Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add buffered reader #4018

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

benjaminhuth
Copy link
Member

@benjaminhuth benjaminhuth commented Jan 10, 2025

This reader can wrap an exisiting reader, preload some events in a buffer, and randomly picks events upon execution in the sequencer. Should help to mitigate I/O bottlenecks in throughput measurements.

--- END COMMIT MESSAGE ---

Any further description goes here, @-mentions are ok here!

  • Use a conventional commits prefix: quick summary
    • We mostly use feat, fix, refactor, docs, chore and build types.
  • A milestone will be assigned by one of the maintainers

Summary by CodeRabbit

  • New Features

    • Introduced BufferedReader for enhanced event data handling with buffering capabilities.
    • Added data sharing functionality for WhiteBoard.
  • Improvements

    • Implemented move semantics for WhiteBoard.
    • Added Python interface support for BufferedReader.
  • Tests

    • Added new test case for BufferedReader functionality.

Copy link

coderabbitai bot commented Jan 10, 2025

Walkthrough

A new BufferedReader component introduced to the ACTS Examples Framework, it is. Designed to enhance event data handling through buffered reading, the implementation spans multiple files. A new class implementing the IReader interface, with capabilities for configuring event sampling, buffer management, and downstream reader integration, it provides.

Changes

File Change Summary
Examples/Framework/CMakeLists.txt Added BufferedReader.cpp to build configuration
Examples/Framework/include/ActsExamples/Framework/BufferedReader.hpp New header defining BufferedReader class with configuration and reading methods
Examples/Framework/include/ActsExamples/Framework/SequenceElement.hpp Added BufferedReader as a friend class
Examples/Framework/include/ActsExamples/Framework/WhiteBoard.hpp Added move semantics and shareDataWith method
Examples/Framework/src/Framework/BufferedReader.cpp Implementation of BufferedReader class methods
Examples/Framework/src/Framework/WhiteBoard.cpp Added shareDataWith method implementation
Examples/Python/src/Input.cpp Integrated BufferedReader into Python interface
Examples/Python/tests/test_reader.py Added test for BufferedReader functionality

Sequence Diagram

sequenceDiagram
    participant DR as Downstream Reader
    participant BR as BufferedReader
    participant ES as Event Store
    
    DR->>BR: Provide Events
    BR->>BR: Buffer Events
    BR->>BR: Sample Random Event
    BR->>ES: Share Event Data
Loading

Possibly related PRs

Suggested labels

automerge

Suggested reviewers

  • paulgessinger

Poem

Buffered bytes flow like the Force, 🌟
Events dance in memory's course, 🎲
Random seeds of wisdom sown, 🌱
Through readers, knowledge has grown! 📚
Wisdom in data, calm and bright ✨

Finishing Touches

  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added this to the next milestone Jan 10, 2025
@github-actions github-actions bot added the Component - Examples Affects the Examples module label Jan 10, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
Examples/Framework/include/ActsExamples/Framework/BufferedReader.hpp (1)

32-34: Document the impact of selectionSeed, we should.

The purpose and impact of selectionSeed on event sampling, explain we must. Help future maintainers understand the randomization process, this will.

Examples/Framework/src/Framework/WhiteBoard.cpp (1)

88-97: Exception safety improve, we must.

If exception thrown during sharing, partial state possible it is. Consider:

  • Transaction-like approach for atomic operations
  • Document sharing semantics (deep vs shallow)
  • Roll back on failure
 void ActsExamples::WhiteBoard::shareDataWith(WhiteBoard &other) const {
+    // Create temporary map for atomic operation
+    std::map<std::string, std::any> temp;
     for (auto &[key, val] : m_store) {
-        auto [it, success] = other.m_store.insert({key, val});
+        auto [it, success] = temp.insert({key, val});
         if (!success) {
             throw std::runtime_error("Cannot share key '" + key +
                                    "', is already present");
         }
     }
+    // Atomic operation
+    other.m_store.insert(temp.begin(), temp.end());
 }
Examples/Python/tests/test_reader.py (1)

440-483: Good start with testing, but more coverage needed, I sense!

Well structured, the basic test is. But additional test cases, beneficial they would be:

  • Different buffer sizes, test you should
  • Error handling for invalid configurations, verify you must
  • Behavior with empty files, understand we need
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d6eb834 and ffcc912.

📒 Files selected for processing (8)
  • Examples/Framework/CMakeLists.txt (1 hunks)
  • Examples/Framework/include/ActsExamples/Framework/BufferedReader.hpp (1 hunks)
  • Examples/Framework/include/ActsExamples/Framework/SequenceElement.hpp (1 hunks)
  • Examples/Framework/include/ActsExamples/Framework/WhiteBoard.hpp (1 hunks)
  • Examples/Framework/src/Framework/BufferedReader.cpp (1 hunks)
  • Examples/Framework/src/Framework/WhiteBoard.cpp (1 hunks)
  • Examples/Python/src/Input.cpp (2 hunks)
  • Examples/Python/tests/test_reader.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: macos
  • GitHub Check: missing_includes
  • GitHub Check: linux_ubuntu_extra (ubuntu2204_clang, 20)
  • GitHub Check: linux_ubuntu_extra (ubuntu2204, 20)
  • GitHub Check: linux_ubuntu
  • GitHub Check: build_debug
🔇 Additional comments (7)
Examples/Framework/include/ActsExamples/Framework/SequenceElement.hpp (1)

54-54: Careful with friend declarations, we must be! Hmmmm.

Access to private members through friendship, this grants. A powerful tool it is, but use it wisely we must. Consider documenting why this friendship necessary it is.

Examples/Framework/include/ActsExamples/Framework/BufferedReader.hpp (1)

59-63: Empty initialize and finalize methods, concerning they are.

Consider if initialization or cleanup needed it is. At minimum, document why these methods empty they are.

Examples/Framework/src/Framework/BufferedReader.cpp (1)

43-51: Memory management concerns, I sense.

Reserve buffer size upfront, good practice it is. But consider:

  • Large buffer sizes, out of memory errors cause they might
  • Progress logging for large buffers, helpful it would be
  • Error handling for failed reads, improve we should
Examples/Python/src/Input.cpp (1)

11-11: Approve the registration of BufferedReader, I do!

Properly declared, the new BufferedReader is. Follow the established pattern for reader declarations, it does. Clear and well-structured, the parameters are.

Also applies to: 43-46

Examples/Framework/include/ActsExamples/Framework/WhiteBoard.hpp (2)

44-45: Wise addition of move semantics, this is!

Efficient transfer of WhiteBoard resources, it enables. Default implementations, sufficient they are, given the member types we have.


49-54: Elegant solution for data sharing, you have created!

Share data between WhiteBoard instances with minimal overhead, this method does. Well documented, the behavior is. Through shared pointers, efficiency achieved it has.

Examples/Framework/CMakeLists.txt (1)

18-18: Correctly integrated into build system, the new source file is!

Follow project conventions for source organization, it does. Simple yet effective, the change is.

Copy link

github-actions bot commented Jan 10, 2025

📊: Physics performance monitoring for dec9e86

Full contents

physmon summary

Copy link
Member

@paulgessinger paulgessinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

/// shared pointers.
/// Throws an exception if the other whiteboard already contains one of
/// the keys in this whiteboard.
void shareDataWith(WhiteBoard& other) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a matter of taste but I would personally implement this the other way round, where the method is called on the receiver of the contents and gets the source as an argument.

public:
struct Config {
/// The downstream reader that should be used
std::shared_ptr<IReader> downstreamReader;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, but in my mind, the source reader is upstream of the buffered reader.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah right, now where I think of it this would make more sense


/// Buffer size. The reader will throw and exception if the downstream
/// reader does not provide enough events
std::size_t bufferSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not guaranteed to be initialized. Should we default this to some (low-ish) number?

Also maybe add a comment that this is cached in memory and therefore has memory footprint implications.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. I think it should be defaulted to 1.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
Examples/Python/tests/test_reader.py (2)

447-460: Hmm. Verify the event count for data generation, we must.

Only eventsInBuffer events are written, but eventsToProcess events we expect to read. Document this intentional design choice in comments, we should, to prevent confusion for future padawans.

 # need to write out some particles first
+# Note: We intentionally write fewer events than we process
+# to demonstrate the BufferedReader's resampling capability
 eventsInBuffer = 5
 eventsToProcess = 10

479-486: Incomplete, the test coverage is. Additional assertions, we need.

Only verify the number of events seen, we do. Add assertions to verify the content of resampled events and proper buffer management, we should.

     s2.run()
 
     assert alg.events_seen == eventsToProcess
+    # Verify that events are properly resampled from buffer
+    events = [ev.particles for ev in s2.events()]
+    assert len(set(str(ev) for ev in events)) <= eventsInBuffer
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ffcc912 and a83827e.

📒 Files selected for processing (1)
  • Examples/Python/tests/test_reader.py (1 hunks)
🧰 Additional context used
📓 Learnings (1)
📓 Common learnings
Learnt from: benjaminhuth
PR: acts-project/acts#4018
File: Examples/Framework/include/ActsExamples/Framework/BufferedReader.hpp:51-54
Timestamp: 2025-01-10T10:35:16.967Z
Learning: The BufferedReader in ActsExamples is designed to serve an unlimited number of events by reusing/resampling from its fixed-size buffer, which helps reduce I/O overhead in timing measurements. Therefore, its availableEvents() correctly returns std::numeric_limits<std::size_t>::max() as the upper bound.
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: linux_physmon
  • GitHub Check: linux_examples_test
  • GitHub Check: missing_includes
  • GitHub Check: linux_ubuntu_extra (ubuntu2204_clang, 20)
  • GitHub Check: build_debug
🔇 Additional comments (2)
Examples/Python/tests/test_reader.py (2)

440-446: Clear and descriptive test setup, it is!

Well-defined variables for buffer size and total events to process, I see. Good practice for test readability and maintainability, this is.


471-477: Strong with the Force, this configuration is!

Correctly configured, the BufferedReader is. Match the retrieved learning about unlimited event serving through resampling, it does.

Comment on lines +440 to +486
@pytest.mark.root
def test_buffered_reader(tmp_path, conf_const, ptcl_gun):
# Test the buffered reader with the ROOT particle reader
# need to write out some particles first
eventsInBuffer = 5
eventsToProcess = 10

s = Sequencer(numThreads=1, events=eventsInBuffer, logLevel=acts.logging.WARNING)
evGen = ptcl_gun(s)

file = tmp_path / "particles.root"
s.addWriter(
conf_const(
RootParticleWriter,
acts.logging.WARNING,
inputParticles=evGen.config.outputParticles,
filePath=str(file),
)
)

s.run()

# reset sequencer for reading
s2 = Sequencer(events=eventsToProcess, numThreads=1, logLevel=acts.logging.WARNING)

reader = acts.examples.RootParticleReader(
level=acts.logging.WARNING,
outputParticles="particles_input",
filePath=str(file),
)

s2.addReader(
acts.examples.BufferedReader(
level=acts.logging.WARNING,
downstreamReader=reader,
bufferSize=eventsInBuffer,
)
)

alg = AssertCollectionExistsAlg(
"particles_input", "check_alg", acts.logging.WARNING
)
s2.addAlgorithm(alg)

s2.run()

assert alg.events_seen == eventsToProcess
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Missing edge cases, I sense in this test.

Test additional scenarios, we must:

  1. Buffer size of 1
  2. Buffer size equal to events to process
  3. Error cases for invalid buffer sizes

Help you write these additional test cases, shall I?

Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component - Examples Affects the Examples module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants