Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: introduce the new MockJsonHandler and MockParquetHandler #678

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

sebastiantia
Copy link
Collaborator

@sebastiantia sebastiantia commented Feb 5, 2025

What changes are proposed in this pull request?

This PR introduces the MockEngine which returns the new MockJsonHandler, and MockParquetHandler.

These mock handlers enable functional testing without relying on real implementations of the Engine trait that perform actual file I/O. Instead, they simulate file reads in a controlled manner, ensuring tests remain isolated and predictable.

Both handlers extend a generic MockHandler, which:

  • Maintains an expected queue of file read operations (i.e., which files should be read, with what schema, and using which predicate).
  • Enforces order validation, ensuring that file reads occur as expected.
  • Returns predefined results for each file read operation, eliminating reliance on actual files.

This addition aims to better our current testing utilities. Tests that leverage LocalMockTable are candidates for replacement as LocalMockTable relies on performing operations in a temporary directory, introducing unnecessary dependencies on file I/O.

How was this change tested?

Included in this PR is an example of the handlers usage for unit testing a core piece of functionality in kernel: test_log_replay

  • asserts that the correct commit files are read in reversed order with the correct schema and predicate
  • asserts that the correct checkpoint files are read with the correct schema and predicate
  • asserts that the batches read from the files are mapped with the appropriate is_log_batch flag
  • asserts that the batches read from the files are chained and returned

Copy link

codecov bot commented Feb 5, 2025

Codecov Report

Attention: Patch coverage is 87.60000% with 31 lines in your changes missing coverage. Please review.

Project coverage is 84.14%. Comparing base (2240154) to head (92495c8).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
kernel/src/utils.rs 82.82% 28 Missing ⚠️
kernel/src/log_segment/tests.rs 96.42% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #678      +/-   ##
==========================================
+ Coverage   84.08%   84.14%   +0.06%     
==========================================
  Files          77       77              
  Lines       17823    18027     +204     
  Branches    17823    18027     +204     
==========================================
+ Hits        14986    15169     +183     
- Misses       2120     2144      +24     
+ Partials      717      714       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sebastiantia sebastiantia changed the title mvp feat: introduce the MockEngine that returns the new MockJsonHandlerand MockParquetHandler Feb 5, 2025
@sebastiantia sebastiantia changed the title feat: introduce the MockEngine that returns the new MockJsonHandlerand MockParquetHandler feat: introduce the new MockJsonHandler and MockParquetHandler Feb 5, 2025
/// This handler maintains a queue of expected read calls and their results,
/// enforcing that calls occur in a defined order.
struct MockHandler {
expected_file_reads_params: Mutex<VecDeque<ExpectedFileReadParams>>,
Copy link
Collaborator Author

@sebastiantia sebastiantia Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Mutex is used because theread_{json|parquet}_files method part of the {Json|Parquet}Handler's traits cannot take a mutable reference to self, which is required to remove elements from the expected calls queue. An alternative approach would be to use a pointer to track the position of expected_file_reads_params, but I opted for this simpler solution.

@sebastiantia sebastiantia marked this pull request as ready for review February 5, 2025 21:20
}
}

// Initialize mock engine and handlers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is a lot of setup code required for the mock 🤔

Also while I think some testing infrastructure is in order, I don't think this is fleshed out enough to be exposed as a general testing util. For instance, I want to be able to control what data we return.

If the mock handlers are just asserting that we get the expected paths, then I think this should only be part of the tests for the iterator.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be interested in what @zachschuermann thinks

Copy link
Collaborator Author

@sebastiantia sebastiantia Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I definitely think some of the testing setup can still be abstracted. I also think this example looks large but most of the code is for creating the expected parameters that the handlers are passed - which can be refactored into helper functions that I foresee many tests can benefit from, e.g. creating Vec<ParsedLogPath>

Also, the mock handlers do have functionality to control what data is returned.

Each expect_read_call takes the 3 params we expect to be passed to our handlers along with result: DeltaResult<FileDataReadResultIterator>, which is the iterator of data batches returned from the handler when called. I just pushed an update to demonstrate this in the unit test.

Comment on lines +190 to +194
/// Create a ParsedLogPath from this FileMeta
#[cfg(test)]
fn to_parsed_log_path(&self) -> ParsedLogPath {
ParsedLogPath::try_from(self.clone()).unwrap().unwrap()
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be its own function.

@sebastiantia sebastiantia requested a review from scovich February 6, 2025 18:53
@zachschuermann zachschuermann changed the title feat: introduce the new MockJsonHandler and MockParquetHandler test: introduce the new MockJsonHandler and MockParquetHandler Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants