test: introduce the new `MockJsonHandler` and `MockParquetHandler` #678

sebastiantia · 2025-02-05T18:58:25Z

What changes are proposed in this pull request?

This PR introduces the MockEngine which returns the new MockJsonHandler, and MockParquetHandler.

These mock handlers enable functional testing without relying on real implementations of the Engine trait that perform actual file I/O. Instead, they simulate file reads in a controlled manner, ensuring tests remain isolated and predictable.

Both handlers extend a generic MockHandler, which:

Maintains an expected queue of file read operations (i.e., which files should be read, with what schema, and using which predicate).
Enforces order validation, ensuring that file reads occur as expected.
Returns predefined results for each file read operation, eliminating reliance on actual files.

This addition aims to better our current testing utilities. Tests that leverage LocalMockTable are candidates for replacement as LocalMockTable relies on performing operations in a temporary directory, introducing unnecessary dependencies on file I/O.

How was this change tested?

Included in this PR is an example of the handlers usage for unit testing a core piece of functionality in kernel: test_log_replay

asserts that the correct commit files are read in reversed order with the correct schema and predicate
asserts that the correct checkpoint files are read with the correct schema and predicate
asserts that the batches read from the files are mapped with the appropriate is_log_batch flag
asserts that the batches read from the files are chained and returned

codecov · 2025-02-05T19:00:43Z

Codecov Report

Attention: Patch coverage is 87.60000% with 31 lines in your changes missing coverage. Please review.

Project coverage is 84.14%. Comparing base (2240154) to head (92495c8).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
kernel/src/utils.rs	82.82%	28 Missing ⚠️
kernel/src/log_segment/tests.rs	96.42%	2 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #678      +/-   ##
==========================================
+ Coverage   84.08%   84.14%   +0.06%     
==========================================
  Files          77       77              
  Lines       17823    18027     +204     
  Branches    17823    18027     +204     
==========================================
+ Hits        14986    15169     +183     
- Misses       2120     2144      +24     
+ Partials      717      714       -3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sebastiantia · 2025-02-05T19:17:58Z

kernel/src/utils.rs

+    /// This handler maintains a queue of expected read calls and their results,
+    /// enforcing that calls occur in a defined order.
+    struct MockHandler {
+        expected_file_reads_params: Mutex<VecDeque<ExpectedFileReadParams>>,


The Mutex is used because theread_{json|parquet}_files method part of the {Json|Parquet}Handler's traits cannot take a mutable reference to self, which is required to remove elements from the expected calls queue. An alternative approach would be to use a pointer to track the position of expected_file_reads_params, but I opted for this simpler solution.

OussamaSaoudi · 2025-02-05T22:12:17Z

kernel/src/log_segment/tests.rs

+        }
+    }
+
+    // Initialize mock engine and handlers


I feel like this is a lot of setup code required for the mock 🤔

Also while I think some testing infrastructure is in order, I don't think this is fleshed out enough to be exposed as a general testing util. For instance, I want to be able to control what data we return.

If the mock handlers are just asserting that we get the expected paths, then I think this should only be part of the tests for the iterator.

Would be interested in what @zachschuermann thinks

Yea I definitely think some of the testing setup can still be abstracted. I also think this example looks large but most of the code is for creating the expected parameters that the handlers are passed - which can be refactored into helper functions that I foresee many tests can benefit from, e.g. creating Vec<ParsedLogPath>

Also, the mock handlers do have functionality to control what data is returned.

Each expect_read_call takes the 3 params we expect to be passed to our handlers along with result: DeltaResult<FileDataReadResultIterator>, which is the iterator of data batches returned from the handler when called. I just pushed an update to demonstrate this in the unit test.

OussamaSaoudi · 2025-02-05T22:12:52Z

kernel/src/lib.rs

+    /// Create a ParsedLogPath from this FileMeta
+    #[cfg(test)]
+    fn to_parsed_log_path(&self) -> ParsedLogPath {
+        ParsedLogPath::try_from(self.clone()).unwrap().unwrap()
+    }


I think this could be its own function.

sebastiantia requested a review from OussamaSaoudi February 5, 2025 18:58

github-actions bot assigned sebastiantia Feb 5, 2025

sebastiantia changed the title ~~mvp~~ feat: introduce the MockEngine that returns the new MockJsonHandlerand MockParquetHandler Feb 5, 2025

sebastiantia changed the title ~~feat: introduce the MockEngine that returns the new MockJsonHandlerand MockParquetHandler~~ feat: introduce the new MockJsonHandler and MockParquetHandler Feb 5, 2025

sebastiantia commented Feb 5, 2025

View reviewed changes

sebastiantia marked this pull request as ready for review February 5, 2025 21:20

OussamaSaoudi reviewed Feb 5, 2025

View reviewed changes

sebastiantia added 2 commits February 5, 2025 16:09

mvp

908c381

assert batches returned in replay test

6ac530d

sebastiantia force-pushed the mock_file_readers_testing_utility branch from 84e6010 to 6ac530d Compare February 6, 2025 00:10

sebastiantia mentioned this pull request Feb 6, 2025

feat: extract & insert sidecar batches in replay's action iterator #679

Open

sebastiantia added 2 commits February 6, 2025 00:24

MockEngineContext & docs

f21dc14

remove visibility

92495c8

sebastiantia requested a review from scovich February 6, 2025 18:53

zachschuermann changed the title ~~feat: introduce the new MockJsonHandler and MockParquetHandler~~ test: introduce the new MockJsonHandler and MockParquetHandler Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: introduce the new `MockJsonHandler` and `MockParquetHandler` #678

test: introduce the new `MockJsonHandler` and `MockParquetHandler` #678

sebastiantia commented Feb 5, 2025 •

edited

Loading

codecov bot commented Feb 5, 2025 •

edited

Loading

sebastiantia Feb 5, 2025 •

edited

Loading

OussamaSaoudi Feb 5, 2025

OussamaSaoudi Feb 5, 2025

sebastiantia Feb 5, 2025 •

edited

Loading

OussamaSaoudi Feb 5, 2025

test: introduce the new MockJsonHandler and MockParquetHandler #678

Are you sure you want to change the base?

test: introduce the new MockJsonHandler and MockParquetHandler #678

Conversation

sebastiantia commented Feb 5, 2025 • edited Loading

What changes are proposed in this pull request?

How was this change tested?

codecov bot commented Feb 5, 2025 • edited Loading

Codecov Report

sebastiantia Feb 5, 2025 • edited Loading

Choose a reason for hiding this comment

OussamaSaoudi Feb 5, 2025

Choose a reason for hiding this comment

OussamaSaoudi Feb 5, 2025

Choose a reason for hiding this comment

sebastiantia Feb 5, 2025 • edited Loading

Choose a reason for hiding this comment

OussamaSaoudi Feb 5, 2025

Choose a reason for hiding this comment

test: introduce the new `MockJsonHandler` and `MockParquetHandler` #678

test: introduce the new `MockJsonHandler` and `MockParquetHandler` #678

sebastiantia commented Feb 5, 2025 •

edited

Loading

codecov bot commented Feb 5, 2025 •

edited

Loading

sebastiantia Feb 5, 2025 •

edited

Loading

sebastiantia Feb 5, 2025 •

edited

Loading