-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: MetricWriter #123
feat: MetricWriter #123
Conversation
032388d
to
78f2246
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good so far, just a couple small things
1077e14
to
e3222b5
Compare
WalkthroughThe pull request introduces significant enhancements to the In the Assessment against linked issues
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Outside diff range and nitpick comments (6)
fgpyo/io/__init__.py (1)
Line range hint
92-101
: LGTM: Good use of deprecation warning for renamed function.The renaming of
assert_path_is_writeable
toassert_path_is_writable
improves consistency with American English spelling. The deprecation warning is a good practice for maintaining backward compatibility while encouraging the use of the new function name.Consider adding a
FutureWarning
instead of aDeprecationWarning
to make the warning more visible to users:warnings.warn( "assert_path_is_writeable is deprecated, use assert_path_is_writable instead", FutureWarning, stacklevel=2, )fgpyo/util/metric.py (3)
450-459
: Update parameter name in the docstring to match the method signature.In the
__init__
method ofMetricWriter
, the parameter is namedfilename
, but the docstring refers to it aspath
. For consistency and clarity, the docstring should usefilename
.Apply this diff to correct the parameter name in the docstring:
""" Args: - path: Path to the output file. + filename: Path to the output file. metric_class: Metric class. append: If `True`, the file will be appended to. Otherwise, the specified file will be overwritten.
523-525
: Remove unnecessary call tosuper().__exit__
in the__exit__
method.Since
AbstractContextManager
does not define a meaningful__exit__
method, callingsuper().__exit__
is unnecessary and can be omitted.Apply this diff to simplify the
__exit__
method:def __exit__( self, exc_type: Type[BaseException], exc_value: BaseException, traceback: TracebackType, ) -> None: self.close() - super().__exit__(exc_type, exc_value, traceback)
450-456
: Add type variable tometric_class
parameter for better type checking.In the
__init__
method ofMetricWriter
, specifying the generic type formetric_class
improves type safety and clarity.Apply this diff to add the type variable:
def __init__( self, filename: Union[Path, str], - metric_class: Type[Metric], + metric_class: Type[MetricType], append: bool = False, delimiter: str = "\t", include_fields: Optional[List[str]] = None,tests/fgpyo/util/test_metric.py (2)
590-603
: Add a docstring totest_writer
functionConsider adding a docstring to the
test_writer
test function to explain its purpose and improve code readability.
619-635
: Add a docstring totest_writer_writeall
functionAdding a docstring to the
test_writer_writeall
function would enhance clarity and maintain consistency with other test functions.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- fgpyo/io/init.py (4 hunks)
- fgpyo/util/metric.py (3 hunks)
- tests/fgpyo/util/test_metric.py (5 hunks)
🧰 Additional context used
🔇 Additional comments (5)
fgpyo/io/__init__.py (5)
50-50
: LGTM: Import changes improve clarity and remove unused imports.The direct import of
TextIOWrapper
and removal of the unusedUnion
import align well with the changes in the file, improving code clarity and reducing unnecessary imports.
Line range hint
157-170
: LGTM: Improved type specificity into_reader
function.The changes to the
to_reader
function improve type specificity by consistently usingTextIOWrapper
as the return type. This enhances type checking and code clarity without affecting the function's behavior.
Line range hint
175-196
: LGTM: Enhanced type specificity into_writer
function.The modifications to the
to_writer
function improve type specificity by consistently usingTextIOWrapper
as the return type. This change enhances type checking and code clarity without altering the function's behavior.
Line range hint
92-92
: LGTM: Corrected typo in error messages.The typo "becasue" has been corrected to "because" in the error messages of assertion functions. This minor correction improves the readability and professionalism of the error messages.
Also applies to: 104-104
Line range hint
1-292
: Overall assessment: Excellent improvements to thefgpyo/io/__init__.py
module.The changes in this file significantly enhance type specificity, correct minor issues, and improve code clarity. The modifications to
to_reader
andto_writer
functions, along with the renaming ofassert_path_is_writeable
, align well with the PR objectives. These improvements contribute to better code quality, maintainability, and type safety.
dd8730f
to
4ab75b7
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #123 +/- ##
==========================================
+ Coverage 89.40% 89.71% +0.30%
==========================================
Files 18 18
Lines 2105 2207 +102
Branches 467 489 +22
==========================================
+ Hits 1882 1980 +98
- Misses 146 149 +3
- Partials 77 78 +1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also replace metric.write() with an implementaiton that uses the new writer so we don't have duplicate code paths/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (15)
fgpyo/util/metric.py (9)
179-203
: LGTM: New methods enhance Metric class functionality.The new methods
keys()
,items()
, andformatted_items()
provide a consistent and pythonic way to access field names and values. This addition improves the usability of the Metric class.However, there's a minor inconsistency in the
formatted_items()
method:The docstring for
formatted_items()
states it returns an iterator, but it actually returns a list. Consider updating the docstring to reflect this:- """An iterator over formatted attribute values in the same order as the header.""" + """A list of tuples containing field names and their formatted values in the same order as the header."""
302-304
: Complete the implementation of MetricWriter in the write() method.The change to use MetricWriter is a good improvement, allowing for more flexible file handling. However, the TODO comment suggests that the implementation is not yet complete.
Consider removing the TODO comment and fully implementing the MetricWriter usage. For example:
- # TODO: open a MetricWriter here instead - with MetricWriter[MetricType](path, metric_class=values[0].__class__) as writer: + with MetricWriter[MetricType](path, metric_class=cls) as writer: writer.writeall(values)This change uses
cls
instead ofvalues[0].__class__
, which is more consistent and doesn't assume thatvalues
is non-empty.
382-422
: LGTM: New _read_header() method handles file headers well.The new
_read_header()
method is well-structured and effectively handles comments, empty lines, and field name extraction. It correctly returns a MetricFileHeader object, maintaining consistency with the new dataclass.Consider adding error handling for potential edge cases:
- Handle the case where the file is empty or contains only comments/empty lines more explicitly.
- Consider adding a maximum number of lines to read before giving up on finding the header.
For example:
MAX_LINES_TO_READ = 1000 # Adjust as needed for line_num, line in enumerate(reader, 1): if line_num > MAX_LINES_TO_READ: raise ValueError(f"Could not find header within {MAX_LINES_TO_READ} lines") # ... rest of the existing logic ... if not fieldnames: raise ValueError("File contains no valid header")This addition would prevent the method from hanging on extremely large files with no valid header.
425-437
: LGTM: New _is_metric_class() function enhances type checking.The new
_is_metric_class()
function is well-structured and effectively handles both dataclasses and attr classes. The use of TypeGuard enhances type checking capabilities.Consider renaming the function to be more descriptive of its purpose:
-def _is_metric_class(cls: Any) -> TypeGuard[Metric]: +def _is_valid_metric_class(cls: Any) -> TypeGuard[Metric]:This name more clearly indicates that the function is checking for a valid Metric class, not just any Metric class.
440-570
: LGTM: New MetricWriter class provides flexible metric writing.The MetricWriter class is well-structured and offers flexible options for writing metrics, including appending, custom delimiters, and field inclusion/exclusion. The implementation of AbstractContextManager ensures proper file handling.
Consider enhancing error handling and messages:
- In the
__init__
method, provide more specific error messages:if not _is_metric_class(metric_class): raise TypeError(f"Expected a Metric subclass, got {type(metric_class).__name__}")
- In the
write
method, add a type check for the metric argument:if not isinstance(metric, self._metric_class): raise TypeError(f"Expected {self._metric_class.__name__}, got {type(metric).__name__}")These changes will provide more informative error messages, making debugging easier for users of the class.
573-606
: LGTM: New function handles fieldname validation and generation well.The
_validate_and_generate_final_output_fieldnames()
function is well-structured and correctly handles the include_fields and exclude_fields options. It properly validates that specified fields are attributes of the metric class.Consider shortening the function name for better readability:
-def _validate_and_generate_final_output_fieldnames( +def _get_validated_output_fieldnames(This shorter name still conveys the function's purpose while improving code readability.
609-648
: LGTM: New function effectively validates file headers against metric classes.The
_assert_file_header_matches_metric()
function is well-structured and performs necessary checks to ensure file headers match the given metric class. It appropriately uses the new_read_header()
method for parsing.Consider enhancing the error messages for better debugging:
- When the header is empty:
if not header.fieldnames: raise ValueError(f"The file '{path}' does not contain a valid header.")
- When there's a mismatch between expected and actual fields:
if header.fieldnames != fieldnames: missing_fields = set(fieldnames) - set(header.fieldnames) extra_fields = set(header.fieldnames) - set(fieldnames) error_msg = f"Header mismatch in file '{path}':\n" if missing_fields: error_msg += f"Missing fields: {', '.join(missing_fields)}\n" if extra_fields: error_msg += f"Extra fields: {', '.join(extra_fields)}" raise ValueError(error_msg)These changes provide more detailed information about the nature of the mismatch, which can be helpful for debugging.
651-671
: LGTM: New function efficiently validates fieldnames against metric attributes.The
_assert_fieldnames_are_metric_attributes()
function is concise and effectively checks if specified fieldnames are attributes of a metric class. The use of set operations for comparison is efficient.Consider improving the error message to be more specific about which fieldnames are invalid:
if invalid_fieldnames: raise ValueError( f"The following fields are not attributes of the Metric {metric_class.__name__}: " f"{', '.join(invalid_fieldnames)}. " f"Valid attributes are: {', '.join(metric_class.keys())}" )This change provides more detailed information about which fields are invalid and what the valid options are, making it easier for users to correct their input.
673-684
: LGTM: New function effectively asserts valid Metric classes.The
_assert_is_metric_class()
function is concise and effectively asserts that a given class is a valid Metric. It appropriately uses the previously defined_is_metric_class()
function.Consider improving the error message to provide more information about why the class is not a valid Metric:
if not _is_metric_class(cls): raise TypeError( f"{cls.__name__} is not a valid Metric class. " "Ensure it is a subclass of Metric and decorated with @dataclass or @attr.s." )This change provides more specific guidance on what makes a valid Metric class, which can help users understand and correct the issue more easily.
tests/fgpyo/util/test_metric.py (6)
548-568
: LGTM: Good test for Picard-formatted header parsingThis test effectively verifies the ability to read headers from Picard-formatted files. It creates a mock file with realistic header content and checks if the correct fieldnames are extracted.
Consider adding an assertion to verify the
preamble
content of the header as well, to ensure all parts of the header are correctly parsed.
591-603
: LGTM: Comprehensive test for MetricWriter basic functionalityThis test effectively verifies the core functionality of MetricWriter, including writing the header and multiple metric instances. The assertions thoroughly check the output file content.
Consider adding a test case with
append=True
to ensure the appending functionality works correctly as well.
766-772
: LGTM: Important edge case test for FIFO handlingThis test effectively verifies that MetricWriter raises the appropriate error when attempting to append to a FIFO (named pipe). It's an important check to prevent misuse of the MetricWriter with non-file outputs.
Consider adding similar tests for other special file types (e.g.,
/dev/null
, character devices) to ensure comprehensive coverage of potential edge cases.
775-780
: LGTM: Basic test for _assert_is_metric_classThis test verifies that
_assert_is_metric_class
correctly handles a valid Metric subclass. It's a good starting point for testing this assertion function.Consider adding an explicit assertion (e.g.,
assert True
) at the end of the test to make the expected behavior more clear. Also, you might want to add tests with both attr and dataclasses implementations to ensure comprehensive coverage.
821-839
: LGTM: Well-structured test for valid fieldnamesThis test effectively verifies that
_assert_fieldnames_are_metric_attributes
correctly handles various valid fieldname scenarios. The use of parametrization allows for comprehensive coverage of different cases.Consider adding an explicit assertion (e.g.,
assert True
) at the end of the test to make the expected behavior more clear. Additionally, you might want to add a test case with an empty list of fieldnames to cover that edge case.
862-894
: LGTM: Thorough tests for file header matchingThese tests effectively verify the behavior of
_assert_file_header_matches_metric
for both valid and invalid headers. The use of parametrization in the second test allows for comprehensive coverage of different error scenarios.Consider adding an explicit assertion (e.g.,
assert True
) at the end of thetest_assert_file_header_matches_metric
function to make the expected behavior more clear.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- fgpyo/io/init.py (4 hunks)
- fgpyo/util/metric.py (4 hunks)
- tests/fgpyo/util/test_metric.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- fgpyo/io/init.py
🧰 Additional context used
🔇 Additional comments (18)
fgpyo/util/metric.py (1)
116-166
: LGTM: New imports and classes added.The new imports (AbstractContextManager, DictWriter, TextIOWrapper, TypeGuard) and new classes (MetricFileHeader, MetricWriter) are well-structured and necessary for the enhanced functionality being introduced. The MetricFileHeader dataclass provides a clear structure for storing file header information, which will improve code readability and maintainability.
tests/fgpyo/util/test_metric.py (17)
395-397
: LGTM: Good test coverage forkeys()
methodThe test effectively verifies the
keys()
method for both attr and dataclasses implementations of the Person class. It's concise and covers the expected functionality.
405-409
: LGTM: Comprehensive test foritems()
methodThis test thoroughly checks the
items()
method of the Person class for both attr and dataclasses implementations. The docstring clearly explains the test's purpose, and the assertion verifies the correct output format and values.
423-426
: LGTM: Well-structured test forformatted_items()
methodThis test effectively verifies the
formatted_items()
method of the Person class. It correctly checks the output format and ensures that values are properly formatted (e.g., age as a string). The test covers both attr and dataclasses implementations.
570-582
: LGTM: Good edge case coverage for empty fileThis test effectively verifies the behavior of
_read_header
when given an empty file. It correctly checks that both the preamble and fieldnames are empty lists, which is the expected behavior for this edge case.
585-588
: LGTM: Well-defined FakeMetric class for testingThe
FakeMetric
class is correctly implemented as a dataclass and Metric subclass. It provides a simple and clear structure for testing Metric-related functionality with two different types of fields.
606-617
: LGTM: Good test for MetricWriter string filename supportThis test effectively verifies that MetricWriter can be initialized with a string filename. It's an important test case that ensures flexibility in how users can specify the output file path.
620-635
: LGTM: Thorough test for MetricWriter's writeall methodThis test effectively verifies the
writeall
method of MetricWriter, which is crucial for bulk writing of metrics. It correctly checks the header and all data lines in the output file, ensuring that multiple metrics are written properly.
638-654
: LGTM: Well-structured test for MetricWriter append functionalityThis test thoroughly verifies the append functionality of MetricWriter. It correctly checks that the original header is preserved and new data is properly appended to an existing file. This is a crucial feature for updating metric files without overwriting existing data.
657-664
: LGTM: Good edge case test for appending to empty fileThis test effectively verifies that MetricWriter raises the appropriate error when attempting to append to an empty file. It's an important edge case that ensures robust error handling in the MetricWriter class.
667-675
: LGTM: Thorough edge case test for appending to file without headerThis test effectively verifies that MetricWriter raises the appropriate error when attempting to append to a file that contains data but no header. It's an important edge case that ensures the integrity of the metric file structure is maintained.
678-690
: LGTM: Crucial test for header mismatch scenarioThis test effectively verifies that MetricWriter raises the appropriate error when attempting to append to a file with a header that doesn't match the metric class. It's a critical check that ensures data consistency and prevents mixing of incompatible metrics in the same file.
693-714
: LGTM: Well-implemented test for field inclusionThis test effectively verifies the ability to include only specific fields when writing metrics. It correctly checks that only the specified field ("foo") is present in the output, demonstrating the flexibility of the MetricWriter in customizing the metric output.
717-738
: LGTM: Thorough test for field reorderingThis test effectively verifies the ability to reorder fields when writing metrics. It correctly checks that the fields are written in the specified order, demonstrating the flexibility of the MetricWriter in customizing the output format. This feature is particularly useful for maintaining compatibility with existing systems or specific reporting requirements.
741-763
: LGTM: Comprehensive test for field exclusionThis test effectively verifies the ability to exclude specific fields when writing metrics. It correctly checks that the excluded field ("bar") is not present in the output, further demonstrating the flexibility of the MetricWriter in customizing the metric output. This feature is particularly useful for omitting sensitive or unnecessary information from the output.
783-794
: LGTM: Thorough test for undecorated Metric subclassThis test effectively verifies that
_assert_is_metric_class
raises the appropriate error when given an undecorated Metric subclass. It's an important check to ensure that Metric subclasses are properly implemented with the required decorators.
797-817
: LGTM: Comprehensive test for non-Metric classesThis test effectively verifies that
_assert_is_metric_class
raises the appropriate error when given classes that are decorated but not Metric subclasses. It covers both dataclass and attr implementations, ensuring robust checking of the assertion function's behavior.
841-859
: LGTM: Comprehensive test for invalid fieldnamesThis test effectively verifies that
_assert_fieldnames_are_metric_attributes
raises the appropriate error when given invalid fieldnames. The use of parametrization allows for thorough coverage of different error scenarios, ensuring robust checking of the assertion function's behavior.
* feat: add asdict * fix: typeguard fix: typeguard import doc: update docstring refactor: import Instance types refactor: import Instance types * fix: 3.8 compatible Dict typing * refactor: make instance checks private * tests: add coverage * fix: typeerror * doc: clarify that asdict does not format values
This PR introduces several of the assertion methods used when constructing the `MetricWriter`. (See #107 for how they are used in practice).
Closes #88. I've adapted the `DataclassWriter` from https://github.com/msto/dataclass_io/ to work with Metrics.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
c6714a1
to
d100baa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (9)
fgpyo/util/metric.py (5)
179-203
: LGTM! New methods enhance Metric class interface, but minor inconsistency in formatted_items().The new keys(), items(), and formatted_items() methods provide a more intuitive interface for accessing metric data. However, there's a minor inconsistency in the formatted_items() method.
Consider updating the docstring or return type of formatted_items():
- def formatted_items(self) -> List[Tuple[str, str]]: - """An iterator over formatted attribute values in the same order as the header.""" + def formatted_items(self) -> List[Tuple[str, str]]: + """A list of tuples containing field names and their formatted values in the same order as the header."""
381-421
: LGTM! Comprehensive _read_header() method with clear documentation.The new _read_header() static method is well-implemented and thoroughly documented. It handles various edge cases and provides a clear structure for the parsed header.
Consider adding a brief comment explaining the use of the
else
clause in thefor
loop:else: + # If we've exhausted the file without finding a non-comment, non-empty line # If the file was empty, kick back an empty header fieldnames = []
This would clarify the purpose of this edge case handling.
424-437
: LGTM! Robust _is_metric_class() function with good error handling.The new _is_metric_class() function provides a robust check for valid Metric classes, handling both dataclasses and attr-decorated classes. The use of TypeGuard enhances type checking capabilities.
Consider adding a brief comment explaining the purpose of the attr import attempt:
try: import attr + # Check for attr-decorated classes if attr is installed is_metric_cls = is_metric_cls and (dataclasses.is_dataclass(cls) or attr.has(cls)) except ImportError: + # Fall back to dataclass-only check if attr is not installed is_metric_cls = is_metric_cls and dataclasses.is_dataclass(cls)This would clarify the reasoning behind the import attempt and fallback behavior.
439-569
: LGTM! Comprehensive MetricWriter class with good resource management.The new MetricWriter class is well-implemented, providing flexible options for writing metrics to files. The use of AbstractContextManager ensures proper resource handling, and the docstrings are detailed and informative.
For consistency with other parts of the codebase, consider using ValueError instead of AssertionError in the init method:
- AssertionError: If the provided filepath is not writable. - AssertionError: If `append=True` and the provided file is not readable. (When appending, + ValueError: If the provided filepath is not writable. + ValueError: If `append=True` and the provided file is not readable. (When appending, we check to ensure that the header matches the specified metric class. The file must be readable to get the header.)Also, update the corresponding error raising in the method body:
- io.assert_path_is_writable(filepath) + if not io.is_writable(filepath): + raise ValueError(f"The file '{filepath}' is not writable.") if append: - io.assert_path_is_readable(filepath) + if not io.is_readable(filepath): + raise ValueError(f"The file '{filepath}' is not readable for appending.")
608-647
: LGTM! Comprehensive file header validation function.The _assert_file_header_matches_metric() function provides a thorough check for file header consistency with the metric class. The implementation uses the new _read_header() method, ensuring consistent header parsing.
Consider adding a check for the case where the file is empty:
with path.open("r") as fin: + if fin.read(1) == '': + raise ValueError(f"The provided file is empty: {path}") + fin.seek(0) header: MetricFileHeader = metric_class._read_header(fin, delimiter=delimiter)This would provide a more specific error message for empty files.
tests/fgpyo/util/test_metric.py (4)
395-398
: Add a docstring totest_metric_keys
for clarityThe test function
test_metric_keys
lacks a docstring. Adding a descriptive docstring will improve code readability and maintain consistency with other test functions.
423-427
: Add a docstring totest_metric_formatted_items
for consistencyIncluding a docstring in the
test_metric_formatted_items
function will enhance clarity and maintain consistency across the test suite.
591-604
: Include a docstring intest_writer
to describe the test's purposeAdding a docstring to the
test_writer
function will help others understand what is being tested and improve code documentation.
620-636
: Provide a docstring fortest_writer_writeall
The function
test_writer_writeall
lacks a docstring. Documenting the purpose of this test will enhance readability and clarity.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- fgpyo/io/init.py (4 hunks)
- fgpyo/util/metric.py (4 hunks)
- tests/fgpyo/util/test_metric.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- fgpyo/io/init.py
🧰 Additional context used
🔇 Additional comments (7)
fgpyo/util/metric.py (5)
116-166
: LGTM! Improved imports and new MetricFileHeader class.The new imports and the introduction of the MetricFileHeader class enhance the module's functionality and provide a clear structure for handling file headers. The dataclass implementation is clean and well-documented.
302-303
: LGTM! Improved write() method using MetricWriter.The change to use MetricWriter in the write() method improves the separation of concerns and allows for more flexible writing options. This modification enhances the overall design and maintainability of the class.
572-605
: LGTM! Well-implemented fieldname validation and generation function.The _validate_and_generate_final_output_fieldnames() function provides a robust mechanism for handling include and exclude field options. The implementation is thorough, with clear error messages and concise logic for generating the final output fieldnames.
650-670
: LGTM! Efficient fieldname validation function.The _assert_fieldnames_are_metric_attributes() function provides an efficient and clear mechanism for validating that specified fieldnames are attributes of a metric class. The use of set operations for comparison is a good choice for performance.
672-683
: LGTM! Clear and concise Metric class assertion function.The _assert_is_metric_class() function provides a clear mechanism for asserting that a class is a valid Metric class. It effectively leverages the _is_metric_class() function and provides a specific error message for invalid classes.
tests/fgpyo/util/test_metric.py (2)
776-781
: LGTM:test_assert_is_metric_class
correctly validates Metric classesThe test function
test_assert_is_metric_class
appropriately verifies that_assert_is_metric_class
works as intended, ensuring that valid Metric classes pass the assertion without raising exceptions.
821-840
: LGTM:test_assert_fieldnames_are_metric_attributes
correctly checks field namesThe test cases in
test_assert_fieldnames_are_metric_attributes
effectively validate that the provided field names are attributes of thePerson
metric class in various scenarios, including different orders and subsets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
tests/fgpyo/util/test_metric.py (1)
769-771
: Consider enhancing platform check for broader compatibilityIn the
test_writer_raises_if_fifo
function, the current checkif os.name == "nt":
skips the test on Windows systems. However, other non-Unix-like operating systems may also lack/dev/stdout
. To ensure the test is only run on Unix-like systems where/dev/stdout
is available, consider modifying the condition to:if os.name != "posix":This change will skip the test on all non-Posix systems, enhancing cross-platform compatibility.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- tests/fgpyo/util/test_metric.py (5 hunks)
🧰 Additional context used
🔇 Additional comments (7)
tests/fgpyo/util/test_metric.py (7)
798-799
: Verify exception message matches actual raised errorIn
test_assert_is_metric_class_raises_if_not_decorated
, thepytest.raises
call matches the error message"Not a dataclass or attr decorated Metric"
. Please verify that the_assert_is_metric_class
function raises an exception with this exact message when the class is aMetric
subclass but not decorated. Ensuring the error message matches prevents tests from passing erroneously or failing due to message discrepancies.
812-813
: Verify exception message matches actual raised errorSimilarly, in
test_assert_is_metric_class_raises_if_not_a_metric
, confirm that the exception raised by_assert_is_metric_class
contains the message"Not a dataclass or attr decorated Metric"
. Accurate error message matching is important for the reliability of the test, ensuring it fails only when appropriate.
396-399
: Well-structured tests for newMetric
methodsThe added tests
test_metric_keys
,test_metric_items
, andtest_metric_formatted_items
effectively validate the new methodskeys()
,items()
, andformatted_items()
of theMetric
class. The tests confirm that these methods return the expected keys and items, both in raw and formatted forms, enhancing confidence in the new functionality.Also applies to: 406-411, 424-428
549-570
: Robust testing of header-reading capabilitiesThe functions
test_read_header_can_read_picard
andtest_read_header_can_read_empty
provide solid tests for the header-reading functionality. They ensure that headers are correctly parsed from Picard-formatted files and that empty files are handled gracefully, which increases the robustness of theMetricFileHeader
class.Also applies to: 571-584
592-765
: Comprehensive testing ofMetricWriter
functionalityThe suite of tests for the
MetricWriter
class—includingtest_writer
,test_writer_append
,test_writer_include_fields
, and others—thoroughly checks various writing scenarios. These tests validate appending to files, handling empty or mismatched headers, including or excluding fields, and reordering fields. This comprehensive coverage ensures thatMetricWriter
behaves correctly in diverse situations.
781-785
: Correct validation of Metric subclass in testThe
test_assert_is_metric_class
function accurately tests that_assert_is_metric_class
accepts a validMetric
subclass. This positive test complements the negative tests and confirms that the validation behaves as expected for proper classes.
825-864
: Effective testing of field name assertionsThe tests
test_assert_fieldnames_are_metric_attributes
andtest_assert_fieldnames_are_metric_attributes_raises
effectively verify that_assert_fieldnames_are_metric_attributes
correctly identifies valid and invalid field names. By testing various combinations of field names, these tests ensure that only appropriate field names are accepted, which strengthens input validation.
Closes #88
Closes #90
Summary
This PR introduces a
MetricWriter
class.This PR is motivated by a limitation in
Metric.write()
, which does not permit appending metric instances to a file. The new class supports appending and streamed writing of metrics, validating that the existing file's structure matches the metrics being written.MetricWriter
also supports optionally reordering or removing fields when writing.Added
MetricWriter
class, which supports context-managed appending and writing of Metrics to file.Metric.keys()
,Metric.items()
, andMetric.formatted_items()
. These are companions to the existingMetric.values()
andMetric.formatted_values()
, and provide analogous key/item viewsMetric._read_header
and the associatedMetricFileHeader
class, to parse a file's header (and any preceding comment lines).