Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: MetricWriter #123

Merged
merged 30 commits into from
Oct 21, 2024
Merged

feat: MetricWriter #123

merged 30 commits into from
Oct 21, 2024

Conversation

msto
Copy link
Contributor

@msto msto commented Jun 4, 2024

Closes #88
Closes #90

Summary

This PR introduces a MetricWriter class.

This PR is motivated by a limitation in Metric.write(), which does not permit appending metric instances to a file. The new class supports appending and streamed writing of metrics, validating that the existing file's structure matches the metrics being written. MetricWriter also supports optionally reordering or removing fields when writing.

Added

  • Added the MetricWriter class, which supports context-managed appending and writing of Metrics to file.
  • Added Metric.keys(), Metric.items(), and Metric.formatted_items(). These are companions to the existing Metric.values() and Metric.formatted_values(), and provide analogous key/item views
  • Added Metric._read_header and the associated MetricFileHeader class, to parse a file's header (and any preceding comment lines).

nh13
nh13 previously requested changes Jul 13, 2024
Copy link
Member

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far, just a couple small things

fgpyo/util/metric.py Outdated Show resolved Hide resolved
fgpyo/util/metric.py Outdated Show resolved Hide resolved
fgpyo/util/metric.py Outdated Show resolved Hide resolved
fgpyo/util/metric.py Outdated Show resolved Hide resolved
fgpyo/util/tests/test_metric.py Outdated Show resolved Hide resolved
@msto msto self-assigned this Sep 12, 2024
@msto msto force-pushed the ms_metric-writer-feature-branch branch from 1077e14 to e3222b5 Compare October 17, 2024 16:54
@msto msto marked this pull request as ready for review October 17, 2024 18:27
@msto msto requested review from tfenne and clintval as code owners October 17, 2024 18:27
@msto msto assigned tfenne and unassigned msto Oct 17, 2024
Copy link
Contributor

coderabbitai bot commented Oct 17, 2024

Walkthrough

The pull request introduces significant enhancements to the fgpyo/io/__init__.py and fgpyo/util/metric.py modules, along with corresponding tests in tests/fgpyo/util/test_metric.py. In the fgpyo/io/__init__.py, the to_reader and to_writer functions have been updated to return a more specific type, TextIOWrapper, and the function assert_path_is_writeable has been renamed to assert_path_is_writable with a deprecation warning. Minor corrections to error messages and the removal of unused imports were also performed.

In the fgpyo/util/metric.py, a new MetricFileHeader dataclass was added, and the Metric class received new methods for better handling of field names and values. A new MetricWriter class was introduced to facilitate writing metrics to files, supporting appending and excluding specific attributes. Tests in test_metric.py have been expanded to cover new functionalities and edge cases, ensuring robust validation of the metrics handling system.

Assessment against linked issues

Objective Addressed Explanation
Add MetricWriter to permit streamed writing of metrics (#88)
Metric.write should support excluding attributes (#90)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (6)
fgpyo/io/__init__.py (1)

Line range hint 92-101: LGTM: Good use of deprecation warning for renamed function.

The renaming of assert_path_is_writeable to assert_path_is_writable improves consistency with American English spelling. The deprecation warning is a good practice for maintaining backward compatibility while encouraging the use of the new function name.

Consider adding a FutureWarning instead of a DeprecationWarning to make the warning more visible to users:

warnings.warn(
    "assert_path_is_writeable is deprecated, use assert_path_is_writable instead",
    FutureWarning,
    stacklevel=2,
)
fgpyo/util/metric.py (3)

450-459: Update parameter name in the docstring to match the method signature.

In the __init__ method of MetricWriter, the parameter is named filename, but the docstring refers to it as path. For consistency and clarity, the docstring should use filename.

Apply this diff to correct the parameter name in the docstring:

             """
             Args:
-                path: Path to the output file.
+                filename: Path to the output file.
                 metric_class: Metric class.
                 append: If `True`, the file will be appended to. Otherwise, the specified file will be
                     overwritten.

523-525: Remove unnecessary call to super().__exit__ in the __exit__ method.

Since AbstractContextManager does not define a meaningful __exit__ method, calling super().__exit__ is unnecessary and can be omitted.

Apply this diff to simplify the __exit__ method:

         def __exit__(
             self,
             exc_type: Type[BaseException],
             exc_value: BaseException,
             traceback: TracebackType,
         ) -> None:
             self.close()
-            super().__exit__(exc_type, exc_value, traceback)

450-456: Add type variable to metric_class parameter for better type checking.

In the __init__ method of MetricWriter, specifying the generic type for metric_class improves type safety and clarity.

Apply this diff to add the type variable:

         def __init__(
             self,
             filename: Union[Path, str],
-            metric_class: Type[Metric],
+            metric_class: Type[MetricType],
             append: bool = False,
             delimiter: str = "\t",
             include_fields: Optional[List[str]] = None,
tests/fgpyo/util/test_metric.py (2)

590-603: Add a docstring to test_writer function

Consider adding a docstring to the test_writer test function to explain its purpose and improve code readability.


619-635: Add a docstring to test_writer_writeall function

Adding a docstring to the test_writer_writeall function would enhance clarity and maintain consistency with other test functions.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 09ecac6 and 6035efb.

📒 Files selected for processing (3)
  • fgpyo/io/init.py (4 hunks)
  • fgpyo/util/metric.py (3 hunks)
  • tests/fgpyo/util/test_metric.py (5 hunks)
🧰 Additional context used
🔇 Additional comments (5)
fgpyo/io/__init__.py (5)

50-50: LGTM: Import changes improve clarity and remove unused imports.

The direct import of TextIOWrapper and removal of the unused Union import align well with the changes in the file, improving code clarity and reducing unnecessary imports.


Line range hint 157-170: LGTM: Improved type specificity in to_reader function.

The changes to the to_reader function improve type specificity by consistently using TextIOWrapper as the return type. This enhances type checking and code clarity without affecting the function's behavior.


Line range hint 175-196: LGTM: Enhanced type specificity in to_writer function.

The modifications to the to_writer function improve type specificity by consistently using TextIOWrapper as the return type. This change enhances type checking and code clarity without altering the function's behavior.


Line range hint 92-92: LGTM: Corrected typo in error messages.

The typo "becasue" has been corrected to "because" in the error messages of assertion functions. This minor correction improves the readability and professionalism of the error messages.

Also applies to: 104-104


Line range hint 1-292: Overall assessment: Excellent improvements to the fgpyo/io/__init__.py module.

The changes in this file significantly enhance type specificity, correct minor issues, and improve code clarity. The modifications to to_reader and to_writer functions, along with the renaming of assert_path_is_writeable, align well with the PR objectives. These improvements contribute to better code quality, maintainability, and type safety.

fgpyo/util/metric.py Outdated Show resolved Hide resolved
fgpyo/util/metric.py Outdated Show resolved Hide resolved
fgpyo/util/metric.py Outdated Show resolved Hide resolved
tests/fgpyo/util/test_metric.py Outdated Show resolved Hide resolved
tests/fgpyo/util/test_metric.py Outdated Show resolved Hide resolved
tests/fgpyo/util/test_metric.py Outdated Show resolved Hide resolved
@msto msto force-pushed the ms_metric-writer-feature-branch branch from dd8730f to 4ab75b7 Compare October 17, 2024 21:53
Copy link

codecov bot commented Oct 17, 2024

Codecov Report

Attention: Patch coverage is 96.63866% with 4 lines in your changes missing coverage. Please review.

Project coverage is 89.71%. Comparing base (8c3c152) to head (2f47770).
Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
fgpyo/util/metric.py 96.42% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #123      +/-   ##
==========================================
+ Coverage   89.40%   89.71%   +0.30%     
==========================================
  Files          18       18              
  Lines        2105     2207     +102     
  Branches      467      489      +22     
==========================================
+ Hits         1882     1980      +98     
- Misses        146      149       +3     
- Partials       77       78       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@tfenne tfenne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also replace metric.write() with an implementaiton that uses the new writer so we don't have duplicate code paths/

fgpyo/util/metric.py Outdated Show resolved Hide resolved
fgpyo/util/metric.py Outdated Show resolved Hide resolved
fgpyo/util/metric.py Outdated Show resolved Hide resolved
fgpyo/util/metric.py Outdated Show resolved Hide resolved
fgpyo/util/metric.py Show resolved Hide resolved
@msto msto requested a review from tfenne October 17, 2024 23:05
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (15)
fgpyo/util/metric.py (9)

179-203: LGTM: New methods enhance Metric class functionality.

The new methods keys(), items(), and formatted_items() provide a consistent and pythonic way to access field names and values. This addition improves the usability of the Metric class.

However, there's a minor inconsistency in the formatted_items() method:

The docstring for formatted_items() states it returns an iterator, but it actually returns a list. Consider updating the docstring to reflect this:

-    """An iterator over formatted attribute values in the same order as the header."""
+    """A list of tuples containing field names and their formatted values in the same order as the header."""

302-304: Complete the implementation of MetricWriter in the write() method.

The change to use MetricWriter is a good improvement, allowing for more flexible file handling. However, the TODO comment suggests that the implementation is not yet complete.

Consider removing the TODO comment and fully implementing the MetricWriter usage. For example:

-        # TODO: open a MetricWriter here instead
-        with MetricWriter[MetricType](path, metric_class=values[0].__class__) as writer:
+        with MetricWriter[MetricType](path, metric_class=cls) as writer:
             writer.writeall(values)

This change uses cls instead of values[0].__class__, which is more consistent and doesn't assume that values is non-empty.


382-422: LGTM: New _read_header() method handles file headers well.

The new _read_header() method is well-structured and effectively handles comments, empty lines, and field name extraction. It correctly returns a MetricFileHeader object, maintaining consistency with the new dataclass.

Consider adding error handling for potential edge cases:

  1. Handle the case where the file is empty or contains only comments/empty lines more explicitly.
  2. Consider adding a maximum number of lines to read before giving up on finding the header.

For example:

MAX_LINES_TO_READ = 1000  # Adjust as needed

for line_num, line in enumerate(reader, 1):
    if line_num > MAX_LINES_TO_READ:
        raise ValueError(f"Could not find header within {MAX_LINES_TO_READ} lines")
    # ... rest of the existing logic ...

if not fieldnames:
    raise ValueError("File contains no valid header")

This addition would prevent the method from hanging on extremely large files with no valid header.


425-437: LGTM: New _is_metric_class() function enhances type checking.

The new _is_metric_class() function is well-structured and effectively handles both dataclasses and attr classes. The use of TypeGuard enhances type checking capabilities.

Consider renaming the function to be more descriptive of its purpose:

-def _is_metric_class(cls: Any) -> TypeGuard[Metric]:
+def _is_valid_metric_class(cls: Any) -> TypeGuard[Metric]:

This name more clearly indicates that the function is checking for a valid Metric class, not just any Metric class.


440-570: LGTM: New MetricWriter class provides flexible metric writing.

The MetricWriter class is well-structured and offers flexible options for writing metrics, including appending, custom delimiters, and field inclusion/exclusion. The implementation of AbstractContextManager ensures proper file handling.

Consider enhancing error handling and messages:

  1. In the __init__ method, provide more specific error messages:
if not _is_metric_class(metric_class):
    raise TypeError(f"Expected a Metric subclass, got {type(metric_class).__name__}")
  1. In the write method, add a type check for the metric argument:
if not isinstance(metric, self._metric_class):
    raise TypeError(f"Expected {self._metric_class.__name__}, got {type(metric).__name__}")

These changes will provide more informative error messages, making debugging easier for users of the class.


573-606: LGTM: New function handles fieldname validation and generation well.

The _validate_and_generate_final_output_fieldnames() function is well-structured and correctly handles the include_fields and exclude_fields options. It properly validates that specified fields are attributes of the metric class.

Consider shortening the function name for better readability:

-def _validate_and_generate_final_output_fieldnames(
+def _get_validated_output_fieldnames(

This shorter name still conveys the function's purpose while improving code readability.


609-648: LGTM: New function effectively validates file headers against metric classes.

The _assert_file_header_matches_metric() function is well-structured and performs necessary checks to ensure file headers match the given metric class. It appropriately uses the new _read_header() method for parsing.

Consider enhancing the error messages for better debugging:

  1. When the header is empty:
if not header.fieldnames:
    raise ValueError(f"The file '{path}' does not contain a valid header.")
  1. When there's a mismatch between expected and actual fields:
if header.fieldnames != fieldnames:
    missing_fields = set(fieldnames) - set(header.fieldnames)
    extra_fields = set(header.fieldnames) - set(fieldnames)
    error_msg = f"Header mismatch in file '{path}':\n"
    if missing_fields:
        error_msg += f"Missing fields: {', '.join(missing_fields)}\n"
    if extra_fields:
        error_msg += f"Extra fields: {', '.join(extra_fields)}"
    raise ValueError(error_msg)

These changes provide more detailed information about the nature of the mismatch, which can be helpful for debugging.


651-671: LGTM: New function efficiently validates fieldnames against metric attributes.

The _assert_fieldnames_are_metric_attributes() function is concise and effectively checks if specified fieldnames are attributes of a metric class. The use of set operations for comparison is efficient.

Consider improving the error message to be more specific about which fieldnames are invalid:

if invalid_fieldnames:
    raise ValueError(
        f"The following fields are not attributes of the Metric {metric_class.__name__}: "
        f"{', '.join(invalid_fieldnames)}. "
        f"Valid attributes are: {', '.join(metric_class.keys())}"
    )

This change provides more detailed information about which fields are invalid and what the valid options are, making it easier for users to correct their input.


673-684: LGTM: New function effectively asserts valid Metric classes.

The _assert_is_metric_class() function is concise and effectively asserts that a given class is a valid Metric. It appropriately uses the previously defined _is_metric_class() function.

Consider improving the error message to provide more information about why the class is not a valid Metric:

if not _is_metric_class(cls):
    raise TypeError(
        f"{cls.__name__} is not a valid Metric class. "
        "Ensure it is a subclass of Metric and decorated with @dataclass or @attr.s."
    )

This change provides more specific guidance on what makes a valid Metric class, which can help users understand and correct the issue more easily.

tests/fgpyo/util/test_metric.py (6)

548-568: LGTM: Good test for Picard-formatted header parsing

This test effectively verifies the ability to read headers from Picard-formatted files. It creates a mock file with realistic header content and checks if the correct fieldnames are extracted.

Consider adding an assertion to verify the preamble content of the header as well, to ensure all parts of the header are correctly parsed.


591-603: LGTM: Comprehensive test for MetricWriter basic functionality

This test effectively verifies the core functionality of MetricWriter, including writing the header and multiple metric instances. The assertions thoroughly check the output file content.

Consider adding a test case with append=True to ensure the appending functionality works correctly as well.


766-772: LGTM: Important edge case test for FIFO handling

This test effectively verifies that MetricWriter raises the appropriate error when attempting to append to a FIFO (named pipe). It's an important check to prevent misuse of the MetricWriter with non-file outputs.

Consider adding similar tests for other special file types (e.g., /dev/null, character devices) to ensure comprehensive coverage of potential edge cases.


775-780: LGTM: Basic test for _assert_is_metric_class

This test verifies that _assert_is_metric_class correctly handles a valid Metric subclass. It's a good starting point for testing this assertion function.

Consider adding an explicit assertion (e.g., assert True) at the end of the test to make the expected behavior more clear. Also, you might want to add tests with both attr and dataclasses implementations to ensure comprehensive coverage.


821-839: LGTM: Well-structured test for valid fieldnames

This test effectively verifies that _assert_fieldnames_are_metric_attributes correctly handles various valid fieldname scenarios. The use of parametrization allows for comprehensive coverage of different cases.

Consider adding an explicit assertion (e.g., assert True) at the end of the test to make the expected behavior more clear. Additionally, you might want to add a test case with an empty list of fieldnames to cover that edge case.


862-894: LGTM: Thorough tests for file header matching

These tests effectively verify the behavior of _assert_file_header_matches_metric for both valid and invalid headers. The use of parametrization in the second test allows for comprehensive coverage of different error scenarios.

Consider adding an explicit assertion (e.g., assert True) at the end of the test_assert_file_header_matches_metric function to make the expected behavior more clear.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 6035efb and c6714a1.

📒 Files selected for processing (3)
  • fgpyo/io/init.py (4 hunks)
  • fgpyo/util/metric.py (4 hunks)
  • tests/fgpyo/util/test_metric.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • fgpyo/io/init.py
🧰 Additional context used
🔇 Additional comments (18)
fgpyo/util/metric.py (1)

116-166: LGTM: New imports and classes added.

The new imports (AbstractContextManager, DictWriter, TextIOWrapper, TypeGuard) and new classes (MetricFileHeader, MetricWriter) are well-structured and necessary for the enhanced functionality being introduced. The MetricFileHeader dataclass provides a clear structure for storing file header information, which will improve code readability and maintainability.

tests/fgpyo/util/test_metric.py (17)

395-397: LGTM: Good test coverage for keys() method

The test effectively verifies the keys() method for both attr and dataclasses implementations of the Person class. It's concise and covers the expected functionality.


405-409: LGTM: Comprehensive test for items() method

This test thoroughly checks the items() method of the Person class for both attr and dataclasses implementations. The docstring clearly explains the test's purpose, and the assertion verifies the correct output format and values.


423-426: LGTM: Well-structured test for formatted_items() method

This test effectively verifies the formatted_items() method of the Person class. It correctly checks the output format and ensures that values are properly formatted (e.g., age as a string). The test covers both attr and dataclasses implementations.


570-582: LGTM: Good edge case coverage for empty file

This test effectively verifies the behavior of _read_header when given an empty file. It correctly checks that both the preamble and fieldnames are empty lists, which is the expected behavior for this edge case.


585-588: LGTM: Well-defined FakeMetric class for testing

The FakeMetric class is correctly implemented as a dataclass and Metric subclass. It provides a simple and clear structure for testing Metric-related functionality with two different types of fields.


606-617: LGTM: Good test for MetricWriter string filename support

This test effectively verifies that MetricWriter can be initialized with a string filename. It's an important test case that ensures flexibility in how users can specify the output file path.


620-635: LGTM: Thorough test for MetricWriter's writeall method

This test effectively verifies the writeall method of MetricWriter, which is crucial for bulk writing of metrics. It correctly checks the header and all data lines in the output file, ensuring that multiple metrics are written properly.


638-654: LGTM: Well-structured test for MetricWriter append functionality

This test thoroughly verifies the append functionality of MetricWriter. It correctly checks that the original header is preserved and new data is properly appended to an existing file. This is a crucial feature for updating metric files without overwriting existing data.


657-664: LGTM: Good edge case test for appending to empty file

This test effectively verifies that MetricWriter raises the appropriate error when attempting to append to an empty file. It's an important edge case that ensures robust error handling in the MetricWriter class.


667-675: LGTM: Thorough edge case test for appending to file without header

This test effectively verifies that MetricWriter raises the appropriate error when attempting to append to a file that contains data but no header. It's an important edge case that ensures the integrity of the metric file structure is maintained.


678-690: LGTM: Crucial test for header mismatch scenario

This test effectively verifies that MetricWriter raises the appropriate error when attempting to append to a file with a header that doesn't match the metric class. It's a critical check that ensures data consistency and prevents mixing of incompatible metrics in the same file.


693-714: LGTM: Well-implemented test for field inclusion

This test effectively verifies the ability to include only specific fields when writing metrics. It correctly checks that only the specified field ("foo") is present in the output, demonstrating the flexibility of the MetricWriter in customizing the metric output.


717-738: LGTM: Thorough test for field reordering

This test effectively verifies the ability to reorder fields when writing metrics. It correctly checks that the fields are written in the specified order, demonstrating the flexibility of the MetricWriter in customizing the output format. This feature is particularly useful for maintaining compatibility with existing systems or specific reporting requirements.


741-763: LGTM: Comprehensive test for field exclusion

This test effectively verifies the ability to exclude specific fields when writing metrics. It correctly checks that the excluded field ("bar") is not present in the output, further demonstrating the flexibility of the MetricWriter in customizing the metric output. This feature is particularly useful for omitting sensitive or unnecessary information from the output.


783-794: LGTM: Thorough test for undecorated Metric subclass

This test effectively verifies that _assert_is_metric_class raises the appropriate error when given an undecorated Metric subclass. It's an important check to ensure that Metric subclasses are properly implemented with the required decorators.


797-817: LGTM: Comprehensive test for non-Metric classes

This test effectively verifies that _assert_is_metric_class raises the appropriate error when given classes that are decorated but not Metric subclasses. It covers both dataclass and attr implementations, ensuring robust checking of the assertion function's behavior.


841-859: LGTM: Comprehensive test for invalid fieldnames

This test effectively verifies that _assert_fieldnames_are_metric_attributes raises the appropriate error when given invalid fieldnames. The use of parametrization allows for thorough coverage of different error scenarios, ensuring robust checking of the assertion function's behavior.

msto added 9 commits October 17, 2024 16:10
* feat: add asdict

* fix: typeguard

fix: typeguard import

doc: update docstring

refactor: import Instance types

refactor: import Instance types

* fix: 3.8 compatible Dict typing

* refactor: make instance checks private

* tests: add coverage

* fix: typeerror

* doc: clarify that asdict does not format values
* feat: add read_header

* fix: rm kw_only to support 3.9

* fix: use Typing.List to support 3.8

* fix: pr suggestions
This PR introduces several of the assertion methods used when
constructing the `MetricWriter`. (See #107 for how they are used in
practice).
Closes #88.

I've adapted the `DataclassWriter` from
https://github.com/msto/dataclass_io/ to work with Metrics.
@msto msto force-pushed the ms_metric-writer-feature-branch branch from c6714a1 to d100baa Compare October 17, 2024 23:10
fgpyo/util/metric.py Outdated Show resolved Hide resolved
fgpyo/util/metric.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (9)
fgpyo/util/metric.py (5)

179-203: LGTM! New methods enhance Metric class interface, but minor inconsistency in formatted_items().

The new keys(), items(), and formatted_items() methods provide a more intuitive interface for accessing metric data. However, there's a minor inconsistency in the formatted_items() method.

Consider updating the docstring or return type of formatted_items():

-    def formatted_items(self) -> List[Tuple[str, str]]:
-        """An iterator over formatted attribute values in the same order as the header."""
+    def formatted_items(self) -> List[Tuple[str, str]]:
+        """A list of tuples containing field names and their formatted values in the same order as the header."""

381-421: LGTM! Comprehensive _read_header() method with clear documentation.

The new _read_header() static method is well-implemented and thoroughly documented. It handles various edge cases and provides a clear structure for the parsed header.

Consider adding a brief comment explaining the use of the else clause in the for loop:

         else:
+            # If we've exhausted the file without finding a non-comment, non-empty line
             # If the file was empty, kick back an empty header
             fieldnames = []

This would clarify the purpose of this edge case handling.


424-437: LGTM! Robust _is_metric_class() function with good error handling.

The new _is_metric_class() function provides a robust check for valid Metric classes, handling both dataclasses and attr-decorated classes. The use of TypeGuard enhances type checking capabilities.

Consider adding a brief comment explaining the purpose of the attr import attempt:

     try:
         import attr
+        # Check for attr-decorated classes if attr is installed
         is_metric_cls = is_metric_cls and (dataclasses.is_dataclass(cls) or attr.has(cls))
     except ImportError:
+        # Fall back to dataclass-only check if attr is not installed
         is_metric_cls = is_metric_cls and dataclasses.is_dataclass(cls)

This would clarify the reasoning behind the import attempt and fallback behavior.


439-569: LGTM! Comprehensive MetricWriter class with good resource management.

The new MetricWriter class is well-implemented, providing flexible options for writing metrics to files. The use of AbstractContextManager ensures proper resource handling, and the docstrings are detailed and informative.

For consistency with other parts of the codebase, consider using ValueError instead of AssertionError in the init method:

-            AssertionError: If the provided filepath is not writable.
-            AssertionError: If `append=True` and the provided file is not readable. (When appending,
+            ValueError: If the provided filepath is not writable.
+            ValueError: If `append=True` and the provided file is not readable. (When appending,
                 we check to ensure that the header matches the specified metric class. The file must
                 be readable to get the header.)

Also, update the corresponding error raising in the method body:

-        io.assert_path_is_writable(filepath)
+        if not io.is_writable(filepath):
+            raise ValueError(f"The file '{filepath}' is not writable.")
         if append:
-            io.assert_path_is_readable(filepath)
+            if not io.is_readable(filepath):
+                raise ValueError(f"The file '{filepath}' is not readable for appending.")

608-647: LGTM! Comprehensive file header validation function.

The _assert_file_header_matches_metric() function provides a thorough check for file header consistency with the metric class. The implementation uses the new _read_header() method, ensuring consistent header parsing.

Consider adding a check for the case where the file is empty:

     with path.open("r") as fin:
+        if fin.read(1) == '':
+            raise ValueError(f"The provided file is empty: {path}")
+        fin.seek(0)
         header: MetricFileHeader = metric_class._read_header(fin, delimiter=delimiter)

This would provide a more specific error message for empty files.

tests/fgpyo/util/test_metric.py (4)

395-398: Add a docstring to test_metric_keys for clarity

The test function test_metric_keys lacks a docstring. Adding a descriptive docstring will improve code readability and maintain consistency with other test functions.


423-427: Add a docstring to test_metric_formatted_items for consistency

Including a docstring in the test_metric_formatted_items function will enhance clarity and maintain consistency across the test suite.


591-604: Include a docstring in test_writer to describe the test's purpose

Adding a docstring to the test_writer function will help others understand what is being tested and improve code documentation.


620-636: Provide a docstring for test_writer_writeall

The function test_writer_writeall lacks a docstring. Documenting the purpose of this test will enhance readability and clarity.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between c6714a1 and 7c998f9.

📒 Files selected for processing (3)
  • fgpyo/io/init.py (4 hunks)
  • fgpyo/util/metric.py (4 hunks)
  • tests/fgpyo/util/test_metric.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • fgpyo/io/init.py
🧰 Additional context used
🔇 Additional comments (7)
fgpyo/util/metric.py (5)

116-166: LGTM! Improved imports and new MetricFileHeader class.

The new imports and the introduction of the MetricFileHeader class enhance the module's functionality and provide a clear structure for handling file headers. The dataclass implementation is clean and well-documented.


302-303: LGTM! Improved write() method using MetricWriter.

The change to use MetricWriter in the write() method improves the separation of concerns and allows for more flexible writing options. This modification enhances the overall design and maintainability of the class.


572-605: LGTM! Well-implemented fieldname validation and generation function.

The _validate_and_generate_final_output_fieldnames() function provides a robust mechanism for handling include and exclude field options. The implementation is thorough, with clear error messages and concise logic for generating the final output fieldnames.


650-670: LGTM! Efficient fieldname validation function.

The _assert_fieldnames_are_metric_attributes() function provides an efficient and clear mechanism for validating that specified fieldnames are attributes of a metric class. The use of set operations for comparison is a good choice for performance.


672-683: LGTM! Clear and concise Metric class assertion function.

The _assert_is_metric_class() function provides a clear mechanism for asserting that a class is a valid Metric class. It effectively leverages the _is_metric_class() function and provides a specific error message for invalid classes.

tests/fgpyo/util/test_metric.py (2)

776-781: LGTM: test_assert_is_metric_class correctly validates Metric classes

The test function test_assert_is_metric_class appropriately verifies that _assert_is_metric_class works as intended, ensuring that valid Metric classes pass the assertion without raising exceptions.


821-840: LGTM: test_assert_fieldnames_are_metric_attributes correctly checks field names

The test cases in test_assert_fieldnames_are_metric_attributes effectively validate that the provided field names are attributes of the Person metric class in various scenarios, including different orders and subsets.

tests/fgpyo/util/test_metric.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
tests/fgpyo/util/test_metric.py (1)

769-771: Consider enhancing platform check for broader compatibility

In the test_writer_raises_if_fifo function, the current check if os.name == "nt": skips the test on Windows systems. However, other non-Unix-like operating systems may also lack /dev/stdout. To ensure the test is only run on Unix-like systems where /dev/stdout is available, consider modifying the condition to:

if os.name != "posix":

This change will skip the test on all non-Posix systems, enhancing cross-platform compatibility.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 7c998f9 and 2f47770.

📒 Files selected for processing (1)
  • tests/fgpyo/util/test_metric.py (5 hunks)
🧰 Additional context used
🔇 Additional comments (7)
tests/fgpyo/util/test_metric.py (7)

798-799: Verify exception message matches actual raised error

In test_assert_is_metric_class_raises_if_not_decorated, the pytest.raises call matches the error message "Not a dataclass or attr decorated Metric". Please verify that the _assert_is_metric_class function raises an exception with this exact message when the class is a Metric subclass but not decorated. Ensuring the error message matches prevents tests from passing erroneously or failing due to message discrepancies.


812-813: Verify exception message matches actual raised error

Similarly, in test_assert_is_metric_class_raises_if_not_a_metric, confirm that the exception raised by _assert_is_metric_class contains the message "Not a dataclass or attr decorated Metric". Accurate error message matching is important for the reliability of the test, ensuring it fails only when appropriate.


396-399: Well-structured tests for new Metric methods

The added tests test_metric_keys, test_metric_items, and test_metric_formatted_items effectively validate the new methods keys(), items(), and formatted_items() of the Metric class. The tests confirm that these methods return the expected keys and items, both in raw and formatted forms, enhancing confidence in the new functionality.

Also applies to: 406-411, 424-428


549-570: Robust testing of header-reading capabilities

The functions test_read_header_can_read_picard and test_read_header_can_read_empty provide solid tests for the header-reading functionality. They ensure that headers are correctly parsed from Picard-formatted files and that empty files are handled gracefully, which increases the robustness of the MetricFileHeader class.

Also applies to: 571-584


592-765: Comprehensive testing of MetricWriter functionality

The suite of tests for the MetricWriter class—including test_writer, test_writer_append, test_writer_include_fields, and others—thoroughly checks various writing scenarios. These tests validate appending to files, handling empty or mismatched headers, including or excluding fields, and reordering fields. This comprehensive coverage ensures that MetricWriter behaves correctly in diverse situations.


781-785: Correct validation of Metric subclass in test

The test_assert_is_metric_class function accurately tests that _assert_is_metric_class accepts a valid Metric subclass. This positive test complements the negative tests and confirms that the validation behaves as expected for proper classes.


825-864: Effective testing of field name assertions

The tests test_assert_fieldnames_are_metric_attributes and test_assert_fieldnames_are_metric_attributes_raises effectively verify that _assert_fieldnames_are_metric_attributes correctly identifies valid and invalid field names. By testing various combinations of field names, these tests ensure that only appropriate field names are accepted, which strengthens input validation.

@tfenne tfenne dismissed nh13’s stale review October 21, 2024 13:59

nh13 is on leave and can't re-review

@msto msto merged commit 318ff98 into main Oct 21, 2024
9 checks passed
@msto msto deleted the ms_metric-writer-feature-branch branch October 21, 2024 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metric.write should support excluding attributes Add MetricWriter to permit streamed writing of metrics
3 participants