-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for anndata #94
Conversation
WalkthroughThe changes update the batch correction command by enhancing the Changes
Sequence Diagram(s)sequenceDiagram
participant U as User/CLI
participant RC as run_batch_correction
participant CA as create_anndata
participant FS as FileSystem
U->>RC: Invoke batch correction with export_anndata=True
RC->>FS: Validate output file exists
RC->>CA: Create AnnData object from DataFrame
CA-->>RC: Return AnnData object
RC->>FS: Save AnnData file (.h5ad)
FS-->>RC: Confirm file saved
Possibly related PRs
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here. PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here. PR Code Suggestions ✨Explore these optional code suggestions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (7)
tests/test_file_utils.py (2)
18-25
: Enhance test coverage forcombine_ibaq_tsv_files
.The test only verifies the shape of the DataFrame. Consider adding assertions for:
- Content validation (e.g., specific column names, data types)
- Edge cases (e.g., empty files, malformed files)
- Error conditions (e.g., invalid separators, non-existent paths)
28-48
: Enhance test coverage forcreate_anndata
.While the test verifies basic functionality, consider adding assertions for:
- Content validation of the layers (e.g., specific values, data types)
- Edge cases (e.g., missing metadata columns, empty DataFrames)
- Error conditions (e.g., invalid column names, duplicate indices)
tests/test_batch_correction.py (1)
37-45
: Enhance test coverage for AnnData object validation.While the test verifies the creation and shape of the AnnData object, consider adding assertions for:
- Content validation of the layers (e.g., specific values, data types)
- Metadata validation (e.g., column names, data types)
- Error conditions (e.g., invalid export paths)
ibaqpy/ibaq/file_utils.py (2)
71-73
: Add stacklevel to warnings.Add
stacklevel=2
to the warnings to help users identify the source of the warning in their code.- warnings.warn( - f"Column '{col}' not found in the input DataFrame. Skipping metadata for '{col}'." - ) + warnings.warn( + f"Column '{col}' not found in the input DataFrame. Skipping metadata for '{col}'.", + stacklevel=2 + )- warnings.warn( - f"Layer column '{layer_col}' not found in the input DataFrame. Skipping layer '{layer_col}'." - ) + warnings.warn( + f"Layer column '{layer_col}' not found in the input DataFrame. Skipping layer '{layer_col}'.", + stacklevel=2 + )Also applies to: 92-94
🧰 Tools
🪛 Ruff (0.8.2)
71-71: No explicit
stacklevel
keyword argument found(B028)
104-149
: Enhance error handling incombine_ibaq_tsv_files
.Consider adding error handling for:
- Invalid file formats or corrupted files
- Permission issues when reading files
- Memory constraints when handling large files
def combine_ibaq_tsv_files( dir_path: str, pattern: str = "*", comment: str = "#", sep: str = "\t" ) -> pd.DataFrame: file_paths = glob.glob(f"{dir_path}/{pattern}") if not file_paths: raise FileNotFoundError( f"No files found in the directory '{dir_path}' matching the pattern '{pattern}'." ) dataframes = [] for file_path in file_paths: - # Read the TSV file, skipping lines that start with the comment character - df = pd.read_csv(file_path, sep=sep, comment=comment) - dataframes.append(df) + try: + # Read the TSV file, skipping lines that start with the comment character + df = pd.read_csv(file_path, sep=sep, comment=comment) + dataframes.append(df) + except pd.errors.EmptyDataError: + warnings.warn(f"Skipping empty file: {file_path}", stacklevel=2) + continue + except (pd.errors.ParserError, pd.errors.ParserWarning) as e: + raise ValueError(f"Error parsing file {file_path}: {str(e)}") + except PermissionError: + raise PermissionError(f"Permission denied when reading file: {file_path}") + except MemoryError: + raise MemoryError(f"Insufficient memory to read file: {file_path}") # Concatenate all DataFrames combined_df = pd.concat(dataframes, ignore_index=True) return combined_dfibaqpy/commands/correct_batches.py (2)
154-171
: Improve error handling in AnnData export.The error handling in the AnnData export block could be improved by properly chaining exceptions.
Apply this diff to improve error handling:
try: adata.write(adata_filename) except Exception as e: - raise ValueError(f"Failed to write AnnData object: {e}") + raise ValueError(f"Failed to write AnnData object: {e}") from e🧰 Tools
🪛 Ruff (0.8.2)
170-170: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
217-222
: Fix formatting inconsistency in CLI option.The formatting of the
ibaq_raw_column
option is inconsistent with other options. The indentation and line breaks should match the style used in other options.Apply this diff to fix the formatting:
-@click.option("-ibaq", - "--ibaq_raw_column", - help="Name of the raw iBAQ column", - required=False, - default=IBAQ - ) +@click.option( + "-ibaq", + "--ibaq_raw_column", + help="Name of the raw iBAQ column", + required=False, + default=IBAQ, +)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
ibaqpy/commands/correct_batches.py
(9 hunks)ibaqpy/ibaq/file_utils.py
(1 hunks)ibaqpy/ibaq/ibaqpy_postprocessing.py
(7 hunks)requirements.txt
(1 hunks)tests/test_batch_correction.py
(3 hunks)tests/test_file_utils.py
(1 hunks)tests/test_ibaqpy_postprocessing.py
(0 hunks)
💤 Files with no reviewable changes (1)
- tests/test_ibaqpy_postprocessing.py
✅ Files skipped from review due to trivial changes (1)
- requirements.txt
🧰 Additional context used
🪛 Ruff (0.8.2)
ibaqpy/ibaq/file_utils.py
71-71: No explicit stacklevel
keyword argument found
(B028)
92-92: No explicit stacklevel
keyword argument found
(B028)
ibaqpy/commands/correct_batches.py
170-170: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build
- GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (3)
ibaqpy/ibaq/ibaqpy_postprocessing.py (1)
18-20
: LGTM!The formatting changes improve readability while maintaining functionality.
Also applies to: 57-60, 146-153, 156-158, 221-223
ibaqpy/commands/correct_batches.py (2)
2-2
: LGTM!The new imports appropriately support the AnnData export functionality and the use of constants for default values.
Also applies to: 8-9
80-106
: LGTM!The function signature updates are well-documented and use appropriate default values from constants.
Co-authored-by: qodo-merge-pro[bot] <151058649+qodo-merge-pro[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (2)
ibaqpy/ibaq/file_utils.py (2)
78-79
: Add validation for metadata values.The metadata mapping could result in
NaN
values if the mapping is incomplete. Add validation to ensure all metadata values are mapped correctly.Apply this diff to add validation:
mapping = df[[key, col]].drop_duplicates().set_index(key)[col] + # Check for duplicate keys with different values + if len(mapping) != len(df[[key, col]].drop_duplicates()): + warnings.warn( + f"Found duplicate keys with different values in metadata column '{col}'.", + stacklevel=2 + ) metadata_df[col] = metadata_df.index.map(mapping) + # Check for unmapped values + if metadata_df[col].isna().any(): + warnings.warn( + f"Some metadata values in column '{col}' could not be mapped.", + stacklevel=2 + )
149-151
: Add validation for concatenation result.The concatenation operation should be validated to ensure the result is not empty and has the expected shape.
Apply this diff to add validation:
# Concatenate all DataFrames combined_df = pd.concat(dataframes, ignore_index=True) + # Validate concatenation result + if combined_df.empty: + raise ValueError("Concatenation resulted in an empty DataFrame") + + # Validate that the number of rows matches the sum of input DataFrames + expected_rows = sum(len(df) for df in dataframes) + if len(combined_df) != expected_rows: + warnings.warn( + f"Expected {expected_rows} rows after concatenation, but got {len(combined_df)}", + stacklevel=2 + ) + return combined_df
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
ibaqpy/ibaq/file_utils.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
ibaqpy/ibaq/file_utils.py
73-73: No explicit stacklevel
keyword argument found
(B028)
94-94: No explicit stacklevel
keyword argument found
(B028)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
This reverts commit 01843c7.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
ibaqpy/ibaq/file_utils.py (3)
82-83
: Validate metadata uniqueness.The metadata mapping assumes unique key-value pairs. Add validation to ensure there are no duplicate mappings that could lead to data loss.
Apply this diff to add validation:
# Create a mapping from key to metadata values. + # Check for duplicate mappings + duplicates = df[[key, col]].duplicated(subset=[key]) + if duplicates.any(): + warnings.warn( + f"Found duplicate values for '{col}' metadata. Using the first occurrence.", + stacklevel=2 + ) mapping = df[[key, col]].drop_duplicates().set_index(key)[col] metadata_df[col] = metadata_df.index.map(mapping)
165-165
: Use exception chaining for better error tracking.When re-raising exceptions within an except clause, use exception chaining to preserve the original traceback.
Apply this diff:
- raise ValueError(f"Error reading file '{file_path}': {str(e)}") + raise ValueError(f"Error reading file '{file_path}': {str(e)}") from e🧰 Tools
🪛 Ruff (0.8.2)
165-165: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
168-170
: Add validation for the combined DataFrame.The function should validate that the combined DataFrame is not empty and has the expected structure.
Apply this diff:
# Concatenate all DataFrames combined_df = pd.concat(dataframes, ignore_index=True) + # Validate the combined DataFrame + if combined_df.empty: + raise ValueError("Combined DataFrame is empty") + if not all(col in combined_df.columns for col in first_schema): + raise ValueError("Combined DataFrame is missing expected columns") + return combined_df
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
ibaqpy/ibaq/file_utils.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
ibaqpy/ibaq/file_utils.py
77-77: No explicit stacklevel
keyword argument found
(B028)
98-98: No explicit stacklevel
keyword argument found
(B028)
165-165: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build
- GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (1)
ibaqpy/ibaq/file_utils.py (1)
77-79
: Add stacklevel to warnings.The warnings should include a
stacklevel
parameter to ensure they point to the correct line in the user's code.Also applies to: 98-100
🧰 Tools
🪛 Ruff (0.8.2)
77-77: No explicit
stacklevel
keyword argument found(B028)
PR Type
Enhancement, Tests
Description
Added support for exporting corrected iBAQ values to AnnData objects.
Introduced a new utility function
create_anndata
for AnnData creation.Updated batch correction CLI to include AnnData export option.
Added and updated tests for AnnData export and related utilities.
Changes walkthrough 📝
correct_batches.py
Enable AnnData export in batch correction
ibaqpy/commands/correct_batches.py
export_anndata
option to batch correction function.workflow.
file_utils.py
Introduce AnnData creation and file combination utilities
ibaqpy/ibaq/file_utils.py
create_anndata
function for AnnData object creation.ibaqpy_postprocessing.py
Refactor postprocessing utilities and remove redundancy
ibaqpy/ibaq/ibaqpy_postprocessing.py
combine_ibaq_tsv_files
function.test_batch_correction.py
Add tests for AnnData export in batch correction
tests/test_batch_correction.py
test_file_utils.py
Add tests for AnnData creation and file utilities
tests/test_file_utils.py
create_anndata
function.test_ibaqpy_postprocessing.py
Update and clean up postprocessing tests
tests/test_ibaqpy_postprocessing.py
combine_ibaq_tsv_files
.requirements.txt
Add AnnData dependency
requirements.txt
anndata
dependency for AnnData support.Summary by CodeRabbit