Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metaproteomics update #146

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

nithujohn
Copy link
Collaborator

@nithujohn nithujohn commented Feb 26, 2025

Metaproteomics with changes

Summary by CodeRabbit

  • New Features
    • Expanded the ontology with an extensive set of new metadata terms to improve data classification and analysis.
    • Introduced terms covering human gut studies, sample identification, host characteristics (e.g., age, disease status, diet), health conditions, and environmental parameters such as storage and measurement details.

Copy link
Contributor

coderabbitai bot commented Feb 26, 2025

Walkthrough

This pull request adds a substantial number of new ontology terms to the pride_cv.obo file. The new terms cover identifiers and metadata for human gut studies, including information on sample names, project names, host characteristics (such as age, diet, disease status, etc.), and detailed parameters for sample handling and environmental analysis. Each term includes a unique identifier, definition, and hierarchical classification within the ontology.

Changes

File Changes Summary
pride_cv.obo Added extensive new terms covering human gut microbiome studies, sample metadata (name, project, storage, analysis), host characteristics, environmental parameters, and various analytical measurements.

Possibly related PRs

  • Metaproteomics #144: The changes in the main PR and the retrieved PR are related as both introduce new terms to the pride_cv.obo file, including overlapping terms such as "Human gut," "sample name," "host age," and "sample storage temperature."
  • added Olink HT #129: The changes in the main PR are related to the addition of new terms in the pride_cv.obo file, while the retrieved PR also modifies the same file by adding terms related to Olink instruments, indicating a direct connection in their code-level changes.
  • Update pride_cv.obo #125: The changes in the main PR, which involve adding numerous new terms to the pride_cv.obo file, are related to the retrieved PR, as both involve modifications to the same file and contribute to the enhancement of the ontology, albeit with different focuses on term addition and definition clarity.

Suggested labels

Review effort 4/5

Suggested reviewers

  • ypriverol
  • deeptijk

Poem

I'm a little rabbit, hopping through the code,
New terms bloom along each new node.
Fields of metadata dance in line and row,
With samples and host details all aglow.
In the world of ontologies, I cheer and grow! 🐇


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c88763 and ce0a562.

📒 Files selected for processing (1)
  • pride_cv.obo (1 hunks)
🔇 Additional comments (1)
pride_cv.obo (1)

4003-4009: Initial addition of 'human gut' term looks good overall.

Comment on lines +4031 to +4044
id: PRIDE:0000677
name: gastrointestinal tract disorder
def: "History of gastrointestinal tract disorders; can include multiple disorders. MIXS:0000280" [PRIDE:PRIDE]
is_a: MONDO:0000001 ! Disease
is_a: PRIDE:0000674 ! Human gut


[Term]
id: PRIDE:0000678
name: liver disorder
def: "History of liver disorders; can include multiple disorders. MIXS:0000282" [PRIDE:PRIDE]
is_a: MONDO:0000001 ! Disease
is_a: PRIDE:0000674 ! Human gut

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid using 'human gut' as a parent for disease terms.
The new terms gastrointestinal tract disorder (lines 4031–4036) and liver disorder (lines 4040–4044) both inherit from MONDO:0000001 ! Disease and PRIDE:0000674 ! Human gut. This dual inheritance is semantically inconsistent, because a disease entity should not also be classified as an anatomical entity. Consider modeling these relationships with alternative properties (e.g., occurs_in or part_of), rather than is_a.

Comment on lines +4010 to +4199
def: "Substance produced by the body, e.g. Stool, mucus, where the sample was obtained from. MIXS:0000888" [PRIDE:PRIDE]
is_a: NCIT:C12219 ! Anatomic Structure, System, or Substance
is_a: PRIDE:0000674 ! Human gut

[Term]
id: PRIDE:0000687
name: host total mass
def: "Total mass of the host at collection, the unit depends on host. MIXS:0000263" [PRIDE:PRIDE]
is_a: OBA:2045413 ! anatomical entity mass
is_a: PRIDE:0000674 ! Human gut

[Term]
id: PRIDE:0000688
name: host height
def: "The height of subject. MIXS:0000264" [PRIDE:PRIDE]
synonym: "body height" EXACT []
is_a: NCIT:C25347 ! Height
is_a: PRIDE:0000674 ! Human gut

[Term]
id: PRIDE:0000689
name: host diet
def: "Type of diet depending on the host, for animals omnivore, herbivore etc., for humans high-fat, meditteranean etc.; can include multiple diet types. MIXS:0000869" [PRIDE:PRIDE]
is_a: EFO:0002755 ! Diet
is_a: PRIDE:0000674 ! Human gut

[Term]
id: PRIDE:0000690
name: host last meal
def: "Content of last meal and time since feeding; can include multiple values. MIXS:0000870" [PRIDE:PRIDE]
is_a: NCIT:C80248 ! Meal
is_a: PRIDE:0000674 ! Human gut

[Term]
id: PRIDE:0000691
name: host family relationship
def: "Relationships to other hosts in the same study; can include multiple relationships. MIXS:0000872" [PRIDE:PRIDE]
synonym: "family relationship" EXACT []
is_a: BFO:0000016 ! disposition
is_a: PRIDE:0000674 ! Human gut

[Term]
id: PRIDE:0000692
name: host genotype
def: "Observed genotype MIXS:0000365" [PRIDE:PRIDE]
is_a: EFO:0004554 ! genomic measurement
is_a: PRIDE:0000674 ! Human gut

[Term]
id: PRIDE:0000693
name: host phenotype
def: "Phenotype of human or other host. Use terms from the phenotypic quality ontology (pato) or the Human Phenotype Ontology (HP). MIXS:0000274" [PRIDE:PRIDE]
is_a: BFO:0000019 ! quality
is_a: PRIDE:0000674 ! Human gut

[Term]
id: PRIDE:0000694
name: host body temperature
def: "Core body temperature of the host when sample was collected. MIXS:0000874" [PRIDE:PRIDE]
is_a: NCIT:C25206 ! Temperature
is_a: PRIDE:0000674 ! Human gut

[Term]
id: PRIDE:0000695
name: host body-mass index
def: "Body mass index, calculated as weight/(height)squared. MIXS:0000317" [PRIDE:PRIDE]
is_a: EFO:0004324 ! body weights and measurements
is_a: PRIDE:0000674 ! Human gut

[Term]
id: PRIDE:0000696
name: ethnicity
def: "A category of people who identify with each other, usually on the basis of presumed similarities such as a common language, ancestry, history, society, culture, nation or social treatment within their residing area. MIXS:0000895" [PRIDE:PRIDE]
is_a: BFO:0000019 ! quality
is_a: PRIDE:0000674 ! Human gut

[Term]
id: PRIDE:0000697
name: host occupation
def: "Most frequent job performed by subject. MIXS:0000896" [PRIDE:PRIDE]
is_a: NCIT:C19160 ! Occupation or Discipline
is_a: PRIDE:0000674 ! Human gut

[Term]
id: PRIDE:0000698
name: medical history performed
def: "Whether full medical history was collected. MIXS:0000897" [PRIDE:PRIDE]
is_a: NCIT:C16205 ! Healthcare activity
is_a: PRIDE:0000674 ! Human gut

[Term]
id: PRIDE:0000699
name: host pulse
def: "Resting pulse, measured as beats per minutes. MIXS:0000333" [PRIDE:PRIDE]
is_a: PRIDE:0000674 ! Human gut

[Term]
id: PRIDE:0000700
name: perturbation
def: "Type of perturbation, e.g. chemical administration, physical disturbance, etc., coupled with perturbation regimen including how many times the perturbation was repeated, how long each perturbation lasted, and the start and end time of the entire perturbation period; can include multiple perturbation types. MIXS:0000754" [PRIDE:PRIDE]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Re-examine “is_a” relationships for data fields and descriptors.
Many newly introduced terms (e.g., sample name, project name, special diet, host subject id, host age, host diet, etc.) are declared with is_a: PRIDE:0000674 ! human gut, is_a: PRIDE:0000828 ! Soil, or is_a: PRIDE:0000829 ! Water. Data fields, disease statuses, or descriptors are not true subtypes of these anatomical or environmental entities. Instead of is_a, consider using relationships better suited for describing context or usage, such as applies_to_environment, collected_in, used_for, or part_of.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce0a562 and 3064778.

📒 Files selected for processing (1)
  • pride_cv.obo (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: convert_and_validate_owl
🔇 Additional comments (5)
pride_cv.obo (5)

4010-4019: Re-examine the 'is_a' relationships for data fields.

Using "human gut," "soil," or "water" as parents for a data field (sample name) is semantically incorrect. Per the previous reviewer’s feedback, consider relationships like collected_in, applicable_environment, or part_of, instead of is_a.


4020-4029: Re-examine the 'is_a' relationships for data fields.

Using "human gut," "soil," or "water" as parents for a data field (project name) is semantically incorrect. Consider modeling these relationships with a property more fitting than is_a.


4030-4036: Avoid using 'human gut' as a parent for disease terms.

Diseases (e.g., gastrointestinal tract disorder) should not inherit from an anatomical entity. Consider alternative properties such as occurs_in or associated_with, rather than is_a.


4038-4044: Avoid using 'human gut' as a parent for disease terms.

Diseases (e.g., liver disorder) should not be declared as subtypes of an anatomical or environmental entity. Use a more appropriate relationship than is_a.


4121-4122: Correct spelling of “Mediterranean.”

Please fix the typographical error in the definition.

- def: "Type of diet depending on the host, for animals omnivore, herbivore etc., for humans high-fat, meditteranean etc.; can include..."
+ def: "Type of diet depending on the host, for animals omnivore, herbivore etc., for humans high-fat, Mediterranean etc.; can include..."

@ypriverol ypriverol linked an issue Feb 28, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MIXS package terms - metaproteomics
3 participants