v0.3.0
Highlights
Generate-and-Extract Command
This release adds a new command generate-extract
that composes two operations.
- generate a natural language description
- parse the NL description using SPIRES
Cell Type Use Case
(This use case based on a conversation with @dosumis)
For example, given a cell type such as Acinar Cell Of Salivary Gland
, generate a description using GPT describing many aspects of the cell type, from it's marker genes through to its function and diseases it is implicated in.
After that use the cell-type schema (https://w3id.org/ontogpt/cell_type) to extract this into structured form. As an optional next step use linkml-owl to generate OWL TBox axioms
Iterative generate-extract
The command can be executed in iterative mode - this will traverse the extracted subtypes with each iteration, gradually building up an ontology that is entirely generated from the "latent knowledge" in the LLM
Here is a screenshot of an ontology generated entirely using OntoGPT by traversing from "Interneuron" downwards:
There are many oddities about it, currently each iteration is independent so it has no way of knowing if it is has already made a concept, but an interesting proof of principle. The ugly pct-encoded labels indicate cases where it couldn't match to an existing concept in CL or other ontology, and may represent KB gaps to be filled
More thoughts here: cell type summaries
What's Changed
- Playing around: adding a phenotype extractor by @matentzn in #14
- add unit test to makefile by @cmungall in #16
- Linted and minor flake8 edits by @hrshdhgd in #15
- Add linter to workflow by @hrshdhgd in #17
- Improve dependencies, add a web optional by @vemonet in #21
- add recipe for test by @sierra-moxon in #23
- added pad krapow recipe by @justaddcoffee in #25
- Add recipe URL by @pkalita-lbl in #24
- Add Walforf Salad URL by @caufieldjh in #26
- Add rajma pulao to recipe-urls.csv by @turbomam in #27
- Adding gene set enrichment by @cmungall in #30
- enrichment by @cmungall in #31
- README updates; add project.Makefile by @caufieldjh in #32
- allow use of different models, entailing different API endpoints. extending enrichment comparison. by @cmungall in #34
- Add CITATION and version updater by @caufieldjh in #35
- Ingest and extract things from literature about inflammatory bowel disease by @justaddcoffee in #36
- eval enrich by @cmungall in #37
- Create dental-restoration-material-composite-polymer-1.txt by @wdduncan in #53
- Create dental-restoration-material-composite-resin-1.txt by @wdduncan in #52
- Create dental-restoration-material-ceramic-composite-1.txt by @wdduncan in #51
- Create dental-restoration-material-ceramic-composite-resin-1.txt by @wdduncan in #50
- Create dental-restoration-material-ceramic-composite-polymer-2.txt by @wdduncan in #49
- Create dental-restoration-material-ceramic-composite-polymer-1.txt by @wdduncan in #48
- Create dental-restoration-material-ceramic-composite-polymer-resin-2.txt by @wdduncan in #47
- Create dental-restoration-material-ceramic-composite-polymer-resin-1.txt by @wdduncan in #46
- Create dental-restoration-material-composite-2.txt by @wdduncan in #45
- Create dental-restoration-material-composite-1.txt by @wdduncan in #44
- Create dental-restoration-material-polymer-1.txt by @wdduncan in #42
- Create dental-restoration-material-resin-2.txt by @wdduncan in #41
- Create dental-restoration-material-ceramic-2.txt by @wdduncan in #39
- Create dental-restoration-material-ceramic-1.txt by @wdduncan in #38
- Create dental-restoration-material-resin-1.txt by @wdduncan in #40
- Create dental-restoration-material-polymer-2.txt by @wdduncan in #43
- similarity by @cmungall in #57
- Add option to provide path to input file by @caufieldjh in #56
- Bicluster enrichment by @realmarcin in #62
- Added command and code for computing euclidian distances between embeddings by @justaddcoffee in #58
- Flake8 fixes + lint by @hrshdhgd in #63
- enrichment changes by @cmungall in #65
- Missed parenthesis for random.SystemRamdom() by @hrshdhgd in #67
- Change citation updater in Makefile to get_version by @caufieldjh in #68
- Raise FileNotFoundError if filepath for extract is missing by @caufieldjh in #72
- Makefile uses all templates by @caufieldjh in #69
- interactive-mode by @cmungall in #71
- Added command to generate mock clinical notes by @justaddcoffee in #74
- Bump version of oaklib by @cmungall in #73
- msigdb hallmark gene sets by @realmarcin in #78
- use prompts for enrichment by @cmungall in #80
- fixing gene sets and updating analysis by @cmungall in #81
- Adding schema for ontology issues in github. refactor enrichment by @cmungall in #83
- p-value templates with edited end markers to run multiple independent… by @realmarcin in #82
- Add diagnostic_procedure template by @caufieldjh in #29
- Fix for web-ontogpt not working on new install by @caufieldjh in #85
- geneweaver format by @cmungall in #89
- re-ran notebook by @cmungall in #90
- re-ran notebooks for enrichGPT by @cmungall in #95
- Update documentation by @caufieldjh in #92
- Autogenerate docs by @caufieldjh in #98
- Fix for doc generation by @caufieldjh in #100
- Added streamlit app for spindoctor by @cmungall in #101
- Study class as tree root in environment_sample template by @sujaypatil96 in #104
- update sections in README by @sujaypatil96 in #105
- Fixed a bug where the 'skip_annotators' option was being ignored by @daikiad in #108
- first trait commits by @cmungall in #109
- first draft of biotic interaction template by @diatomsRcool in #107
- One more fix for biotic interaction template by @caufieldjh in #113
- Adding a GPT-based reasoner, for evaluation purposes. by @cmungall in #112
- more prompt language and adding ENVTHES ontology by @realmarcin in #118
- Add general framework for specifying models by name and source by @caufieldjh in #99
- Adding a MappingEngine by @cmungall in #121
- removing importlib dependency by @cmungall in #122
- very small typo fix by @PR0CK0 in #124
- Init environmental metadata template by @caufieldjh in #117
- use latest rueaml. Avoids problems like this: monarch-initiative/talisman-paper#4 by @cmungall in #126
- Add a command 'pubmed-annotate' to retrieve PMIDs for a search term, then apply a template to all of them to extract info by @justaddcoffee in #127
- relaxing pinning by @cmungall in #129
- reasoner gpt changes by @cmungall in #128
- Retrieve remote models for local use and pass extract prompt to them by @caufieldjh in #123
- First pass at PhenoEngine by @cmungall in #130
- New PubMed eutil functions by @caufieldjh in #131
- Cleanup and documentation updates by @caufieldjh in #115
- Start of PR for IBD literature project by @justaddcoffee in #120
- Fixes for #139 by @caufieldjh in #142
- Add interface for HuggingFace Hub by @caufieldjh in #145
- Dependency updates by @caufieldjh in #151
- Adding generate-extract command, 158. Add cell type templates #159 by @cmungall in #162
- Removed temporary hack when generating documentation. Adding cell type to docs by @cmungall in #163
- Adding to cell/neuron template by @cmungall in #164
- Fix 157 - make webapp work, expand set of available data models by @caufieldjh in #165
- Add functionality for PubMed Central text retrieval by @caufieldjh in #156
- Improvements to iterative generate-extract. by @cmungall in #166
New Contributors
- @matentzn made their first contribution in #14
- @vemonet made their first contribution in #21
- @sierra-moxon made their first contribution in #23
- @justaddcoffee made their first contribution in #25
- @pkalita-lbl made their first contribution in #24
- @turbomam made their first contribution in #27
- @wdduncan made their first contribution in #53
- @sujaypatil96 made their first contribution in #104
- @daikiad made their first contribution in #108
- @diatomsRcool made their first contribution in #107
- @PR0CK0 made their first contribution in #124
Full Changelog: v0.2.0...v0.3.0