Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding definition validation functionality #738

Merged
merged 4 commits into from
Apr 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 49 additions & 1 deletion docs/howtos/use-llms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -115,9 +115,51 @@ Suggesting Definitions
finger toe \
--style-hints "write definitions in formal genus-differentia form"

Validating Definitions
~~~~~~~~~~~~~~~~~~~~~~

The LLM adapter currently interprets ``validate-definitions`` as comparing the specified definition
against the abstracts of papers cited in the definition provenance, or by comparing the definition
against the database objects that are cited as definition provenance.

Here is an example of validating definitions for GO terms:

.. code-block:: bash

runoak --stacktrace -i llm:sqlite:obo:go validate-definitions \
i^GO: -o out.jsonl -O jsonl

The semsql version of GO has other ontologies merged in, so the ``i^GO:`` query only validates
against actual GO terms.

You can also pass in a configuration object.
This should conform to the `Validation Data Model <https://w3id.org/oak/validation-datamodel>`_

For example, this configuration yaml provides a specific prompt and also a URL for
documentation aimed at ontology developers.

.. code-block:: yaml

prompt_info: Please also use the following GO guidelines
documentation_objects:
- https://wiki.geneontology.org/Guidelines_for_GO_textual_definitions

All specified URLs are downloaded and converted to text and included in the prompt.

The configuration yaml is passed in as follows:

.. code-block:: bash


runoak --stacktrace -i llm:{claude-3-opus}:sqlite:obo:go validate-definitions \
-C src/oaklib/conf/go-definition-validation-llm-config.yaml i^GO: -O yaml

Validating Mappings
~~~~~~~~~~~~~~~~~~~

The LLM adapter validates mappings by looking up info on the mapped entity and
comparing it with the main entity.

.. code-block:: bash

runoak --stacktrace -i llm:{gpt-4}:sqlite:obo:go validate-mappings \
Expand Down Expand Up @@ -165,8 +207,14 @@ as a developer, then you can do this:

This will install the plugin in the same environment as OAK.

TODO: instructions for non-developers.
If you need to update this:

.. code-block:: bash

cd ontology-access-kit
poetry run llm install -U llm-gemini

TODO: instructions for non-developers.

Mixtral via Ollama and LiteLLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
28 changes: 24 additions & 4 deletions docs/packages/interfaces/validator.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,24 @@
Validator Interface
--------------------

.. warning ::
The Validator Interface provides access to a number of different validation operations over ontologies.

Currently the main validator methods are only implemented for :ref:`SqlDatabaseImplementation`
The notion of validation in OAK is intentionally very flexible, and may encompass:

The validate method is configured using a *metadata schema*. The default one used is:
* *Schema* validation, for example, checking definitions are strings and have 0..1 cardinality.
* *Logical* validation, using a reasoner.
* *Lexical* validation, for example, ensuring there are no spelling errors
* *Stylistic* validation, against a style guide
* *Content* validation, checking the content of the ontology against domain knowledge or other ontologies.

- `Ontology Metadata <https://incatools.github.io/ontology-access-kit/datamodels/ontology-metadata/index.html>`_
Different adapters may implement different portions of this.

Schema Validation
~~~~~~~~~~~~~~~~~

The core validate method is configured using a *metadata schema*. The default one used is:

- `Ontology Metadata <https://w3id.org/oak/ontology-metadata>`_

This is specified using LinkML which provides an expressive way to state constraints on metadata elements,
such as :ref:`AnnotationProperty` assertions in ontologies. For example, this schema states that definition
Expand All @@ -19,6 +30,15 @@ Different projects may wish to configure this - it is possible to pass in a diff

For more details see `this howto guide <https://incatools.github.io/ontology-access-kit/howtos/validate-an-obo-ontology>`_

.. warning::

Currently only implemented for :ref`sql_implementation`

LLM-based validation
~~~~~~~~~~~~~~~~~~~~

See :ref:`use_llms`


.. currentmodule:: oaklib.interfaces.validator_interface

Expand Down
3 changes: 2 additions & 1 deletion docs/packages/utilities.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,6 @@ being turned into :ref:`interfaces`.
lexical.lexical_indexer
subsets.slimmer_utils
apikey_manager
taxon/taxon_constraint_utils
taxon.taxon_constraint_utils
table_filler

681 changes: 681 additions & 0 deletions notebooks/Commands/ValidateDefinitions.ipynb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion notebooks/Commands/ValidateMappings.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"id": "0a28b88d-4deb-4d0a-a110-f27adf077e23",
"metadata": {},
"source": [
"# OAK apply command\n",
"# OAK validate-mappings command\n",
"\n",
"This notebook is intended as a supplement to the [main OAK CLI docs](https://incatools.github.io/ontology-access-kit/cli.html).\n",
"\n",
Expand Down
1 change: 1 addition & 0 deletions notebooks/Commands/input/validate-definition-conf.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
lookup_references: true
Loading
Loading