Skip to content

Commit

Permalink
Merge branch 'main' into replay-obo-compliance-suite
Browse files Browse the repository at this point in the history
  • Loading branch information
matentzn committed Mar 10, 2024
2 parents 8507fbe + 564f06f commit 7e10193
Show file tree
Hide file tree
Showing 78 changed files with 3,818 additions and 1,949 deletions.
189 changes: 189 additions & 0 deletions docs/howtos/use-llms.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
How to use Large Language Models (LLMs) with OAK
===============================================

Large Language Models (LLMs) such as ChatGPT have powerful text pattern matching and processing abilities,
and general question answering capabilities. LLMs can be used in conjunction with ontologies for a number
of tasks, including:

- Summarizing lists of ontology terms and other OAK outputs
- Annotating text using ontology terms
- Reviewing ontology branches or different kinds of ontology axioms

This guide is in 3 sections:

- Summary of ontology-LLM tools that directly leverage OAK, but are not part of the OAK framework
- How to use OAK in conjunction with existing generic LLM tools
- The OAK LLM implementation

LLM frameworks that use OAK
---------------------------

OntoGPT
~~~~~~~

`OntoGPT <https://github.com/monarch-initiative/ontogpt>`_ extracts knowledge from text according
to a LinkML schema and LinkML dynamic value set specifications. OAK is used for grounding ontology terms.

CurateGPT
~~~~~~~~~

`CurateGPT <https://github.com/monarch-initiative/curate-gpt>`_ is a general purpose knowledge management
and editing tool that uses LLMs for enhanced search and autosuggestions.

Talisman
~~~~~~~~

`Talisman <https://github.com/monarch-initiative/talisman>`_ allows for an LLM analog of the
OAK `enrichment` command. It summarizes collections of terms or descriptions of genes.

Using OAK in conjunction with existing LLM tools
------------------------------------------------

LLMs such as ChatGPT can take any kind of textual output, including outputs of OAK.

For example, you could query all T-cell types:

.. code-block:: bash
runoak -i sqlite:obo:cl labels .descendant//p=i "T cell"
And then copy the results into the ChatGPT window and ask "give me detailed descriptions of these T-cell types".

This kind of workflow is not very automatable. OAK is designed in part for the Command Line, so
LLM CLI tools such as the datasette ``llm`` tool pair naturally

.. code-block:: bash
pipx install llm
runoak -i sqlite:obo:cl labels .descendant//p=i "T cell" | llm --system "summarize the following terms"
OAK LLM Adapter
---------------

OAK provides a number of different adapters (implementations) for each of its interfaces.
Some adapters provide direct access to an ontology or collection of ontologies; others act as *wrappers*
onto another adapter, and inject additional functionality.

The OAK LLM adapter is one such adapter. It provides a number of implementations of a subset of OAK
commands and interfaces.

See :ref:`llm_implementation` for details on the OAK LLM adapter.

The basic idea is that you can prefix any existing adapter with ``llm:``; for example:

.. code-block:: bash
runoak -i llm:my-ont.json ...
If can specify the model which you wish to use within `{}`s, for example:

.. code-block:: bash
runoak -i llm:{litellm-groq-mixtral}:sqlite:obo:cl ...
We recommend the LiteLLM package to allow for access of a broad range of models through a proxy.

Examples are provided here on the command line, but this can also be done programmatically.

.. code-block:: python
from oaklib import get_adapter
adapter = get_adapter("llm:sqlite:obo:cl")
Note that the output of LLMs is non-deterministic and unpredictable, so the LLM adapter should
not be used for tasks where precision is required.

Annotation
~~~~~~~~~~

.. code-block:: bash
runoak -i llm:sqlite:obo:hp annotate "abnormalities were found in the eye and the liver"
Suggesting Definitions
~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash
runoak -i llm:sqlite:obo:uberon generate-definitions \
finger toe \
--style-hints "write definitions in formal genus-differentia form"
Validating Mappings
~~~~~~~~~~~~~~~~~~~

.. code-block:: bash
runoak --stacktrace -i llm:{gpt-4}:sqlite:obo:go validate-mappings \
.desc//p=i molecular_function -o out.jsonl -O jsonl
Selecting alternative models
----------------------------

If you are using the :ref:`llm_implementation` then by default it will use a model such
as `gpt-4` or `gpt-4-turbo` (this may change in the future).

You can specify different models by using the `{}` syntax:

.. code-block:: bash
runoak -i llm:{gpt-3.5-turbo}:sqlite:obo:cl generate-definitions .descendant//p=i "T cell"
We are using `Datasette LLM package <https://llm.datasette.io/en/stable/>`_ which provides a *plugin*
mechanism for adding new models. See `Plugin index <https://llm.datasette.io/en/stable/plugins/index.html>`_.

However, LLM can sometimes be slow to add new models, so here it can be useful to the awesome
`LiteLLM <https://github.com/BerriAI/litellm/>`_ package, which provides a proxy to a wide range of models.

Mixtral via Ollama and LiteLLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash
ollama run mixtral
.. code-block:: bash
pipx install litellm
litellm -m ollama/mixtral
Next edit your extra-openai-models.yaml as detailed in the llm
[other model docs](https://llm.datasette.io/en/stable/other-models.html):

.. code-block:: yaml
- model_name: ollama/mixtral
model_id: litellm-mixtral
api_base: "http://0.0.0.0:8000"
Then you can use the model in OAK:

.. code-block:: bash
runoak -i llm:{litellm-mixtral}:sqlite:obo:cl generate-definitions .descendant//p=i "T cell"
Mixtral via groq and LiteLLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

`groq <https://groq.com/>` provides an API over souped-up hardware running Llama2 and Mixtral.
You can configure in a similar way to ollama above, but here we are proxying to a remote server:

. code-block:: bash

pipx install litellm
litellm -m groq/mixtral-8x7b-32768

Next edit your extra-openai-models.yaml as detailed in the llm
[other model docs](https://llm.datasette.io/en/stable/other-models.html):

.. code-block:: yaml
- model_name: litellm-groq-mixtral
model_id: litellm-groq-mixtral
api_base: "http://0.0.0.0:8000"
Then you can use the model in OAK:

.. code-block:: bash
runoak -i llm:{litellm-groq-mixtral}:sqlite:obo:cl validate-mappings .descendant//p=i "T cell"
1 change: 1 addition & 0 deletions docs/packages/implementations/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,4 @@ Implementations (also known as *adapters*) implement one or more :ref:`interface
ols
gilda
aggregator
llm
25 changes: 25 additions & 0 deletions docs/packages/implementations/llm.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
.. _llm_implementation:

LLM Adapter
=============

Command Line Examples
----------------------

Use the :code:`llm` selector, wrapping an existing source

.. code:: shell
runoak -i llm:sqlite:obo:cl COMMAND [COMMAND-ARGUMENTS-AND-OPTIONS]
Annotation
^^^^^^^^^^
.. code:: shell
runoak -i llm:sqlite:obo:hp annotate "abnormalities were found in the eye and the liver"
Code
----
.. currentmodule:: oaklib.implementations.llm

.. autoclass:: LLMImplementation
Loading

0 comments on commit 7e10193

Please sign in to comment.