Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a .sample operator #797

Merged
merged 2 commits into from
Aug 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 61 additions & 2 deletions docs/howtos/use-oak-expression-language.rst
Original file line number Diff line number Diff line change
Expand Up @@ -289,6 +289,24 @@ FILTER

The ``.filter`` operator allows you to provide arbitrary python filters.

QUERY
^^^^^

The ``.query`` operator allows you to pass through a query to the underlying store (SPARQL, SQL).

For example, the ``sqlite`` backend uses SQL, so you can pass through SQL:

.. code-block::

runoak -i sqlite:obo:uberon info .query \
"SELECT subject from has_dbxref_statement where value like 'ZFA:%'"

This is equivalent to:

.. code-block::

runoak -i sqlite:obo:uberon info x^ZFA:

NR
^^

Expand All @@ -307,18 +325,59 @@ Example:

runoak -i sqlite:obo:uberon info .mrca//p=i,p .idfile my_terms.txt

RAND
^^^^

Pick a random subset of terms. Parameterized by ``n`` (number of terms).

Definitions for random terms in the Cell Ontology:

.. code-block::

runoak -i sqlite:obo:cl definitions .rand

For 10 random terms

.. code-block::

runoak -i sqlite:obo:cl definitions .rand//n=10

.. note::

The ``.rand`` operator will sample from all terms in the ontology. This
could include terms imported and merged from other ontologies. For
finer-grained control, use the ``.sample`` operator, which allows the
combination of a sample operator with the results of evaluating any
OAK expression.

SAMPLE
^^^^^^

The ``.sample`` operator takes a random sample of terms. It is parameterized by ``n`` (number of terms
in sample).

Definitions for 3 random terms:

.. code-block::

runoak -i sqlite:obo:obi definitions .sample//n=3 i^OBI:

To compare 3 random terms with 3 other random terms:

.. code-block::

runoak -i sqlite:obo:cl similarity .sample//n=3 i^CL: @ .sample//n=3 i^CL:

Others
^^^^^^

* ``.is_obsolete``: all :term:`Obsolete` terms
* ``.non_obsolete``: all non-obsoletes
* ``.dangling``: all :term:`Dangling` terms
* ``.query``: pass through a query to the underlying store (SPARQL, SQL)
* ``.child``: non-transitive version of ``.desc``. Also parameterized by predicate.
* ``.parent``: non-transitive version of ``.anc``. Also parameterized by predicate.
* ``.sib``: all siblings of a term. Also parameterized by predicate.
* ``.all``: all terms
* ``.classes``: all classes
* ``.relations``: all relations
* ``.rand``: random subset of terms. Parameters: ``n`` (number of terms)

28 changes: 18 additions & 10 deletions src/oaklib/query.py
Original file line number Diff line number Diff line change
Expand Up @@ -595,10 +595,20 @@ def chain_results(v):
logging.debug(f"Random: {term}")
params = _parse_params(term)
sample_size = params.get("n", "100")
entities = list(adapter.entities())
sample = [
entities[secrets.randbelow(len(entities))] for x in range(1, int(sample_size))
]
entities = set(adapter.entities())
sample = secrets.SystemRandom().sample(entities, int(sample_size))
chain_results(sample)
elif term.startswith(".sample"):
logging.debug(f"Sampling: {term}")
params = _parse_params(term)
sample_size = params.get("n", "100")
rest = list(query_terms_iterator([query_terms[0]], adapter))
query_terms = query_terms[1:]
try:
sample = secrets.SystemRandom().sample(rest, int(sample_size))
except ValueError as e:
logging.error(f"Error sampling {sample_size} / {len(rest)}: {e}")
raise e
chain_results(sample)
elif term.startswith(".in"):
logging.debug(f"IN: {term}")
Expand Down Expand Up @@ -648,19 +658,17 @@ def chain_results(v):
this_predicates = params.get("predicates", predicates)
rest = list(query_terms_iterator([query_terms[0]], adapter))
query_terms = query_terms[1:]
if isinstance(adapter, OboGraphInterface):
chain_results(adapter.descendants(rest, predicates=this_predicates))
else:
if not isinstance(adapter, OboGraphInterface):
raise NotImplementedError
chain_results(adapter.descendants(rest, predicates=this_predicates))
elif term.startswith(".sub"):
logging.debug(f"Subclasses: {term}")
# graph query: is-a descendants
rest = list(query_terms_iterator([query_terms[0]], adapter))
query_terms = query_terms[1:]
if isinstance(adapter, OboGraphInterface):
chain_results(adapter.descendants(rest, predicates=[IS_A]))
else:
if not isinstance(adapter, OboGraphInterface):
raise NotImplementedError
chain_results(adapter.descendants(rest, predicates=[IS_A]))
elif term.startswith(".child"):
logging.debug(f"Children: {term}")
# graph query: children
Expand Down
Loading