minor doc updates

webis-de · Nov 15, 2024 · 0c86c8f · 0c86c8f
1 parent 4a02a30
commit 0c86c8f
Show file tree

Hide file tree

Showing 2 changed files with 9 additions and 6 deletions.
diff --git a/docs/howto/dataset.rst b/docs/howto/dataset.rst
@@ -1,26 +1,28 @@
 .. _howto-dataset:
 
+.. _ir_datasets: https://ir-datasets.com/
+
 ====================
 Use a Custom Dataset
 ====================
 
-Lightning IR currently supports all datasets registered with the `ir_datasets <https://ir-datasets.com/>`_ library. However, it is also possible to use custom datasets with Lightning IR. ``ir_datasets`` supports five different data types:
+Lightning IR currently supports all datasets registered with the `ir_datasets`_ library. However, it is also possible to use custom datasets with Lightning IR. `ir_datasets`_ supports five different data types:
 
 - Documents (a collection of documents)
 - Queries (a collection of queries)
 - Qrels (a collection of relevance judgements for query-document pairs)
 - Training n-tuples (a collection of n-tuples consisting of a query and n-1 documents used for training)
 - Run Files (a collection of queries and ranked documents)
 
-Depending on your use case, you may need to integrate one or more of these data types. In the following, we will show you locally register datasets with ``ir_datasets`` for easy use in Lightning IR. However, first, we will demonstrate how to integrate custom run files, as these are often generated for datasets already supported by ``ir_datasets``.
+Depending on your use case, you may need to integrate one or more of these data types. In the following, we will show you locally register datasets with `ir_datasets`_ for easy use in Lightning IR. However, first, we will demonstrate how to integrate custom run files, as these are often generated for datasets already supported by `ir_datasets`_.
 
 Run Files
 ---------
 
-Integrating your own run files is as simple as providing the run file to the :py:class:`~lightning_ir.data.datasets.RunDataset`. Two types of run files are supported. 
+Integrating your own run files is as simple as providing the run file to the :py:class:`~lightning_ir.data.dataset.RunDataset`. Two types of run files are supported. 
 
-1. The first is a standard TREC run file. When using this format, the file name must conform to a specific naming convention. The file name must correspond to the ``ir_datasets`` dataset id that the run file is associated with. For example, if you have a run file for the TREC Deep Learning 2019, the ``ir_datasets`` dataset id is ``msmarco-passage/trec-dl-2019/judged``. The run file should be named ``msmarco-passage-trec-dl-2019-judged.run``. Optionally, to discern between different run files, you can prefix the file name with meta information surrounded by two underscores, e.g., ``__my-cool-model__msmarco-passage-trec-dl-2019-judged.run``.
-2. The second format is a ``.jsonl`` file that not only provides the ``query_id``, ``doc_id``, and the ``score``, but also the actual query and document texts. This format is useful when you want to re-rank a run file but do not want to register the dataset with ``ir_datasets``. The file can optionally contain relevance judgements for easy evaluation. Here is an example of a ``.jsonl`` run file:
+1. The first is a standard TREC run file. When using this format, the file name must conform to a specific naming convention. The file name must correspond to the `ir_datasets`_ dataset id that the run file is associated with. For example, if you have a run file for the TREC Deep Learning 2019, the `ir_datasets`_ dataset id is ``msmarco-passage/trec-dl-2019/judged``. The run file should be named ``msmarco-passage-trec-dl-2019-judged.run``. Optionally, to discern between different run files, you can prefix the file name with meta information surrounded by two underscores, e.g., ``__my-cool-model__msmarco-passage-trec-dl-2019-judged.run``.
+2. The second format is a ``.jsonl`` file that not only provides the ``query_id``, ``doc_id``, and the ``score``, but also the actual query and document texts. This format is useful when you want to re-rank a run file but do not want to register the dataset with `ir_datasets`_. The file can optionally contain relevance judgements for easy evaluation. Here is an example of a ``.jsonl`` run file:
 
 .. code-block:: json
 
@@ -32,7 +34,7 @@ Integrating your own run files is as simple as providing the run file to the :py
 Registering a Local Dataset
 ---------------------------
 
-To integrate a custom dataset it needs to be locally registered with the `ir_datasets <https://ir-datasets.com/>`_. Lightning IR provides a :py:class:`~lightning_ir.lightning_utils.callbacks.RegisterLocalDatasetCallback` class to make registering datasets easy. This function takes a dataset id, and optional paths to local files or already valid ``ir_datasets`` dataset ids.
+To integrate a custom dataset it needs to be locally registered with the `ir_datasets`_. Lightning IR provides a :py:class:`~lightning_ir.lightning_utils.callbacks.RegisterLocalDatasetCallback` class to make registering datasets easy. This function takes a dataset id, and optional paths to local files or already valid `ir_datasets`_ dataset ids.
 
 Let's look at an example. Say we wanted to register a new set of training triples for the MS MARCO passage dataset. Our triples file is named ``msmarco-passage-train-triples.tsv`` and has the following format:
 

diff --git a/docs/model-zoo.rst b/docs/model-zoo.rst
@@ -18,6 +18,7 @@ The following command and configuration can be used to reproduce the results:
 
         trainer:
           logger: false
+          enable_checkpointing: false
         model:
           class_path: CrossEncoderModule # for cross-encoders
           # class_path: BiEncoderModule # for bi-encoders