-
Notifications
You must be signed in to change notification settings - Fork 18
Backend ‐ Categories
Categories represent an information architecture that allows us to associate unique tags to plugins on the hub. This makes it easier for users of the hub to navigate and access plugins associated with a particular category.
Category data is sourced from portions of the alpha06 version of the EDAM Bioimaging ontology, with some terms remapped based on the hub mappings defined in hub-mapping-alpha06.json.
The category data is generated using the backend/category/edam.py script. To run the script, create a new virtual environment with dependencies from requirements.txt
and run the script:
cd backend
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cd category
python edam.py
This will generate a new file backend/category/data/EDAM-BIOIMAGING/alpha06.json
that contains every category mapping on the hub.
Category mappings are manually uploaded to S3 using the path category/<version>.json
. Right now there is no automation for generating and uploading this file, so you will need to manually generate and upload the .json
file generated.
Category data on DynamoDB are stored by fetching the category mapping file from S3 in the previous step. Eventually as we migrate away from S3, we'll move this process to a data workflow that populates DynamoDB directly. The schema for each row in the the table is defined as:
interface CategoryRow {
name: string
version_hash: string
version: string
formatted_name: string
dimension: string
hierarchy: string[]
label: string
last_updated_timestamp: number
}
The name
column is the hash key and the version_hash
column is the range key. The name
column is a sluggified version of the category key while formatted_name
is the unmodified key with spacing and punctuation. For example, Scanning electron cryomicroscopy
will become as scanning-electron-cryomicroscopy
.
The version_hash
column is a combination of the category version and an MD5 hash of all the contents for a category entry. This is required to store the categories that have multiple entries, such as Scanning electron cryomicroscopy
. Looking at the response for https://api.napari-hub.org/categories/Scanning%20electron%20cryomicroscopy, we see that this category has two entries that only differ by own item in the hierarchy
list:
[
{
"dimension": "Image modality",
"hierarchy": [
"Electron microscopy",
"Cryo electron microscopy",
"Scanning electron cryomicroscopy"
],
"label": "Electron microscopy"
},
{
"dimension": "Image modality",
"hierarchy": [
"Electron microscopy",
"Scanning electron microscopy",
"Scanning electron cryomicroscopy"
],
"label": "Electron microscopy"
}
]
Categories can be accessed using the following APIs:
- https://api.napari-hub.org/categories
-
https://api.napari-hub.org/categories/<name>
-
https://api.napari-hub.org/categories/<name>/versions/<version>
The APIs are relatively simple and mostly work by fetching the mappings from DynamoDB and returning the category list by <name>
.