This spaCy project uses the Healthsea dataset to compare the performance between the Spancat and NER architecture.
The project.yml
defines the data assets required by the
project, as well as the available commands and workflows. For details, see the
spaCy projects documentation.
The following commands are defined by the project. They
can be executed using spacy project run [name]
.
Commands are only re-run if their inputs have changed.
Command | Description |
---|---|
preprocess |
Format .jsonl annotations into .spacy training format for NER and Spancat |
train_ner |
Train an NER model |
train_spancat |
Train a Spancat model |
evaluate_ner |
Evaluate the trained NER model |
evaluate_spancat |
Evaluate the trained Spancat model |
evaluate |
Evaluate NER vs Spancat on the dev dataset and create a detailed performance analysis which is saved in the metrics folder |
reset |
Reset the project to its original state and delete all training process |
The following workflows are defined by the project. They
can be executed using spacy project run [name]
and will run the specified commands in order. Commands are only re-run if their
inputs have changed.
Workflow | Steps |
---|---|
all |
preprocess → train_ner → train_spancat → evaluate |
ner |
preprocess → train_ner → evaluate_ner |
spancat |
preprocess → train_spancat → evaluate_spancat |
The following assets are defined by the project. They can
be fetched by running spacy project assets
in the project directory.
File | Source | Description |
---|---|---|
assets/annotation.jsonl |
URL | NER annotations exported from Prodigy with 5000 examples and 2 labels |