Skip to content

Latest commit

 

History

History

prodigy

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

🪐 Weasel Project: Prodigy annotation tool integration

This project shows how to integrate the Prodigy annotation tool (requires v1.11+) into your spaCy project template to automatically export annotations you've created and train your model on the collected data. Note that in order to run this template, you'll need to install Prodigy separately into your environment. For details on how the data was created, check out this project template and blog post.

⚠️ Important note: The example in this project uses a separate step db-in to export the example annotations into your database, so you can easily run it end-to-end. In your own workflows, you can leave this out and access the given dataset you've annotated directly.

📋 project.yml

The project.yml defines the data assets required by the project, as well as the available commands and workflows. For details, see the Weasel documentation.

⏯ Commands

The following commands are defined by the project. They can be executed using weasel run [name]. Commands are only re-run if their inputs have changed.

Command Description
db-in Load data into prodigy (only for example purposes)
data-to-spacy Merge your annotations and create data in spaCy's binary format
train_spacy Train a named entity recognition model with spaCy
train_prodigy Train a named entity recognition model with Prodigy
train_curve Train the model with Prodigy by using different portions of training examples to evaluate if more annotations can potentially improve the performance
package Package the trained model so it can be installed

⏭ Workflows

The following workflows are defined by the project. They can be executed using weasel run [name] and will run the specified commands in order. Commands are only re-run if their inputs have changed.

Workflow Steps
all db-indata-to-spacytrain_spacy
all_prodigy db-intrain_prodigy

🗂 Assets

The following assets are defined by the project. They can be fetched by running weasel assets in the project directory.

File Source Description
assets/fashion_brands_training.jsonl.jsonl Local JSONL-formatted training data exported from Prodigy, annotated with FASHION_BRAND entities (1235 examples)
assets/fashion_brands_eval.jsonl.jsonl Local JSONL-formatted development data exported from Prodigy, annotated with FASHION_BRAND entities (500 examples)