New multi-label synthetic train data biomedBERT #81

paluchasz · 2024-12-16T12:51:13Z

This PR introduces a new improved Transformer NER model. The model is multi-label and has been trained on LLM annotated data. It can support 18 different classes and has a mean F1 score of 95.6% on a held out LLM annotated test set. It has been trained on over 7000 documents which includes a total of 295822 samples.

The model is added to the model pack with an example config but is not enabled by default, as it is still experimental and further work should be done to investigate/improve the quality of the data it was trained on.

This model supports a much wider 18 different classes with a 95.6% F1 score on LLM generated multilabel data

mariosaenger

LGTM

docs/training_multilabel_ner.rst

paluchasz added 5 commits December 16, 2024 11:29

feat: new syntheic data trained multilabel BERT

d1dbbe5

This model supports a much wider 18 different classes with a 95.6% F1 score on LLM generated multilabel data

feat: add a sample config to use the new multilabel BERT

b180b13

docs: update changelog with new model info

47b8e1f

docs: update changelog with info about previous pr fix

b883d7c

docs: fix typo in docs

948dc2f

paluchasz requested a review from RichJackson December 16, 2024 12:51

mariosaenger approved these changes Dec 17, 2024

View reviewed changes

RichJackson reviewed Dec 17, 2024

View reviewed changes

docs/training_multilabel_ner.rst Show resolved Hide resolved

RichJackson approved these changes Dec 17, 2024

View reviewed changes

paluchasz added 2 commits December 17, 2024 12:40

docs: include detailed model metrics per class

655c068

fix: missing blank line after code block

20a3a7d

paluchasz merged commit 07e54a4 into main Dec 17, 2024
2 checks passed

paluchasz deleted the release_multilabel_synthetic_train_data_bert branch December 17, 2024 15:30

paluchasz changed the title ~~New multi-label synthetic train data bimedBERT~~ New multi-label synthetic train data biomedBERT Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New multi-label synthetic train data biomedBERT #81

New multi-label synthetic train data biomedBERT #81

paluchasz commented Dec 16, 2024 •

edited

Loading

mariosaenger left a comment

New multi-label synthetic train data biomedBERT #81

New multi-label synthetic train data biomedBERT #81

Conversation

paluchasz commented Dec 16, 2024 • edited Loading

mariosaenger left a comment

Choose a reason for hiding this comment

paluchasz commented Dec 16, 2024 •

edited

Loading