SIA is an annotation service according to the BioCreative V.5. BeCalm task TIPS. Annotations for mutation mentions are generated using SETH, mirNer, and diseases using a dictionary lookup. Results are returned in JSON according to these definitions.
To cite SIA, please use the following reference:
@Article{Kirschnick2018,
title = {{SIA:} a scalable interoperable annotation server for biomedical named entities},
author = {Johannes Kirschnick and Philippe Thomas and Roland Roller and Leonhard Hennig},
journal = {Journal of Cheminformatics},
volume = {10},
number = {1},
pages = {63:1--63:7},
year = {2018},
month = {Dec},
url = {https://doi.org/10.1186/s13321-018-0319-2},
doi = {10.1186/s13321-018-0319-2}
}
A PDF version of the paper is freely available here
The system uses RabbitMQ to load balance, so make sure it is running locally before starting the application, refer to how to install RabbitMQ for help.
If you want to skip the RabbitMQ installation, for convenience, you can just start it via maven (this might not work on your machine)
./mvnw rabbitmq:start
Check http://localhost:15672/ for the management interface, default login: guest/guest
And issue the following to tear down RabbitMQ afterwards
./mvnw rabbitmq:stop
To start the system in development mode issue
./mvnw spring-boot:run
This starts the backend without submitting results to the tips server, instead results are printed to the console.
The server is listening on port 8080
by default.
Issue the following curl
request to trigger a new annotation request with a sample payload
curl -vX POST http://localhost:8080/call -d @src/test/resources/samplepayloadGetannotations.json --header "Content-Type: application/json"
and watch the console for results.
To trigger a get status report, use the following curl
request
curl -vX POST http://localhost:8080/call -d @src/test/resources/sampleplayloadGetStatus.json --header "Content-Type: application/json"
To extend SIA for additional Named Entity Recognition tools you have to:
- Implement the Annotator interface
Consult the examples in the corresponding package for implementation details. Afterwards, for correct message routing, it is necessary to define the input channel. Input channels can be freely named, but we recommend to use the name of the annotator. For example:
@Transformer(inputChannel = "yourAnnotator")
This annotation placed on the annotator defines that inputs are coming from the yourAnnotator channel. Internally channels are mapped to queues automatically.
- Add your annotator as recipient in FlowHandler and define the set of PredictionType your annotator responds to accordingly.
For example:
.recipientMessageSelector("yourAnnotator", message -> headerContains(message, CHEMICAL) && enabledAnnotators.yourAnnotator)
Here the yourAnnotator
has to match the transformer inputChannel
definition. And defines that all requests that need to be tagged with CHEMICAL
will be send to the yourAnnotator channel.
headerContains(message, CHEMICAL)
is a helper method to check if in the header a field called types
contains the enum CHEMICAL.
The header is automatically populated from the request message containing the annotator types requested.
- Furthermore
enabledAnnotators
is an injected configuration bean which allows to specify which annotators to enable.
Simply add a new boolean property with yourAnnotator
to the class allows to control which annotators to enable.
Check application.properties.
BANNER is a named entity recognition system, primarily intended for biomedical text.
http://banner.sourceforge.net/
DiseasesNER is using a large dictionary of desease mentiones.
Species name recognition and normalization software.
http://linnaeus.sourceforge.net/
mirNer is a simple regex based tool to detect MicroRna mentions in text, following the mi-RNA definition of Victor Ambroset al., (2003). A uniform system for microRNA annotation. RNA 2003 9(3):277-279.
https://github.com/Erechtheus/mirNer
SNP Extraction Tool for Human Variations.
SETH is a software that performs named entity recognition (NER) of genetic variants (with an emphasis on single nucleotide polymorphisms (SNPs) and other short sequence variations) from natural language texts.
ChemSpot is a named entity recognition tool for identifying mentions of chemicals in natural language texts, including trivial names, drugs, abbreviations, molecular formulas and IUPAC entities.
https://www.informatik.hu-berlin.de/de/forschung/gebiete/wbi/resources/chemspot/chemspot
DNorm is an automated method for determining which diseases are mentioned in biomedical text, the task of disease normalization. Diseases have a central role in many lines of biomedical research, making this task important for many lines of inquiry, including etiology (e.g. gene-disease relationships) and clinical aspects (e.g. diagnosis, prevention, and treatment).
https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/DNorm.html
DNorm and ChemSpot are integrated out of process. This means that you need to start the annotators before you can use them. Communication is handled via a dedicated queue for each handler respectively.
-
Start DNorm
./mvnw -f tools/dnorm/pom.xml -DskipTests package java -Xmx8g -jar tools/dnorm/target/dnorm-0.0.1-SNAPSHOT.jar
-
Start ChemSpot
./mvnw -f tools/chemspot/pom.xml package java -Xmx16g -jar tools/chemspot/target/chemspot-0.0.1-SNAPSHOT.jar
You can simply tag pubmed articles from ftp://ftp.ncbi.nlm.nih.gov/pubmed/baseline/ by putting them into the directory tools/pubmedcache
.
Configure the annotators to use by creating an application.properties
file in the current directory and add the annotators you want to use.
Then start any external annotators that you want to use.
If you don't customize the annotators, the following default configuration is applied:
sia.annotators.banner=false
sia.annotators.diseaseNer=false
sia.annotators.mirNer=false
sia.annotators.linnaeus=false
sia.annotators.seth=true
# external
sia.annotators.dnorm=false
sia.annotators.chemspot=false
Finally start the SiaPubmedAnnotator
class with the driver and backend profile enabled.
The driver profile ensures that output is collected into the directory annotated
,
while the backend profile ensures that the internal annotators are started as well.
./mvnw -DskipTests package
java -cp target/sia-0.0.1-SNAPSHOT.jar \
-Dloader.main=de.dfki.nlp.SiaPubmedAnnotator \
org.springframework.boot.loader.PropertiesLauncher \
--spring.profiles.active=backend,driver
Example output
$ ls -lh annotated
1.0K Jun 28 23:15 annotation-results_2018-06-28_11-15-07.json
$ head annotated/a*
{"predictionResults":[{"document_id":"10022392","section":"A","init":1085,"end":1090,"score":1.0,"annotated_text":"T337A","type":"MUTATION"} ....