-
Notifications
You must be signed in to change notification settings - Fork 100
The Akka based CoreNLP Processor Server
Note: this component is no longer supported, since processors
v8+.
The corenlp
sub-project now includes an Akka-based server which provides access to the processors implemented within the org.clulab.corenlp.processors
package. By default, the server is configured to use the 'bionlp' processor (BioNLPProcessor
) but may be configured to start any one of processors in the corenlp
package:
-
BioNLPProcessor
(default) CoreNLPProcessor
FastNLPProcessor
FastBioNLPProcessor
ShallowNLPProcessor
Selection of the processor and settings for the chosen processor are configured in the reference.conf
file found in the corenlp/src/main/resources/
directory. All settings for the server, including the Akka settings, are grouped under the label ProcessorCoreServer
. Settings specific to the choice of processor are grouped under ProcessorCoreServer.server.processor
and include the following (bold indicates default setting):
-
type
- which processor the server should use. One of:bio
,core
,fast
,fastbio
, orshallow
. -
internStrings
- whether to intern strings or not.true
orfalse
-
maxSentenceLength
- skip sentences with more than N characters. Default:100
characters -
removeFigTabReferences
- whether to remove figure table references before processing.true
orfalse
-
removeBibReferences
- whether to remove bibliographic references before processing.true
orfalse
-
useMalt
- whether to use the MALT parser.true
orfalse
-
withChunks
- whether to compute and save chunk information.true
orfalse
-
withContext
- whether to compute and save context information.true
orfalse
-
withCRFNER
- whether to use the CRF for NER.true
orfalse
-
withDiscourse
- whether to compute and save a discourse tree."NO_DISCOURSE"
,"WITH_DISCOURSE"
, or"JUST_EDUS"
-
withRuleNER
- whether to use rule-based NER.true
orfalse
Akka settings are grouped under ProcessorCoreServer.akka
. Currently, only the Actor pool type and size are specified (a round-robin-pool
with 4 instances).
The implementation of the Processor Server is located in the org.clulab.processors.coserver
package. The ProcessorCoreServer
object instantiates a companion class with the previously described configuration. The instantiated ProcessorCoreServer
class creates the configuration-specified processor and the router which controls a pool of actors. Each actor (of type ProcessorActor
) is constructed with a pointer to the previously instantiated processor, which it calls to implement the API. Because Akka is specifically designed for asynchronous exchange of immutable messages, the ProcessorActor
mimics closely, but does not implement, the Processor
trait; which relies on synchronous calls, some with mutated arguments.
The ProcessorCoreServer
in the Processor project is accessed through the Reach org.clulab.reach.coserver.ProcessorCoreClient
class. The use of the client class is similar to the use of a class implementing the Processor
trait, as the client implements almost the same interface. The noteable exception is that methods of the client class return a new document (i.e., there are no side-effecting calls). As all configuration is confined to the server package, the client has no constructor arguments and may be instantiated and called directly, as in this example:
import org.clulab.reach.coserver.ProcessorCoreClient
...
val client = new ProcessorCoreClient // default processor is BioNLP processor
...
val text = "Children like smaller documents with smaller sentences."
val doc1 = client.mkDocument(text) // default is to keep text
val doc2 = client.tagPartsOfSpeech(doc1)
val doc = client.lemmatize(doc2)
val sentences = doc.sentences // retrieve the sentences from the final document
- Users (r--)
- Developers (-w-)
- Maintainers (--x)