nifi-open-nlp

A set of NiFi processors implementing Apache OpenNLP engine tools.

Project structure

Project has been generated using Maven archetype org.apache.nifi:nifi-processor-bundle-archetype:1.8.0

It is a Java 8 project built by Maven 3.3+ and following Maven layout conventions.

One can find a docker-compose setup to run NiFi locally with a predefined workflow, present as examples of use.

Building & running

You can build project then reuse the nar file produced in your NiFi or boot a Docker container ready to use.

From sources

Maven commands are available to build the project, using

mvn clean package

This will run the tests locally and prepare a nar file that you can drop into your current nifi install, should you have one.

Inside Docker container

Simply run the docker-compose file using

docker-compose up

Build is done inside the container, as a separate maven layer, so expect to wait a few seconds for Maven to download the internet.

Then the nar file is copied into NiFi lib/ folder and NiFi is started as a container, available on the port 8080.

The configuration directory for NiFi ($NIFI_HOME/conf or /opt/nifi/nifi-current/conf) has been mapped to the local folder ./nifi-local-data/conf.

NLP models folder

A new NiFi folder exists under $NIFI_HOME/models that contains the pre-trained models for English language:

en-chunker.bin
en-doccat-tweets.bin
en-ner-date.bin
en-ner-location.bin
en-ner-money.bin
en-ner-organization.bin
en-ner-percentage.bin
en-ner-person.bin
en-ner-time.bin
en-parser-chunking.bin
en-pos-maxent.bin
en-pos-perceptron.bin
en-sent.bin
en-token.bin
langdetect-183.bin

NLP training

A new NiFi folder exists under $NIFI_HOME/training that contains tweets.txt, an example of training data for sentiment analysis on tweets (see Document Categorizer) taken from this discussion on StackOverflow.

NLP model store

Another new folder under $NIFI_HOME/model-store is present and will hold the trained models for the processors.

The rationale is that processors can be trained using both model files, training files and training data so input types differ, but at the end of the day, it all ends in a model file that can be stored and reused by the processors. Lifecycle of processors training/evaluation will be explained further.

Importing from Jitpack

Add Jitpack repository in your maven project:

<repositories>
	<repository>
	    <id>jitpack.io</id>
	    <url>https://jitpack.io</url>
	</repository>
</repositories>

and the maven dependency on the github project:

<dependency>
    <groupId>com.github.rdlopes</groupId>
    <artifactId>nifi-open-nlp</artifactId>
    <version>${nifi-open-nlp.version}</version>
</dependency>

Importing from GitHub Package Registry

The feature is temporarily disabled, I'm waiting for GitHub feedback on few issues.

Apache NLP tools

Following tools listed in the OpenNLP developer documentation are implemented:

For further documentation, please refer to processors usage page.

Name		Name	Last commit message	Last commit date
Latest commit History 230 Commits
.github/workflows		.github/workflows
nifi-local-data		nifi-local-data
nifi-nlp-nar		nifi-nlp-nar
nifi-nlp-processors		nifi-nlp-processors
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yaml		docker-compose.yaml
pom.xml		pom.xml
settings.xml		settings.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nifi-open-nlp

Project structure

Building & running

From sources

Inside Docker container

NLP models folder

NLP training

NLP model store

Importing from Jitpack

Importing from GitHub Package Registry

Apache NLP tools

About

Releases 4

Packages

Languages

rdlopes/nifi-open-nlp

Folders and files

Latest commit

History

Repository files navigation

nifi-open-nlp

Project structure

Building & running

From sources

Inside Docker container

NLP models folder

NLP training

NLP model store

Importing from Jitpack

Importing from GitHub Package Registry

Apache NLP tools

About

Topics

Resources

Stars

Watchers

Forks

Releases 4

Packages 0

Languages

Packages