Skip to content

Files

Latest commit

987fd51 · Sep 11, 2024

History

History

ner

NER Playground

Today, data can be found in many different places. In order to use it however one must extract it and process it to have it in a useable format. As it turns out Large Language Models (LLMs) are a great tool for this purpose.

NER Playground is a place to try Open Source LLMs for data extraction from a varied source of media: PDFs, images, audio, websites, and more!

NER Playground is powered by LLMs served by OctoAI

Prerequisites

The main prerequisite to get started with the NER Playground is an OctoAI account and an API Key. These can be created at Octo.AI.

Overview

Using LLMs for data extraction can benefit several industries and applications:

  • Healthcare: Create reports from recordings of patient consultations.
  • Finance: Extract structured pieces of information from large sets of irregular documents like company quarterly reportings.
  • Law: Extract subject names, company names, addresses, facts, and other key points from contracts.
  • Education: Create preparation cards based on student assignments.

In this playground you can easily modify the schema that will be used by the LLM to identify and extract key pieces of information in an structured manner. You can run several files through the same schema, finishing with a table that can be exported to a CSV file or other types of databases.

For Developers

If you are a developers then you are more than welcome to see the code for this application and create your own. This project has been packaed with Poetry for Python.

Open in GitHub Codespaces

Running locally

First input your different api keys in env.sh. Then:

source env.sh

Now, let's setup a new poetry environment:

poetry install --no-root

Now run via:

poetry run streamlit run ner_solution.py