Frequent Phrases

Process large chunks of text into a node tree, which can then be traversed to grab phrases that match the given criteria.

To install:

npm install frequent-phrases

Basic Usage

The workflow is generally:

Construct FrequentPhrase instance
Define custom config (Optional)
Process text
Output frequent phrases

Construct

const FP = new FrequentPhrase();

Custom Config (more info HERE)

The default config object is as follows:

const defaultConfig = {
    maxPhraseLength: 6,
    selectionAlgorithm: 'dropOff',
    selectionConfig: {
        dropOff: {
            threshold: 0.5
        }
    },
    scoringAlgorithm: 'default',
    parserConfig: {
        chunkSentences: true,
        removeTypedSentences: true
    },
    preProcessing: {
        trim: 3
    },
    postProcessing: {
        uniqueWordAtCutoffDepth: 1
    }
}

Access the config property to modify this after instantiation, or construct a new config object and pass it in.

const FP = new FrequentPhrase();

FP.config = newConfigObject;
// or
FP.config.maxPhraseLength = 8; // etc.

Process Text And/Or Generate Phrases

The last bit can be separated out, or done altogether.

const speech = 'Five score years ago, a great American, in whose symbolic shadow' // ... MLK's I Have A Dream speech

To process text and then extract phrases:

await FP.process(speech);

// then get Frequent Phrases
await FP.getFrequentPhrases().then((res) => console.log(res))

To do both, just pass text in to getFrequentPhrases(). Note that this method overwrites previous tree data, and is best served if you are instantiating a new FrequentPhrase() everytime.

await FP.getFrequentPhrases(speech).then((res) => console.log(res));

Both methods will yield the same result:

// ^^^^^ console.log(res);
{
    ok: true
    msg: ''
    frequentPhrases: [
        { phrase: "", score: 0 },
        { phrase: "", score: 0 },
        { phrase: "", score: 0 },
        ...
    ]
    executionTime: '3.544ms'
}

***For More

Modifying the Library

To help understanding of best ways to modify for a specific use-case, the library works as follows:

Input corpus
Pre-process potential candidates
Select Candidates
Score selected Candidates
Post-process candidates
Output

Config

Pre-Processing

trim - Trims candidate pool to only originate from the top trim starter words. Trim defaults to 0, or no trim.

Candidate Selection

Selection Algorithm - Algorithm to use for selection algorithm. Default is a simple dropoff, which cuts off phrases based on their relative visits between child / parent.
Selection Config - Stores constants to modify how selection algorithms perform. See here.

Candidate Scoring:

Defines what scoring algorithm is used. Default algo is based solely on averaged visits, meaning a higher visit average yields higher scores.

Post-Processing

uniqueWordAtCutoffDepth - Trims scored candidates so that the highest-scored phrase from each starter word is represented.

Parser Config

chunkSentences - convert a string into an array of it's contained sentences
removeTypedSentences - find the unique, longest sentence amongst a gamut of typed copies of the same sentence.
- e.g.: We are only interested in the sentence 'How are you?' but we have:
  - 'H'
  - 'Ho'
  - 'How'
  - ...
  - 'How are you?'

More

.getFrequentPhrases(body)
.process(body)
.reset()

FrequentPhrase.getFrequentPhrases(body)

Return Frequent Phrases from data already processed.

Returns: Promise.<FP> - Frequent phrases present in the text

Param	Description
body	OPTIONAL - string of text, if passed it will be processed and then phrases will be extracted. If not passed, phrases will be extracted from existing data.

FrequentPhrase.process(body)

Process a string of sentences. Frequent phrases can only be extracted from processed text.

Returns: Promise<string[] | FPNode[]> - [registry, rootNode]

Param	Description
body	string of text, if passed it will be processed and then phrases will be extracted. If not passed, phrases will be extracted from existing data.

FrequentPhrase.reset()

Cleans out the sentence registry and destroys the node tree

Returns: Promise<string[] | FPNode[]> - [registry, rootNode]

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
src		src
test		test
.DS_Store		.DS_Store
.eslintignore		.eslintignore
.eslintrc		.eslintrc
.gitignore		.gitignore
.prettierrc		.prettierrc
LICENSE.md		LICENSE.md
README.md		README.md
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Frequent Phrases

Basic Usage

Construct

Custom Config (more info HERE)

Process Text And/Or Generate Phrases

***For More

Modifying the Library

Config

Pre-Processing

Candidate Selection

Candidate Scoring:

Post-Processing

Parser Config

More

FrequentPhrase.getFrequentPhrases(body)

FrequentPhrase.process(body)

FrequentPhrase.reset()

About

Releases

Packages

Languages

License

spokenaac/FrequentPhrases

Folders and files

Latest commit

History

Repository files navigation

Frequent Phrases

Basic Usage

Construct

Custom Config (more info HERE)

Process Text And/Or Generate Phrases

***For More

Modifying the Library

Config

Pre-Processing

Candidate Selection

Candidate Scoring:

Post-Processing

Parser Config

More

FrequentPhrase.getFrequentPhrases(body)

FrequentPhrase.process(body)

FrequentPhrase.reset()

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages