Skip to content

Probabilistically splits joined words based on their unigram frequencies (i.e each word's frequency as a ratio of the number of times that word appears and the total number of words)

Notifications You must be signed in to change notification settings

tayoogunbiyi/word-splitter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Word-splitter

This helps to split words which are joined together withoutanydelimeter.

I was working on a problem involving extracting text from some weirdly formatted PDF files then came across this really smart stack overflow answer - How to split text without spaces into list of words ? which then led me to this great package - Word Ninja

I decided to re-write it in Go.

Installation

go get github.com/tayoogunbiyi/word-splitter

Usage

package main

import (
	"fmt"
	 wordsplitter "github.com/tayoogunbiyi/word-splitter"
)

func main(){
    fmt.Println(wordsplitter.Split("welcometomycity")) // outputs ["welcome", "to" ,"my" ,"city"]
    fmt.Println(wordsplitter.Split("2020istheyear")) // outputs ["2020" ,"is" ,"the" ,"year"]

}

About

Probabilistically splits joined words based on their unigram frequencies (i.e each word's frequency as a ratio of the number of times that word appears and the total number of words)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages