Machine Learning news headline categorizer in Ruby

In this primitive machine learning experiment a Ruby program is categorizing news article headlines into one of the following categories:

Business
Science and technology
Entertainment
Health

Aggregated historical news headline datasets are used to initially train the 'brain' of the machine.

These datasets are being parsed into dictionaries containing wordlists separated into the categories above. Each word has a frequency value associated to it. This value describes how many times this word appeared in headlines of the category.

Wordlists are adjusted to exclude stop words.

The algorithm cleans, tokenizes the given input then looks up the highest probability based on the sum of word frequency in each category.

Usage

First download the uci-news-aggregator.csv dataset (102.9MB) from here
Adjust 'brain size' by changing DEBUGLIMIT parsedata.rb or set DEBUG = 0 to completely disable limits
Run program:

$ ruby main.rb /path/to/uci-news-aggregator.csv

Processing uci-news-aggregator.csv
Finished processing uci-news-aggregator.csv
Records processed: 20000

[...]

Brain initialization complete
Let me try to categorize your sentence (type quit or Ctrl+D to exit)
> Doctors found an effective drug for Alzheimers
Highest probability: m {"b"=>7, "t"=>27, "e"=>78, "m"=>338}
>

Enter news headlines on standard input to evaluate

Further steps of development

Training facility
Web aggregation
Human supervised brain training

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
files		files
README.md		README.md
main.rb		main.rb
newscategorizer.rb		newscategorizer.rb
parsedata.rb		parsedata.rb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning news headline categorizer in Ruby

Usage

Further steps of development

Links, References

About

Releases

Packages

Languages

adamkovesdi/news-categorizer

Folders and files

Latest commit

History

Repository files navigation

Machine Learning news headline categorizer in Ruby

Usage

Further steps of development

Links, References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages