-
Notifications
You must be signed in to change notification settings - Fork 0
prnicolas/24x7Content
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
24x7 Content library This is a draft implementation of a semantic analyzer that extract the topic topograhy from a document or a set of documents. The implementation runs on JDK 1.6 relies on - Information retrieval (modified tf-idf) - Semantic analysis (Wikipedia short and long descriptions classification, WordNet hypernyms and categories) - Machine learning (Conditional Random Fields, Naive Bayes,...) - Natural Language Processing (Tagging, chunking,....) The following open source libraries are to be added to the classpath in order to compile and execute the code base - Apache Log 4j 1.2.15 - Apache commons-code 1.5 - jUnit 4.0 - Open NLP tools 1.5 - OAuthSignPost 1.2.1 - Apache common-Net 2.2 - MySQL Connector for Java 5.1.1 The application has been successfully tested by extracting topics of similar documents retrieved through search with an accuracy of 82%. Related patent: "Methods and systems for extracting topics from documents using taxonomy graphs and kirchoff's" United States 61645413 - May 2012 Patrick Nicolas June 2012
About
Open source version of Semantic/Taxonomy search project - 24x7 Content - 2011-2012
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published