Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem while extracting the proper noun #2

Open
nandanii opened this issue Aug 4, 2018 · 2 comments
Open

problem while extracting the proper noun #2

nandanii opened this issue Aug 4, 2018 · 2 comments

Comments

@nandanii
Copy link

nandanii commented Aug 4, 2018

file not found at line 24" wordsEn.txt "
screenshot from 2018-08-04 16-40-02

@dereckson
Copy link
Owner

dereckson commented Aug 4, 2018

The README provides the following instructions:

Source text
-----------
You need a copy of the text you want to extract from as plain text.

Source English word list
------------------------
The expected format is a list in lowercase, each line a substantive word.
Filename should be wordsEn.txt or modified in eliminate-common-nouns script.

Such file is available at http://www-01.sil.org/linguistics/wordlists/english/

Usage
-----
./extract-proper-nouns source.txt > nouns.txt

To sort them and eliminate duplicates:
./extract-proper-nouns source.txt | sort | uniq > nouns.txt

To discard known English words:
./eliminate-common-nouns nouns.txt

I guess there are two things to solve:

  1. offer a nice message if the file hasn't been found to explain how to generate it
  2. clarify README to indicate the download of a list of common nouns is mandatory

@dereckson
Copy link
Owner

This works:

wget http://www-01.sil.org/linguistics/wordlists/english/wordlist/wordsEn.txt
./extract-proper-nouns somebook.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants