Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make this an importable lib #14

Open
schlitzered opened this issue Jul 16, 2020 · 1 comment
Open

make this an importable lib #14

schlitzered opened this issue Jul 16, 2020 · 1 comment

Comments

@schlitzered
Copy link

schlitzered commented Jul 16, 2020

hi, i find the tool pretty use full, and it would be nice if you could make this a lib, with a stable interface, that can be imported into other projects.

for this i would suggest that the logic to choose "corpus" should move into find_acronyms

@mshemuni
Copy link

mshemuni commented Nov 17, 2022

I'd say it is.
Just looking at the code one can see the acronym can be used as:

import nltk
from acronym.acronym import find_acronyms

ac.acronym.find_acronyms("Hello World", nltk.corpus.gutenberg, min_length=2)

Output:


Collecting word corpus
Identifying matching acronyms
Process Complete
        long_version  score
acronym
HOWL     HellO WorLd     18
HEW      HEllo World     15
HOOD     HellO wOrlD     15
HOW      HellO World     15
HELD     HEllo worLD     13
HERD     HEllo woRlD     13
HOLD     HellO worLD     13
HOD      HellO worlD     10
HOO      HellO wOrld     10
HER      HEllo woRld      8
HOR      HellO woRld      8
HO       Hello wOrld      5

see:

def find_acronyms(s, corpus, min_length=5, max_length=7):

One can change corpus

  1. nltk.corpus.words
  2. nltk.corpus.brown
  3. nltk.corpus.gutenberg

Do not forget to change max and min length. In my example 5 was too long and the output was empty DataFrame.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants