-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove enchant #59
base: main
Are you sure you want to change the base?
Remove enchant #59
Conversation
…due to limited nature of basic word list...
Initialize the Word Chains game environment | ||
|
||
Args: | ||
max_turns (int): Maximum number of turns before the game ends in a draw. | ||
""" | ||
|
||
# Ensure NLTK words are loaded | ||
self.word_list = list((set(word.lower() for word in words.words()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically this behaviour is changed as the dictionary is also built from the other dicts. so not just word stems
@@ -56,19 +42,21 @@ def reset(self, seed: Optional[int]=None): | |||
random.seed(seed) | |||
|
|||
# pick a starting word | |||
starting_word = random.choice(self.word_list) | |||
starting_word = random.choice(self.dictionary.get_all_words()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for prior behaviour set to self.dictionary.nltk_words instead of get_all_words()
if flag in suffixes: | ||
# apply suffix | ||
for rule in suffixes[flag]: | ||
# continue if flag is not mergeable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code is not very pretty. Fairly certain it works but
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obviously not ideal to add 120,000 LOC but it is just a dictionary file. Need to add sourcing somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no effective differences here outside of changing dictionary source
for i, word1 in enumerate(filtered_words): | ||
for word2 in filtered_words[i+1:]: | ||
if self.one_letter_difference(word1, word2): | ||
# Add edges for words differing by one letter using a more efficient approach |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more efficient but equivalent version (checked for equivalence)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
efficiency really matters for large word list
# for i, word in enumerate(self.k_len_words): | ||
# for other_word in self.k_len_words[i + 1 :]: | ||
# if sum(a != b for a, b in zip(word, other_word)) == 1: ## check if the words differ by exactly one letter | ||
# graph.add_edge(word, other_word) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove dead code?
|
||
for start_word in start_words: | ||
# Use single-source shortest paths from each start word | ||
lengths = nx.single_source_shortest_path_length(G, start_word) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is more efficient than trying to do shortest path between all nodes... should still give a decent sampling since the words are sampled anyway...
This was really annoying me...
Changes:
Adds dicts for english us and uk (downloaded from hunspell, which is used by e.g. openoffice) (envs.utils.word_lists.py)
Unifies the usage of word lists depending on if NLTK or not, also whether to use pronouns
Removes references to enchant everywhere + dependency
Also does some reworks to WordLadder environment (mainly on algorithm side)
Minor
My parser corrected some things, sorted imports etc.
Also may have to correct some of the typing I used in the word_lists utils file since it uses python 3.11