Remove enchant #59

DylanASHillier · 2025-02-18T04:52:53Z

This was really annoying me...

Changes:
Adds dicts for english us and uk (downloaded from hunspell, which is used by e.g. openoffice) (envs.utils.word_lists.py)
Unifies the usage of word lists depending on if NLTK or not, also whether to use pronouns
Removes references to enchant everywhere + dependency
Also does some reworks to WordLadder environment (mainly on algorithm side)

Minor
My parser corrected some things, sorted imports etc.
Also may have to correct some of the typing I used in the word_lists utils file since it uses python 3.11

…due to limited nature of basic word list...

DylanASHillier · 2025-02-18T16:55:59Z

textarena/envs/two_player/WordChains/env.py

        Initialize the Word Chains game environment

        Args:
            max_turns (int): Maximum number of turns before the game ends in a draw.
        """
-
-        # Ensure NLTK words are loaded
-        self.word_list = list((set(word.lower() for word in words.words())))


technically this behaviour is changed as the dictionary is also built from the other dicts. so not just word stems

DylanASHillier · 2025-02-18T16:56:23Z

textarena/envs/two_player/WordChains/env.py

@@ -56,19 +42,21 @@ def reset(self, seed: Optional[int]=None):
            random.seed(seed)

        # pick a starting word
-        starting_word = random.choice(self.word_list)
+        starting_word = random.choice(self.dictionary.get_all_words())


for prior behaviour set to self.dictionary.nltk_words instead of get_all_words()

DylanASHillier · 2025-02-18T16:56:53Z

textarena/envs/utils/word_lists.py

+                if flag in suffixes:
+                    # apply suffix
+                    for rule in suffixes[flag]:
+                        # continue if flag is not mergeable


this code is not very pretty. Fairly certain it works but

DylanASHillier · 2025-02-18T16:57:42Z

textarena/data/en_US.dic

obviously not ideal to add 120,000 LOC but it is just a dictionary file. Need to add sourcing somewhere?

DylanASHillier · 2025-02-18T16:58:56Z

textarena/envs/two_player/LetterAuction/env.py

no effective differences here outside of changing dictionary source

DylanASHillier · 2025-02-18T16:59:31Z

textarena/envs/single_player/WordLadder/env.py

-            for i, word1 in enumerate(filtered_words):
-                for word2 in filtered_words[i+1:]:
-                    if self.one_letter_difference(word1, word2):
+            # Add edges for words differing by one letter using a more efficient approach


more efficient but equivalent version (checked for equivalence)

efficiency really matters for large word list

DylanASHillier · 2025-02-18T17:00:00Z

textarena/envs/single_player/WordLadder/env.py

-    #     for i, word in enumerate(self.k_len_words):
-    #         for other_word in self.k_len_words[i + 1 :]:
-    #             if sum(a != b for a, b in zip(word, other_word)) == 1: ## check if the words differ by exactly one letter
-    #                 graph.add_edge(word, other_word)


remove dead code?

DylanASHillier · 2025-02-18T17:01:11Z

textarena/envs/single_player/WordLadder/env.py

+
+            for start_word in start_words:
+                # Use single-source shortest paths from each start word
+                lengths = nx.single_source_shortest_path_length(G, start_word)


this is more efficient than trying to do shortest path between all nodes... should still give a decent sampling since the words are sampled anyway...

DylanASHillier added 3 commits February 18, 2025 12:50

adds dictionary stuff

e9aa830

updates word ladder too

9ba230b

updates readme etc.

78719e9

DylanASHillier requested a review from LeonGuertler February 18, 2025 04:52

DylanASHillier added 15 commits February 18, 2025 12:56

minor bugs

ea80d38

gets wordlist working

65053b8

adds the aff file

5f021e4

updates setup.py

289981a

trying to fix

48d1c8b

adds some debug prints

494ca38

?????

9b8cfcd

debugging still [don't merge]

dbd1af9

removes some bugs

3fd0906

updated

f06362b

removes prints

9cbb29d

updates desc

b957b15

fixes up

7b48de9

makes more efficient. Should consider moving to using full word list …

aa44b27

…due to limited nature of basic word list...

minor linting

f8fe85a

DylanASHillier commented Feb 18, 2025

View reviewed changes

textarena/envs/two_player/LetterAuction/env.py

Copy link

Contributor Author

DylanASHillier Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no effective differences here outside of changing dictionary source

DylanASHillier commented Feb 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove enchant #59

Remove enchant #59

DylanASHillier commented Feb 18, 2025 •

edited

Loading

DylanASHillier Feb 18, 2025

DylanASHillier Feb 18, 2025

DylanASHillier Feb 18, 2025

DylanASHillier Feb 18, 2025

DylanASHillier Feb 18, 2025

DylanASHillier Feb 18, 2025

DylanASHillier Feb 18, 2025

DylanASHillier Feb 18, 2025

DylanASHillier Feb 18, 2025

Remove enchant #59

Are you sure you want to change the base?

Remove enchant #59

Conversation

DylanASHillier commented Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DylanASHillier commented Feb 18, 2025 •

edited

Loading