-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Group Anagrams together #65
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another really nice tokenizer challenge! ;-) Could you please remove words_alpha.txt
from the PR (only keeping the end anagrams.jsonl end-product). The notebook can be kept (ideally add the source of words_alpha.txt).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks!
What does this PR do?
This PR implements the reasoning task of grouping anagrams together from a list of words. Anagram is a word formed by rearranging the letters of a different word, using all the original letters exactly once.
For the purpose of the task, before I started implementing the environment, I first downloaded a list of all English that only contain alphabetic characters (not digits or special characters). Then, I created a Jupyter Notebook that processes these words and groups them into anagrams, as well as providing character-level counts for each group. The output is stored in
reasoning_gym/data/anagrams.jsonl
for potential future uses. Currently the notebook is also commited on this PR, but I can delete it upon request.From there, the code for the task is pretty straight-forward: go over each word in the provided input list, count the occurrences of each character, and add the word in a hashmap base on its character-level code.
Example