Group Anagrams together #65

zafstojano · 2025-02-05T23:20:58Z

What does this PR do?

This PR implements the reasoning task of grouping anagrams together from a list of words. Anagram is a word formed by rearranging the letters of a different word, using all the original letters exactly once.

For the purpose of the task, before I started implementing the environment, I first downloaded a list of all English that only contain alphabetic characters (not digits or special characters). Then, I created a Jupyter Notebook that processes these words and groups them into anagrams, as well as providing character-level counts for each group. The output is stored in reasoning_gym/data/anagrams.jsonl for potential future uses. Currently the notebook is also commited on this PR, but I can delete it upon request.

From there, the code for the task is pretty straight-forward: go over each word in the provided input list, count the occurrences of each character, and add the word in a hashmap base on its character-level code.

Example

import reasoning_gym

# Defaults:
# anagram_groups: int = 10  # Groups of anagrams present in the input
# max_words_per_group: int = 5  # Maximum number of words in a single anagram group

data = reasoning_gym.create_dataset('group_anagrams', size=3, seed=42)

for i, item in enumerate(data):
    print(f"Example {i+1}:")
    print(f"Question: {item['question']}")
    print(f"Answer: {item['answer']}\n")
    print(f"Metadata: {item['metadata']}\n")
    print("--------------------------------------------------\n")

Example 1:
Question: An anagram is a word formed by rearranging the letters of a different word, using all the original letters exactly once.

Your job is to group the anagrams together. You can return the answer in any order.

Example:
Input: ["eat", "tea", "tan", "ate", "nat", "bat"]
Output: [["bat"], ["nat", "tan"], ["ate", "eat", "tea"]]
Explanation:
    - There is no string in the input that can be rearranged to form "bat".
    - The strings "nat" and "tan" are anagrams as they can be rearranged to form each other.

Group the following list of words into anagrams:
["tinglers", "argonon", "ditas", "palinodist", "merocyte", "conterminal", "canny", "nancy", "outasight", "autosight", "oversauciness", "applauders", "suprapedal"]

Answer: [["applauders", "suprapedal"], ["argonon"], ["autosight", "outasight"], ["canny", "nancy"], ["conterminal"], ["ditas"], ["merocyte"], ["oversauciness"], ["palinodist"], ["tinglers"]]

Metadata: {'words': ['tinglers', 'argonon', 'ditas', 'palinodist', 'merocyte', 'conterminal', 'canny', 'nancy', 'outasight', 'autosight', 'oversauciness', 'applauders', 'suprapedal'], 'solution': [['applauders', 'suprapedal'], ['argonon'], ['autosight', 'outasight'], ['canny', 'nancy'], ['conterminal'], ['ditas'], ['merocyte'], ['oversauciness'], ['palinodist'], ['tinglers']]}

--------------------------------------------------

Example 2:
Question: An anagram is a word formed by rearranging the letters of a different word, using all the original letters exactly once.

Your job is to group the anagrams together. You can return the answer in any order.

Example:
Input: ["eat", "tea", "tan", "ate", "nat", "bat"]
Output: [["bat"], ["nat", "tan"], ["ate", "eat", "tea"]]
Explanation:
    - There is no string in the input that can be rearranged to form "bat".
    - The strings "nat" and "tan" are anagrams as they can be rearranged to form each other.

Group the following list of words into anagrams:
["regear", "escrod", "coders", "decors", "credos", "scored", "semitaur", "muriates", "peripterous", "zanies", "expatiater", "wooled", "meningomyelocele", "myelomeningocele", "vainest", "natives", "naivest", "preludes", "repulsed"]

Answer: [["coders", "credos", "decors", "escrod", "scored"], ["expatiater"], ["meningomyelocele", "myelomeningocele"], ["muriates", "semitaur"], ["naivest", "natives", "vainest"], ["peripterous"], ["preludes", "repulsed"], ["regear"], ["wooled"], ["zanies"]]

Metadata: {'words': ['regear', 'escrod', 'coders', 'decors', 'credos', 'scored', 'semitaur', 'muriates', 'peripterous', 'zanies', 'expatiater', 'wooled', 'meningomyelocele', 'myelomeningocele', 'vainest', 'natives', 'naivest', 'preludes', 'repulsed'], 'solution': [['coders', 'credos', 'decors', 'escrod', 'scored'], ['expatiater'], ['meningomyelocele', 'myelomeningocele'], ['muriates', 'semitaur'], ['naivest', 'natives', 'vainest'], ['peripterous'], ['preludes', 'repulsed'], ['regear'], ['wooled'], ['zanies']]}

--------------------------------------------------

Example 3:
Question: An anagram is a word formed by rearranging the letters of a different word, using all the original letters exactly once.

Your job is to group the anagrams together. You can return the answer in any order.

Example:
Input: ["eat", "tea", "tan", "ate", "nat", "bat"]
Output: [["bat"], ["nat", "tan"], ["ate", "eat", "tea"]]
Explanation:
    - There is no string in the input that can be rearranged to form "bat".
    - The strings "nat" and "tan" are anagrams as they can be rearranged to form each other.

Group the following list of words into anagrams:
["eagerest", "granitite", "helium", "nizam", "nazim", "striplings", "slipstring", "rearrest", "arrester", "bf", "tadpolism", "canun", "cunan", "isotonic"]

Answer: [["arrester", "rearrest"], ["bf"], ["canun", "cunan"], ["eagerest"], ["granitite"], ["helium"], ["isotonic"], ["nazim", "nizam"], ["slipstring", "striplings"], ["tadpolism"]]

Metadata: {'words': ['eagerest', 'granitite', 'helium', 'nizam', 'nazim', 'striplings', 'slipstring', 'rearrest', 'arrester', 'bf', 'tadpolism', 'canun', 'cunan', 'isotonic'], 'solution': [['arrester', 'rearrest'], ['bf'], ['canun', 'cunan'], ['eagerest'], ['granitite'], ['helium'], ['isotonic'], ['nazim', 'nizam'], ['slipstring', 'striplings'], ['tadpolism']]}

--------------------------------------------------

zafstojano · 2025-02-05T23:22:24Z

"quick fix" 😆

andreaskoepf

Another really nice tokenizer challenge! ;-) Could you please remove words_alpha.txt from the PR (only keeping the end anagrams.jsonl end-product). The notebook can be kept (ideally add the source of words_alpha.txt).

reasoning_gym/algorithmic/group_anagrams.py

notebooks/generate_anagrams.ipynb

reasoning_gym/algorithmic/group_anagrams.py

andreaskoepf

Nice, thanks!

zafstojano added 3 commits February 5, 2025 16:25

generate all english anagrams

76a3d47

group anagrams env

b4f4129

docs

7827025

andreaskoepf requested changes Feb 6, 2025

View reviewed changes

reasoning_gym/algorithmic/group_anagrams.py Outdated Show resolved Hide resolved

notebooks/generate_anagrams.ipynb Show resolved Hide resolved

reasoning_gym/algorithmic/group_anagrams.py Outdated Show resolved Hide resolved

zafstojano added 4 commits February 6, 2025 10:12

delete words_alpha.txt

5cc45d6

add source for words_alpha.txt

6ec6876

use get_data_file_path to read file contents

256eb71

test malformed json answer

8244657

andreaskoepf approved these changes Feb 6, 2025

View reviewed changes

andreaskoepf merged commit d2bef8d into open-thought:main Feb 6, 2025
3 checks passed

zafstojano deleted the env/group-anagrams branch February 6, 2025 12:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Group Anagrams together #65

Group Anagrams together #65

zafstojano commented Feb 5, 2025

zafstojano commented Feb 5, 2025 •

edited

Loading

andreaskoepf left a comment

andreaskoepf left a comment

Group Anagrams together #65

Group Anagrams together #65

Conversation

zafstojano commented Feb 5, 2025

What does this PR do?

Example

zafstojano commented Feb 5, 2025 • edited Loading

andreaskoepf left a comment

Choose a reason for hiding this comment

andreaskoepf left a comment

Choose a reason for hiding this comment

zafstojano commented Feb 5, 2025 •

edited

Loading