Skip to content

Commit

Permalink
v1.0.0: CLI, datasets, & README (#5)
Browse files Browse the repository at this point in the history
## Overview
- add command line interface based on the `cxxopts` [library](https://github.com/jarro2783/cxxopts), and `src` build flow
- add three new word dictionary datasets
- add proper `README.md`
- add `GNU AGPL v3` license
- general code cleanup for release

## Details
- the CLI, after parsing options, instantiates a proper `cw_csp` and attempts to solve it
- add `cw_gen` class to handle CLI actions and own the `cw_csp` to be solved
- add example preset crosswords generation parameters to CLI
- add progress bar option
- datasets added were the top 3000 english words, top 10000 english words, and [Crossfire's](https://beekeeperlabs.com/crossfire/) default dictionary
- gifs in `README.md` created using [terminalizer](https://github.com/faressoft/terminalizer)

## Notes
- this is the last PR to be submitted before the project is put on pause until the summer 2024 SWE internship recruiting season is over or an offer is accepted
- official `v1.0.0` release to soon follow
- the `Future Work` section in `README.md` serves as the feature map for future versions
- `word_finder` (finally) updated to follow RAII as referenced in #4
  • Loading branch information
ashleyzhang216 authored Sep 14, 2023
1 parent 178a9d7 commit 028db42
Show file tree
Hide file tree
Showing 30 changed files with 203,892 additions and 45 deletions.
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@
*.out
*.app

#macOS
.DS_Store

# build products
*.o
test/build
test/build
src/build
661 changes: 661 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

93 changes: 92 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,94 @@
# crossword-gen

This project is in progress: to run current testing suite, navigate to `crossword-gen/test` and run `sh test.sh`
[![License: AGPL v3](https://img.shields.io/badge/License-AGPL_v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0) ![Version: 1.0](https://img.shields.io/badge/Version-1.0-blue)

`cw_gen` lets you generate crossword puzzles by specifying a word list and a grid to fill in.

![Animated terminal gif to illustrate generation of example crosswords](assets/example.gif)

## 🚀 Installation

![Animated terminal gif to illustrate building of project](assets/build.gif)

The unix binary is already included as `cw_gen`. To re-build, run the build script:

```bash
sh build.sh
```

## ⚙️ Usage

![Animated terminal gif to illustrate generation of crosswords with user parameters](assets/demo.gif)

```bash
./cw_gen [OPTIONS]
```

### 🧩 `size`

| Option | Description | Type | Default |
|----------------|--------------------------------------------------------------------------------------------------------------|---------|---------|
| `-s` | Dimensions of the crossword to be generated, in the format `columns,rows`. Size must be greater than 1x1. Large sizes may take several minutes or more to solve, especially if a large dictionary is used, or if difficult or no grid contents are specified. | int,int | `4,4` |

### 📝 `grid contents`

| Option | Description | Type | Default |
|--------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|---------|
| `-c` | Crossword grid to fill in, from top left to bottom right by row, and **contained within quotes**. Each char represents one tile and must be a set letter (`a-z`), wildcard (`?`), or black tile (single space). Total length must equal `rows * columns`. If left unspecified, all tiles are set to wildcards. | string | none |

### 📚 `word dictionary`

| Option | Description | Type | Default |
|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|------------|
| `-d` | Word list to use. Possible options from smallest to largest are `top1000`, `top3000`, `top10000`, `crossfire`, and `all`. Larger dictionaries are more effective for finding solutions for larger puzzles at the cost of likely increased runtime and possibly more unfamiliar words chosen. | string | `top10000` |

### 📋 `examples`

| Option | Description | Type | Default |
|-------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|---------|
| `-e` | Preset example crossword generation params, **overriding size, contents, and dictionary if specified**. Possible options are `empty`, `cross`, `bridge`, `stairs`, `donut`, `crosshair1`, and `crosshair2`. | string | none |

### 🗣️ `verbosity`

| Option | Description | Type | Default |
|---------------------|------------------------------------------------------------------------------------------------------------------------------------|--------|---------|
| `-v` | Minimum verbosity for print statements. Possible options in decreasing level are `fatal`, `error`, `warning`, `info`, and `debug`. | string | `fatal` |

### 🔄 `progress bar`

| Option | Description | Type | Default |
|----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|------|---------|
| `-p` | Enables progress bar; useful when generating larger crossword puzzles with longer runtimes. Disabled by default unless specified. | none | none |

### ℹ️ `help`

| Option | Description | Type | Default |
|------------|--------------------------------------------------------------------|------|---------|
| `-h` | Print usage to display a summary of all options, their usage, and descriptions. | none | none |

Since generation is done purely via heuristics without randomness, re-generating a crossword with the same parameters is likely to return the same output, especially if not many valid solutions for those parameters exist. Also please note that if used, the grid contents string length must match the puzzle size or the default 4x4 puzzle size if none is specified.

## ✅ Tests

![Animated terminal gif to illustrate testing the project](assets/test.gif)

This project uses the [Catch2 v2.x](https://github.com/catchorg/Catch2/tree/v2.x) testing framework to test each module. To run the test suite, navigate to the `test/` directory and run the test script:

```bash
cd test
sh test.sh
```

## ⚙️ How it works

After parsing the crossword parameters, `cw_gen` constructs a [constraint satisfaction problem (CSP)](https://en.wikipedia.org/wiki/Constraint_satisfaction_problem) that represents the crossword. CSP variables symbolize words, and their domains are initially set to all the words from the user-specified dictionary that fit in that slot. CSP constraints occur when any two variables intersect in the crossword; those two letters in each of the word variables must be equal.

Using a combination of the [AC-3 algorithm](https://en.wikipedia.org/wiki/AC-3_algorithm) and [backtracking](https://en.wikipedia.org/wiki/Backtracking), the CSP iteratively assigns values to variables until a valid solution is found or the search space is exhausted.

## 💡 Future work

- Incorporate word frequency or other scoring heuristics into word selection to create minimize unfamiliar words used and speed up generation
- Explore alternatives to arc consistency such as path consistency to improve CSP simplification performance
- Investigate different CSP solving options in addition to backtracking like the VLNS method, local search, and/or linear programming
- Upgrade CSPs to be [dynamic CSPs](https://en.wikipedia.org/wiki/Constraint_satisfaction_problem#Dynamic_CSPs) to allow insertion of black tiles to complete crosswords. This will minimize user effort required to avoid impossible crossword grid patterns
- Clue generation/lookup
Binary file added assets/build.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/demo.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/example.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/test.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
24 changes: 24 additions & 0 deletions build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# ==================================================================
# Author: Ashley Zhang ([email protected])
# Date: 9/6/2023
# Description: build script to build tool w/ Makefile
# ==================================================================

SRCDIR="src"
BUILDDIR="build"
MAINFILE="cw_gen"
LOGFILE="buildlog"

cd ${SRCDIR}

# soft clean
rm -rf ${BUILDDIR}
mkdir ${BUILDDIR}

# build
make > ${BUILDDIR}/${LOGFILE}.txt

# clean up
mv *.o ${BUILDDIR}
cd ..
mv ${SRCDIR}/${MAINFILE} ${MAINFILE}
Binary file added cw_gen
Binary file not shown.
78 changes: 78 additions & 0 deletions src/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# ==================================================================
# Author: Ashley Zhang ([email protected])
# Date: 7/4/2023
# Description: Makefile to build tool for running
# ==================================================================

# main directories
SRC_DIR=./

# src dirs
UTIL_DIR=utils
COMMON_DIR=common
WORDFINDER_DIR=word_finder
CROSSWORD_DIR=crossword
CWCSP_DIR=cw_csp
CWGEN_DIR=cli

# g++ settings
D = -D
DEFINES = $(D) CW_TOP
CPPFLAGS += -pipe
CXXFLAGS += -g -W -Werror -Wall -Wconversion -Wextra -std=c++17 $(DEFINES)
INC=-I$(SRC_DIR)

BIN = cw_gen

all : $(BIN)

clean :
rm -f $(BIN) *.o

# path of where object files will be created
OFILES = \
common_data_types.o \
cw_utils.o \
common_parent.o \
\
word_finder.o \
word_finder_data_types.o \
\
crossword.o \
crossword_data_types.o \
\
cw_csp.o \
cw_csp_data_types.o \
\
cw_gen.o \

# full path of all header files to build
HFILES = \
$(SRC_DIR)/$(UTIL_DIR)/cw_utils.h \
$(SRC_DIR)/$(COMMON_DIR)/common_data_types.h \
$(SRC_DIR)/$(COMMON_DIR)/common_parent.h \
\
$(SRC_DIR)/$(WORDFINDER_DIR)/word_finder.h \
$(SRC_DIR)/$(WORDFINDER_DIR)/word_finder_data_types.h \
\
$(SRC_DIR)/$(CROSSWORD_DIR)/crossword.h \
$(SRC_DIR)/$(CROSSWORD_DIR)/crossword_data_types.h \
\
$(SRC_DIR)/$(CWCSP_DIR)/cw_csp.h \
$(SRC_DIR)/$(CWCSP_DIR)/cw_csp_data_types.h \
\
$(SRC_DIR)/$(CWGEN_DIR)/cw_gen.h \
$(SRC_DIR)/$(CWGEN_DIR)/cw_gen_data_types.h \
$(SRC_DIR)/$(CWGEN_DIR)/cxxopts.hpp \

$(BIN) : $(OFILES)
$(CXX) $(INC) *.o -o $(BIN)

cw_gen.o : $(SRC_DIR)/$(CWGEN_DIR)/cw_gen.cpp
$(CXX) $(CPPFLAGS) $(CXXFLAGS) $(INC) -c $<

#################### SRC ####################

# rule to compile all cpp files
%.o : $(SRC_DIR)/*/%.cpp $(HFILES)
$(CXX) $(CPPFLAGS) $(CXXFLAGS) $(INC) -c $<
173 changes: 173 additions & 0 deletions src/cli/cw_gen.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
// ==================================================================
// Author: Ashley Zhang ([email protected])
// Date: 9/6/2023
// Description: class implementation & main() for main crossword generation tool
// ==================================================================

#include "cw_gen.h"

using namespace cw_gen_ns;

/**
* @brief constructor for a crossword generation tool with all default values
*/
cw_gen::cw_gen(string name) : common_parent(name) {
// do nothing
}

/**
* @brief shorthand function to squash param options into a readable str
*
* @param options vector of options to squash
* @returns str representation of options
*/
string cw_gen::squash_options(vector<string> options) {
stringstream opts_ss;

for(uint i = 0; i < options.size(); i++) {
opts_ss << "\"" << options[i] << "\"";
if(i != options.size() - 1) opts_ss << ", ";
if(i == options.size() - 2) opts_ss << "or ";
}

return opts_ss.str();
}

/**
* @brief build cw_csp after all params set
*/
void cw_gen::build() {
if(has_grid_contents) {
csp = make_unique<cw_csp>("puzzle", length, height, contents, dict_path[dict]);
} else {
csp = make_unique<cw_csp>("puzzle", length, height, dict_path[dict]);
}
}

/**
* @brief main function using command line interface w/ cxxopts to generate crosswords
*/
int main(int argc, char** argv) {

unique_ptr<cw_gen> cwgen = make_unique<cw_gen>("cwgen");
cxxopts::Options options("cw_gen", "Crossword puzzle generation tool given word list and blank grid to fill in");

// key: param name, value: set of allowed param values
unordered_map<string, vector<string> > param_vals = {
{"dict", {"top1000", "top3000", "top10000", "crossfire", "all"}},
{"example", {"empty", "cross", "bridge", "stairs", "donut", "crosshair1", "crosshair2"}},
{"verbosity", {"fatal", "error", "warning", "info", "debug"}}
};

// because these are very long
stringstream contents_desc, examples_desc;
contents_desc << "Puzzle grid contents from top left to bottom right by row, all lowercase, in quotes. wildcard is: \""
<< WILDCARD << "\", black is: \"" << BLACK << "\"";
examples_desc << "Example crossword puzzles (overrides size/contents/dict): " << cw_gen::squash_options(param_vals["example"]);

options.add_options()
("s,size", "Puzzle size, in format length,height", cxxopts::value<vector<int>>()->default_value("4,4"))
("c,contents", contents_desc.str(), cxxopts::value<string>())
("d,dict", "Word dictionary: " + cw_gen::squash_options(param_vals["dict"]), cxxopts::value<string>()->default_value("top10000"))
("e,example", examples_desc.str(), cxxopts::value<string>())
("v,verbosity", "Debug verbosity: " + cw_gen::squash_options(param_vals["verbosity"]), cxxopts::value<string>()->default_value("fatal"))
("p,progress", "Enable progress bar (default: false)", cxxopts::value<bool>())
("h,help", "Print usage")
;

auto result = options.parse(argc, argv);

// ############### help ###############

// print help msg if specified
if(result.count("help")) {
cout << options.help() << endl;
exit(0);
}

// ############### example ###############

if(result.count("example")) {
string example = result["example"].as<string>();
if(std::find(param_vals["example"].begin(), param_vals["example"].end(), example) == param_vals["example"].end()) {
cout << "Error: got invalid example option " << example << ", allowed: " << cw_gen::squash_options(param_vals["example"]) << endl;
exit(1);
}

cwgen->set_dimensions(examples_map[example].length, examples_map[example].height);
cwgen->set_dict(examples_map[example].dict);
cwgen->set_contents(examples_map[example].contents);
} else {
// non-example, use user-specified size/dict/contents

// ############### size ###############

vector<int> dimensions = result["size"].as<vector<int>>();
if(dimensions.size() != 2) {
cout << "Error: got unknown number of size dimensions (" << dimensions.size() << "), should be 2" << endl;
exit(1);
}
if(dimensions[0] <= 0 || dimensions[1] <= 0) {
cout << "Error: crossword dimensions must be positive and non-zero" << endl;
exit(1);
} else if(dimensions[0] == 1 && dimensions[1] == 1) {
cout << "Error: 1x1 crossword contains no valid words, word min len = " << MIN_WORD_LEN << endl;
exit(1);
}
cwgen->set_dimensions((uint)dimensions[0], (uint)dimensions[1]);

// ############### dictionary ###############

string dict = result["dict"].as<string>();
if(std::find(param_vals["dict"].begin(), param_vals["dict"].end(), dict) == param_vals["dict"].end()) {
cout << "Error: got invalid dictionary option " << dict << ", allowed: " << cw_gen::squash_options(param_vals["dict"]) << endl;
exit(1);
}
cwgen->set_dict(dict);

// ############### contents ###############

if(result.count("contents")) {
string contents = result["contents"].as<string>();
if(contents.size() != cwgen->num_tiles()) {
cout << "Error: got illegal contents length (" << contents.size() << ") given dimensions, should be: " << cwgen->num_tiles() << endl;
exit(1);
}
for(char c : contents) {
if(c != BLACK && c != WILDCARD && (c < 'a' || c > 'z')) {
cout << "Error: got illegal char in contents: " << c << ", all chars should be a lowercase letter, a wildcard (\"" << WILDCARD << "\"), or a black tile (\"" << BLACK << "\")" << endl;
exit(1);
}
}
cwgen->set_contents(contents);
}
}

// ############### verbosity ###############

string verbosity = result["verbosity"].as<string>();
if(std::find(param_vals["verbosity"].begin(), param_vals["verbosity"].end(), verbosity) == param_vals["verbosity"].end()) {
cout << "Error: got invalid verbosity option " << verbosity << ", allowed " << cw_gen::squash_options(param_vals["verbosity"]) << endl;
exit(1);
}
VERBOSITY = verbosity_map[verbosity];

// ############### progress bar ###############

if(result.count("progress")) {
cwgen->enable_progress_bar();
}

// ############### cw_csp solving ###############

cwgen->build();
if(cwgen->solve()) {
cout << "found crossword: " << cwgen->result() << endl;
} else {
cout << "Error: no valid crossword generated for the given parameters. "
<< "Try an example, or use different dimensions, grid contents, or dictionary" << endl;
exit(1);
}

return 0;
}
Loading

0 comments on commit 028db42

Please sign in to comment.