pygramm : Grammar processing in Python

Experiments in processing BNF with Python. Work in progress.

Why?

While Ply provides a semi-yaccalike, it has some characteristics that bother me. First, it prioritizes lexical processing by the length of the pattern, not the length of the token ... in violation of the "maximum munch" rule. Second, it tries to do everything at run time.

Also I want to experiment with generation of sentences as well as parsing, and with LL as well as LR parsing.

Work in progress

Done

Parse BNF (llparse.py) and create an internal form. The BNF form is extended with Kleene *, but a grammar in pure BNF without Kleene is also fine.
Internal structure (grammar.py) represents the BNF structure directly. The Grammar object contains a list of symbols, each of which has a single expansion (which could be a sequence or a choice). The following two grammars will produce precisely the same internal form:
```
S ::= "a";
S ::= "b";
```
and
```
S ::= "a" | "b";
```
A phrase generator (generator.py), together with some grammar analysis in grammar.py, can produce sentences within a given length limit (the budget) with or without direction. See choicebot.py for an example of how grammar choices can be controlled.

To Do

Distinguish lexical from CFG productions even for sentence generation because we will want different tactics for tokens than for RHS. In CFG we budget for length of sentence. In lexical productions we should choose between new and previously used tokens. Currently the BNF goes all the way to string constants, always. The works for the kinds of grammars that Glade learns, but it is not really ideal for generating useful program inputs.
Related to the prior point: Infer a good boundary between CFG and lexical structure. In conventional grammar processing, a developer makes this distinction. For grammar learners like Glade, though, the distinction is not trivial to recognize.
Add classic grammar analyses, starting with analyses for LL(1) grammars (first, follow), then checking for conflicts, and likewise for LALR(1) and/or LR(1).
Add simple transformations, such as left-factoring for LL(1).

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
data		data
doc		doc
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
biased_choice.py		biased_choice.py
binary_choice.py		binary_choice.py
char_classes.py		char_classes.py
choicebot.py		choicebot.py
config.py		config.py
demo_gen.py		demo_gen.py
generator.py		generator.py
glade_fix.py		glade_fix.py
grammar.py		grammar.py
grammar.sty		grammar.sty
grammar_bias.py		grammar_bias.py
grammar_prune.py		grammar_prune.py
latex-demo.tex		latex-demo.tex
latexify.py		latexify.py
lex.py		lex.py
llparse.py		llparse.py
unit_productions.py		unit_productions.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pygramm : Grammar processing in Python

Why?

Work in progress

Done

To Do

About

Releases

Packages

Languages

uo-se-research/pygramm

Folders and files

Latest commit

History

Repository files navigation

pygramm : Grammar processing in Python

Why?

Work in progress

Done

To Do

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages