Skip to content

Latest commit

 

History

History
111 lines (91 loc) · 2.94 KB

README.md

File metadata and controls

111 lines (91 loc) · 2.94 KB

DictParser

DictParser is a tiny library for decoding rudimentary dictionary-like objects from a stream of bytes. The library contains two implementations, one in Python and one in C++. This document describes the format of an encoded dictionary object.

Acknowledgements

A big thanks to my former employer Nordic River, for letting me share this code.

The Encoding

We define dictionary (or dict for short) to mean a set of properties where each property has a name and a value. The set of properties may or may not be ordered and property names may or may not repeat; it's up to the user to define.

An encoded dictionary is an ordered sequence of encoded properties enclosed in curly braces: {, }. The empty dictionary is the string '{}'. The significance of the ordering of properties is user defined.

An encoded property is either simple or binary.

A simple property has a name and a value (each a sequence of bytes) separated by a colon and terminated by a semicolon, e.g., 'name:value;'. A simple property name must not be empty and must not contain parentheses ((, )) or colon (:). A simple property value must not contain a semicolon (;).

A binary property has the same structure as a simple property, but its name ends with the length (in 8 bit bytes) of the value, in parentheses ((, )), e.g., 'hello(7): world!;' and 'hello(6):world!;'. A binary property value may contain any character.

Property names may be repeated, so that '{a:x;a:y;a:z;}' is a valid encoded dictionary with three distinct properties. The interpretation of properties with identical names is user defined.

Note that white space characters are interpreted as any other characters; any line feed, space, tab, etc, will be interpreted verbatim, as part of a property name or value.

The API

This library provides an interface to decode an encoded dictionary one property at a time. The interface is pretty straight forward. You instantiate a DictParser with a stream and call getNextProperty() until all available (or desired?) properties have been read. The parser instance will throw an exception on invalid input or stream errors.

Here's an example of how to parse the contents of a file in Python:

import DictParser
import io

f = io.open("dict.txt", mode="r+b")
parser = DictParser.DictParser(f)
while True:
  prop = parser.getNextProperty()
  if not prop: break
  print "%s: %s" % (prop.name(), prop.value())

Here's the same example in C++:

#include <fstream>
#include "DictParser.h"

int main(int argc, char *argv[]) {
  std::fstream stream("dict.txt", std::ios::in|std::ios::binary);
  DictParser parser(stream);
  DictParser::Property prop;
  while (parser.getNextProperty(prop)) {
    std::cout << prop.name() << ": " << prop.value() << std::endl;
  }
  std::cout << std::flush;
  return 0;
}