-
Notifications
You must be signed in to change notification settings - Fork 57
Introduction
In Gloss, byte formats are called frames. A frame is a standard Clojure data structure, with types substituted for actual values. For instance, this is a valid frame:
{:a :int16, :b :float32}
To turn a frame into something that can handle bytes, we call compile-frame
. This will return a codec, which can be used with encode
and decode
.
> (def fr (compile-frame {:a :int16, :b :float32}))
#'fr
> (encode fr {:a 1, :b 2})
[ #<HeapByteBuffer java.nio.HeapByteBuffer[pos=0 lim=6 cap=6]> ]
> (decode fr *1)
{:a 1, :b 2.0}
(defcodec name frame)
can also be used to create a new codec.
Notice that encode
returns a sequence of ByteBuffers. Gloss consumes and emits sequences of ByteBuffers, because it’s designed to deal with streaming data. Turning these sequences into a contiguous ByteBuffer can be accomplished by calling (contiguous buffer-sequence)
, but this is only necessary when interfacing with external libraries.
When a frame is defined using a map, Gloss encodes the vals in an arbitrary but consistent order. To work with existing binary protocols, use Gloss’ ordered-map (or a vector) to achieve the correct ordering.
To encode and decode sequences of frames, use encode-all
and decode-all
.
> (defcodec fr :float32)
#'fr
> (encode-all fr [1 2])
[ #<HeapByteBuffer java.nio.HeapByteBuffer[pos=0 lim=4 cap=4]> #<HeapByteBuffer java.nio.HeapByteBuffer[pos=0 lim=4 cap=4]> ]
> (decode-all fr *1)
[1.0 2.0]
Frames can have constant values which are not encoded. For instance, this frame
[:foo :int32]
only takes up 4 bytes of space. When decoding this frame, :foo
will be placed in front of every integer that is read. This can be useful as an identifier, giving us a way to differentiate between similar frames:
{:type :foo
:num :byte}
{:type :bar
:num :int64}
Gloss supports the following primitive types: :byte
, :int16
, :int32
, :int64
, :float32
, :float64
, :ubyte
, :uint16
, :uint32
, and :uint64
.
Defining the endianness of numerical primitives can be achieved by adding -le
or -be
to the end of the respective primitive:
> ; Frame with a little endian float.
> (def fr (compile-frame {:a :byte, :b :float32-le}))
#'fr
> ; Frame with a big endian unsigned integer.
> (def fr (compile-frame {:a :byte, :b :uint32-be}))
#'fr
Defining a simple stream of text is simple:
(string :utf-8)
The second argument can be any valid name for a standard character set.
The above works great if our entire data structure is a string, but doesn’t allow for mixing different types of data with a string. If we want to limit the length of the string, we have two choices: we can make the string fixed length, or terminate it with a delimiter. The first case is straightforward:
(string :utf-8 :length 10)
This defines a string which only contains ten characters. This means that it will consume a finite number of bytes, and we can place it in a data structure beside other data types:
[:int32 (string :utf-8 :length 10) :int32]
Delimiters are common tools for marking the end of strings. Every string in C, for instance, is by convention null terminated. Text files are often delimited by newline characters. When dealing with strings, we may specify a list of possible delimiters, specified as ByteBuffers or anything which can be transformed into a ByteBuffer.
(string :utf-8 :delimiters ["abc" \z 32 [1 2 3]])
This specifies a string which is terminated by one of the available terminators: the byte sequence corresponding the the string “abc”, the byte corresponding to the character ‘z’, the byte value ‘32’, or the byte sequence [1 2 3]. The largest delimiter will always be consumed, so we can specify two delimiters where one is just a shorter version of the other:
(string :utf-8 :delimiters ["\r" "\r\n"])
Sometimes strings are both prepended with a description of their length, and terminated with a marker simply for human readability. We can specify this using a delimiter, but that means the string can’t contain that terminating marker, and adds the overhead of actually scanning the string for the delimiter. Instead, we can simply specify a :suffix
:
(string :utf-8 :length 10 :suffix ",")
Unlike delimiters, there can only be a single possible suffix, but it can be specified using any of the types that can be used for delimiters. It can be used in conjunction with either :length
and :delimiters
(the suffix is assumed to appear after the delimiters), or by itself.
We can have sequences of the same data-type:
[:int32 :int32 :int32]
but this only works for sequences of fixed length. To support dynamically sized sequences, we need to use repeated
:
(repeated :int32)
This will encode to a sequence of 32-bit integers, with an integer prepended that describes the length. By default it will be a 32-bit integer, but this can customized using :prefix
:
(repeated [:int16 :int16]
:prefix :byte)
We can also create a sequence terminated by a delimiter:
(repeated (string :utf-8 :delimiters ["\n"])
:delimiters ["\\0"])
Notice that both the string and the sequence are delimited. There will be a \n
delimiter for each string, and a single \0
delimiter for the entire sequence.
Some prefixes are simple numbers describing the number of elements that follow, but others can be more complicated. Consider the case where a 16-bit prefix describes the byte-length of the sequence of 32-bit integers that follows, rather than just the number of integers. To support this, we need (prefix frame to-count from-count)
:
(repeated :int32
:prefix (prefix :int16 #(/ % 4) #(* % 4))))
This prefix defines the data signature of the prefix, a decoder method which transforms the prefix value into the count of the sequence that follows, and an encoder method which transforms the sequence count into a prefix value. In this case, it’s as simple as dividing the byte-count by 4 to get the number of 32-bit integers, and multiplying the number of integers by 4 to get the byte-count.
Any type of frame can be used as a prefix. This is a completely valid frame:
(repeated :byte
:prefix (prefix (string :ascii :delimiters [":"])
#(Long/parseLong %)
str))
This will preface a sequence of bytes with a string terminated by :
describing its length.
It can be useful to have a simple mapping between values and a unique numerical identifier.
(enum :byte :a :b :c)
This assigns unique numbers to each value, and allows the enumeration to be used as a data-type in other frames.
> (defcodec animal (enum :int16 :dog :cat :horse))
#'animal
> (defcodec pet
{:name (string :utf-8 :delimiter "\n")
:type animal})
Enumerations can also have specific values associated with them:
(enum :byte {:a \$, :b 37})
This can be useful when trying to create friendly representations of hard-coded values in other protocols.
A header is a frame which specifies the following frame. A header is created using (header frame header->body body->header)
. The first argument is the frame for the header. The second argument is a function which takes the value from the header, and returns a codec for the body. The third argument is a function which takes the value of the body, and returns the header value.
Let’s look at a frame that can describe a rectangle, a triangle, or a circle:
(defcodec type (enum :byte :rectangle :triangle :circle))
(defcodec triangle {:type :triangle, :width :int32, :height :int32})
(defcodec rectangle {:type :rectangle, :width :int32, :height :int32})
(defcodec circle {:type :circle, :radius :int32})
(defcodec shapes
(header
type
{:triangle triangle, :rectangle rectangle, :circle circle}
:type))
Each frame starts with an enum, and ‘header→body’ is just a hash of enum values to codecs. Since the frames have :type
hardcoded, going from the decoded frame to the header value is trivial.
Protocols often describe lengths in terms of bytes, not elements. The prefix example given earlier is one way of working around this, but this isn’t always possible. A more general solution can be found in (finite-frame prefix frame)
. This creates a frame which is prefaced by a descriptor of its byte-length. A key point to realize is that the inner frame doesn’t define its own boundaries, it will simply consume all the bytes that are given to it.
The earlier example of an integer sequence, then, can also be implemented like this:
(finite-frame :int16
(repeated :int32 :prefix :none))
Notice that we have to explicitly state that there’s no prefix for the (repeated ...)
frame. A string with a byte-length prefix is also straightforward:
(finite-frame :int32 (string :utf-8))
Again, notice that the string doesn’t have a finite length or delimiters.
(finite-block prefix)
is just like finite-frame
, except that there’s no inner frame. The decoded value is simply a sequence of ByteBuffers, and the encoded value is just the ByteBuffers that are given to it, prefaced by the byte-length.
(string-integer ...)
and (string-float ...)
are string frames that are decoded into numerical values. They are otherwise identical to (string ...)
, and take the same parameters. This greatly simplifies the earlier example of a string prefix:
(repeated :byte
:prefix (prefix (string :ascii :delimiters [":"])
#(Long/parseLong %)
str))
becomes simply
(repeated :byte
:prefix (string-integer :ascii :delimiters [":"]))