What are Identifiers?

VERSION 0.4 Draft (Jan 1, 2019)

What are Identifiers?

Identifiers are data that identify a unique entity apart from other entities. The concept of Identifiers has many uses in the world. In software, identifiers found in every facet of development. Some types of identifiers are standardized like UUID and URI. Most identifiers, however, are not externally defined in a specification and are dependent on many factors specific to their application.

In practice, identifiers are serialized strings that must be interpreted, parsed, encoded and decoded along software system pathways. They transit multiple systems, in many kinds of mediums like JSON, emails and log files. Software that must interpret this data along the way has to know how to consume the identifier and interpret it's value.

To illustrate this problem, consider a string identifier encoded with Base64. The generator of the identifier needs to convert their identifier value into a byte array or string. It transforms this array into Base64 and sends or stores this result. Later, another application encounters this Base64 string, and then must make several determinations:

Is this string encoded?
If so, how should it be decoded?
Once decoded into a byte array, should it be transformed into another data structure?
Once it is transformed, what are the semantics of the value?

The developer must find a source of truth to answer their questions about this multi-step process. Often docs are out-of-date, the developers are unavailable, or they provide incorrect guidance. This process is hard, error-prone and the source of many bugs, failures, and other negative outcomes.

The Identifiers project hopes to tackle this problem by defining sharable identifier types that can be applied across software domains. It intends to make it simple to convert a data identifier into a string, transmit it or store it, and then allow a different application convert the encoded identifier into a semantic data value for processing.

Identifier Types

Identifier types can be primitive values, semantic values or structured identifiers.

Primitive Identifiers

string (UTF-8)
boolean
integer (32-bit signed)
float 64-bit signed decimals (IEEE 754)
long (64-bit signed)
bytes (array of bytes)

Structural Variants

All identifier types have collection variants that hold multiple values of the type. The two collection types are list and map. Collections can only hold same-typed values at this time.

List

A list identifier is a list of values. They are not a list of identifiers, but are a single identifier composed of multiple values of the same type.

Maps

A map identifier is a map of values. Maps are useful to create a single identifier composed of multiple labeled values of the same type. These values are labeled by the map keys. The keys are stored in alphabetically-sorted order for consistency.

Composite

Composite identifiers combine other identifiers of mixed types into a single identifier. One can combine primitive identifiers, semantic identifiers and structured variants together into one composite identifier. They can be either a list or a map of other identifiers.

Semantic Identifiers

Semantic identifiers are based on either single or structured primitive identifiers. They can be considered to "extend" a base identifier type.

type	base type	structure	notes
`uuid`	`bytes`	16 bytes	Supports all uuid versions. https://en.wikipedia.org/wiki/Universallyuniqueidentifier
`datetime`	`long`	single value	Time in Unix/Posix Epoch, in milliseconds. https://en.wikipedia.org/wiki/Epoch_(computing)
`geo`	`float-list`	[latitude, longitude]	decimal latitude & longitude. https://en.wikipedia.org/wiki/Geotagging

Future Possibilities

If you have suggestions please file an Issue to start a discussion.

Cross-Version Consumption

Semantic identifiers are guaranteed safe passage through older systems that do not understand the semantics of the identifier. They can consume a semantic identifier, parse it's data, and pass it through to another system without losing the semantic type information.

As an example, if a system encounters an unknown IPv6 semantic identifier, but has no explicit support for IPv6 identifiers, this system will interpret the value as it's base identifier type which is a fixed list of 2 longs. If this system then passes this identifier on to another system that does understand IPv6 identifiers, that system will interpret it as a IPv6 identifier. The IPv6 type information is not lost along the way.

String Encoding

Identifiers have two forms of string encoding—Data and Human. These forms have different uses.

Data Form

The data form is intended for identifiers that go into transmitted data like JSON and XML, as well as data storage like a SQL database. They are not intended for use in URIs and are not human-enterable, though they are composed of visible characters.

Identifiers serialized for data purposes are encoded with a Base128 symbol set for minimum size bloat and safe transferability.

Human Form

Identifiers are often consumed and entered by humans and thus have different constraints. Examples of this form are account identifiers, URLs and serial numbers. These identifiers are often encountered in messages like text and email. The specification can be found in the Base32 specification.

Implementations

The following projects implement the Identifiers specification:

Implementation Requirements

This section applies to library authors who build implementations of the Identifiers spec for platforms of their choosing.

Primitive Identifiers

The primitive identifiers should map to any existing platform types. Most platforms have string, boolean, and the other primitive types natively implemented. If one is not available, the implementer is encouraged to build the type support into the library rather than requiring the consumer to explicitly utilize a third-party library. For instance, JS does not support a full 64-bit long value, so the JS implementation utilizes the a popular Long library to support the long number space.

Type Codes

All primitive identifier types are associated with a 1-byte type code. Semantic identifiers have a second type code to identify themselves. The type codes are calculated with bitwise operators to accumulate the various flags that compose their full value.

Byte 1 Positions

`0`	`1`	`2`	`3`	`4`	`5`	`6`	`7`
primitive	primitive	primitive	list	map	list-of	map-of	semantic

Structured Variants

All identifier types also have structured variants that hold their values in collections. Their type codes combine the structural flags and the type code of the value. To create the full type code, | the appropriate structured type code to the base primitive type.

type	code	MsgPack family
`list`	`0x8`	array
`map`	`0x10`	map

MsgPack

String-encoded identifiers are compressed using MsgPack. More details are in the following section, but the related MsgPack information is included in the type tables for easy reference.

Primitive Types

Here are the type codes for primitive types, as well as their list and map structured types.

type	code	MsgPack family	list	map
`string`	`0x0`	string	`0x8`	`0x10`
`boolean`	`0x1`	bool	`0x9`	`0x11`
`integer`	`0x2`	int	`0xa`	`0x12`
`float`	`0x3`	float	`0xb`	`0x13`
`long`	`0x4`	int	`0xc`	`0x14`
`bytes`	`0x5`	bin	`0xd`	`0x15`

Composite Types

Composite identifiers can be either lists or maps of other identifiers. Composite identifiers are not typed with primitive type flags. They contain fully-formed identifiers of any type. They can be used to define Semantic identifiers.

When encoded to MsgPack, the outer type is either composite-list or composite-map. The contents of composite identifiers are fully-encoded identifiers.

type	code	MsgPack family
`composite-list`	`0x38`	array
`composite-map`	`0x58`	map

Semantic Types

Semantic identifiers have 2-byte type values. The first byte is the primitive and structural information, and the second byte is the "slot" number. The integer type value is computed by starting with the base type (including structural type), adding a semantic value flag (0x80), and then adding the slot position shifted left by 0x8. The left shift pushes the slot position into the second byte. For example, The geo type code is calculated like this:

float	list	semantic	slot
`0x3`	`0x8`	`0x80`	`2 << 0x8`

0x3 | 0x8 | 0x80 | (2 << 0x8) = 0x28b

The following table lists the defined semantic types:

type	slot	code	MsgPack format	list	map
`uuid`	`0`	`0x85`	bin 16 size 16	`0x8d`	`0x95`
`datetime`	`1`	`0x184`	int	`0x18c`	`0x194`
`geo`	`2`	`0x28b`	fixarray size 2 floats	`0x2ab`	`0x2cb`

List/Map of Structured Semantic Identifiers

It is possible for a semantic identifier's base type to be a list or map of primitives. The example of this is the geo identifier. In order to create a list or map of these identifiers, the structured types must be marked as either a list-of or map-of the semantic identifier. These type code addenda are defined in the following table:

type	code	MsgPack family
`list-of`	`0x20`	array[semantic]
`map-of`	`0x40`	map[semantic]

For example, to create the type code for a list of geos, Set the list-of flag bit (0x20):

0x28b | 0x20 = 0x2ab

Encoding Format

In order to encode an identifier, one must first encode it to bytes using the MsgPack encoding format. These bytes are then encoded using either Base128 for data uses or Base32 for human uses. Implementations will auto-detect the encoding format and decode into an identifier value correctly.

MsgPack

Internally encoded Identifiers are compressed MsgPack data structures. In order to inter-operate with MsgPack correctly, One must pass the MsgPack encoder the following array:

[type-code, identifier-encode-value]

Each identifier type has a specific encode value shape that must be met. Implementations will often have platform-specific formats of the identifier values, like native class representations, but these must be transformed into formats that are usable by MsgPack.

Most MsgPack implementations have cross-platform quirks that will require fine-tuning or even fixing. For instance, the Java version of MsgPack treats all doubles as FLOAT64 while other platforms encode float values as either FLOAT32 or FLOAT64. The java version of identifiers has to manually emit FLOAT32 for single-precision doubles. The Test Compatibility Kit will aid the implementer in discovering and mitigating their platform's MsgPack anomalies.

Cross-Implementation Compatibility

It is expected that encoded identifiers created in one system will be consumed in another system of a different architecture. For instance, a Java server will encode an Identifier that will be consumed by a JavaScript client. To support this goal, all implementations must pass the Test Compatibility Kit.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
tables		tables
tck		tck
.gitignore		.gitignore
Base128.md		Base128.md
Base32.md		Base32.md
LICENSE		LICENSE
README.md		README.md
types.xlsx		types.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VERSION 0.4 Draft (Jan 1, 2019)

What are Identifiers?

Identifier Types

Primitive Identifiers

Structural Variants

List

Maps

Composite

Semantic Identifiers

Future Possibilities

Cross-Version Consumption

String Encoding

Data Form

Human Form

Implementations

Implementation Requirements

Primitive Identifiers

Type Codes

Byte 1 Positions

Structured Variants

MsgPack

Primitive Types

Composite Types

Semantic Types

List/Map of Structured Semantic Identifiers

Encoding Format

MsgPack

Cross-Implementation Compatibility

About

Releases

Packages

Languages

License

Identifiers/spec

Folders and files

Latest commit

History

Repository files navigation

VERSION 0.4 Draft (Jan 1, 2019)

What are Identifiers?

Identifier Types

Primitive Identifiers

Structural Variants

List

Maps

Composite

Semantic Identifiers

Future Possibilities

Cross-Version Consumption

String Encoding

Data Form

Human Form

Implementations

Implementation Requirements

Primitive Identifiers

Type Codes

Byte 1 Positions

Structured Variants

MsgPack

Primitive Types

Composite Types

Semantic Types

List/Map of Structured Semantic Identifiers

Encoding Format

MsgPack

Cross-Implementation Compatibility

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages