-
Notifications
You must be signed in to change notification settings - Fork 52
Home
On television, nothing is created, everything is copied.
– Abelardo Barbosa
It's quite an old idea.
TSID Creator is just an implementation of Snowflake ID.
At the time, I couldn't come up with a more fitting name, and I'm not even certain if "sortable" is a recognized word. So, that's on me. :)
You might refer to it as a Time-Sorted Identifier, Twitter Snowflake Identifier, Time Series Identifier, Time Stamp Identifier, or any other term that suits your preference, as long as it conveys the intended meaning.
Ultimately, it's just a name, and you're free to choose one that resonates with people and effectively communicates the concept.
It is now called Time-Sorted Unique Identifier. Sounds better. :)
The TSID Creator was introduced in that particular year.
In your application, you have the flexibility to choose any starting date you prefer, but it's essential to maintain consistency. For instance, if you opt for the date 1990-01-01, ensure its uniform use throughout the entire application.
These time durations are the longest periods that can be represented using either 41 or 42 bits.
If your programming language or database only works with signed integer data types, the first bit of the identifier is used as a sign bit. This means the limit will be reached in 69 years.
In languages and databases that allow for unsigned integers, the limit is extended to 139 years.
If the identifier is stored in a string or byte array format, which is not a common practice for a 64-bit value, the limit also goes up to 139 years.
Honestly, I don't know what will happen in 69 or 139 years.
It's a way to recognize the ID generator and prevent clashes among IDs created by different generators.
When multiple processes are generating IDs, there's a higher chance of collisions, particularly with 64-bit IDs. But if you give each generator in your application a unique node identifier, like a virtual machine ID, container ID, running process ID, app instance ID, etc., you can effectively eliminate the risk of collisions. How you interpret this node identifier in your application is up to you.
If your application only has one process generating IDs, you don't have to be concerned about collisions.
It is a group of bits that go up with each new identifier made, and when the timestamp changes, these bits are randomly reset.
The reason for these bits is to keep the identifiers in order, making sure each new identifier is always bigger than the one before. This process stops identifiers created by the same generator from accidentally being the same because they always move forward and don't go back to previous values.
There's a chance of identifiers clashing if the system clock goes backward, but it's tough to avoid this issue.
I understand it might be a little confusing. When I first made the TSID Creator, the last 22 bits were entirely random. To simplify things, I decided to split these bits into two parts. I still call them "random" because both parts are set up randomly.
These two parts work together to ensure each identifier in the application is unique. The node identifier prevents conflicts between different generators, so identifiers from different generators won't be the same.
At the same time, the counter prevents conflicts within a single generator. This way, an identifier generated by one generator won't be the same as others generated by the same generator.
Because it's ULID encoding and because it's very efficient.
Nothing prevents you from encoding the TSID in an encoding of your choice, like base-62 for example.
In the current implementation of TSID Creator, this is the only encoding.
This might also appear confusing.
However, the TSID Creator actually has only one kind of identifier.
What varies are the bits set aside for the node identifier and the counter.
I could have gone with the 1024 node version, similar to what Twitter Snowflake used. I chose to divide it into 3 types because I thought it would be more useful.
Now, I understand that the 256 node version is the one most commonly used.
TsidCreator
class is the easiest way to generate TSIDs.
TsidFactory
class is the factory that actually creates the TSIDs. This class can be configured to create TSIDs however you see fit. For example, you can change the amount of bits reserved for the node identifier, you can change the start date of the timestamp, you can change the random number generator, etc.
Tsid
class is a value object.
In some applications, it may be more convenient to use a value object than a basic data type like long
or String
.
Because it was the default in Snowflake Twitter IDs.
In fact, the Twitter Snowflake timestamp is 41 bits long. I added 1 bit to turn the TSID into an unsigned integer, doubling the lifetime of TSIDs. It also made integer format sorting consistent with string format sorting.
Some implementations of the concept have different bit counts for timestamps. For example, the Mastodon timestamp is 48 bits long.
In the current implementation of TSID Creator, this number of bits cannot be changed.
Because it was the default in Snowflake Twitter IDs.
If you want, you can use any number of bits between 0 and 20. But if you do that, you're also changing the number of bits in the counter.
In Twitter Snowflake, this node idea actually consisted of two parts: Datacenter ID and Worker ID. These two things added together give 10 bits.
Because it was the default in Snowflake Twitter IDs.
If you want, you can use any number of bits between 2 and 22. But for that you have to change the number of bits of the node identifier.
Nobody asked me any of this. I use this text format to try to better explain the decisions I had to make during the implementation of the TSID Creator. The questions I've included here are ones I think I would ask myself if I saw this project for the first time. Hope this is helpful.