Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: 64-bit Integer Type #227

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 102 additions & 0 deletions docs/rfc/007-integer64-type.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# RFC: Conjure 64-bit Integer Type

2019-02-14

_The 54-bit Integer is Not Enough_

Conjure currently supports three numeric types: `integer`, `double`, and `safelong`. Unfortunately for many use-cases, 2<sup>53</sup> - 1 is not large enough.

These cases are relatively common and often result in one of three failure scenarios:
1. Use safelong because it exists. We often measure nanoseconds, however a safelong can only represent three months of nanoseconds, which is long enough that it's unlikely to be caught in testing.
1. Represent 64-bit integer values as strings in conjure. This is only as correct as the application code which sets and parses values, which is precisely the problem Conjure is built to solve. In this case the conjure definition fails to accurately describe the wire API.
1. Use external type imports to polyfill the 64-bit integer type. This is incredibly dangerous, often resulting in code that behaves inconsistently between languages.

## Proposal

New conjure `integer64` type semantics matching `integer` except for the allowed values, and JSON serialized form.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshed: should we use long as the name? We match Java's naming for integer, boolean, double currently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth between the two, I don't have a strong preference. I think integer<bits> is a bit cleaner if we ever need to add larger integer types, and allows us to alias integer to integer32 and safelong to integer54.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be on board with moving to integer32, float64, etc.


For the immediate future, typescript clients may represent these values using branded strings.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems to me like this change should block on the typescript serialization layer, or we end up further down the path of this not really being a language-independent RPC layer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementing the typescript generator feature will make it easier to support this proposal with rich numerical types, though the serde layer implementation detail should block improving the Conjure specification unless a feature is provably unsupportable in a target language. By adding this type, we provide an incentive to improve the TS generator. Inverting that dependency would push us to optimize around the wrong details.

Once a serialization layer is implemented for the typescript client, they may take advantage of the `integer64`
type using the proposed [BigInt](https://github.com/tc39/proposal-bigint) type which is already supported by
chrome, with a [polyfill](https://github.com/GoogleChromeLabs/jsbi) for other browsers.

### Allowed values

The `integer64` type supports signed 64-bit values ranging from -2<sup>63</sup> to 2<sup>63</sup> - 1.

### JSON Format

The `integer64` type is encoded to JSON as a JSON string containing the base-10 value, otherwise javascript
clients may transparently truncate numeric values. The [BigInt](https://github.com/tc39/proposal-bigint)
specification does not modify how json is parsed, so values must use existing JSON types.

For example, take a type defined as:

```yml
types:
definitions:
objects:
Example:
fields:
integerField: integer
safelongField: safelong
integer64Field: integer64
```

Where each field takes the value `123`, would be serialized to json as:

```json
{
"integerField": 123,
"safelongField": 123,
"integer64Field": "123"
}
```

#### The case against using JSON numeric format

We use JSON numbers elsewhere in Conjure and they, by definition, better capture the integer data type. However,
in javascript `JSON.parse` is the most common json parsing function, and it result in _incorrect data_ when
used with numeric values beyond 52-bits. Even if we do provide a serialization layer capable of handling large
integer values, applications are still likely to attempt usage of `JSON.parse` resulting in data loss.

Secondly, many existing applications already polyfill the 64-bit integer type using string encoding with the
following block:

```yml
types:
imports:
Long:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this type definition will only work if deserializers coerce strings to longs, but I'm pretty sure that feature was disabled recently in Java, for instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct, the reference java implementation recently broke this, however other internal java conjure implementations still support it. My goal for rfc is to standardize behavior, and avoid maintaining multiple implementations.

My preference is to provide a supported mechanism for this feature rather than revert the coercion change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as Rob and I noted, it then likely needs to be handled in all languages.

I think if you weren't dealing with math, that an additional identifier type would give equal treatment in all languages and simply make more efficient use of memory in some languages than others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are in agreement that the ecmascript bigint type is the correct binding for TS (as noted in the Proposal section). The full implementation requires a bit of work on our typescript generator, but I cannot invest my team in doing so until the specification change has been ratified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I don't think we could agree to this specification without the pre-work from happening in TypeScript to support the then small-ish change this RFC would make.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: pending the typescript client serialization layer, you support this feature. Is that correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if and only there is a serialization layer.

base-type: string
external:
java: java.lang.Long
```

String encoded values allow these applications to upgrade and take advantage of the new type without breaking
existing clients. Without this implementation, it would not be possible to remove support for external type
imports without an impossibly severe API break.

### PLAIN Format

The PLAIN format of an `integer64` value is the base-10 numerical value.

## Alternatives Considered

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to elaborate on the suitability of a JSON number representation, beyond what's written above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I will update this when I’m at a computer (responses will be a bit delayed for the next couple weeks as I’m ooto)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, let me know if there are points you'd like me to expand upon.

### Arbitrary-Precision Integers

While arbitrary-precision integers are more flexible, the goal of the conjure language is to describe data as
accurately as possible, which does not align with arbitrary precision. The need for such a type is dramatically
less common than 64-bit values, and the existing `binary` type provides sufficient utility to represent large
values.

### Use External Type Imports

Assuming java servers, and external type import for `java.lang.Long` may unblock in some cases, except it is
difficult to guarantee correctness and compatibility with typescript clients as conjure implementations evolve over time.
This is not an option for non-java servers, none of which support external type imports.
We would like to remove external type imports from conjure entirely, without a mechanism to support 64-bit integers
many consumers will be unable to accept new versions of conjure.

### Serialization using JSON Numeric encoding

See [The case against using JSON numeric format](#the-case-against-using-json-numeric-format).