Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add intdot as version scheme #166

Open
tschmidtb51 opened this issue Apr 27, 2022 · 11 comments
Open

add intdot as version scheme #166

tschmidtb51 opened this issue Apr 27, 2022 · 11 comments

Comments

@tschmidtb51
Copy link
Contributor

I suggest to add a version scheme intdot that is defined as follows:
To compare 2 versions a and b, do:

A = a.split('.'); B = b.split('.')   // split each version on .
for i in 0..min(len(a), len(b):
  if (int(a[i]) > int(b[i])):
     return 'A greater than B'
   elseif (int(a[i]) < int(b[i])):
     return 'B greater than A'

if len(a) > len(b):
  return 'A greater than B'
if len(b) > len(a):
  return 'B greater than A'
return 'A matches B'

This will cover 80% of the currently used version schemes that do not follow one of the well-known version-schemes.

Thoughts?

Flagging @pombredanne for comments.

@tschmidtb51
Copy link
Contributor Author

@pombredanne What would be the process to get this included?

@immqu
Copy link

immqu commented Nov 13, 2024

This scheme would essentially be like semver but would allow more labels, e.g., 1.2.3.4.5, correct?

@tschmidtb51
Copy link
Contributor Author

More or less: It just splits at the . and compares all other as integers. So your example is right as well as 2020.0001 and 2.002.10.1.1.1.1.1.
However, if there would be any prerelease or build part (according to SemVer - this would be ignored...

@immqu
Copy link

immqu commented Nov 15, 2024

So, we allow any kind of prerelease/build part, like 1.39.2828-alpha and 1.39.2828.pre, but simply ignore it in the comparisons?

@tschmidtb51
Copy link
Contributor Author

I'm currently not sure, what the best way would be. The algorithm in the description does a straight cast to int... Would that be appropriate? Or would 1.1-100 be then interpreted as [1, -99]? That would not be intended. Also, 10e2 should not be parse into 10*10^2...

So, please provide a suggestion as you go.

@immqu
Copy link

immqu commented Nov 15, 2024

The easiest is to write a regex that separates the two groups, see my current draft.

@matt-phylum
Copy link
Contributor

matt-phylum commented Nov 15, 2024

Things to be careful of:

  • Is "0b10" 0 or 2 or 10 or invalid?
  • Is "010" 8 or 10?
  • Is "0x10" 0 or 10 or 16 or invalid?
  • Is "a10" 10?
  • Is "10a" 10? ECMAScript parseInt thinks so.
  • Is "1a0" 1 or 10? parseInt says 1.
  • Is "١٠" (Arabic-Indic digits) 10? It is in Java but not in ECMAScript (formerly Javascript). Implementing this depends on having decent Unicode support. (univers draft bug: "1" < "٠")
  • Is "𐒡𐒠" (Osmanya digits) 10? These are newer characters outside the Unicode BMP so it won't be 10 if your Unicode implementation is old or broken (like Maven's). It's easy to say that broken Unicode implementations are wrong, but Unicode versioning is a much more complicated issue to deal with because most implementations will just use whatever version is provided by the most convenient string implementation and they'll behave almost the same. If vers is supposed to describe an existing version scheme (why else would vers support this?), exact alignment here may be difficult. These characters are valid in Maven versions, but they are treated as letters instead of numbers.
  • Is "2147483648" (2^31) a valid version number? Some languages don't support unsigned integers.
  • Is "4294967296" (2^32) a valid version number? Some languages don't handle 64-bit integers or may perform a lossy conversion to floating point.
  • Is "9007199254740992" (2^53) less than "9007199254740993"? In ECMAScript these can be equal.
  • Is "18446744073709551616" (2^64) a valid version number? Some languages don't support 128-bit integers.
  • Is "01" greater than, less than, or equal to "1"? It's less in lexicographical sorting and greater for some broken digit-by-digit comparisons (univers draft bug), but probably should be equal.
  • Is "1𐒠" greater than, less than, or equal to "10"? This is similar, but the zero is the Osmanya digit 0, and it's very common for Unicode string implementations to report the length of "1𐒠" as 3.

The safest option is to specify that the digits must be ASCII digits and include either a maximum supported size or some test cases with excessively large numbers to catch implementations that have unexpected limits.

@tschmidtb51
Copy link
Contributor Author

@matt-phylum Thank you for your comments and insights.

I would suggest:

  • ASCII digits-only
  • test cases with excessively large numbers
  • no support for non-digits => break-off at the first one
  • leading 0 are ignored

@immqu
Copy link

immqu commented Nov 21, 2024

See the univers PR here: aboutcode-org/univers#148

@fvsamson
Copy link

fvsamson commented Nov 27, 2024

Or would 1.1-100 be then interpreted as [1, -99]?

No, absolutely not. First of all, I believe it is crucial to use proper, common terms which were coined by RPM and DPKG. This also became obvious when reading this statement:

So, we allow any kind of prerelease/build part, like 1.39.2828-alpha and 1.39.2828.pre, but simply ignore it in the comparisons?

The definition for a complete versioning string is <version>-<release>.
Hence 1.39.2828.pre is not a full version string, but only the <version> part of one.

  • The <release> part

    Calling the <release> part "build part" correctly identifies its intended use, but unfortunately many do not understand that it should solely be used to denote a new packaging round (including changes in the RPM spec file and / or RPM change-log, but also simply another packaging run, e.g. by a time-based CI runs, for example for generating "nightlies"), but not any change outside of the packaging configuration.

    The <release> part can basically contain any printable character except for whitespaces (IIRC) and the dash (-), because the <release> part is separated from the <version> part by the last dash in the full versioning string.

    The <release> part must be evaluated to determine the order of versioning strings with an identical <version> part.

    The <release> part is always evaluated as a single field, i.e. the full <release> parts are used for comparisons.

  • The <version> part

    The <version> part comprises fields separated by dots (.).

    The fields of the <version> part must be compared one by one, until a field differs or the final dash of the whole versioning string is parsed (indicating identical <version> parts in a comparison).

    IIRC, the first field of the <version> part must start with a number.

    IIRC, the <version> part can basically contain any printable character except for whitespaces.

    IIRC, no other limitations exist for <version> part.

I will try to obtain the comprehensive description of the evaluation algorithm RPM uses, which was once (at least from 2003 to 2020) publicly available at https://www.redhat.com/archives/rpm-list/2003-January/msg00182.html, then moved behind a login wall to https://listman.redhat.com/mailman/private/rpm-list/2003-January/msg00182.html, and now it seems to be completely inaccessible.

I thoroughly studied the algorithm for versioning string evaluations and comparisons documented in this message in 2018 (but unfortunately did not save it), and AFAIR 1a.bc-+*~.xyz9-0O0-foobar2000!$§%&4rhubarb is a valid versioning string (at least for RPM) comprising the <version> part 1a.bc-+*~.xyz9-0O0 and the <release> part foobar2000!$§%&4rhubarb. All characters in both parts must be evaluated for comparisons! But how exactly that is supposed to happen evades me six years after reading that message. I will try to get hold of it again, but after an exhaustive web-search turned out to be futile (maybe someone else might try, too; I have not yet consulted archive.org), that may take a while.

Consequently IIRC a consideration like this is more or less a no-go (rsp. will fail for some valid cases):

  • no support for non-digits => break-off at the first one

@immqu
Copy link

immqu commented Nov 28, 2024

Thank you for the explanations and fixing the link!
My understanding is that we want to allow for a versioning scheme that is defined by the regex ^([0-9]+(\.[0-9]+)*)(.*)$, i.e., a scheme that does not follow a standard scheme like rpm (which is already implemented in univers). Instead, we want to allow for versioning strings like 1.2.3pre in this scheme, because we assume that that is a common scheme found in the wild which we want to cover as well. What do you think @tschmidtb51 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants