Skip to content
Goun Lee edited this page May 15, 2018 · 2 revisions
  • A collation values come from the Unicode collation data. Each string maps to an array of number or an array of arrays of numbers. The lowest level array is 4 levels. primary, secondary, tertiary, quaternary match. primary = base character, secondary = case (like upper case, lower case), tertiary = accents, quaternary -=character variants.
  • i.e) base character a=A. but secondary level a !=A. so "a" might have the sort key "[1]" (the same as "[1,0,0,0]" and "A" might have the sort key "[1,1]" (same as "[1,1,0,0]". so everything with 1 in the first position is a type of a. everything with a 0 in the second position is lower-case, and with 1 is an upper vase. Everything with numbers in the 3rd position is the various accented version of a. 4th position variants of the character like precomposed/decomposed of half-with/full-width. When you sort by primary key, you ignore 2nd-4th position and only sort by the 1st position. Sorting by secondary(case), you sort by both 1st and 2nd position, and ignore 3rd and 4th. in the Unicode collation data, it does not specify the actual numbers. it says things like "here is a block where each character is a primary difference from the pr5ecious one" in English for example, it gets more complicated. a << A <<< à <<< á <<< â <<< ã <<< ä <<< å that means a is less than A with secondary difference and A is less than à with a tertiary difference. etc so that has to be converted into a number that a sort algorithm can use the algorithm is really really complicated. so I just mad the fie by hand. if a=1, then b=2 etc.
  • There was an issue regarding collation data: https://github.com/enyojs/iLib/issues/51
Clone this wiki locally