Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLDR-18271 More v47 release page work #4346

Merged
merged 3 commits into from
Feb 6, 2025
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 37 additions & 37 deletions docs/site/downloads/cldr-47.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,17 +25,21 @@ The most significant changes in this release are:
- Updated time zone data to tzdata 2025a
- [RBNF](#number-spellout-data-changes) (Number Spellout Data Improvements) for multiple languages
- Assorted transforms improvements
- Updated language matching for Afrikaans to English (en) from Dutch (nl) [CLDR-18198](https://unicode-org.atlassian.net/browse/CLDR-18198)
- Ordered scripts in decending order of usage per locale [CLDR-18155](https://unicode-org.atlassian.net/browse/CLDR-18155)
- Fixed invalid codes [CLDR-18129](https://unicode-org.atlassian.net/browse/CLDR-18129)
- Updated population data
- Updated and revised population data
- Incorporates all changes from CLDR v46.1.
- [CLDR v46.1](https://cldr.unicode.org/downloads/cldr-46#461-changes) was a special release, which many users of CLDR (including ICU) have not updated to.
So the listed changes are relative to [CLDR v46.0](https://cldr.unicode.org/downloads/cldr-46).
- More explicit well-formedness and validity constraints for unit of measurement identifiers
- Addition of derived emoji annotations that were missing: emoji with skin tones facing right
- Fixes to make the ja, ko, yue, zh datetimeSkeletons useful for generating the standard patterns
- Improved date/time test data

For more details, see below.

### Locale Coverage Status

CLDR 47 was a closed cycle which means that locale data changes were limited to addition of new locales, and bug fixes.
This means that coverage levels for existing locales did not change in this release.
CLDR 47 focused on MessageFormat 2.0 and tooling for an expansion of DDL support.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you wanted to move this line up to the overview section?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, changed it just now (and a couple of minor other items)

It was a closed cycle: locale data changes were limited to bug fixes and the addition of new locales, mostly regional variants.

#### Current Levels

Expand All @@ -51,6 +55,8 @@ For a full listing, see [Coverage Levels](https://unicode.org/cldr/charts/dev/su

## [Specification Changes](https://www.unicode.org/reports/tr35/proposed.html)

**NOTE: the specification changes will be completed by the specification beta: only a few of them are listed here, and the Modifications section is not yet complete.**

The following are the most significant changes to the specification (LDML).

- Don't produce "Unknown City Time" for VVV and VVVV, use localized offset format instead [CLDR-18237](https://unicode-org.atlassian.net/browse/CLDR-18237))
Expand All @@ -59,10 +65,14 @@ There are many more changes that are important to implementations, such as chang
See the [Modifications section](https://www.unicode.org/reports/tr35/proposed.html#Modifications) of the specification for details.

## Data Changes
**TBD: Flesh out overview items**
- Updated language matching for Afrikaans to English (en) from Dutch (nl) [CLDR-18198](https://unicode-org.atlassian.net/browse/CLDR-18198)
- Ordered scripts in `<languageData>` in descending order of usage per locale [CLDR-18155](https://unicode-org.atlassian.net/browse/CLDR-18155)
- Fixed certain invalid codes [CLDR-18129](https://unicode-org.atlassian.net/browse/CLDR-18129)

### DTD Changes

- TBD
Most of the DTD changes were in 46.1. One additional change was to order currency values in **TBD get ticket number**

For a full listing, see [Delta DTDs](https://unicode.org/cldr/charts/dev/supplemental/dtd_deltas.html).

Expand All @@ -80,15 +90,8 @@ For a full listing, see [¤¤BCP47 Delta](https://unicode.org/cldr/charts/dev/de
- Cleanups for current pattern variants `alt="alphaNextToNumber"` and `alt="noCurrency"`: These were introduced in CLDR 42
(per [CLDR-14336](https://unicode-org.atlassian.net/browse/CLDR-14336)) to provide a cleaner way of adjusting currency
patterns when an alphabetic currency symbol is used, or when a currency-style pattern is desired without a currency symbol
(as for use in a table). Some further adjustments were needed ([CLDR-17879](https://unicode-org.atlassian.net/browse/CLDR-17879)):
- Adjust coverage so that these variants are at moderate (not comprehensive) coverage for standard/accounting currency formats with
`numberSystem="latn"`, and so that `alt="alphaNextToNumber"` is at modern (not comprehensive) for oither relevant number systems in
in a locale. Coverage was already correct for other combinations of these attributes with various numberSystems.
- Adjust PathHeader so compact currency for relevant non-Latn number systems in a locale will appear in Survey Tool.
- In root, add an `alt="alphaNextToNumber"` variant for the standard/accounting currency patterns.
- Ensure that in the most commonly-used locales. for all relevant number systems in the locale, the standard/accounting currency
patterns have both `alt="alphaNextToNumber"` and `alt="noCurrency"` variants (inherting as necessary), and the compact currency
formats have the `alt="alphaNextToNumber"` variants.
(as for use in a table). Gaps in the data coverage showed up, because the translators weren't shown the right values.
Fixes were made in [CLDR-17879](https://unicode-org.atlassian.net/browse/CLDR-17879).
- As noted below in [Migration](#migration), number `<symbols>` elements and format elements (`<currencyFormats>`, `<decimalFormats>`, `<percentFormats>`, `<scientificFormats>`)
should all have a `numberSystem` attribute, and such elements without a `numberSystem` attribute will be deprecated in CLDR 48. To
prepare for this, in CLDR 47, all such elements were either removed (if redundant) or correct by adding a `numberSystem` attribute.
Expand All @@ -103,6 +106,7 @@ For a full listing, see [Delta Data](https://unicode.org/cldr/charts/dev/delta/i
### Collation Data Changes

- Two old `zh` collation variants are removed: big5han and gb2312.
They are no longer typically used, and only cover a fraction of the CJK ideographs.
([CLDR-16062](https://unicode-org.atlassian.net/browse/CLDR-16062))

### Number Spellout Data Changes
Expand All @@ -121,15 +125,15 @@ For a full listing, see [Delta Data](https://unicode.org/cldr/charts/dev/delta/i

### Segmentation Data Changes

- The word break tailorings for `fi` and `sv` are removed to align with recent discussions in the UTC
- The word break tailorings for `fi` and `sv` are removed to align with recent changes to the root collation
and recent changes to ICU behavior. ([CLDR-18272](https://unicode-org.atlassian.net/browse/CLDR-18272))

### Transform Data Changes

- A new `Hant-Latn` transform is added, and `Hans-Latn` is added as an alias for the existing `Hani-Latn`
transform. When the Unihan data `kMandarin` field has two values, the first is preferred for a `CN`/`Hans`
context, and is used by the `Hani-Latn`/`Hans-Latn` transform; the second is preferred for a `TW`/`Hant`
context, and is now used by the new `Hant-Latn` transform.
- A new `Hant-Latn` transform is added, and `Hans-Latn` is added as an alias for the existing `Hani-Latn` transform.
When the Unihan data `kMandarin` field has two values,
the first is preferred for a `CN`/`Hans` context, and is used by the `Hani-Latn`/`Hans-Latn` transform;
the second is preferred for a `TW`/`Hant` context, and is now used by the new `Hant-Latn` transform.
([CLDR-18080](https://unicode-org.atlassian.net/browse/CLDR-18080))

### JSON Data Changes
Expand Down Expand Up @@ -166,33 +170,29 @@ In 46.0, but not in 47.0:

### Tooling Changes

- Assorted SurveyTool improvements including:
There were various SurveyTool improvements targeting expansion of DDL support and error detection, such as the following:
- Added a CLA check
-
- Improved validity checks for codes [CLDR-18129](https://unicode-org.atlassian.net/browse/CLDR-18129)
- Improved ability to detect invalid URLs in the site and spec
- Improved validity checks for codes [CLDR-18129](https://unicode-org.atlassian.net/browse/CLDR-18129)
- Improved ability to detect invalid URLs in the site and spec

### Keyboard Changes

- TBD

## Migration

- Number `<symbols>` elements and format elements (`<currencyFormats>`, `<decimalFormats>`, `<percentFormats>`, `<scientificFormats>`)
should all have a `numberSystem` attribute. In CLDR v48 such elements without a `numberSystem` attribute will be deprecated, and the
corresponding entries in root will be removed; these were only intended as a long-ago migration aid. See the relevant sections of the
LDML specification: [Number Symbols](https://www.unicode.org/reports/tr35/dev/tr35-numbers.html#Number_Symbols) and
[Number Formats](https://www.unicode.org/reports/tr35/dev/tr35-numbers.html#number-formats).
- Any locales that are missing Core data by the end of the CLDR 48 cycle will be removed [CLDR-16004](https://unicode-org.atlassian.net/browse/CLDR-16004)
- The default week numbering will change to ISO instead being based on the calendar week starting in CLDR 48 [CLDR-18275](https://unicode-org.atlassian.net/browse/CLDR-18275).
- Removal of number data without `numberSystem` attributes.
- Number `<symbols>` elements and format elements (`<currencyFormats>`, `<decimalFormats>`, `<percentFormats>`, `<scientificFormats>`)
should all have a `numberSystem` attribute. In CLDR v48 such elements without a `numberSystem` attribute will be deprecated, and the
corresponding entries in root will be removed; these were only intended as a long-ago migration aid. See the relevant sections of the
LDML specification: [Number Symbols](https://www.unicode.org/reports/tr35/dev/tr35-numbers.html#Number_Symbols) and
[Number Formats](https://www.unicode.org/reports/tr35/dev/tr35-numbers.html#number-formats).
- V48 advance warnings
- Any locales that are missing Core data by the end of the CLDR 48 cycle will be removed [CLDR-16004](https://unicode-org.atlassian.net/browse/CLDR-16004)
- The default week numbering will change to ISO instead being based on the calendar week starting in CLDR 48 [CLDR-18275](https://unicode-org.atlassian.net/browse/CLDR-18275).

## Known Issues

1. [CLDR-17095] The region-based firstDay value (see weekData) is currently used for several different purposes. In the future, some of these functions will be separated out:
- The day that should be shown as the first day of the week in a calendar view.
- The first day of the week (day 1) for weekday numbering.
- The first day of the week for week-of-year calendar calculations.

## Acknowledgments

Many people have made significant contributions to CLDR and LDML;
Expand Down
Loading