[v4,0/4] Add new C.UTF-8 locale (Bug 17318)

Message ID 20210428130033.3196848-1-carlos@redhat.com
Headers show
Series
  • Add new C.UTF-8 locale (Bug 17318)
Related show

Message

Adhemerval Zanella via Libc-alpha April 28, 2021, 1 p.m.
In order to make implementing the C.UTF-8 locale easier there are
several steps that should be taken before the locale is added:

1) Implement wide ellipsis range handling for UTF-8 to simplify
   the LC_COLLATE description in the locale.
2) Update the UTF-8 charmap processing to include all code points
   (excluding surrogates) and make use of the wide ellipsis ranges.
4) Regenerate the UTF-8 character map with the new characters
   for full code point coverage.

The new C.UTF-8 locale is not added to SUPPORTED because it is
28MiB in size due to the size of the weights array in LC_COLLATE
for the full set of code points. Before we can make C.UTF-8
supported we must simplify the weights processing to use strcmp
and remove the weights array from the binary data. To some extent
this is a reference implementation from which we can test a newer
version or a builtin version that has the size and performance
we expect.

Carlos O'Donell (4):
  Add support for processing wide ellipsis ranges in UTF-8.
  Update UTF-8 charmap processing.
  Regenerate localedata files.
  Add generic C.UTF-8 locale (Bug 17318)

 locale/programs/charmap.c              |  174 +-
 localedata/C.UTF-8.in                  |  156 +
 localedata/Makefile                    |    2 +
 localedata/charmaps/UTF-8              | 4396 ++++--------------------
 localedata/locales/C                   |  188 +
 localedata/locales/i18n_ctype          |    2 +-
 localedata/locales/tr_TR               |    2 +-
 localedata/locales/translit_circle     |    2 +-
 localedata/locales/translit_cjk_compat |    2 +-
 localedata/locales/translit_combining  |    2 +-
 localedata/locales/translit_compat     |    2 +-
 localedata/locales/translit_font       |    2 +-
 localedata/locales/translit_fraction   |    2 +-
 localedata/unicode-gen/utf8_gen.py     |  133 +-
 14 files changed, 1288 insertions(+), 3777 deletions(-)
 create mode 100644 localedata/C.UTF-8.in
 create mode 100644 localedata/locales/C

-- 
2.26.3