[RFC] New language: Lower Sorbian (dsb_DE) [BZ #23208]

Message ID 744203676.887878.1529363268253@poczta.nazwa.pl
State Superseded
Headers show
Series
  • [RFC] New language: Lower Sorbian (dsb_DE) [BZ #23208]
Related show

Commit Message

Rafal Luzynski June 18, 2018, 11:07 p.m.
Here is a work in progress of the new language data which has been
originally posted in Bugzilla:

https://sourceware.org/bugzilla/show_bug.cgi?id=23208

Comments are welcome.  I am going to polish it and commit before the
end of June.

Regards,

Rafal

Comments

Rafal Luzynski June 18, 2018, 11:44 p.m. | #1
And here is my review:

> diff --git a/localedata/locales/dsb_DE b/localedata/locales/dsb_DE

> new file mode 100644

> index 0000000..71bca81

> --- /dev/null

> +++ b/localedata/locales/dsb_DE

> @@ -0,0 +1,256 @@

> +comment_char %

> +escape_char /

> +

> +% This file is part of the GNU C Library and contains locale data.

> +% The Free Software Foundation does not claim any copyright interest

> +% in the locale data contained in this file.  The foregoing does not

> +% affect the license of the GNU C Library as a whole.  It does not

> +% exempt you from the conditions of the license if your use would

> +% otherwise be governed by that license.

> +

> +% Lower Sorbian Language Locale for Germany

> +

> +% Source: information from Michael Wolf <milupo at sorbzilla de>

> +

> +LC_IDENTIFICATION

> +title      "Lower Sorbian locale for Germany"

> +source     "Information from Michael Wolf"

> +address    ""

> +contact    ""

> +email      ""

> +tel        ""

> +fax        ""


It's not obligatory but wouldn't you like to add your personal data here?
If not, then maybe let's add "bug-glibc-locales@gnu.org" as the email?

> [...]

> +LC_COLLATE

> +copy "iso14651_t1"

> +

> +% CLDR collation rules for Lower Sorbian:

> +% (see:https://unicode.org/cldr/trac/browser/trunk/common/collation/dsb.xml)

> +%

> [...]


We have agreed [1] to accept this chunk as is even if it is not perfect
(I'm not telling it is not perfect, I'm just considering a possible case)
so we will have a chance to tweak it in future.

> +% &E<ě<<<Ě

> +% &H<ch<<<cH<<<Ch<<<CH

> +% &[before 1] L<ł<<<Ł

> +% &N<ń<<<Ń

> +% &O<ó<<<Ó

> +% &R<ŕ<<<Ŕ

> +% &S<š<<<Š<ś<<<Ś

> +% &Z<ž<<<Ž<ź<<<Ź

> +%

> +% And CLDR also lists the following

> +% index characters:

> +% (see: https://unicode.org/cldr/trac/browser/trunk/common/main/dsb.xml)

> +%

> +% <exemplarCharacters type="index">[A B C Č Ć D E F G H {Ch} I J K Ł L M N O

> P Q R S Š Ś T U V W X Y Z Ž Ź]</exemplarCharacters>

> +% <exemplarCharacters>[a b c č ć d e ě f g h {ch} i j k ł l m n ń o ó p q r ŕ

> s š ś t u v w x y z ž ź]</exemplarCharacters>

> +

> +% The characters ě, ń, ó, ŕ are usually used as lower case characters only,

> +% only in fully capitalized words they exist as upper case characters

> +% In contrast to Upper Sorbian, the character ř does not exist in Lower

> Sorbian

> +

> +

> +

> +

> +

> +


I think we can collapse this vertical space here.  One empty line
should be sufficient.

> +collating-element <c-h> from "<U0063><U0068>"

> +collating-element <c-H> from "<U0063><U0048>"

> +collating-element <C-h> from "<U0043><U0068>"

> +collating-element <C-H> from "<U0043><U0048>"

> +

> [...]

> +

> +reorder-end

> +

> +END LC_COLLATE

> +

> +LC_CTYPE

> +copy "i18n"

> +END LC_CTYPE


I'm not sure.  I have a feeling that something is missing here.
But if we don't figure out let's leave it as is.

> +LC_MESSAGES

> +yesexpr "^[+1hHyY]"

> +noexpr  "^[-0nN]"

> +yesstr  "jo"

> +nostr   "n<U011B>"

> +END LC_MESSAGES



If "yes" is "jo" in DSB then "yesexpr" must contain "jJ".  Also as it has
been copied from HSB I think that HSB should include "jJ" for the
compatibility with German.  Whether DSB should include "hH" for the
compatibility with HSB... well, it's a question to you if there are DSB
computer users so used to HSB that they may press 'H' as the answer for "yes"?

> +LC_MONETARY

> +copy "de_DE"

> +END LC_MONETARY

> +

> +LC_NUMERIC

> +copy "de_DE"

> +END LC_NUMERIC


Good, copy from "de_DE" whatever is common.

> +LC_TIME

> +abday   "Nj";"P<U00F3>";/

> +        "Wa";"Sr";/

> +        "St";"P<U011B>";/

> +        "So"

> +day     "Nje<U017A>ela";/

> +        "P<U00F3>n<U017A>ela";/


This says: "Pónźela" - CLDR says "pónjeźele".

> +        "Wa<U0142>tora";/

> +        "Srjoda";/

> +        "Stw<U00F3>rtk";/

> +        "P<U011B>tk";/

> +        "Sobota"


Do you want to start all weekday names with uppercase?  According to CLDR
it is not necessary but if you think that weekday names usually appear in
the beginning of the sentence and therefore you want to leave it like this
then it is OK.

> +abmon   "Jan";"Feb";/

> +        "M<U011B>r";"Apr";/

> +        "Maj";"Jun";/

> +        "Jul";"Awg";/

> +        "Sep";"Okt";/

> +        "Now";"Dec"

> +alt_mon     "Januar";/


I will adjust spaces here.

> +        "Februar";/

> +        "M<U011B>rc";/

> +        "Apryl";/

> +        "Maj";/

> +        "Junij";/

> +        "Julij";/

> +        "Awgust";/

> +        "September";/

> +        "Oktober";/

> +        "Nowember";/

> +        "December"


Again, there is no reason to start the month names with the uppercase
unless you think it is good because they will usually appear in the beginning
of a sentence (including standalone).

> +mon  "januara";/


Again I will adjust spaces here.

> +        "februara";/

> +        "m<U011B>rca";/

> +        "apryla";/

> +        "maja";/

> +        "junija";/

> +        "julija";/

> +        "awgusta";/

> +        "septembra";/

> +        "oktobra";/

> +        "nowembra";/

> +        "decembra"

> +d_t_fmt "%a %d %b %Y %T %Z"

> +d_fmt   "%d.%m.%Y"

> +t_fmt   "%T"

> +am_pm   "";""

> +t_fmt_ampm ""

> +

> +week    7;19971130;4

> +first_weekday 2

> +END LC_TIME


Otherwise looks good.

> +LC_PAPER

> +copy "de_DE"

> +END LC_PAPER


Most of the locales use either “copy "i18n"” or “copy "en_US"”.

> +LC_TELEPHONE

> +copy "de_DE"

> +END LC_TELEPHONE

> +

> +LC_MEASUREMENT

> +copy "de_DE"

> +END LC_MEASUREMENT

> +

> +LC_NAME

> +name_fmt    "%d%t%g%t%m%t%f"

> +name_miss   "kn<U011B><U017E>na"

> +name_mr     "kn<U011B>z"

> +name_mrs    "kn<U011B>ni"

> +%name_ms     ""

> +END LC_NAME


What about:

name_ms     "kn<U011B>ni"

> +LC_ADDRESS

> +postal_fmt    "%f%N%a%N%d%N%b%N%s %h %e %r%N%z %T%N%c%N"

> +country_name  "Nimska"

> +country_post  "D"

> +country_ab2   "DE"

> +country_ab3   "DEU"

> +country_num   276

> +country_car   "D"

> +country_isbn  3

> +lang_name     "dolnoserb<U0161><U0107>ina"

> +lang_ab      ""

> +lang_term    "dsb"

> +lang_lib     "dsb"

> +END LC_ADDRESS


I can see this is copied from hsb_DE except few fields which had obligatorily
to be changed.  Therefore I believe this is correct.

Again, thank you Michael.

Regards,

Rafal

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=23208#c4

Patch

From 7df137009b3bbaf5acff9d4cea6b788e95d126da Mon Sep 17 00:00:00 2001
From: Michael Wolf <milupo@sorbzilla.de>
Date: Fri, 8 Jun 2018 01:26:43 +0200
Subject: [PATCH] New language: Lower Sorbian (dsb_DE) [BZ #23208]

	[BZ #23208]
	* localedata/SUPPORTED (dsb_DE.UTF8): New entry.
	* localedata/locales/dsb_DE: New file.
---
 localedata/SUPPORTED      |   1 +
 localedata/locales/dsb_DE | 256 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 257 insertions(+)
 create mode 100644 localedata/locales/dsb_DE

diff --git a/localedata/SUPPORTED b/localedata/SUPPORTED
index ab5ac11..ee959de 100644
--- a/localedata/SUPPORTED
+++ b/localedata/SUPPORTED
@@ -119,6 +119,7 @@  de_LU.UTF-8/UTF-8 \
 de_LU/ISO-8859-1 \
 de_LU@euro/ISO-8859-15 \
 doi_IN/UTF-8 \
+dsb_DE.UTF-8/UTF-8 \
 dv_MV/UTF-8 \
 dz_BT/UTF-8 \
 el_GR.UTF-8/UTF-8 \
diff --git a/localedata/locales/dsb_DE b/localedata/locales/dsb_DE
new file mode 100644
index 0000000..71bca81
--- /dev/null
+++ b/localedata/locales/dsb_DE
@@ -0,0 +1,256 @@ 
+comment_char %
+escape_char /
+
+% This file is part of the GNU C Library and contains locale data.
+% The Free Software Foundation does not claim any copyright interest
+% in the locale data contained in this file.  The foregoing does not
+% affect the license of the GNU C Library as a whole.  It does not
+% exempt you from the conditions of the license if your use would
+% otherwise be governed by that license.
+
+% Lower Sorbian Language Locale for Germany
+
+% Source: information from Michael Wolf <milupo at sorbzilla de>
+
+LC_IDENTIFICATION
+title      "Lower Sorbian locale for Germany"
+source     "Information from Michael Wolf"
+address    ""
+contact    ""
+email      ""
+tel        ""
+fax        ""
+language   "Lower Sorbian"
+territory  "Germany"
+revision   "0.1"
+date       ""
+
+category "i18n:2012";LC_IDENTIFICATION
+category "i18n:2012";LC_CTYPE
+category "i18n:2012";LC_COLLATE
+category "i18n:2012";LC_TIME
+category "i18n:2012";LC_NUMERIC
+category "i18n:2012";LC_MONETARY
+category "i18n:2012";LC_MESSAGES
+category "i18n:2012";LC_PAPER
+category "i18n:2012";LC_NAME
+category "i18n:2012";LC_ADDRESS
+category "i18n:2012";LC_TELEPHONE
+category "i18n:2012";LC_MEASUREMENT
+END LC_IDENTIFICATION
+
+LC_COLLATE
+copy "iso14651_t1"
+
+% CLDR collation rules for Lower Sorbian:
+% (see:https://unicode.org/cldr/trac/browser/trunk/common/collation/dsb.xml)
+%
+% &C<č<<<Č<ć<<<Ć
+% &E<ě<<<Ě
+% &H<ch<<<cH<<<Ch<<<CH
+% &[before 1] L<ł<<<Ł
+% &N<ń<<<Ń
+% &O<ó<<<Ó
+% &R<ŕ<<<Ŕ
+% &S<š<<<Š<ś<<<Ś
+% &Z<ž<<<Ž<ź<<<Ź
+%
+% And CLDR also lists the following
+% index characters:
+% (see: https://unicode.org/cldr/trac/browser/trunk/common/main/dsb.xml)
+%
+% <exemplarCharacters type="index">[A B C Č Ć D E F G H {Ch} I J K Ł L M N O P Q R S Š Ś T U V W X Y Z Ž Ź]</exemplarCharacters>
+% <exemplarCharacters>[a b c č ć d e ě f g h {ch} i j k ł l m n ń o ó p q r ŕ s š ś t u v w x y z ž ź]</exemplarCharacters>
+
+% The characters ě, ń, ó, ŕ are usually used as lower case characters only,
+% only in fully capitalized words they exist as upper case characters
+% In contrast to Upper Sorbian, the character ř does not exist in Lower Sorbian
+
+
+
+
+
+
+collating-element <c-h> from "<U0063><U0068>"
+collating-element <c-H> from "<U0063><U0048>"
+collating-element <C-h> from "<U0043><U0068>"
+collating-element <C-H> from "<U0043><U0048>"
+
+collating-symbol <c-caron>
+collating-symbol <c-acute>
+collating-symbol <d-z-acute-digraph>
+collating-symbol <e-caron>
+collating-symbol <c-h-digraph>
+collating-symbol <l-stroke>
+collating symbol <n-acute>
+collating symbol <o-acute>
+collating-symbol <r-acute>
+collating-symbol <s-caron>
+collating-symbol <s-acute>
+collating-symbol <z-caron>
+collating-symbol <z-acute>
+
+reorder-after <AFTER-C>
+<c-caron>
+<c-acute>
+reorder-after <AFTER-D>
+<d-z-acute-digraph>
+reorder-after <AFTER-E>
+<e-caron>
+ reorder-after <AFTER-H>
+<c-h-digraph>
+reorder-after <AFTER-K>
+<l-stroke>
+reorder-after <AFTER-N>
+<n-acute>
+reorder-after <AFTER-O>
+<o-acute>
+reorder-after <AFTER-R>
+<r-acute>
+reorder-after <AFTER-S>
+<s-caron>
+<s-acute>
+reorder-after <AFTER-Z>
+<z-caron>
+<z-acute>
+
+<U010D> <c-caron>;<BASE>;<MIN>;IGNORE % č
+<U010C> <c-caron>;<BASE>;<CAP>;IGNORE % Č
+<U0107> <c-acute>;<BASE>;<MIN>;IGNORE % ć
+<U0106> <c-acute>;<BASE>;<CAP>;IGNORE % Ć
+<d-z'> <d-z-acute-digraph>;<BASE>;"<MIN><MIN>";IGNORE % dź
+<d-Z'> <d-z-acute-digraph>;<BASE>;"<MIN><CAP>";IGNORE % dŹ
+<D-z'> <d-z-acute-digraph>;<BASE>;"<CAP><MIN>";IGNORE % Dź
+<D-Z'> <d-z-acute-digraph>;<BASE>;"<CAP><CAP>";IGNORE % DŹ
+<U011B> <e-caron>;<BASE>;<MIN>;IGNORE % ě
+<U011A> <e-caron>;<BASE>;<CAP>;IGNORE % Ě
+<c-h> <c-h-digraph>;<BASE>;"<MIN><MIN>";IGNORE % ch
+<c-H> <c-h-digraph>;<BASE>;"<MIN><CAP>";IGNORE % cH
+<C-h> <c-h-digraph>;<BASE>;"<CAP><MIN>";IGNORE % Ch
+<C-H> <c-h-digraph>;<BASE>;"<CAP><CAP>";IGNORE % CH
+<U0142> <l-stroke>;<BASE>;<MIN>;IGNORE % ł
+<U0141> <l-stroke>;<BASE>;<CAP>;IGNORE % Ł
+<U0144> <n-acute>;<BASE>;<MIN>;IGNORE % ń
+<U0143> <n-acute>;<BASE>;<CAP>;IGNORE % Ń
+<U00F3> <o-acute>;<BASE>;<MIN>;IGNORE % ó
+<U00D3> <o-acute>;<BASE>;<CAP>;IGNORE % Ó
+<U0155> <r-acute>;<BASE>;<MIN>;IGNORE % ŕ
+<U0154> <r-acute>;<BASE>;<CAP>;IGNORE % Ŕ
+<U0161> <s-caron>;<BASE>;<MIN>;IGNORE % š
+<U0160> <s-caron>;<BASE>;<CAP>;IGNORE % Š
+<U015B> <s-acute>;<BASE>;<MIN>;IGNORE % ś
+<U015A> <s-acute>;<BASE>;<CAP>;IGNORE % Ś
+<U017E> <z-caron>;<BASE>;<MIN>;IGNORE % ž
+<U017D> <z-caron>;<BASE>;<CAP>;IGNORE % Ž
+<U017A> <z-acute>;<BASE>;<MIN>;IGNORE % ź
+<U0179> <z-acute>;<BASE>;<CAP>;IGNORE % Ź
+
+reorder-end
+
+END LC_COLLATE
+
+LC_CTYPE
+copy "i18n"
+END LC_CTYPE
+
+LC_MESSAGES
+yesexpr "^[+1hHyY]"
+noexpr  "^[-0nN]"
+yesstr  "jo"
+nostr   "n<U011B>"
+END LC_MESSAGES
+
+LC_MONETARY
+copy "de_DE"
+END LC_MONETARY
+
+LC_NUMERIC
+copy "de_DE"
+END LC_NUMERIC
+
+LC_TIME
+abday   "Nj";"P<U00F3>";/
+        "Wa";"Sr";/
+        "St";"P<U011B>";/
+        "So"
+day     "Nje<U017A>ela";/
+        "P<U00F3>n<U017A>ela";/
+        "Wa<U0142>tora";/
+        "Srjoda";/
+        "Stw<U00F3>rtk";/
+        "P<U011B>tk";/
+        "Sobota"
+abmon   "Jan";"Feb";/
+        "M<U011B>r";"Apr";/
+        "Maj";"Jun";/
+        "Jul";"Awg";/
+        "Sep";"Okt";/
+        "Now";"Dec"
+alt_mon     "Januar";/
+        "Februar";/
+        "M<U011B>rc";/
+        "Apryl";/
+        "Maj";/
+        "Junij";/
+        "Julij";/
+        "Awgust";/
+        "September";/
+        "Oktober";/
+        "Nowember";/
+        "December"
+mon  "januara";/
+        "februara";/
+        "m<U011B>rca";/
+        "apryla";/
+        "maja";/
+        "junija";/
+        "julija";/
+        "awgusta";/
+        "septembra";/
+        "oktobra";/
+        "nowembra";/
+        "decembra"
+d_t_fmt "%a %d %b %Y %T %Z"
+d_fmt   "%d.%m.%Y"
+t_fmt   "%T"
+am_pm   "";""
+t_fmt_ampm ""
+
+week    7;19971130;4
+first_weekday 2
+END LC_TIME
+
+LC_PAPER
+copy "de_DE"
+END LC_PAPER
+
+LC_TELEPHONE
+copy "de_DE"
+END LC_TELEPHONE
+
+LC_MEASUREMENT
+copy "de_DE"
+END LC_MEASUREMENT
+
+LC_NAME
+name_fmt    "%d%t%g%t%m%t%f"
+name_miss   "kn<U011B><U017E>na"
+name_mr     "kn<U011B>z"
+name_mrs    "kn<U011B>ni"
+%name_ms     ""
+END LC_NAME
+
+LC_ADDRESS
+postal_fmt    "%f%N%a%N%d%N%b%N%s %h %e %r%N%z %T%N%c%N"
+country_name  "Nimska"
+country_post  "D"
+country_ab2   "DE"
+country_ab3   "DEU"
+country_num   276
+country_car   "D"
+country_isbn  3
+lang_name     "dolnoserb<U0161><U0107>ina"
+lang_ab      ""
+lang_term    "dsb"
+lang_lib     "dsb"
+END LC_ADDRESS
-- 
2.7.5