Re: [PATCH 1/1] http: don't send C or POSIX in Accept-Language

"brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> · Fri, 11 Jul 2025 21:29:40 +0000

On 2025-07-11 at 20:57:03, Carlo Marcelo Arenas Belón wrote:
> except that it would be incorrect, as language tags are defined in RFC5646
> and are larger than that.
> 
> most importantly, deriving language tags from locales provides some very
> useful tags when including the characters after the _, because zh_CN and
> zh_HK use completely different scripts, for example.

Yes, that's true.  You have some private use and some irregular tags and
you also have some tags that include scripts or country codes.

For instance, Swahili can be written in Latin or Arabic script.  As I
understand it, the Arabic script form is older and less common these
days, so if I learned Swahili (which I would like to), then I might only
learn the Latin script variant in a course.  I would need to specify
that script in the language code to be sure that I was presented with
content in a form that I could read and understand.  Similar concerns
exist with the variants of Serbo-Croatian: some are written in Latin
scripts, some in Cyrillic, and some in both, and it's not guaranteed
that all speakers understand all forms.

And then there's pt-PT and pt-BR, which are not always mutually
intelligible.  Most free software I've seen ships these as separate
translations.

I don't want to implement language tag parsing here since we don't need
to do that.  I would like to do the simple thing to prevent commonly
used locales that don't represent actual language tags from being
included and not overengineer this design.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA
Attachment:
signature.asc

Description: PGP signature