On 2025-07-11 at 20:57:03, Carlo Marcelo Arenas Belón wrote: > except that it would be incorrect, as language tags are defined in RFC5646 > and are larger than that. > > most importantly, deriving language tags from locales provides some very > useful tags when including the characters after the _, because zh_CN and > zh_HK use completely different scripts, for example. Yes, that's true. You have some private use and some irregular tags and you also have some tags that include scripts or country codes. For instance, Swahili can be written in Latin or Arabic script. As I understand it, the Arabic script form is older and less common these days, so if I learned Swahili (which I would like to), then I might only learn the Latin script variant in a course. I would need to specify that script in the language code to be sure that I was presented with content in a form that I could read and understand. Similar concerns exist with the variants of Serbo-Croatian: some are written in Latin scripts, some in Cyrillic, and some in both, and it's not guaranteed that all speakers understand all forms. And then there's pt-PT and pt-BR, which are not always mutually intelligible. Most free software I've seen ships these as separate translations. I don't want to implement language tag parsing here since we don't need to do that. I would like to do the simple thing to prevent commonly used locales that don't represent actual language tags from being included and not overengineer this design. -- brian m. carlson (they/them) Toronto, Ontario, CA
Attachment:
signature.asc
Description: PGP signature