On 7/10/25 6:16 PM, brian m. carlson wrote: > The LANGUAGE environment variable is not specified by POSIX, but a > variety of programs using GNU gettext accept it. The Linux manpages > state that it can contain a colon-separated list of locales. > > However, not all locales are valid as languages. The C and POSIX > locales, for instance, are not languages and are not registered with > IANA, nor are they a part of ISO 639. In fact, "C" is too short to > match the ABNF production for a language, which must be at least two > characters in length. > > Nonetheless, many users provide these values in the LANGUAGE environment > variable for unknown reasons and if they do, we do not want to send a > malformed Accept-Language header to the server. If there are no other > valid language tags, then send no header; otherwise, send only the valid > tags, ignoring "C" and "POSIX" wherever they may appear, as well as any > variants (such as the "C.UTF-8" locale found on some Linux systems). Better docs -- the gettext manpages suck: https://www.gnu.org/software/gettext/manual/html_node/Locale-Names.html https://www.gnu.org/software/gettext/manual/html_node/The-LANGUAGE-variable.html At minimum this commit message needs revising. Gettext was adopted into POSIX 2024 (Issue 8). Respected by tools of course: https://pubs.opengroup.org/onlinepubs/9799919799/utilities/gettext.html#tag_20_54_08 https://pubs.opengroup.org/onlinepubs/9799919799/functions/gettext.html $LANGUAGE docs can be found at https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap08.html#tag_08_02 """ The value of LANGUAGE shall be a list of locale names separated by a <colon> (':') character. If LANGUAGE is set to a non-empty string, each locale name shall be tried in the specified order and if a messages object is found, it shall be used for translation. If a locale name has the format language[_territory][.codeset][@modifier], additional searches of locale names without .codeset (if present), without _territory (if present), and without @modifier (if present) may be performed """ And, for locale name values, """ If the locale value is "C" or "POSIX", the POSIX locale shall be used and the standard utilities behave in accordance with the rules in 7.2 POSIX Locale for the associated category. If the locale value begins with a <slash>, it shall be interpreted as the pathname of a file that was created in the output format used by the localedef utility; see OUTPUT FILES under localedef. Referencing such a pathname shall result in that locale being used for the indicated category. [XSI] [Option Start] If the locale value has the form: language[_territory][.codeset] it refers to an implementation-provided locale, where settings of language, territory, and codeset are implementation-defined. LC_COLLATE , LC_CTYPE , LC_MESSAGES , LC_MONETARY , LC_NUMERIC , and LC_TIME are defined to accept an additional field @modifier, which allows the user to select a specific instance of localization data within a single category (for example, for selecting the dictionary as opposed to the character ordering of data). The syntax for these environment variables is thus defined as: [language[_territory][.codeset][@modifier]] """ Your tests and code are probably broken -- they appear to normalize nearly none of the standard grammar into valid Accept-Language entries. Of course, "surely nobody actually does that" (except when they do!) -- but it's a relatively simple grammar structure, simply getting the "shape" correct seems like a good idea. -- Eli Schwartz
Attachment:
OpenPGP_signature.asc
Description: OpenPGP digital signature