Re: [PATCH 1/1] http: don't send C or POSIX in Accept-Language

Carlo Marcelo Arenas Belón <carenas@xxxxxxxxx> · Fri, 11 Jul 2025 13:57:03 -0700

On Fri, Jul 11, 2025 at 10:02:01AM -0800, Collin Funk wrote:
> Justin Tobler <jltobler@xxxxxxxxx> writes:
> 
> > From my understanding, each language is expected to be defined in the
> > following form:
> >
> >   language[_territory][.codeset][@modifier]
> >
> > When we parse the list of languages we only care about the
> > `language[_territory]` part though.
> >
> > From looking at ISO 639 language codes, only codes with two or three
> > characters are valid. If we wanted to be a bit more strict, we could
> > check the length of the language code (everything before the first '_')
> > and filter out anything outside of those limits. This would naturally
> > filter out "C" and "POSIX" without having to mention them explicitly.
> 
> Filtering out anything that isn't 2-3 letters seems like a good
> heuristic to me.

except that it would be incorrect, as language tags are defined in RFC5646
and are larger than that.

most importantly, deriving language tags from locales provides some very
useful tags when including the characters after the _, because zh_CN and
zh_HK use completely different scripts, for example.

Carlo