Re: [PATCH 1/1] http: don't send C or POSIX in Accept-Language

Collin Funk <collin.funk1@xxxxxxxxx> · Fri, 11 Jul 2025 10:02:01 -0700

Justin Tobler <jltobler@xxxxxxxxx> writes:

> From my understanding, each language is expected to be defined in the
> following form:
>
>   language[_territory][.codeset][@modifier]
>
> When we parse the list of languages we only care about the
> `language[_territory]` part though.
>
> From looking at ISO 639 language codes, only codes with two or three
> characters are valid. If we wanted to be a bit more strict, we could
> check the length of the language code (everything before the first '_')
> and filter out anything outside of those limits. This would naturally
> filter out "C" and "POSIX" without having to mention them explicitly.
>
> Not sure if being more strict adds much more value here in practice
> though. So it may be fine to keep it as-is. :)

Filtering out anything that isn't 2-3 letters seems like a good
heuristic to me.

It seems better than only filtering out "C" and "POSIX" and allowing
anything else. And it keeps us from having to keep a list of updated BCP
47 language tags.

Collin