Justin Tobler <jltobler@xxxxxxxxx> writes: > From my understanding, each language is expected to be defined in the > following form: > > language[_territory][.codeset][@modifier] > > When we parse the list of languages we only care about the > `language[_territory]` part though. > > From looking at ISO 639 language codes, only codes with two or three > characters are valid. If we wanted to be a bit more strict, we could > check the length of the language code (everything before the first '_') > and filter out anything outside of those limits. This would naturally > filter out "C" and "POSIX" without having to mention them explicitly. > > Not sure if being more strict adds much more value here in practice > though. So it may be fine to keep it as-is. :) Filtering out anything that isn't 2-3 letters seems like a good heuristic to me. It seems better than only filtering out "C" and "POSIX" and allowing anything else. And it keeps us from having to keep a list of updated BCP 47 language tags. Collin