On May 2, 2025 at 1:12:45 PM, Addison Phillips <addisoni18n@xxxxxxxxx> wrote:
The problem with disallowing unassigned code points is that it disadvantages languages whose code points are assigned later. Such languages can go many years with support gaps and barriers.
Confusables *are* a problem, but most new assignments don't represent new confusables. Perhaps better coordination between Unicode and IETF is called for to prevent gaps and better document problem vectors?
What Addison said. Having read PRECIS carefully, and observed the lack of upkeep on the PRECIS and IDNA tables, I think it is fair to say that dealing with unassigned-but-assignable code points is a largely unsolved problem.
I have a lot of exposure to this problem because my primary open-source project is a text processor that includes regexp matching, and the question arises of what “.” should match. You need to either anchor your software at a particular Unicode version (as for example Go does, and it’s now a major version behind) or consult the Unicode Character Database at runtime, which is probably the right thing to do but not always practical.
For that reason, I do not see unassigned-but-assignable code points as “problematic” in the sense that noncharacters and surrogates and so on are, and thus I think Unichars gets this right. Also, in this particular area, I think PRECIS needs more work. -T
-- last-call mailing list -- last-call@xxxxxxxx To unsubscribe send an email to last-call-leave@xxxxxxxx