[Last-Call] Re: draft-bray-unichars-14 ietf last call Secdir review

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The problem with disallowing unassigned code points is that it disadvantages languages whose code points are assigned later. Such languages can go many years with support gaps and barriers.

Confusables *are* a problem, but most new assignments don't represent new confusables. Perhaps better coordination between Unicode and IETF is called for to prevent gaps and better document problem vectors?

Addison

On Fri, May 2, 2025, 12:45 John C Klensin <john-ietf@xxxxxxx> wrote:


--On Thursday, May 1, 2025 14:25 -0700 Tim Hollebeek via Datatracker
<noreply@xxxxxxxx> wrote:

> Document: draft-bray-unichars
> Title: Unicode Character Repertoire Subsets
> Reviewer: Tim Hollebeek
> Review result: Ready
>
> This is a very important and useful document. I found it useful and
> will recommend it to others once published.
>
> The only thing I'd point out is the opportunity to perhaps add a
> sentence opining on the intersection between "confusables" and
> "unassigned code points", and point out that if "confusables" is in
> your threat model, you have to admit you've signed up for reviewing
> and/or consuming a new list of valid code points every new unicode
> release.

And, of course, that assumes there is an entity that will create such
lists and do so accurately (presumably reflecting broad consensus)
and on a timely basis.  That is where the scope of this document
slides toward those of, e.g., PRECIS and IDNA2008.  To put the
concern into sharper perspective, PRECIS has not been updated since
2017 (Unicode 10.0) and IDNA2008 since 2022 (Unicode 12.0.0).  Draft
updates to both are floating around, but neither has been queued for
community review and action and, at this point, could be outdated
before being approved and published.   

FWIW, the problem Tim points out is exactly the reason why IDNA2008
and PRECIS disallow the use of unassigned code points -- there is no
way to know what might end up assigned to them in some future Unicode
release.  They might not only get assigned to characters that create
confusability problems but could possibly end up being assigned to
noncharacters or device or presentation controls not covered by the
current spec (although, unlike confusables, those other uses are
unlikely.  The confusable issue is not theoretical: we've had real
examples in which assignment of previously unassigned characters has
created the potential for confusion with code points assigned earlier
and versioning issues with IDNA2008.

That set of issues associated with future code point assignments puts
the statement at the end of the introduction to Section 2 of the I-D:

        "Since unassigned code points regularly become assigned when
        new characters are added to Unicode, it is usually not a good
        practice to specify that unassigned code points should be
        avoided."

in direct contradiction to the IDNA2008 and PRECIS specs, which
consider the use of unassigned code points to be very bad practice.
So while I think some variation on the sentence suggested by Tim
would be useful, it may not be sufficient.   In particular, the last
paragraph of the introductory part of Section 1, which was intended
to deal with uses of character strings in context, might reasonably
be altered to include an explicit statement about confusable
characters.   And the statements that unassigned code points should
be allowed (as above) and that Private Use ones should too (again,
absent context that specifically identifies the private use and its
conventions, there is no way to know what those code points
represent) perhaps should be reviewed once again.

   john





--
last-call mailing list -- last-call@xxxxxxxx
To unsubscribe send an email to last-call-leave@xxxxxxxx
-- 
last-call mailing list -- last-call@xxxxxxxx
To unsubscribe send an email to last-call-leave@xxxxxxxx

[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux