[Last-Call] Re: Last Call: <draft-bray-unichars-10.txt> (Unicode Character Repertoire Subsets) to Proposed Standard

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Sun, Apr 6, 2025 at 5:48 AM Carsten Bormann <cabo@xxxxxxx> wrote:
# Review of draft-bray-unichars-12

[....]
 
The character LF is widely used as the newline character and is
probably the only control character that has a defined, non-empty
meaning.
LF is actually "problematic" when the text in question is not actually
intended to be structured into lines, which in the era of structured
data representation formats is now the predominant use of text in
protocols (*).
If this document had a normative intent, it should say that the
decision to include LF in the repertoire for a data item MUST always
be explicit.

The character CR just is noise when preceding an LF (and has no
defined meaning when occuring anywhere else).
It is required by some older standards such as certain mail formats,
but is vestigial.
It is not wrong to allow it for line-structured text, but the protocol
probably has to state how it is ignored; a document like this could
provide boilerplate that can simply be copied or referenced.

I think these have to be allowed given that they are built into ABNF (see LWSP etc).
https://datatracker.ietf.org/doc/html/rfc5234

At least, eliminating these or requiring them to be explicit would make ABNF difficult to use.

CR is also still used. Let's consult YANG - RFC 7950
https://datatracker.ietf.org/doc/html/rfc7950#section-6

"Legal characters in YANG modules are the Unicode and ISO/IEC 10646
   [ISO.10646] characters, including tab, carriage return, and line feed
   but excluding the other C0 control characters, the surrogate blocks,
   and the noncharacters.  The character syntax is formally defined by
   the rule "yang-char" in Section 14."

Seems familiar.


Characters such as FF or RS are in limited use for providing
additional structure to text that goes beyond a line structure.
They are not "problematic" if their meaning is defined in the
application protocol.
(Compare the usage of RS in RFC 8142, as it is defined in RFC 8091, or
the way the publication form of RFCs up to RFC8649 employed FF to
provide pagination.)

I consider this a feature. There's nothing that says these ABNF productions have to cover the whole protocol. Maybe something like this:

FF = %x0C
paginated_unicode_text = (unicode-assignable *(FF unicode-assignable))

thanks,
Rob
-- 
last-call mailing list -- last-call@xxxxxxxx
To unsubscribe send an email to last-call-leave@xxxxxxxx

[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux