Hello Tim, On 2025-05-06 00:48, Tim Bray wrote:
On May 5, 2025 at 1:59:27 AM, Martin J. Dürst <duerst@xxxxxxxxxxxxxxx> wrote:
What I wrote was from the viewpoint of a language/library implementer. In that position, you have to know what version you're dealing with.
I am the lead author of such a library, see https://github.com/timbray/quamina It supports case-folding using code generated by reading CaseFolding.txt from the UCD. It has a Makefile that I run as part of the release process which updates that generated code, so its contract with the world is “up to date with Unicode as of the last release timestamp”. I think that’s reasonable but haven’t really talked it over with anyone in detail.
I think what you are doing is reasonable, for your case. One thing is that you only re-download the CaseFolding.txt file every three months, which makes sure there's not too much network pressure on the Unicode web site. The other thing is that you can expect that the basic idea of case folding won't evolve.
For Ruby, the situation is a bit more complicated, because we support all kinds of Unicode properties in its regular expressions, and various other Unicode-related operations (upper casing, lower casing, title casing, normalization, grapheme clusters). We had to adjust some of the code for a significant percentage of Unicode version upgrades. As examples, for the upgrade to 15.1, the grapheme cluster algorithm needed to be adapted, and for the upgrade to 16.0, we had to fix our normalization code.
Regards, Martin. -- last-call mailing list -- last-call@xxxxxxxx To unsubscribe send an email to last-call-leave@xxxxxxxx