On Sun, 27 Apr 2025 at 20:01, Kent Overstreet <kent.overstreet@xxxxxxxxx> wrote: > > I'm having trouble finding anything authoritative, but what I'm seeing > indicates that NTFS does do Unicode casefolding (and their own > incompatible version, at that). I think it's effectively a fixed table, yes. I've never actually used NT. Back in the DOS days, you could set the codepage on your medium, and people did. And it did cause problems, but they were pretty rare. People in non-US locales tended to learn to not rely on local case folding rules for the local odd characters. [ And I don't know if the Finnish situation was typical, but we actually had a *7-bit* version of Finnish characters, which meant that there were special case folding rules where '{' was the lowercase version of '['. I know, that sounds insane, but there you are. Those rules never extended to the filesystem, though, but they showed up in editors etc ] > I'm becoming more and more convinced that I want more separation between > casefolded lookups and non casefolded lookups, the potential for > casefolding rule changes to break case-sensitive lookups is just bad. Yeah. The problem is that it's just *hard*. So when I am made Emperor and Grand Poo-Bah, I will solve all these problems by just making case folding illegal. But until that time, I really wish that people would at least try to actively minimize the damage. It would be interesting to hear from the Wine people (and Android people) what the minimal set of case folding would be. Because I really do suspect that there may not be any actual steam games that rely on *anything* else than just A-Z. That case really is much simpler to handle. You can do some really cheap things like saying "for hashing filenames, I will just ignore bit #5". Of course, for *existing* setups, we're kind of screwed. The "two different versions of the heart emoji" was from a real case, apparently. Because those filesystems had already encoded the overly complex rules in their hashes. Linus