Re: [GIT PULL] bcachefs fixes for 6.15-rc4

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Sun, 27 Apr 2025 20:16:25 -0700

On Sun, 27 Apr 2025 at 20:01, Kent Overstreet <kent.overstreet@xxxxxxxxx> wrote:
>
> I'm having trouble finding anything authoritative, but what I'm seeing
> indicates that NTFS does do Unicode casefolding (and their own
> incompatible version, at that).

I think it's effectively a fixed table, yes.

I've never actually used NT. Back in the DOS days, you could set the
codepage on your medium, and people did. And it did cause problems,
but they were pretty rare.

People in non-US locales tended to learn to not rely on local case
folding rules for the local odd characters.

[ And I don't know if the Finnish situation was typical, but we
actually had a *7-bit* version of Finnish characters, which meant that
there were special case folding rules where '{' was the lowercase
version of '['. I know, that sounds insane, but there you are.

  Those rules never extended to the filesystem, though, but they
showed up in editors etc ]

> I'm becoming more and more convinced that I want more separation between
> casefolded lookups and non casefolded lookups, the potential for
> casefolding rule changes to break case-sensitive lookups is just bad.

Yeah. The problem is that it's just *hard*.

So when I am made Emperor and Grand Poo-Bah, I will solve all these
problems by just making case folding illegal.

But until that time, I really wish that people would at least try to
actively minimize the damage.

It would be interesting to hear from the Wine people (and Android
people) what the minimal set of case folding would be.

Because I really do suspect that there may not be any actual steam
games that rely on *anything* else than just A-Z.

That case really is much simpler to handle. You can do some really
cheap things like saying "for hashing filenames, I will just ignore
bit #5".

Of course, for *existing* setups, we're kind of screwed. The "two
different versions of the heart emoji" was from a real case,
apparently. Because those filesystems had already encoded the overly
complex rules in their hashes.

               Linus