On Sun, 27 Apr 2025 at 19:34, Kent Overstreet <kent.overstreet@xxxxxxxxx> wrote: > > Do you mean to say that we invented yet another incompatible unicode > casefolding scheme? > > Dear god, why? Oh, Unicode itself comes with multiple "you can do this" schemes. It's designed by committee, and meant for different situations and different uses. Because the rules for things like sorting names are wildly different even for the same language, just for different contexts. Think of Unicode as "several decades of many different people coming together, all having very different use cases". So you find four different normalization forms, all with different use-cases. And guess what? The only actual *valid* scheme for a filesystem is none of the four. Literally. It's to say "we don't normalize". Because the normalization forms are not meant to be some kind of "you should do this". They are meant as a kind of "if you are going to do X, then you can normalize into form Y, which makes doing X easier". And often the normalized form should only ever be an intermediate _temporary_ form for doing comparisons, not the actual form you save things in. Sadly, people so often get it wrong. For example, one very typical "you got it wrong, because you didn't understand the problem" case is to do comparisons by normalizing both sides (in one of the normalization forms) and then doing the comparison in that form. And guess what? 99.9% of the time, you just wasted enormous amounts of time, because you could have done the comparison first *without* any normalization at all, because equality is equality even when neither side is normalized. And the *common* case is that you are comparing things that are in the same form. For example, in filesystem operations, 99.999% of the time when you do a 'stat()' the *source* of the 'stat()' is typically a 'readdir()' operation. So you are going to be using the same exact form that the filesystem ALREADY HAD, and it's going to be an exact match, and there will NEVER EVER be any case folding issues in those situations. But the "simplistic" way to do it is to always normalize - which involves allocating temporary storage for the new form, doing a fairly expensive transformation including case folding, and then comparing those things. Christ. The pure and incompetence in case-insensitivity *hurts*. And what is so sad is that all of this is self-inflicted damage by filesystem people who SHOULD NOT HAVE DONE THE COMPLEXITY IN THE FIRST PLACE! It's a classic case of "Doctor, doctor, it hurts when I hit myself in the balls with this hammer" and then people wonder why I still claim the answer still remains - and always will remain - "Don't do that then". Linus