Re: [GIT PULL] bcachefs changes for 6.17

Kent Overstreet <kent.overstreet@xxxxxxxxx> · Sat, 9 Aug 2025 13:36:39 -0400

On Thu, Aug 07, 2025 at 07:42:38PM +0700, Aquinas Admin wrote:
> Generally, this drama is more like a kindergarten. I honestly don't understand 
> why there's such a reaction. It's a management issue, solely a management 
> issue. The fact is that there are plenty of administrative possibilities to 
> resolve this situation.

Yes, this is accurate. I've been getting entirely too many emails from
Linus about how pissed off everyone is, completely absent of details -
or anything engineering related, for that matter. Lots of "you need to
work with us better" - i.e. bend to demands - without being willing to
put forth an argument that stands to scrutiny.

This isn't high school, and it's not a popularity contest. This is
engineering, and it's about engineering standards.

Those engineering standards have been notably lacking in the Linux
filesystem world.

When brtfs shipped, it did so with clear design issues that have never
been adequately resolved. These were brought up on the list in the very
early days of btrfs, when it was still experimental, with detailed
analysis - that was ignored.

The issues in btrfs are the stuff of legend; I've been to conferences
(past LSFs) where after dinner the stories kept coming out from people
who had worked on it - for easily an _hour_ - and had people falling out
of their chairs.

As a result, to this day, people don't trust it, and for good reason.
Multidevice data corruptions, unfixed bugs with no real information,
people who have tried to help out and fund getting this stuff fixed only
to be turned away. This stuff is still going on:
https://news.ycombinator.com/item?id=44508601

This is what you'd expect to happen when you rush to have all the
features, skip the design, and don't build a community that's focused on
working with users.

Let's compare what's going on in bcachefs:

Bug tracker:
https://github.com/koverstreet/bcachefs/issues?q=is%3Aissue%20state%3Aopen%20-label%3Aenhancement%20-label%3A%22waiting%20confirmation%20fixed%22

Syzbot, and the other major filesystems for comparison:
https://syzkaller.appspot.com/upstream/s/bcachefs
https://syzkaller.appspot.com/upstream/s/ext4
https://syzkaller.appspot.com/upstream/s/xfs
https://syzkaller.appspot.com/upstream/s/btrfs

(Does btrfs even have a central bug tracker?)

An important note, with bcachefs most of the activity doesn't happen on
the bug tracker, it's on IRC (and the IRC channel is by far the most
active out of all the major filesystems). The bug tracker is for making
sure bugs don't get lost if they can't get fixed right away - most bugs
never make it there. So the bug tracker is a good measure of outstanding
bugs, but not fixed bugs or gauging usage.

How did we get here, what are we doing differently - and where are we
now?

The recipe has been: patient, methodical engineering, with a focus on
the users and building the user community, and working closely with the
people who are using, testing and QAing.

Get the design right, keep the codebase reasonably clean and well
organized so that we can work efficiently; _heavy_ focus on assertions,
automated testing (i.e. basic modern engineering best practices),
introspection and debug tooling.

Get enough feature work done to validate the design, and then - fix
every last bug, and work with users to make sure that bugs are fixed and
it's working well; work with people who are doing every kind of torture
testing imaginable.

A refrain I've been hearing has been about "working with the community",
but to the kernel community, I need to hammer the point home that the
community is not just us; it's all the people running our code, too.

We have to actively work with those people if we want our code to
actually work reliably in the real world, and this is something that's
been frighteningly absent elsewhere, in filesystem development these
days.

30 years ago, Linux took over by being a real community effort.

But now, most of the development is very corporate, and getting
corporate developers to actually engage with the community and do
anything that smells of unpaid support is worse than pulling teeth - it
just doesn't happen.

Now bcachefs is the community based up and comer...

But it's not really "up and coming" anymore.

6.16 is "unofficially unexperimental" - it's solid.

It's attracting real interest and feedback from the ZFS community, and
that hasn't happened before; those are the people who care about
reliability and good engineering above all else.

All the hard engineering problems are solved, stabilizing is basically
done. We've got petabyte scalability, the majority of online fsck in
place, all the multi device stuff rock solid (a major area where brtfs
falls over); the error handling, logging and debugging tools are top
notch. Repair is comprehensive and robust, with real defense in depth,
and an extensive suite of tools for analyzing issues and making sure we
can debug anything that may occur in the wild.

The kernel community is being caught with their pants down here.

The desicionmaking process has, at every step in the way, been "things
couldn't possibly be that insane" - and yet, I am continually proven
wrong.

Post btrfs, I seriously expected there to be real design review for any
future filesystems, and a retrospective on development process.

Needless to say, that did not happen - it seems we're still in the
"trust me bro, I got this" stage in the development of an engineering
culture.

But a cowboy culture only takes you so far, at some point you really do
need actual engineer standards; you need to be able to explain your
designs, your methods, your processes and decisionmaking.

I've talked at length in the past about the need for a tight feedback
loop on getting bugs out to users if we want to be able to work with
those users (and to be honest, that should not even have been a
discussion; I've been going over RC pull requests and there's been
nothing remotely unusual about what I've been sending - except for
volume, which is exactly what you want and expect for a filesystem
that's been rapidly stabilizing).

But "shipping bugfixes" has been called "whining" - that's the mentality
we're dealing with here.

I have to hammer on this one: there are certain bedrock principles of
systems engineering we all know.  "Make sure things work and stay
working" is one of them. The rest of the kernel knows this as "do not
break userspace", but in filesystem land that same underlying principle
is written as "we do not lose user data".

Our job is to ship things that work, and make sure they work.

I also talk a lot about the need for automated testing; and that's
another area where the kernel is woefully behind - and it's been one of
the sources of conflict. I've asked people in other subsystems to please
make sure they tests when regressions have hit bcachefs; it's good for
everyone, not just bcachefs. But this has been cited (!) as one of the
causes of conflict that's been pissing Linus off.

Engineering principles. Basic stuff, here.

And regarding manegement processes: Linus has been saying repeatedly
(and loudly, and in public) that it's his decision whether or not to
remove bcachefs from the kernel - but the criteria and decisionmaking
process have been notably absent.

It is not for me to say whether or not the kernel should still be a
personal project, with decisions made in this way. And at the end of the
day, we're all human beings, I'm not going to argue against the human
factor, or against considering the people behind these projects.

But the uncertainty this has caused has created massive problems for
building a sustainable developer community around this thing, it should
be noted.

For my part, I just want to reassure people that I'm not going anywhere;
bcachefs will continue to be developed and supported, in or out of the
kernel.

Cheers,
Kent