Thanks for the report. I'd like to add, bcachefs is the _lot_ of people investing a ton of time QAing this thing. There are a lot of users like Jérôme that I've worked with for extended periods tracking down all sorts of crazy stuff, and I thank them for their patience and all their help. It's been a true community effort. On Sat, Jun 21, 2025 at 05:07:51PM -0400, Jérôme Poulin wrote: > The filesystem is very resilient at being rebooted anywhere, anytime. > It went through many random resets during any of.. fsck repairs, fsck > rebuilding the btree from scratch, upgrades, in the middle of snapshot > operations, while replaying journal. It just always recovers at > places I wouldn't expect to be able to hit the power switch. Worst > case, it mounted read-only and needed fsck but could always be mounted > read-only. That's the dream :) I don't think the filesystem should ever fail in a way that leads to data loss, and I think this is a more than achievable goal. > Where things get a bit more touchy is when combining all those > features together; operations tend to be a bit "racy" between each > other and tend to lock up when there's multiple features running/being > used in parallel. I think this is where we get to the "move fast > break things" part of the filesystem. The foundation is solid, read, > write, inode creations/suppression, bucket management, all basic posix > operations, checksums, scrub, device addition. Many of the > bcachefs-specific operations are stable, being able to set compression > and replication level and data target per folder is awesome stuff and > works well. It's not "move fast and break things", we haven't had a problem with regressions that I've seen. It's just a project with massive scope, and it takes awhile to find all the corner cases and make sure there's no pathalogical behaviour in any scenario. > From my experience, what is less polished are; snapshots and snapshot > operations, reflink, nocow, multiprocess heavy workloads, those seem > to be where the "experimental" part of the filesystem goes into the > spotlight. This mostly fits with what I've been seeing; exception being that I haven't seen any major issues with reflink in ages (you mentioned a reflink corruption earlier, are you sure that was reflink?). And rebalance (background data movement) has taken awhile to make polished, and we're still not done - I think as of 6.16 we've got all the outright bugs I know of fixed, but there's still behaviour that's less than ideal (charitably) - if you ask it to move more data to a target than fits, it'll spin (no longer wasting IO, though). That one needs some real work to fix properly - another auxiliary index of "pending" extents, extents that rebalance would like to move but can't until something changes. Re: multiprocess workloads, those livelock-ish behaviour have been the most problematic to track down - but we made some recent progress on understanding where they're coming from, and the new btree iterator tracepoints should help. The new error_throw tracepoint is also already proving useful for tracking down wonky behaviour (just not the one you're talking about). > I've been running rotating snapshots on many machines, it > works well until it doesn't and I need to reboot or fsck. Reflink > before 6.14 seemed a bit hacky and can result in errors. Nocow tends > to lock up but isn't really useful with bcachefs anyway. Maybe > casefolding which might not be fully tested yet. Those are the true > experimental features and aren't really labelled as such. Casefolding still has a strange rename bug. Some of the recent self healing work was partly to make it easier to track down - we now will notice that something went wrong and pop a fsck error on the first 'ls' of an improperly renamed dirent. > We can always say "yes, this is fixed in master, this is fixed in > 6.XX-rc4" but it is still experimental and tends to be what causes the > most pain right now. I think this needs to be communicated more > clearly. If the filesystem goes off experimental, I think a subset of > features should be gated by filesystem options to reduce the need for > big and urgent rc patches. Yeah, this is coming up more as the userbase grows. For the moment doing more backports is infeasible due to sheer volume, but I expect this to be changing soon - 6.17 is when I expect to start doing more backports. > The problem is... when the experimental label is removed, it needs to > be very clear that users aren't expected to be running the latest rc > and master branch. All the features marked as stable should have > settled enough that there won't be 6 users requiring a developer to > mount their filesystem read-write or recover files from a catastrophic > race condition. Correct. Stable backports will start happening _before_ the experimental is lifted > This is where communication needs to be clear, bcachefs website, > tools, options; should all clearly label features that might require > someone to ask a developer's help or to run the latest release > candidate or a debug version of the kernel. Everything just needs to be solid before the experimental label is lifted. I don't want users to have to check a website to know what's safe to use.