Re: [PATCH RFC 5/6] fs: introduce a shutdown_bdev super block operation

Christian Brauner <brauner@xxxxxxxxxx> · Tue, 24 Jun 2025 12:15:13 +0200

On Tue, Jun 24, 2025 at 07:21:50PM +0930, Qu Wenruo wrote:
> 
> 
> 在 2025/6/24 18:43, Christian Brauner 写道:
> [...]
> > > It's not hard for btrfs to provide it, we already have a check function
> > > btrfs_check_rw_degradable() to do that.
> > > 
> > > Although I'd say, that will be something way down the road.
> > 
> > Yes, for sure. I think long-term we should hoist at least the bare
> > infrastructure for multi-device filesystem management into the VFS.
> 
> Just want to mention that, "multi-device filesystem" already includes fses
> with external journal.

Yes, that's what I meant below by "We've already done a bit of that".
It's now possible to actually reach all devices associted with a
filesystem from the block layer. It works for xfs and ext4 with
journal fileystems. So for example, you can freeze the log device and
the main device as the block layer is now able to find both and the fs
stays frozen until both have been unfrozen. This wasn't possible before
the rework we did.

Now follows a tiny rant not targeted at you specifically but something
that still bugs me in general:

We had worked on extending this to btrfs so that it's all integrated
properly with the block layer. And we heard long promises of how you
would make that switch happen but refused us to let us make that switch.
So now it's 2 years later and zero happend in that area.

That also means block device freezing on btrfs is broken. If you freeze
a block device used by btrfs via the dm (even though unlikely) layer you
freeze the block device without btrfs being informed about that.

It also means that block device removal is likely a bit yanky because
btrfs won't be notified when any device other than the main device is
suddenly yanked. You probably have logic there but the block layer can
easily inform the filesystem about such an event nowadays and let it
take appropriate action.

And fwiw, you also don't restrict writing to mounted block devices.
That's another thing you blocked us from implementing even though we
sent the changes for that already and so we disabled that in
ead622674df5 ("btrfs: Do not restrict writes to btrfs devices"). So
you're also still vulnerable to that stuff.

> 
> Thus the new callback may be a good chance for those mature fses to explore
> some corner case availability improvement, e.g. the loss of the external
> journal device while there is no live journal on it.

Already handled for xfs and ext4 cleanly since our rework iiuc.

> (I have to admin it's super niche, and live-migration to internal journal
> may be way more complex than my uneducated guess)
> 
> Thanks,
> Qu
> 
> > Or we should at least explore whether that's feasible and if it's
> > overall advantageous to maintenance and standardization. We've already
> > done a bit of that and imho it's now a lot easier to reason about the
> > basics already.
> > 
> > > 
> > > We even don't have a proper way to let end user configure the device loss
> > > behavior.
> > > E.g. some end users may prefer a full shutdown to be extra cautious, other
> > > than continue degraded.
> > 
> > Right.
>