Re: [RFC PATCH 4/4] vfs: add filesystem freeze/thaw callbacks for power management

James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> · Fri, 28 Mar 2025 10:21:03 -0400

On Thu, 2025-03-27 at 19:20 +0100, Jan Kara wrote:
> On Thu 27-03-25 10:06:13, James Bottomley wrote:
> > Introduce a freeze function, which iterates superblocks in reverse
> > order freezing filesystems.  The indicator a filesystem is
> > freezable is either possessing a s_bdev or a freeze_super method. 
> > So this can be used in efivarfs, whether the freeze is for
> > hibernate is also passed in via the new FREEZE_FOR_HIBERNATE flag.
> > 
> > Thawing is done opposite to freezing (so superblock traversal in
> > regular order) and the whole thing is plumbed into power
> > management.
> > The original ksys_sync() is preserved so the whole freezing step is
> > optional (if it fails we're no worse off than we are today) so it
> > doesn't inhibit suspend/hibernate if there's a failure.
> > 
> > Signed-off-by: James Bottomley
> > <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx>
> 
> OK, I've seen you are setting the new FREEZE_FOR_HIBERNATE flag but I
> didn't find anything using that flag. What do you plan to use it for?
> Does you efivars usecase need it? I find passing down this detail
> about the caller down to all filesystems a bit awkward. Isn't it
> possible to extract the information "hibernate is ongoing" from PM
> subsystem?

That's right.  I'm happy to post my patch below, but it depends on Al
accepting the simple_next_child() proposal, so it doesn't apply to any
tree.

> > +/*
> > + * Kernel freezing and thawing is only done in the power
> > management
> > + * subsystem and is thus single threaded (so we don't have to
> > worry
> > + * here about multiple calls to filesystems_freeze/thaw().
> > + */
> > +
> > +static int freeze_flags;
> 
> Frankly, the global variable to propagate flags is pretty ugly... If
> we really have to propagate some context into the iterator callback,
> rather do it explicitly like iterate_supers() does it.

Christian said the same thing.  I can do it, but if you look in the
power management subsystem, it relies on single threading and has a lot
of global variables like this, so I thought of this as a
simplification.

> > +static void filesystems_freeze_callback(struct super_block *sb)
> > +{
> > +	/* errors don't fail suspend so ignore them */
> > +	if (sb->s_op->freeze_super)
> > +		sb->s_op->freeze_super(sb, FREEZE_MAY_NEST
> > +				       | FREEZE_HOLDER_KERNEL
> > +				       | freeze_flags);
> > +	else if (sb->s_bdev)
> > +		freeze_super(sb, FREEZE_MAY_NEST |
> > FREEZE_HOLDER_KERNEL
> > +			     | freeze_flags);
> 
> Style nit - braces around above blocks would be IMHO appropriate.
> 
> > +	else {
> > +		pr_info("Ignoring filesystem %s\n", sb->s_type-
> > >name);
> > +		return;
> > +	}
> > +
> > +	pr_info("frozen %s, now syncing block ...", sb->s_type-
> > >name);
> > +	sync_blockdev(sb->s_bdev);
> > +	pr_info("done.");
> > +}
> 
> Generally this callback is not safe because it can race with
> filesystem unmount and calling ->freeze_super() after the
> filesystem's ->put_super() was called may have all sorts of
> interesting effects (freeze_super() itself will just bail with a
> warning, which is better but not great either).
> 
> The cleanest way I see how to make the iteration safe is to grab
> active sb reference (like grab_super() does it) for the duration of
> freeze_super() calls. Another possibility would be to grab sb-
> >s_umount rwsem exclusively as Luis does it in his series but that
> requires a bit of locking surgery and ->freeze_super() handlers make
> this particularly nasty these days so I think active sb reference is
> going to be nicer these days.

Before getting into the how of this, could you just confirm my
understanding of what the various locks do:

https://lore.kernel.org/cd5c3d8aab9c5fb37fa018cb3302ecf7d2bdb140.camel@xxxxxxxxxxxxxxxxxxxxx

Is correct?

Regards,

James