Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Filesystem Suspend Resume

Jan Kara <jack@xxxxxxx> · Mon, 24 Mar 2025 20:28:57 +0100

On Mon 24-03-25 10:34:56, James Bottomley wrote:
> On Mon, 2025-03-24 at 12:38 +0100, Jan Kara wrote:
> > On Fri 21-03-25 13:00:24, James Bottomley via Lsf-pc wrote:
> > > On Fri, 2025-03-21 at 08:34 -0400, James Bottomley wrote:
> > > [...]
> > > > Let me digest all that and see if we have more hope this time
> > > > around.
> > > 
> > > OK, I think I've gone over it all.  The biggest problem with
> > > resurrecting the patch was bugs in ext3, which isn't a problem now.
> > > Most of the suspend system has been rearchitected to separate
> > > suspending user space processes from kernel ones.  The sync it
> > > currently does occurs before even user processes are frozen.  I
> > > think
> > > (as most of the original proposals did) that we just do freeze all
> > > supers (using the reverse list) after user processes are frozen but
> > > just before kernel threads are (this shouldn't perturb the image
> > > allocation in hibernate, which was another source of bugs in xfs).
> > 
> > So as far as my memory serves the fundamental problem with this
> > approach was FUSE - once userspace is frozen, you cannot write to
> > FUSE filesystems so filesystem freezing of FUSE would block if
> > userspace is already suspended. You may even have a setup like:
> > 
> > bdev <- fs <- FUSE filesystem <- loopback file <- loop device <-
> > another fs
> > 
> > So you really have to be careful to freeze this stack without causing
> > deadlocks.
> 
> Ah, so that explains why the sys_sync() sits in suspend resume *before*
> freezing userspace ... that always appeared odd to me.
> 
> >  So you need to be freezing userspace after filesystems are
> > frozen but then you have to deal with the fact that parts of your
> > userspace will be blocked in the kernel (trying to do some write)
> > waiting for the filesystem to thaw. But it might be tractable these
> > days since I have a vague recollection that system suspend is now
> > able to gracefully handle even tasks in uninterruptible sleep.
> 
> There is another thing I thought about: we don't actually have to
> freeze across the sleep.  It might be possible simply to invoke
> freeze/thaw where sys_sync() is now done to get a better on stable
> storage image?  That should have fewer deadlock issues.

Well, there's not going to be a huge difference between doing sync(2) and
doing freeze+thaw for each filesystem. After you thaw the filesystem
drivers usually mark that the fs is in inconsistent state and that triggers
journal replay / fsck on next mount.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR