On Mon, Mar 24, 2025 at 10:34:56AM -0400, James Bottomley wrote: > On Mon, 2025-03-24 at 12:38 +0100, Jan Kara wrote: > > On Fri 21-03-25 13:00:24, James Bottomley via Lsf-pc wrote: > > > On Fri, 2025-03-21 at 08:34 -0400, James Bottomley wrote: > > > [...] > > > > Let me digest all that and see if we have more hope this time > > > > around. > > > > > > OK, I think I've gone over it all. The biggest problem with > > > resurrecting the patch was bugs in ext3, which isn't a problem now. > > > Most of the suspend system has been rearchitected to separate > > > suspending user space processes from kernel ones. The sync it > > > currently does occurs before even user processes are frozen. I > > > think > > > (as most of the original proposals did) that we just do freeze all > > > supers (using the reverse list) after user processes are frozen but > > > just before kernel threads are (this shouldn't perturb the image > > > allocation in hibernate, which was another source of bugs in xfs). > > > > So as far as my memory serves the fundamental problem with this > > approach was FUSE - once userspace is frozen, you cannot write to > > FUSE filesystems so filesystem freezing of FUSE would block if > > userspace is already suspended. You may even have a setup like: > > > > bdev <- fs <- FUSE filesystem <- loopback file <- loop device <- > > another fs > > > > So you really have to be careful to freeze this stack without causing > > deadlocks. > > Ah, so that explains why the sys_sync() sits in suspend resume *before* > freezing userspace ... that always appeared odd to me. > > > So you need to be freezing userspace after filesystems are > > frozen but then you have to deal with the fact that parts of your > > userspace will be blocked in the kernel (trying to do some write) > > waiting for the filesystem to thaw. But it might be tractable these > > days since I have a vague recollection that system suspend is now > > able to gracefully handle even tasks in uninterruptible sleep. > > There is another thing I thought about: we don't actually have to > freeze across the sleep. Yes we do. Filesystems have background workers that do stuff even when the filesystem has been synced, and this can race with hibernate shutting stuff down. This is the whole reason we needed to move to filesystem freezing - to tell the filesystems to *temporarily stop dirtying* new objects. > It might be possible simply to invoke > freeze/thaw where sys_sync() is now done to get a better on stable > storage image? That should have fewer deadlock issues. A freeze/thaw cycle still allows the filesystems to dirty objects in the background whilst hibernate continues onwards assuming filesystem are all clean. It took a long time to get all those worms in the can, and we really don't want to let them back out.... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx