On Mon 24-03-25 10:34:56, James Bottomley wrote: > On Mon, 2025-03-24 at 12:38 +0100, Jan Kara wrote: > > On Fri 21-03-25 13:00:24, James Bottomley via Lsf-pc wrote: > > > On Fri, 2025-03-21 at 08:34 -0400, James Bottomley wrote: > > > [...] > > > > Let me digest all that and see if we have more hope this time > > > > around. > > > > > > OK, I think I've gone over it all. The biggest problem with > > > resurrecting the patch was bugs in ext3, which isn't a problem now. > > > Most of the suspend system has been rearchitected to separate > > > suspending user space processes from kernel ones. The sync it > > > currently does occurs before even user processes are frozen. I > > > think > > > (as most of the original proposals did) that we just do freeze all > > > supers (using the reverse list) after user processes are frozen but > > > just before kernel threads are (this shouldn't perturb the image > > > allocation in hibernate, which was another source of bugs in xfs). > > > > So as far as my memory serves the fundamental problem with this > > approach was FUSE - once userspace is frozen, you cannot write to > > FUSE filesystems so filesystem freezing of FUSE would block if > > userspace is already suspended. You may even have a setup like: > > > > bdev <- fs <- FUSE filesystem <- loopback file <- loop device <- > > another fs > > > > So you really have to be careful to freeze this stack without causing > > deadlocks. > > Ah, so that explains why the sys_sync() sits in suspend resume *before* > freezing userspace ... that always appeared odd to me. > > > So you need to be freezing userspace after filesystems are > > frozen but then you have to deal with the fact that parts of your > > userspace will be blocked in the kernel (trying to do some write) > > waiting for the filesystem to thaw. But it might be tractable these > > days since I have a vague recollection that system suspend is now > > able to gracefully handle even tasks in uninterruptible sleep. > > There is another thing I thought about: we don't actually have to > freeze across the sleep. It might be possible simply to invoke > freeze/thaw where sys_sync() is now done to get a better on stable > storage image? That should have fewer deadlock issues. Well, there's not going to be a huge difference between doing sync(2) and doing freeze+thaw for each filesystem. After you thaw the filesystem drivers usually mark that the fs is in inconsistent state and that triggers journal replay / fsck on next mount. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR