On Sun, Apr 20, 2025 at 06:54:06AM +0100, Al Viro wrote: > On Tue, Apr 08, 2025 at 04:58:34PM -0400, Eric Chanudet wrote: > > Defer releasing the detached file-system when calling namespace_unlock() > > during a lazy umount to return faster. > > > > When requesting MNT_DETACH, the caller does not expect the file-system > > to be shut down upon returning from the syscall. > > Not quite. Sure, there might be another process pinning a filesystem; > in that case umount -l simply removes it from mount tree, drops the > reference and goes away. However, we need to worry about the following > case: > umount -l has succeeded > <several minutes later> > shutdown -r now > <apparently clean shutdown, with all processes killed just fine> > <reboot> > WTF do we have a bunch of dirty local filesystems? Where has the data gone? > > Think what happens if you have e.g. a subtree with several local filesystems > mounted in it, along with an NFS on a slow server. Or a filesystem with > shitloads of dirty data in cache, for that matter. > > Your async helper is busy in the middle of shutting a filesystem down, with > several more still in the list of mounts to drop. With no indication for anyone > and anything that something's going on. > I'm not quite following. With umount -l, I thought there is no guaranty that the file-system is shutdown. Doesn't "shutdown -r now" already risks loses without any of these changes today? Or am I missing your point entirely? It looks like the described use-case in umount(8) manpage. > umount -l MAY leave filesystem still active; you can't e.g. do it and pull > a USB stick out as soon as it finishes, etc. After all, somebody might've > opened a file on it just as you called umount(2); that's expected behaviour. > It's not fully async, though - having unobservable fs shutdown going on > with no way to tell that it's not over yet is not a good thing. > > Cost of synchronize_rcu_expedited() is an issue, all right, and it does > feel like an excessively blunt tool, but that's a separate story. Your > test does not measure that, though - you have fs shutdown mixed with > the cost of synchronize_rcu_expedited(), with no way to tell how much > does each of those cost. > > Could you do mount -t tmpfs tmpfs mnt; sleep 60 > mnt/foo & > followed by umount -l mnt to see where the costs are? I was under the impression the tests provided did not account for the file-system shutdown, or that it was negligible. The following, on mainline PREEMPT_RT, without any patch mentioned before? # mount -t tmpfs tmpfs mnt; sleep 60 > mnt/foo & perf ftrace -G path_umount --graph-opts="depth=4" umount -l /mnt/ [Eliding most calls <100us] 0) | path_umount() { [...] 0) | namespace_unlock() { [...] 0) | synchronize_rcu_expedited() { 0) 0.108 us | rcu_gp_is_normal(); 0) | synchronize_rcu_normal() { 0) * 15820.29 us | } 0) * 15829.52 us | } [...] 0) * 15852.90 us | } [...] 0) * 15918.07 us | } Thanks, -- Eric Chanudet