On Thu, Apr 17, 2025 at 06:17:01PM +0800, Ian Kent wrote: > > On 17/4/25 17:01, Christian Brauner wrote: > > On Wed, Apr 16, 2025 at 11:11:51PM +0100, Mark Brown wrote: > > > On Tue, Apr 08, 2025 at 04:58:34PM -0400, Eric Chanudet wrote: > > > > Defer releasing the detached file-system when calling namespace_unlock() > > > > during a lazy umount to return faster. > > > > > > > > When requesting MNT_DETACH, the caller does not expect the file-system > > > > to be shut down upon returning from the syscall. Calling > > > > synchronize_rcu_expedited() has a significant cost on RT kernel that > > > > defaults to rcupdate.rcu_normal_after_boot=1. Queue the detached struct > > > > mount in a separate list and put it on a workqueue to run post RCU > > > > grace-period. > > > For the past couple of days we've been seeing failures in a bunch of LTP > > > filesystem related tests on various arm64 systems. The failures are > > > mostly (I think all) in the form: > > > > > > 20101 10:12:40.378045 tst_test.c:1833: TINFO: === Testing on vfat === > > > 20102 10:12:40.385091 tst_test.c:1170: TINFO: Formatting /dev/loop0 with vfat opts='' extra opts='' > > > 20103 10:12:40.391032 mkfs.vfat: unable to open /dev/loop0: Device or resource busy > > > 20104 10:12:40.395953 tst_test.c:1170: TBROK: mkfs.vfat failed with exit code 1 > > > > > > ie, a failure to stand up the test environment on the loopback device > > > all happening immediately after some other filesystem related test which > > > also used the loop device. A bisect points to commit a6c7a78f1b6b97 > > > which is this, which does look rather relevant. LTP is obviously being > > > very much an edge case here. > > Hah, here's something I didn't consider and that I should've caught. > > > > Look, on current mainline no matter if MNT_DETACH/UMOUNT_SYNC or > > non-MNT_DETACH/UMOUNT_SYNC. The mntput() calls after the > > synchronize_rcu_expedited() calls will end up in task_work(): > > > > if (likely(!(mnt->mnt.mnt_flags & MNT_INTERNAL))) { > > struct task_struct *task = current; > > if (likely(!(task->flags & PF_KTHREAD))) { > > init_task_work(&mnt->mnt_rcu, __cleanup_mnt); > > if (!task_work_add(task, &mnt->mnt_rcu, TWA_RESUME)) > > return; > > } > > if (llist_add(&mnt->mnt_llist, &delayed_mntput_list)) > > schedule_delayed_work(&delayed_mntput_work, 1); > > return; > > } > > > > because all of those mntput()s are done from the task's contect. > > > > IOW, if userspace does umount(MNT_DETACH) and the task has returned to > > userspace it is guaranteed that all calls to cleanup_mnt() are done. > > > > With your change that simply isn't true anymore. The call to > > queue_rcu_work() will offload those mntput() to be done from a kthread. > > That in turn means all those mntputs end up on the delayed_mntput_work() > > queue. So the mounts aren't cleaned up by the time the task returns to > > userspace. > > > > And that's likely problematic even for the explicit MNT_DETACH use-case > > because it means EBUSY errors are a lot more likely to be seen by > > concurrent mounters especially for loop devices. > > > > And fwiw, this is exactly what I pointed out in a prior posting to this > > patch series. > > And I didn't understand what you said then but this problem is more > > understandable to me now. > > > > > > But we've also worsened that situation by doing the deferred thing for > > any non-UMOUNT_SYNC. That which includes namespace exit. IOW, if the > > last task in a new mount namespace exits it will drop_collected_mounts() > > without UMOUNT_SYNC because we know that they aren't reachable anymore, > > after all the mount namespace is dead. > > > > But now we defer all cleanup to the kthread which means when the task > > returns to userspace there's still mounts to be cleaned up. > > Correct me if I'm wrong but the actual problem is that the mechanism used > > to wait until there are no processes doing an rcu-walk on mounts in the > > discard list is unnecessarily long according to what Eric has seen. So a I think that the current approach is still salvagable but I need to test this and currently LTP doesn't really compile for me.