On Fri, Jun 13, 2025 at 02:02:53PM +0100, Filipe Manana wrote: > On Fri, Jun 6, 2025 at 1:53 AM Zorro Lang <zlang@xxxxxxxxxx> wrote: > > > > On Thu, Jun 05, 2025 at 08:37:35PM +0100, Filipe Manana wrote: > > > On Thu, Jun 5, 2025 at 5:52 PM Zorro Lang <zlang@xxxxxxxxxx> wrote: > > > > > > > > On Wed, May 28, 2025 at 12:42:20PM +0100, fdmanana@xxxxxxxxxx wrote: > > > > > From: Filipe Manana <fdmanana@xxxxxxxx> > > > > > > > > > > Running generic/032 can sporadically fail like this: > > > > > > > > > > generic/032 11s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//generic/032.out.bad) > > > > > --- tests/generic/032.out 2023-03-02 21:47:53.884609618 +0000 > > > > > +++ /home/fdmanana/git/hub/xfstests/results//generic/032.out.bad 2025-05-28 10:39:34.549499493 +0100 > > > > > @@ -1,5 +1,6 @@ > > > > > QA output created by 032 > > > > > 100 iterations > > > > > +/home/fdmanana/git/hub/xfstests/tests/generic/032: line 90: wait: pid 3708239 is not a child of this shell > > > > > 000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd >................< > > > > > * > > > > > 100000 > > > > > ... > > > > > (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/generic/032.out /home/fdmanana/git/h > > > > > > > > > > This is because we are attempting to wait for a process that is not a > > > > > child process of the test process and it's instead a child of a process > > > > > spawned by the test. > > > > > > > > > > To make sure that after we kill the process running _syncloop() there > > > > > isn't any xfs_io process still running syncfs, add instead a trap to > > > > > to _syncloop() that waits for xfs_io to finish before the process running > > > > > that function exits. > > > > > > > > > > Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx> > > > > > --- > > > > > > > > Oh... I didn't remove the _pgrep when I reverted those "setsid" things. > > > > > > > > CC Darrick, what do you think if I remove the _pgrep from common/rc > > > > and generic/032 :) On the other words, revert the: > > > > > > > > commit 1bb15a27573eea1df493d4b7223ada2e6c04a07a > > > > Author: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > Date: Mon Feb 3 14:00:29 2025 -0800 > > > > > > > > generic/032: fix pinned mount failure > > > > > > Reverting that commit won't fix anything. One still needs a mechanism > > > to ensure that we don't attempt to unmount the scratch device while > > > xfs_io from sync_pid is still running. The mechanism implemented in > > > that commit is buggy and the trap based one from this patch should > > > always work (and we do this trap based approach on several other tests > > > to solve this same problem). > > > > Sure, don't worry, I didn't try to Nack your patch:) Just due to you remove > > the _pgrep() in your patch, then I thought it can be removed from common/rc > > totally, looks like nothing need that function. So I tried to confirm that > > with Darrick (who brought in this function:) > > > > Due to commit 1bb15a27573 does two things: > > 1) create a new function _pgrep > > 2) call _pgrep in g/032 > > > > You've removed the 2) in this patch, so I'm wondering how about removing > > the 1) and 2) totally. As you can see, g/032 is the only one place uses > > _pgrep: > > Ok, but that shouldn't be a blocker to fix a bug. > Can we make some progress on this? Sure, you're right. It'll be merged in this week fstests release. You can find it in patches-in-queue branch :) > > Do you want me to remove the function in a v2 of this patch, or do it > as a separate patch? I'll send a patch to remove it, don't worry. Reviewed-by: Zorro Lang <zlang@xxxxxxxxxx> > > Thanks. > > > > > $ grep -rsn _pgrep . > > ./common/rc:40:_pgrep() > > ./tests/generic/032:87:dead_syncfs_pid=$(_pgrep xfs_io) > > > > Thanks, > > Zorro > > > > > > > > > > Thanks. > > > > > > > > > > > > > > Thanks, > > > > Zorro > > > > > > > > > tests/generic/032 | 13 ++++--------- > > > > > 1 file changed, 4 insertions(+), 9 deletions(-) > > > > > > > > > > diff --git a/tests/generic/032 b/tests/generic/032 > > > > > index 48d594fe..b04b84de 100755 > > > > > --- a/tests/generic/032 > > > > > +++ b/tests/generic/032 > > > > > @@ -28,6 +28,10 @@ _cleanup() > > > > > > > > > > _syncloop() > > > > > { > > > > > + # Wait for any running xfs_io command running syncfs before we exit so > > > > > + # that unmount will not fail due to the mount being pinned by xfs_io. > > > > > + trap "wait; exit" SIGTERM > > > > > + > > > > > while [ true ]; do > > > > > _scratch_sync > > > > > done > > > > > @@ -81,15 +85,6 @@ echo $iters iterations > > > > > kill $syncpid > > > > > wait > > > > > > > > > > -# The xfs_io instance started by _scratch_sync could be stuck in D state when > > > > > -# the subshell running _syncloop & is killed. That xfs_io process pins the > > > > > -# mount so we must kill it and wait for it to die before cycling the mount. > > > > > -dead_syncfs_pid=$(_pgrep xfs_io) > > > > > -if [ -n "$dead_syncfs_pid" ]; then > > > > > - _pkill xfs_io > > > > > - wait $dead_syncfs_pid > > > > > -fi > > > > > - > > > > > # clear page cache and dump the file > > > > > _scratch_cycle_mount > > > > > _hexdump $SCRATCH_MNT/file > > > > > -- > > > > > 2.47.2 > > > > > > > > > > > > > > > > > > > >