Re: [PATCH] generic/032: fix failure due to attempt to wait for non-child process

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



On Fri, Jun 13, 2025 at 02:02:53PM +0100, Filipe Manana wrote:
> On Fri, Jun 6, 2025 at 1:53 AM Zorro Lang <zlang@xxxxxxxxxx> wrote:
> >
> > On Thu, Jun 05, 2025 at 08:37:35PM +0100, Filipe Manana wrote:
> > > On Thu, Jun 5, 2025 at 5:52 PM Zorro Lang <zlang@xxxxxxxxxx> wrote:
> > > >
> > > > On Wed, May 28, 2025 at 12:42:20PM +0100, fdmanana@xxxxxxxxxx wrote:
> > > > > From: Filipe Manana <fdmanana@xxxxxxxx>
> > > > >
> > > > > Running generic/032 can sporadically fail like this:
> > > > >
> > > > >   generic/032 11s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//generic/032.out.bad)
> > > > >       --- tests/generic/032.out   2023-03-02 21:47:53.884609618 +0000
> > > > >       +++ /home/fdmanana/git/hub/xfstests/results//generic/032.out.bad    2025-05-28 10:39:34.549499493 +0100
> > > > >       @@ -1,5 +1,6 @@
> > > > >        QA output created by 032
> > > > >        100 iterations
> > > > >       +/home/fdmanana/git/hub/xfstests/tests/generic/032: line 90: wait: pid 3708239 is not a child of this shell
> > > > >        000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd  >................<
> > > > >        *
> > > > >        100000
> > > > >       ...
> > > > >       (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/generic/032.out /home/fdmanana/git/h
> > > > >
> > > > > This is because we are attempting to wait for a process that is not a
> > > > > child process of the test process and it's instead a child of a process
> > > > > spawned by the test.
> > > > >
> > > > > To make sure that after we kill the process running _syncloop() there
> > > > > isn't any xfs_io process still running syncfs, add instead a trap to
> > > > > to _syncloop() that waits for xfs_io to finish before the process running
> > > > > that function exits.
> > > > >
> > > > > Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx>
> > > > > ---
> > > >
> > > > Oh... I didn't remove the _pgrep when I reverted those "setsid" things.
> > > >
> > > > CC Darrick, what do you think if I remove the _pgrep from common/rc
> > > > and generic/032 :) On the other words, revert the:
> > > >
> > > >   commit 1bb15a27573eea1df493d4b7223ada2e6c04a07a
> > > >   Author: Darrick J. Wong <djwong@xxxxxxxxxx>
> > > >   Date:   Mon Feb 3 14:00:29 2025 -0800
> > > >
> > > >       generic/032: fix pinned mount failure
> > >
> > > Reverting that commit won't fix anything. One still needs a mechanism
> > > to ensure that we don't attempt to unmount the scratch device while
> > > xfs_io from sync_pid is still running. The mechanism implemented in
> > > that commit is buggy and the trap based one from this patch should
> > > always work (and we do this trap based approach on several other tests
> > > to solve this same problem).
> >
> > Sure, don't worry, I didn't try to Nack your patch:) Just due to you remove
> > the _pgrep() in your patch, then I thought it can be removed from common/rc
> > totally, looks like nothing need that function. So I tried to confirm that
> > with Darrick (who brought in this function:)
> >
> > Due to commit 1bb15a27573 does two things:
> > 1) create a new function _pgrep
> > 2) call _pgrep in g/032
> >
> > You've removed the 2) in this patch, so I'm wondering how about removing
> > the 1) and 2) totally. As you can see, g/032 is the only one place uses
> > _pgrep:
> 
> Ok, but that shouldn't be a blocker to fix a bug.
> Can we make some progress on this?

Sure, you're right. It'll be merged in this week fstests release. You
can find it in patches-in-queue branch :)

> 
> Do you want me to remove the function in a v2 of this patch, or do it
> as a separate patch?

I'll send a patch to remove it, don't worry.

Reviewed-by: Zorro Lang <zlang@xxxxxxxxxx>

> 
> Thanks.
> 
> >
> > $ grep -rsn _pgrep .
> > ./common/rc:40:_pgrep()
> > ./tests/generic/032:87:dead_syncfs_pid=$(_pgrep xfs_io)
> >
> > Thanks,
> > Zorro
> >
> >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > Thanks,
> > > > Zorro
> > > >
> > > > >  tests/generic/032 | 13 ++++---------
> > > > >  1 file changed, 4 insertions(+), 9 deletions(-)
> > > > >
> > > > > diff --git a/tests/generic/032 b/tests/generic/032
> > > > > index 48d594fe..b04b84de 100755
> > > > > --- a/tests/generic/032
> > > > > +++ b/tests/generic/032
> > > > > @@ -28,6 +28,10 @@ _cleanup()
> > > > >
> > > > >  _syncloop()
> > > > >  {
> > > > > +     # Wait for any running xfs_io command running syncfs before we exit so
> > > > > +     # that unmount will not fail due to the mount being pinned by xfs_io.
> > > > > +     trap "wait; exit" SIGTERM
> > > > > +
> > > > >       while [ true ]; do
> > > > >               _scratch_sync
> > > > >       done
> > > > > @@ -81,15 +85,6 @@ echo $iters iterations
> > > > >  kill $syncpid
> > > > >  wait
> > > > >
> > > > > -# The xfs_io instance started by _scratch_sync could be stuck in D state when
> > > > > -# the subshell running _syncloop & is killed.  That xfs_io process pins the
> > > > > -# mount so we must kill it and wait for it to die before cycling the mount.
> > > > > -dead_syncfs_pid=$(_pgrep xfs_io)
> > > > > -if [ -n "$dead_syncfs_pid" ]; then
> > > > > -     _pkill xfs_io
> > > > > -     wait $dead_syncfs_pid
> > > > > -fi
> > > > > -
> > > > >  # clear page cache and dump the file
> > > > >  _scratch_cycle_mount
> > > > >  _hexdump $SCRATCH_MNT/file
> > > > > --
> > > > > 2.47.2
> > > > >
> > > > >
> > > >
> > >
> >
> 





[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux