Re: [PATCH v2 01/16] netfs: Fix hang due to missing case in final DIO read result collection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



You can add my tested by to the first 11 in the series.  I have
verified that they fix the netfs regression (e.g. hangs in xfstest
generic/013 etc.).  The series appears important to make sure gets in
6.16

On Wed, Jun 25, 2025 at 11:44 AM David Howells <dhowells@xxxxxxxxxx> wrote:
>
> When doing a DIO read, if the subrequests we issue fail and cause the
> request PAUSE flag to be set to put a pause on subrequest generation, we
> may complete collection of the subrequests (possibly discarding them) prior
> to the ALL_QUEUED flags being set.
>
> In such a case, netfs_read_collection() doesn't see ALL_QUEUED being set
> after netfs_collect_read_results() returns and will just return to the app
> (the collector can be seen unpausing the generator in the trace log).
>
> The subrequest generator can then set ALL_QUEUED and the app thread reaches
> netfs_wait_for_request().  This causes netfs_collect_in_app() to be called
> to see if we're done yet, but there's missing case here.
>
> netfs_collect_in_app() will see that a thread is active and set inactive to
> false, but won't see any subrequests in the read stream, and so won't set
> need_collect to true.  The function will then just return 0, indicating
> that the caller should just sleep until further activity (which won't be
> forthcoming) occurs.
>
> Fix this by making netfs_collect_in_app() check to see if an active thread
> is complete - i.e. that ALL_QUEUED is set and the subrequests list is empty
> - and to skip the sleep return path.  The collector will then be called
> which will clear the request IN_PROGRESS flag, allowing the app to
> progress.
>
> Fixes: 2b1424cd131c ("netfs: Fix wait/wake to be consistent about the waitqueue used")
> Reported-by: Steve French <sfrench@xxxxxxxxx>
> Signed-off-by: David Howells <dhowells@xxxxxxxxxx>
> Reviewed-by: Paulo Alcantara <pc@xxxxxxxxxxxxx>
> cc: linux-cifs@xxxxxxxxxxxxxxx
> cc: netfs@xxxxxxxxxxxxxxx
> cc: linux-fsdevel@xxxxxxxxxxxxxxx
> ---
>  fs/netfs/misc.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c
> index 43b67a28a8fa..0a54b1203486 100644
> --- a/fs/netfs/misc.c
> +++ b/fs/netfs/misc.c
> @@ -381,7 +381,7 @@ void netfs_wait_for_in_progress_stream(struct netfs_io_request *rreq,
>  static int netfs_collect_in_app(struct netfs_io_request *rreq,
>                                 bool (*collector)(struct netfs_io_request *rreq))
>  {
> -       bool need_collect = false, inactive = true;
> +       bool need_collect = false, inactive = true, done = true;
>
>         for (int i = 0; i < NR_IO_STREAMS; i++) {
>                 struct netfs_io_subrequest *subreq;
> @@ -400,9 +400,11 @@ static int netfs_collect_in_app(struct netfs_io_request *rreq,
>                         need_collect = true;
>                         break;
>                 }
> +               if (subreq || !test_bit(NETFS_RREQ_ALL_QUEUED, &rreq->flags))
> +                       done = false;
>         }
>
> -       if (!need_collect && !inactive)
> +       if (!need_collect && !inactive && !done)
>                 return 0; /* Sleep */
>
>         __set_current_state(TASK_RUNNING);
>
>


-- 
Thanks,

Steve





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux