Re: [PATCH 2/2] NFS: Improve nfsiod workqueue detection for allocation flags

Trond Myklebust <trondmy@xxxxxxxxxx> · Mon, 07 Jul 2025 12:25:17 -0700

On Mon, 2025-07-07 at 14:46 -0400, Benjamin Coddington wrote:
> The NFS client writeback paths change which flags are passed to their
> memory allocation calls based on whether the current task is running
> from
> within a workqueue or not.  More specifically, it appears that during
> writeback allocations with PF_WQ_WORKER set on current->flags will
> add
> __GFP_NORETRY | __GFP_NOWARN.  Presumably this is because nfsiod can
> simply fail quickly and later retry to write back that specific page
> should
> the allocation fail.
> 
> However, the check for PF_WQ_WORKER is too general because tasks can
> enter NFS
> writeback paths from other workqueues.  Specifically, the loopback
> driver
> tends to perform writeback into backing files on NFS with
> PF_WQ_WORKER set,
> and additionally sets PF_MEMALLOC_NOIO.  The combination of
> PF_MEMALLOC_NOIO with __GFP_NORETRY can easily result in allocation
> failures and the loopback driver has no retry functionality.  As a
> result,
> after commit 0bae835b63c5 ("NFS: Avoid writeback threads getting
> stuck in
> mempool_alloc()") users are seeing corrupted loop-mounted filesystems
> backed
> by image files on NFS.
> 
> In a preceding patch, we introduced a function to allow NFS to detect
> if
> the task is executing within a specific workqueue.  Here we use that
> helper
> to set __GFP_NORETRY | __GFP_NOWARN only if the workqueue is nfsiod.
> 
> Fixes: 0bae835b63c5 ("NFS: Avoid writeback threads getting stuck in
> mempool_alloc()")
> Signed-off-by: Benjamin Coddington <bcodding@xxxxxxxxxx>
> ---
>  fs/nfs/internal.h | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 69c2c10ee658..173172afa3f5 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -12,6 +12,7 @@
>  #include <linux/nfs_page.h>
>  #include <linux/nfslocalio.h>
>  #include <linux/wait_bit.h>
> +#include <linux/workqueue.h>
>  
>  #define NFS_SB_MASK (SB_NOSUID|SB_NODEV|SB_NOEXEC|SB_SYNCHRONOUS)
>  
> @@ -669,9 +670,18 @@ nfs_write_match_verf(const struct nfs_writeverf
> *verf,
>  		!nfs_write_verifier_cmp(&req->wb_verf, &verf-
> >verifier);
>  }
>  
> +static inline bool is_nfsiod(void)
> +{
> +	struct workqueue_struct *current_wq = current_workqueue();
> +
> +	if (current_wq)
> +		return current_wq == nfsiod_workqueue;
> +	return false;
> +}
> +
>  static inline gfp_t nfs_io_gfp_mask(void)
>  {
> -	if (current->flags & PF_WQ_WORKER)
> +	if (is_nfsiod())
>  		return GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN;
>  	return GFP_KERNEL;
>  }

Instead of trying to identify the nfsiod_workqueue, why not apply
current_gfp_context() in order to weed out callers that set
PF_MEMALLOC_NOIO and PF_MEMALLOC_NOFS?

i.e.

static inline gfp_t nfs_io_gfp_mask(void)
{
	gfp_t ret = current_gfp_context(GFP_KERNEL);

	if ((current->flags & PF_WQ_WORKER) && ret == GFP_KERNEL)
		ret |= __GFP_NORETRY | __GFP_NOWARN;
	return ret;
}

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@xxxxxxxxxx, trond.myklebust@xxxxxxxxxxxxxxx