On Mon, 2025-07-07 at 14:46 -0400, Benjamin Coddington wrote: > The NFS client writeback paths change which flags are passed to their > memory allocation calls based on whether the current task is running > from > within a workqueue or not. More specifically, it appears that during > writeback allocations with PF_WQ_WORKER set on current->flags will > add > __GFP_NORETRY | __GFP_NOWARN. Presumably this is because nfsiod can > simply fail quickly and later retry to write back that specific page > should > the allocation fail. > > However, the check for PF_WQ_WORKER is too general because tasks can > enter NFS > writeback paths from other workqueues. Specifically, the loopback > driver > tends to perform writeback into backing files on NFS with > PF_WQ_WORKER set, > and additionally sets PF_MEMALLOC_NOIO. The combination of > PF_MEMALLOC_NOIO with __GFP_NORETRY can easily result in allocation > failures and the loopback driver has no retry functionality. As a > result, > after commit 0bae835b63c5 ("NFS: Avoid writeback threads getting > stuck in > mempool_alloc()") users are seeing corrupted loop-mounted filesystems > backed > by image files on NFS. > > In a preceding patch, we introduced a function to allow NFS to detect > if > the task is executing within a specific workqueue. Here we use that > helper > to set __GFP_NORETRY | __GFP_NOWARN only if the workqueue is nfsiod. > > Fixes: 0bae835b63c5 ("NFS: Avoid writeback threads getting stuck in > mempool_alloc()") > Signed-off-by: Benjamin Coddington <bcodding@xxxxxxxxxx> > --- > fs/nfs/internal.h | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h > index 69c2c10ee658..173172afa3f5 100644 > --- a/fs/nfs/internal.h > +++ b/fs/nfs/internal.h > @@ -12,6 +12,7 @@ > #include <linux/nfs_page.h> > #include <linux/nfslocalio.h> > #include <linux/wait_bit.h> > +#include <linux/workqueue.h> > > #define NFS_SB_MASK (SB_NOSUID|SB_NODEV|SB_NOEXEC|SB_SYNCHRONOUS) > > @@ -669,9 +670,18 @@ nfs_write_match_verf(const struct nfs_writeverf > *verf, > !nfs_write_verifier_cmp(&req->wb_verf, &verf- > >verifier); > } > > +static inline bool is_nfsiod(void) > +{ > + struct workqueue_struct *current_wq = current_workqueue(); > + > + if (current_wq) > + return current_wq == nfsiod_workqueue; > + return false; > +} > + > static inline gfp_t nfs_io_gfp_mask(void) > { > - if (current->flags & PF_WQ_WORKER) > + if (is_nfsiod()) > return GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN; > return GFP_KERNEL; > } Instead of trying to identify the nfsiod_workqueue, why not apply current_gfp_context() in order to weed out callers that set PF_MEMALLOC_NOIO and PF_MEMALLOC_NOFS? i.e. static inline gfp_t nfs_io_gfp_mask(void) { gfp_t ret = current_gfp_context(GFP_KERNEL); if ((current->flags & PF_WQ_WORKER) && ret == GFP_KERNEL) ret |= __GFP_NORETRY | __GFP_NOWARN; return ret; } -- Trond Myklebust Linux NFS client maintainer, Hammerspace trondmy@xxxxxxxxxx, trond.myklebust@xxxxxxxxxxxxxxx