On Mon, 2025-07-07 at 16:28 -0400, Laurence Oberman wrote: > On Mon, 2025-07-07 at 12:25 -0700, Trond Myklebust wrote: > > On Mon, 2025-07-07 at 14:46 -0400, Benjamin Coddington wrote: > > > The NFS client writeback paths change which flags are passed to > > > their > > > memory allocation calls based on whether the current task is > > > running > > > from > > > within a workqueue or not. More specifically, it appears that > > > during > > > writeback allocations with PF_WQ_WORKER set on current->flags > > > will > > > add > > > __GFP_NORETRY | __GFP_NOWARN. Presumably this is because nfsiod > > > can > > > simply fail quickly and later retry to write back that specific > > > page > > > should > > > the allocation fail. > > > > > > However, the check for PF_WQ_WORKER is too general because tasks > > > can > > > enter NFS > > > writeback paths from other workqueues. Specifically, the > > > loopback > > > driver > > > tends to perform writeback into backing files on NFS with > > > PF_WQ_WORKER set, > > > and additionally sets PF_MEMALLOC_NOIO. The combination of > > > PF_MEMALLOC_NOIO with __GFP_NORETRY can easily result in > > > allocation > > > failures and the loopback driver has no retry functionality. As > > > a > > > result, > > > after commit 0bae835b63c5 ("NFS: Avoid writeback threads getting > > > stuck in > > > mempool_alloc()") users are seeing corrupted loop-mounted > > > filesystems > > > backed > > > by image files on NFS. > > > > > > In a preceding patch, we introduced a function to allow NFS to > > > detect > > > if > > > the task is executing within a specific workqueue. Here we use > > > that > > > helper > > > to set __GFP_NORETRY | __GFP_NOWARN only if the workqueue is > > > nfsiod. > > > > > > Fixes: 0bae835b63c5 ("NFS: Avoid writeback threads getting stuck > > > in > > > mempool_alloc()") > > > Signed-off-by: Benjamin Coddington <bcodding@xxxxxxxxxx> > > > --- > > > fs/nfs/internal.h | 12 +++++++++++- > > > 1 file changed, 11 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h > > > index 69c2c10ee658..173172afa3f5 100644 > > > --- a/fs/nfs/internal.h > > > +++ b/fs/nfs/internal.h > > > @@ -12,6 +12,7 @@ > > > #include <linux/nfs_page.h> > > > #include <linux/nfslocalio.h> > > > #include <linux/wait_bit.h> > > > +#include <linux/workqueue.h> > > > > > > #define NFS_SB_MASK > > > (SB_NOSUID|SB_NODEV|SB_NOEXEC|SB_SYNCHRONOUS) > > > > > > @@ -669,9 +670,18 @@ nfs_write_match_verf(const struct > > > nfs_writeverf > > > *verf, > > > !nfs_write_verifier_cmp(&req->wb_verf, &verf- > > > > verifier); > > > } > > > > > > +static inline bool is_nfsiod(void) > > > +{ > > > + struct workqueue_struct *current_wq = > > > current_workqueue(); > > > + > > > + if (current_wq) > > > + return current_wq == nfsiod_workqueue; > > > + return false; > > > +} > > > + > > > static inline gfp_t nfs_io_gfp_mask(void) > > > { > > > - if (current->flags & PF_WQ_WORKER) > > > + if (is_nfsiod()) > > > return GFP_KERNEL | __GFP_NORETRY | > > > __GFP_NOWARN; > > > return GFP_KERNEL; > > > } > > > > > > Instead of trying to identify the nfsiod_workqueue, why not apply > > current_gfp_context() in order to weed out callers that set > > PF_MEMALLOC_NOIO and PF_MEMALLOC_NOFS? > > > > i.e. > > > > > > static inline gfp_t nfs_io_gfp_mask(void) > > { > > gfp_t ret = current_gfp_context(GFP_KERNEL); > > > > if ((current->flags & PF_WQ_WORKER) && ret == GFP_KERNEL) > > ret |= __GFP_NORETRY | __GFP_NOWARN; > > return ret; > > } > > > > > > I am testing both patch options to see if both prevent the failed > write > with no other impact and will report back. > > The test is confined to the use case of an XFS file system served by > an > image that is located on NFS. as that is where the failed writes were > seen. > > > Both Ben's patch and Trond's fix the failing write issue so I guess we need to decide what the final fix will be. For both solutions Tested-by: Laurence Oberman <loberman@xxxxxxxxxx>