Re: [PATCH] NFSv4/pnfs: Reset the layout state after a layoutreturn

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2025-05-13 at 00:35 +0000, Trond Myklebust wrote:
> On Mon, 2025-05-12 at 13:59 -0500, Jeff Layton wrote:
> > On Sun, 2025-05-11 at 13:48 +0000, Trond Myklebust wrote:
> > > Hi Jeff and Omar,
> > > 
> > > On Sun, 2025-05-11 at 09:28 -0400, trondmy@xxxxxxxxxx wrote:
> > > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > > > 
> > > > If there are still layout segments in the layout plh_return_lsegs
> > > > list
> > > > after a layout return, we should be resetting the state to ensure
> > > > they
> > > > eventually get returned as well.
> > > > 
> > > > Fixes: 68f744797edd ("pNFS: Do not free layout segments that are
> > > > marked for return")
> > > > Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > > > ---
> > > >  fs/nfs/pnfs.c | 9 +++++++++
> > > >  1 file changed, 9 insertions(+)
> > > > 
> > > > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> > > > index 10fdd065a61c..fc7c5fb10198 100644
> > > > --- a/fs/nfs/pnfs.c
> > > > +++ b/fs/nfs/pnfs.c
> > > > @@ -745,6 +745,14 @@ pnfs_mark_matching_lsegs_invalid(struct
> > > > pnfs_layout_hdr *lo,
> > > >   return remaining;
> > > >  }
> > > >  
> > > > +static void pnfs_reset_return_info(struct pnfs_layout_hdr *lo)
> > > > +{
> > > > + struct pnfs_layout_segment *lseg;
> > > > +
> > > > + list_for_each_entry(lseg, &lo->plh_return_segs, pls_list)
> > > > + pnfs_set_plh_return_info(lo, lseg->pls_range.iomode, 0);
> > > > +}
> > > > +
> > > >  static void
> > > >  pnfs_free_returned_lsegs(struct pnfs_layout_hdr *lo,
> > > >   struct list_head *free_me,
> > > > @@ -1292,6 +1300,7 @@ void pnfs_layoutreturn_free_lsegs(struct
> > > > pnfs_layout_hdr *lo,
> > > >   pnfs_mark_matching_lsegs_invalid(lo, &freeme, range, seq);
> > > >   pnfs_free_returned_lsegs(lo, &freeme, range, seq);
> > > >   pnfs_set_layout_stateid(lo, stateid, NULL, true);
> > > > + pnfs_reset_return_info(lo);
> > > >   } else
> > > >   pnfs_mark_layout_stateid_invalid(lo, &freeme);
> > > >  out_unlock:
> > > 
> > > Could the above bug perhaps explain the issue with leaked layout
> > > segments that you were seeing?
> > > If the client doesn't set NFS_LAYOUT_RETURN_REQUESTED, and the
> > > server
> > > is unable to recall the layout due to the network getting shut
> > > down,
> > > then it seems to me that these layout segments just disappear down
> > > a
> > > black hole.
> > > 
> > > IOW: the scenario is something like this:
> > >  * The client holds a read and a read/write layout.
> > >  * The server recalls the read layout.
> > >  * The client closes the file while the recall is being processed,
> > > so
> > >    that the read and read/write layout segments are both put on the
> > >    plh_return_segs list.
> > >  * The client returns the read layout, and clears the associated
> > > read
> > >    layout segments. The read/write layout segments are still on the
> > >    list, but without NFS_LAYOUT_RETURN_REQUESTED being set.
> > > 
> > 
> > Maybe?
> > 
> > The problem I think we hit was that pnfs_put_layout_hdr() got called
> > and its refcount went to zero while there were still entries on
> > plh_return_segs.
> > 
> > pnfs_put_layout_hdr() calls pnfs_layoutreturn_before_put_layout_hdr()
> > as its "last ditch" effort to clean out the plh_return_segs list. It
> > looks like your patch will ensure NFS_LAYOUT_RETURN_REQUESTED is set
> > on
> > all of those entries, but if that flag gets set during the
> > pnfs_layoutreturn_before_put_layout_hdr() call, then I think it may
> > be
> > too late and they'll just leak anyway.
> > 
> > So I guess the question is: is every entry on plh_return_segs
> > guaranteed to get a first attempt at a LAYOUTRETURN before
> > pnfs_layoutreturn_before_put_layout_hdr() is called?
> > 
> > If so, then yes that should fix it. If not, then I think it may not
> > (unless I'm misunderstanding this code).
> 
> So, the point is that pnfs_layoutreturn_before_put_layout_hdr() will
> take a reference to the layout (in pnfs_prepare_layoutreturn()) before
> it kicks off the layoutreturn RPC call. Upon finishing the
> layoutreturn, the nfs4_layoutreturn_release() callback will then
> trigger another layoutreturn attempt when it calls
> pnfs_put_layout_hdr(), assuming that NFS_LAYOUT_RETURN_REQUESTED was
> set by pnfs_layoutreturn_free_lsegs()->pnfs_reset_return_info().
> 

Oh lovely. I missed that detail. So the thing that puts a reference to
the layout_hdr will _itself_ take and put an extra reference. I guess
the extra put happens in another context though so you don't need to
worry about recursion.

So yeah, that probably will fix the problem Omar found. I'll see about
spinning up a kernel internally that we can test.

Thanks!
-- 
Jeff Layton <jlayton@xxxxxxxxxx>





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux