On Mon, 2025-05-12 at 13:59 -0500, Jeff Layton wrote: > On Sun, 2025-05-11 at 13:48 +0000, Trond Myklebust wrote: > > Hi Jeff and Omar, > > > > On Sun, 2025-05-11 at 09:28 -0400, trondmy@xxxxxxxxxx wrote: > > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > > > > > If there are still layout segments in the layout plh_return_lsegs > > > list > > > after a layout return, we should be resetting the state to ensure > > > they > > > eventually get returned as well. > > > > > > Fixes: 68f744797edd ("pNFS: Do not free layout segments that are > > > marked for return") > > > Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > > --- > > > fs/nfs/pnfs.c | 9 +++++++++ > > > 1 file changed, 9 insertions(+) > > > > > > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c > > > index 10fdd065a61c..fc7c5fb10198 100644 > > > --- a/fs/nfs/pnfs.c > > > +++ b/fs/nfs/pnfs.c > > > @@ -745,6 +745,14 @@ pnfs_mark_matching_lsegs_invalid(struct > > > pnfs_layout_hdr *lo, > > > return remaining; > > > } > > > > > > +static void pnfs_reset_return_info(struct pnfs_layout_hdr *lo) > > > +{ > > > + struct pnfs_layout_segment *lseg; > > > + > > > + list_for_each_entry(lseg, &lo->plh_return_segs, pls_list) > > > + pnfs_set_plh_return_info(lo, lseg->pls_range.iomode, 0); > > > +} > > > + > > > static void > > > pnfs_free_returned_lsegs(struct pnfs_layout_hdr *lo, > > > struct list_head *free_me, > > > @@ -1292,6 +1300,7 @@ void pnfs_layoutreturn_free_lsegs(struct > > > pnfs_layout_hdr *lo, > > > pnfs_mark_matching_lsegs_invalid(lo, &freeme, range, seq); > > > pnfs_free_returned_lsegs(lo, &freeme, range, seq); > > > pnfs_set_layout_stateid(lo, stateid, NULL, true); > > > + pnfs_reset_return_info(lo); > > > } else > > > pnfs_mark_layout_stateid_invalid(lo, &freeme); > > > out_unlock: > > > > Could the above bug perhaps explain the issue with leaked layout > > segments that you were seeing? > > If the client doesn't set NFS_LAYOUT_RETURN_REQUESTED, and the > > server > > is unable to recall the layout due to the network getting shut > > down, > > then it seems to me that these layout segments just disappear down > > a > > black hole. > > > > IOW: the scenario is something like this: > > * The client holds a read and a read/write layout. > > * The server recalls the read layout. > > * The client closes the file while the recall is being processed, > > so > > that the read and read/write layout segments are both put on the > > plh_return_segs list. > > * The client returns the read layout, and clears the associated > > read > > layout segments. The read/write layout segments are still on the > > list, but without NFS_LAYOUT_RETURN_REQUESTED being set. > > > > Maybe? > > The problem I think we hit was that pnfs_put_layout_hdr() got called > and its refcount went to zero while there were still entries on > plh_return_segs. > > pnfs_put_layout_hdr() calls pnfs_layoutreturn_before_put_layout_hdr() > as its "last ditch" effort to clean out the plh_return_segs list. It > looks like your patch will ensure NFS_LAYOUT_RETURN_REQUESTED is set > on > all of those entries, but if that flag gets set during the > pnfs_layoutreturn_before_put_layout_hdr() call, then I think it may > be > too late and they'll just leak anyway. > > So I guess the question is: is every entry on plh_return_segs > guaranteed to get a first attempt at a LAYOUTRETURN before > pnfs_layoutreturn_before_put_layout_hdr() is called? > > If so, then yes that should fix it. If not, then I think it may not > (unless I'm misunderstanding this code). So, the point is that pnfs_layoutreturn_before_put_layout_hdr() will take a reference to the layout (in pnfs_prepare_layoutreturn()) before it kicks off the layoutreturn RPC call. Upon finishing the layoutreturn, the nfs4_layoutreturn_release() callback will then trigger another layoutreturn attempt when it calls pnfs_put_layout_hdr(), assuming that NFS_LAYOUT_RETURN_REQUESTED was set by pnfs_layoutreturn_free_lsegs()->pnfs_reset_return_info(). -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx