On Wed, Apr 23, 2025 at 9:23 AM Max Kellermann <max.kellermann@xxxxxxxxx> wrote: > > Sometimes, when a file was read while it was being truncated by > another NFS client, the kernel could deadlock because folio_unlock() > was called twice, and the second call would XOR back the `PG_locked` > flag. > > Most of the time (depending on the timing of the truncation), nobody > notices the problem because folio_unlock() gets called three times, > which flips `PG_locked` back off: > > 1. vfs_read, nfs_read_folio, ... nfs_read_add_folio, > nfs_return_empty_folio > 2. vfs_read, nfs_read_folio, ... netfs_read_collection, > netfs_unlock_abandoned_read_pages > 3. vfs_read, ... nfs_do_read_folio, nfs_read_add_folio, > nfs_return_empty_folio > > The problem is that nfs_read_add_folio() is not supposed to unlock the > folio if fscache is enabled, and a nfs_netfs_folio_unlock() check is > missing in nfs_return_empty_folio(). > > Rarely this leads to a warning in netfs_read_collection(): > > ------------[ cut here ]------------ > R=0000031c: folio 10 is not locked > WARNING: CPU: 0 PID: 29 at fs/netfs/read_collect.c:133 netfs_read_collection+0x7c0/0xf00 > [...] > Workqueue: events_unbound netfs_read_collection_worker > RIP: 0010:netfs_read_collection+0x7c0/0xf00 > [...] > Call Trace: > <TASK> > netfs_read_collection_worker+0x67/0x80 > process_one_work+0x12e/0x2c0 > worker_thread+0x295/0x3a0 > > Most of the time, however, processes just get stuck forever in > folio_wait_bit_common(), waiting for `PG_locked` to disappear, which > never happens because nobody is really holding the folio lock. > > Fixes: 000dbe0bec05 ("NFS: Convert buffered read paths to use netfs when fscache is enabled") > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Max Kellermann <max.kellermann@xxxxxxxxx> > --- > fs/nfs/read.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/fs/nfs/read.c b/fs/nfs/read.c > index 81bd1b9aba17..3c1fa320b3f1 100644 > --- a/fs/nfs/read.c > +++ b/fs/nfs/read.c > @@ -56,7 +56,8 @@ static int nfs_return_empty_folio(struct folio *folio) > { > folio_zero_segment(folio, 0, folio_size(folio)); > folio_mark_uptodate(folio); > - folio_unlock(folio); > + if (nfs_netfs_folio_unlock(folio)) > + folio_unlock(folio); > return 0; > } > > -- > 2.47.2 > LGTM Thanks for tracking this down, Max. Reviewed-by: Dave Wysochanski <dwysocha@xxxxxxxxxx>