On Sat, 2025-04-19 at 16:28 +0800, Li Lingfeng wrote: > 在 2025/4/17 20:43, Jeff Layton 写道: > > On Thu, 2025-04-17 at 20:24 +0800, Li Lingfeng wrote: > > > 在 2025/4/17 18:29, Jeff Layton 写道: > > > > On Thu, 2025-04-17 at 15:25 +0800, Li Lingfeng wrote: > > > > > When memory is insufficient, the allocation of nfs_lock_context in > > > > > nfs_get_lock_context() fails and returns -ENOMEM. If we mistakenly treat > > > > > an nfs4_unlockdata structure (whose l_ctx member has been set to -ENOMEM) > > > > > as valid and proceed to execute rpc_run_task(), this will trigger a NULL > > > > > pointer dereference in nfs4_locku_prepare. For example: > > > > > > > > > > BUG: kernel NULL pointer dereference, address: 000000000000000c > > > > > PGD 0 P4D 0 > > > > > Oops: Oops: 0000 [#1] SMP PTI > > > > > CPU: 15 UID: 0 PID: 12 Comm: kworker/u64:0 Not tainted 6.15.0-rc2-dirty #60 > > > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 > > > > > Workqueue: rpciod rpc_async_schedule > > > > > RIP: 0010:nfs4_locku_prepare+0x35/0xc2 > > > > > Code: 89 f2 48 89 fd 48 c7 c7 68 69 ef b5 53 48 8b 8e 90 00 00 00 48 89 f3 > > > > > RSP: 0018:ffffbbafc006bdb8 EFLAGS: 00010246 > > > > > RAX: 000000000000004b RBX: ffff9b964fc1fa00 RCX: 0000000000000000 > > > > > RDX: 0000000000000000 RSI: fffffffffffffff4 RDI: ffff9ba53fddbf40 > > > > > RBP: ffff9ba539934000 R08: 0000000000000000 R09: ffffbbafc006bc38 > > > > > R10: ffffffffb6b689c8 R11: 0000000000000003 R12: ffff9ba539934030 > > > > > R13: 0000000000000001 R14: 0000000004248060 R15: ffffffffb56d1c30 > > > > > FS: 0000000000000000(0000) GS:ffff9ba5881f0000(0000) knlGS:00000000 > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > > CR2: 000000000000000c CR3: 000000093f244000 CR4: 00000000000006f0 > > > > > Call Trace: > > > > > <TASK> > > > > > __rpc_execute+0xbc/0x480 > > > > > rpc_async_schedule+0x2f/0x40 > > > > > process_one_work+0x232/0x5d0 > > > > > worker_thread+0x1da/0x3d0 > > > > > ? __pfx_worker_thread+0x10/0x10 > > > > > kthread+0x10d/0x240 > > > > > ? __pfx_kthread+0x10/0x10 > > > > > ret_from_fork+0x34/0x50 > > > > > ? __pfx_kthread+0x10/0x10 > > > > > ret_from_fork_asm+0x1a/0x30 > > > > > </TASK> > > > > > Modules linked in: > > > > > CR2: 000000000000000c > > > > > ---[ end trace 0000000000000000 ]--- > > > > > > > > > > Free the allocated nfs4_unlockdata when nfs_get_lock_context() fails and > > > > > return NULL to terminate subsequent rpc_run_task, preventing NULL pointer > > > > > dereference. > > > > > > > > > > Fixes: f30cb757f680 ("NFS: Always wait for I/O completion before unlock") > > > > > Signed-off-by: Li Lingfeng <lilingfeng3@xxxxxxxxxx> > > > > > --- > > > > > fs/nfs/nfs4proc.c | 9 ++++++++- > > > > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > > > > > index 970f28dbf253..9f5689c43a50 100644 > > > > > --- a/fs/nfs/nfs4proc.c > > > > > +++ b/fs/nfs/nfs4proc.c > > > > > @@ -7074,10 +7074,18 @@ static struct nfs4_unlockdata *nfs4_alloc_unlockdata(struct file_lock *fl, > > > > > struct nfs4_unlockdata *p; > > > > > struct nfs4_state *state = lsp->ls_state; > > > > > struct inode *inode = state->inode; > > > > > + struct nfs_lock_context *l_ctx; > > > > > > > > > > p = kzalloc(sizeof(*p), GFP_KERNEL); > > > > > if (p == NULL) > > > > > return NULL; > > > > > + l_ctx = nfs_get_lock_context(ctx); > > > > > + if (!IS_ERR(l_ctx)) { > > > > > + p->l_ctx = l_ctx; > > > > > + } else { > > > > > + kfree(p); > > > > > + return NULL; > > > > > + } > > > > > p->arg.fh = NFS_FH(inode); > > > > > p->arg.fl = &p->fl; > > > > > p->arg.seqid = seqid; > > > > > @@ -7085,7 +7093,6 @@ static struct nfs4_unlockdata *nfs4_alloc_unlockdata(struct file_lock *fl, > > > > > p->lsp = lsp; > > > > > /* Ensure we don't close file until we're done freeing locks! */ > > > > > p->ctx = get_nfs_open_context(ctx); > > > > Not exactly the same problem, but get_nfs_open_context() can fail too. > > > > Does it need error handling for that as well? > > > Hi, > > > > > > IIUC, nfs_open_context is allocated during file open and attached to > > > filp->private_data. Upon successful file opening, the context remains valid. > > > Post-lock acquisition, nfs_open_context can be retrieved via > > > file_lock->file->nfs_open_context chain. Thus get_nfs_open_context() here > > > should have non-failure guarantee in standard code paths. > > > > I'm not so sure. This function can get called from the rpc_release > > callback for a LOCK request: > > > > ->rpc_release > > nfs4_lock_release > > nfs4_do_unlck > > nfs4_alloc_unlockdata > > > > Can that happen after the open_ctx->lock_context.count goes to 0? > > > > Given that we have a safe failure path in this code, it seems like we > > ought to check for that here, just to be safe. If it really shouldn't > > happen like you say, then we could throw in a WARN_ON_ONCE() too. > Thank you for raising this concern. > During file open, the nfs_open_context is allocated, and > open_ctx->lock_context.count is initialized to 1. Based on the current > flow, I think it's unlikely for this counter to reach 0 during lock/unlock > operations since its decrement is tied to file closure. > > However, I agree with your suggestion to add checks when > get_nfs_open_context fails. Furthermore, this check might also be > necessary not only in the unlock path but potentially in the lock path if > get_nfs_open_contextb fails there as well. > > Additionally, I noticed that both the lock and unlock release callbacks > dereference nfs_open_context. If get_nfs_open_context were to fail > (assuming such a scenario is possible), this could lead to a NULL pointer > dereference. Instead of relying solely on WARN_ON_ONCE(), it might be > safer to halt the operation immediately upon detecting a failure in > get_nfs_open_context. > > // unlock > nfs4_locku_release_calldata > put_nfs_open_context > __put_nfs_open_context > // dereference nfs_open_context > > // lock > nfs4_lock_release > nfs4_do_unlck > // dereference nfs_open_context > put_nfs_open_context > // dereference nfs_open_context > > I'll incorporate your feedback and send a patchset soon. Thanks. I think that it pays to be safe here. If this scenario turns out to not be possible, then the alternative would be to add a comment that explains why. Just handling the error case is probably best though as this is not a particularly hot codepath, and that would help safeguard against future changes. > > > > > > > - p->l_ctx = nfs_get_lock_context(ctx); > > > > > locks_init_lock(&p->fl); > > > > > locks_copy_lock(&p->fl, fl); > > > > > p->server = NFS_SERVER(inode); > > > > Good catch: > > > > > > > > Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx> > > > > -- Jeff Layton <jlayton@xxxxxxxxxx>