在 2025/4/17 20:43, Jeff Layton 写道:
On Thu, 2025-04-17 at 20:24 +0800, Li Lingfeng wrote:
在 2025/4/17 18:29, Jeff Layton 写道:
On Thu, 2025-04-17 at 15:25 +0800, Li Lingfeng wrote:
When memory is insufficient, the allocation of nfs_lock_context in
nfs_get_lock_context() fails and returns -ENOMEM. If we mistakenly treat
an nfs4_unlockdata structure (whose l_ctx member has been set to -ENOMEM)
as valid and proceed to execute rpc_run_task(), this will trigger a NULL
pointer dereference in nfs4_locku_prepare. For example:
BUG: kernel NULL pointer dereference, address: 000000000000000c
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP PTI
CPU: 15 UID: 0 PID: 12 Comm: kworker/u64:0 Not tainted 6.15.0-rc2-dirty #60
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40
Workqueue: rpciod rpc_async_schedule
RIP: 0010:nfs4_locku_prepare+0x35/0xc2
Code: 89 f2 48 89 fd 48 c7 c7 68 69 ef b5 53 48 8b 8e 90 00 00 00 48 89 f3
RSP: 0018:ffffbbafc006bdb8 EFLAGS: 00010246
RAX: 000000000000004b RBX: ffff9b964fc1fa00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: fffffffffffffff4 RDI: ffff9ba53fddbf40
RBP: ffff9ba539934000 R08: 0000000000000000 R09: ffffbbafc006bc38
R10: ffffffffb6b689c8 R11: 0000000000000003 R12: ffff9ba539934030
R13: 0000000000000001 R14: 0000000004248060 R15: ffffffffb56d1c30
FS: 0000000000000000(0000) GS:ffff9ba5881f0000(0000) knlGS:00000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000000c CR3: 000000093f244000 CR4: 00000000000006f0
Call Trace:
<TASK>
__rpc_execute+0xbc/0x480
rpc_async_schedule+0x2f/0x40
process_one_work+0x232/0x5d0
worker_thread+0x1da/0x3d0
? __pfx_worker_thread+0x10/0x10
kthread+0x10d/0x240
? __pfx_kthread+0x10/0x10
ret_from_fork+0x34/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Modules linked in:
CR2: 000000000000000c
---[ end trace 0000000000000000 ]---
Free the allocated nfs4_unlockdata when nfs_get_lock_context() fails and
return NULL to terminate subsequent rpc_run_task, preventing NULL pointer
dereference.
Fixes: f30cb757f680 ("NFS: Always wait for I/O completion before unlock")
Signed-off-by: Li Lingfeng <lilingfeng3@xxxxxxxxxx>
---
fs/nfs/nfs4proc.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 970f28dbf253..9f5689c43a50 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7074,10 +7074,18 @@ static struct nfs4_unlockdata *nfs4_alloc_unlockdata(struct file_lock *fl,
struct nfs4_unlockdata *p;
struct nfs4_state *state = lsp->ls_state;
struct inode *inode = state->inode;
+ struct nfs_lock_context *l_ctx;
p = kzalloc(sizeof(*p), GFP_KERNEL);
if (p == NULL)
return NULL;
+ l_ctx = nfs_get_lock_context(ctx);
+ if (!IS_ERR(l_ctx)) {
+ p->l_ctx = l_ctx;
+ } else {
+ kfree(p);
+ return NULL;
+ }
p->arg.fh = NFS_FH(inode);
p->arg.fl = &p->fl;
p->arg.seqid = seqid;
@@ -7085,7 +7093,6 @@ static struct nfs4_unlockdata *nfs4_alloc_unlockdata(struct file_lock *fl,
p->lsp = lsp;
/* Ensure we don't close file until we're done freeing locks! */
p->ctx = get_nfs_open_context(ctx);
Not exactly the same problem, but get_nfs_open_context() can fail too.
Does it need error handling for that as well?
Hi,
IIUC, nfs_open_context is allocated during file open and attached to
filp->private_data. Upon successful file opening, the context remains valid.
Post-lock acquisition, nfs_open_context can be retrieved via
file_lock->file->nfs_open_context chain. Thus get_nfs_open_context() here
should have non-failure guarantee in standard code paths.
I'm not so sure. This function can get called from the rpc_release
callback for a LOCK request:
->rpc_release
nfs4_lock_release
nfs4_do_unlck
nfs4_alloc_unlockdata
Can that happen after the open_ctx->lock_context.count goes to 0?
Given that we have a safe failure path in this code, it seems like we
ought to check for that here, just to be safe. If it really shouldn't
happen like you say, then we could throw in a WARN_ON_ONCE() too.
Thank you for raising this concern.
During file open, the nfs_open_context is allocated, and
open_ctx->lock_context.count is initialized to 1. Based on the current
flow, I think it's unlikely for this counter to reach 0 during lock/unlock
operations since its decrement is tied to file closure.
However, I agree with your suggestion to add checks when
get_nfs_open_context fails. Furthermore, this check might also be
necessary not only in the unlock path but potentially in the lock path if
get_nfs_open_contextb fails there as well.
Additionally, I noticed that both the lock and unlock release callbacks
dereference nfs_open_context. If get_nfs_open_context were to fail
(assuming such a scenario is possible), this could lead to a NULL pointer
dereference. Instead of relying solely on WARN_ON_ONCE(), it might be
safer to halt the operation immediately upon detecting a failure in
get_nfs_open_context.
// unlock
nfs4_locku_release_calldata
put_nfs_open_context
__put_nfs_open_context
// dereference nfs_open_context
// lock
nfs4_lock_release
nfs4_do_unlck
// dereference nfs_open_context
put_nfs_open_context
// dereference nfs_open_context
I'll incorporate your feedback and send a patchset soon.
- p->l_ctx = nfs_get_lock_context(ctx);
locks_init_lock(&p->fl);
locks_copy_lock(&p->fl, fl);
p->server = NFS_SERVER(inode);
Good catch:
Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>