From: Wang Zhaolong <wangzhaolong1@xxxxxxxxxx> Date: Tue, 1 Apr 2025 21:54:47 +0800 > Hi. > > My colleagues and I have been investigating the issue addressed by this patch > and have discovered some significant concerns that require broader discussion. > > ### Socket Leak Issue > > After testing this patch extensively, I've confirmed it introduces a socket leak > when TCP connections don't complete proper termination (e.g., when FIN packets > are dropped). The leak manifests as a continuous increase in TCP slab usage: > > I've documented this issue with a reproducer in Bugzilla: > > https://bugzilla.kernel.org/show_bug.cgi?id=219972#c0 > > The key issue appears to stem from the interaction between the SMB client and TCP > socket lifecycle management: > > 1. Removing `sk->sk_net_refcnt = 1` causes TCP timers to be terminated early in > `tcp_close()` via the `inet_csk_clear_xmit_timers_sync()` call when > `!sk->sk_net_refcnt` > 2. This early timer termination prevents proper reference counting resolution > when connections don't complete the 4-way TCP termination handshake > 3. The resulting socket references are never fully released, leading to a leak > > #### Timeline of Related Changes > > 1. v4.2-rc1 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets") > - Added `sk_net_refcnt` field to distinguish user sockets (=1) from kernel sockets (=0) > - Kernel sockets don't hold netns references, which can lead to potential UAF issues > > 2. v6.9-rc2 151c9c724d05: ("tcp: properly terminate timers for kernel sockets") > - Modified `tcp_close()` to check `sk->sk_net_refcnt` and explicitly terminate timers for kernel sockets (=0) > - This prevents UAF when netns is destroyed before socket timers complete > - **Key change**: If `!sk->sk_net_refcnt`, call `inet_csk_clear_xmit_timers_sync()` > > 3. v6.12-rc7 ef7134c7fc48: ("smb: client: Fix use-after-free of network namespace") > - Fixed netns UAF in CIFS by manually setting `sk->sk_net_refcnt = 1` > - Also called `maybe_get_net()` to maintain netns references > - This effectively made kernel sockets behave like user sockets for reference counting > > 4. v6.13-rc4 e9f2517a3e18: ("smb: client: fix TCP timers deadlock after rmmod") > - Problem commit: Removed `sk->sk_net_refcnt = 1` setting > - Changed to using explicit `get_net()/put_net()` at CIFS layer > - This change leads to socket leaks because timers are terminated early > > ### Lockdep Warning Analysis > > I've also investigated the lockdep warning mentioned in the patch. My analysis > suggests it may be a false positive rather than an actual deadlock. The crash > actually occurs in the lockdep subsystem itself (null pointer dereference in > `check_wait_context()`), not in the CIFS or networking code directly. > > The procedure for the null pointer dereference is as follows: > > When lockdep is enabled, the lock class "slock-AF_INET-CIFS" is set when a socket > connection is established. > > ``` > cifs_do_mount > cifs_mount > mount_get_conns > cifs_get_tcp_session > ip_connect > generic_ip_connect > cifs_reclassify_socket4 > sock_lock_init_class_and_name(sk, "slock-AF_INET-CIFS", > lockdep_init_map > lockdep_init_map_wait > lockdep_init_map_type > lockdep_init_map_type > register_lock_class > __set_bit(class - lock_classes, lock_classes_in_use); > ``` > > When the module is unloaded, the lock class is cleaned up. > > ``` > free_mod_mem > lockdep_free_key_range > __lockdep_free_key_range > zap_class > __clear_bit(class - lock_classes, lock_classes_in_use); > ``` > > After the module is uninstalled and the network connection is restored, the > timer is woken up. > > ``` > run_timer_softirq > run_timer_base > __run_timers > call_timer_fn > tcp_write_timer > bh_lock_sock > spin_lock(&((__sk)->sk_lock.slock)) > _raw_spin_lock > lock_acquire > __lock_acquire > check_wait_context > hlock_class > if (!test_bit(class_idx, lock_classes_in_use)) { > return NULL; > hlock_class(next)->wait_type_inner; // Null pointer dereference > ``` > > The problem lies within lockdep, as Kuniyuki says: > > > I tried the repro and confirmed it triggers null deref. > > > > It happens in LOCKDEP internal, so for me it looks like a problem in > > LOCKDEP rather than CIFS or TCP. > > > > I think LOCKDEP should hold a module reference and prevent related > > modules from being unloaded. > > Regarding the deadlock issue, it is clear that the locks mentioned in the deadlock warning > do not belong to the same lock instance. A deadlock should not occur. > > ### Discussion Points > > 1. API Design Question: Is this fundamentally an issue with how CIFS uses the socket > API, or is it a networking layer issue that should handle socket cleanup differently? > > 2. Approach to Resolution: Would it be better to: > - Revert to the original solution (setting `sk->sk_net_refcnt = 1`) from ef7134c7fc48? > - Work with networking subsystem maintainers on a more comprehensive solution that > handles socket cleanup properly? > > 3. CVE Process Question: Given that CVE-2024-54680 appears to "fix" a non-existent issue > while introducing an actual vulnerability, what's the appropriate way to address this? I tested on 6.14 and e9f2517a3e18, but the issue still reproduces, so e9f2517a3e18 is a bogus fix, and we will need to ask the CNA team to update the description once the correct fix is merged. repro: ---8<--- #!/bin/bash CIFS_SERVER="10.0.0.137" CIFS_PATH="//${CIFS_SERVER}/Users/Administrator/Desktop/CIFS_TEST" DEV="enp0s3" CRED="/root/WindowsCredential.sh" MNT=$(mktemp -d XXXXXX) echo "/tmp/${MNT}" mkdir -p /tmp/${MNT} mount -t cifs ${CIFS_PATH} /tmp/${MNT} -o vers=3.0,sec=ntlmssp,credentials=${CRED},rsize=65536,wsize=65536,cache=none,echo_interval=1 echo "hello world" > /tmp/${MNT}/a.txt iptables -A INPUT -s ${CIFS_SERVER} -j DROP cat /tmp/${MNT}/a.txt & for i in $(seq 10); do umount /tmp/${MNT} rmmod cifs sleep 1 done rm -r /tmp/${MNT} iptables -D INPUT -s ${CIFS_SERVER} -j DROP ---8<--- splat: ---8<--- ------------[ cut here ]------------ DEBUG_LOCKS_WARN_ON(1) WARNING: CPU: 10 PID: 0 at kernel/locking/lockdep.c:234 hlock_class (kernel/locking/lockdep.c:234 kernel/locking/lockdep.c:223) Modules linked in: cifs_arc4 nls_ucs2_utils cifs_md4 [last unloaded: cifs] CPU: 10 UID: 0 PID: 0 Comm: swapper/10 Not tainted 6.14.0 #36 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:hlock_class (kernel/locking/lockdep.c:234 kernel/locking/lockdep.c:223) Code: ef 90 e8 c4 66 4c 00 85 c0 74 23 8b 05 ba dd bf 01 85 c0 75 19 90 48 c7 c6 4d 83 a1 82 48 c7 c7 bc 0d a0 82 e8 e2 2f f8 ff 90 <0f> 0b 90 90 90 31 c0 c3 cc cc cc cc 0f 1f 44 00 00 90 90 90 90 90 RSP: 0018:ffa0000000468a08 EFLAGS: 00010086 RAX: 0000000000000000 RBX: ff1100010091cc38 RCX: 0000000000000027 RDX: ff1100081f09ca48 RSI: 0000000000000001 RDI: ff1100081f09ca40 RBP: ff1100010091c200 R08: ff1100083fe6e228 R09: 00000000ffffbfff R10: ff1100081eca0000 R11: ff1100083fe10dc0 R12: ff1100010091cc88 R13: 0000000000000001 R14: 0000000000000000 R15: 00000000000424b1 FS: 0000000000000000(0000) GS:ff1100081f080000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f10a998d1b4 CR3: 0000000002c4a003 CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> ? __warn (kernel/panic.c:748) ? hlock_class (kernel/locking/lockdep.c:234 kernel/locking/lockdep.c:223) ? report_bug (lib/bug.c:201 lib/bug.c:219) ? handle_bug (arch/x86/kernel/traps.c:285) ? exc_invalid_op (arch/x86/kernel/traps.c:309 (discriminator 1)) ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621) ? hlock_class (kernel/locking/lockdep.c:234 kernel/locking/lockdep.c:223) ? hlock_class (kernel/locking/lockdep.c:234 kernel/locking/lockdep.c:223) __lock_acquire (kernel/locking/lockdep.c:4853 kernel/locking/lockdep.c:5178) ? lock_acquire (kernel/locking/lockdep.c:469 kernel/locking/lockdep.c:5853 kernel/locking/lockdep.c:5816) ? sk_filter_trim_cap (./include/linux/rcupdate.h:337 ./include/linux/rcupdate.h:849 net/core/filter.c:155) lock_acquire (kernel/locking/lockdep.c:469 kernel/locking/lockdep.c:5853 kernel/locking/lockdep.c:5816) ? tcp_v4_rcv (./include/linux/skbuff.h:1678 ./include/net/tcp.h:2547 net/ipv4/tcp_ipv4.c:2350) ? sk_filter_trim_cap (./include/linux/rcupdate.h:883 net/core/filter.c:166) ? __inet_lookup_established (./include/net/inet_hashtables.h:358 net/ipv4/inet_hashtables.c:515) _raw_spin_lock_nested (kernel/locking/spinlock.c:379) ? tcp_v4_rcv (./include/linux/skbuff.h:1678 ./include/net/tcp.h:2547 net/ipv4/tcp_ipv4.c:2350) tcp_v4_rcv (./include/linux/skbuff.h:1678 ./include/net/tcp.h:2547 net/ipv4/tcp_ipv4.c:2350) ? raw_local_deliver (net/ipv4/raw.c:206) ? lock_is_held_type (kernel/locking/lockdep.c:5592 kernel/locking/lockdep.c:5923) ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205 (discriminator 1)) ip_local_deliver_finish (./include/linux/rcupdate.h:878 net/ipv4/ip_input.c:234) ip_sublist_rcv_finish (net/ipv4/ip_input.c:576) ip_list_rcv_finish (net/ipv4/ip_input.c:628) ip_list_rcv (net/ipv4/ip_input.c:670) __netif_receive_skb_list_core (net/core/dev.c:5939 net/core/dev.c:5986) netif_receive_skb_list_internal (net/core/dev.c:6040 net/core/dev.c:6129) napi_complete_done (./include/linux/list.h:37 ./include/net/gro.h:519 ./include/net/gro.h:514 net/core/dev.c:6496) e1000_clean (drivers/net/ethernet/intel/e1000/e1000_main.c:3815) __napi_poll.constprop.0 (net/core/dev.c:7191) net_rx_action (net/core/dev.c:7262 net/core/dev.c:7382) ? handle_irq_event (kernel/irq/internals.h:236 kernel/irq/handle.c:213) ? _raw_spin_unlock_irqrestore (./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194) ? find_held_lock (kernel/locking/lockdep.c:5341) handle_softirqs (kernel/softirq.c:561) __irq_exit_rcu (kernel/softirq.c:596 kernel/softirq.c:435 kernel/softirq.c:662) irq_exit_rcu (kernel/softirq.c:680) common_interrupt (arch/x86/kernel/irq.c:280 (discriminator 14)) </IRQ> <TASK> asm_common_interrupt (./arch/x86/include/asm/idtentry.h:693) RIP: 0010:default_idle (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:92 arch/x86/kernel/process.c:744) Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d c3 2b 15 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 RSP: 0018:ffa00000000ffee8 EFLAGS: 00000202 RAX: 000000000000640b RBX: ff1100010091c200 RCX: 0000000000061aa4 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff812f30c5 RBP: 000000000000000a R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ? do_idle (kernel/sched/idle.c:186 kernel/sched/idle.c:325) default_idle_call (./include/linux/cpuidle.h:143 kernel/sched/idle.c:118) do_idle (kernel/sched/idle.c:186 kernel/sched/idle.c:325) cpu_startup_entry (kernel/sched/idle.c:422 (discriminator 1)) start_secondary (arch/x86/kernel/smpboot.c:315) common_startup_64 (arch/x86/kernel/head_64.S:421) </TASK> irq event stamp: 25636 hardirqs last enabled at (25636): [<ffffffff81297fe1>] __local_bh_enable_ip+0x71/0xd0 hardirqs last disabled at (25635): [<ffffffff8129800a>] __local_bh_enable_ip+0x9a/0xd0 softirqs last enabled at (25606): [<ffffffff81297b2e>] handle_softirqs+0x2ee/0x3b0 softirqs last disabled at (25613): [<ffffffff81297d51>] __irq_exit_rcu+0xa1/0xc0 ---[ end trace 0000000000000000 ]--- BUG: kernel NULL pointer dereference, address: 00000000000000c4 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 10 UID: 0 PID: 0 Comm: swapper/10 Tainted: G W 6.14.0 #36 Tainted: [W]=WARN Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:__lock_acquire (kernel/locking/lockdep.c:4852 kernel/locking/lockdep.c:5178) Code: 15 41 09 c7 41 8b 44 24 20 25 ff 1f 00 00 41 09 c7 8b 84 24 a0 00 00 00 45 89 7c 24 20 41 89 44 24 24 e8 e1 bc ff ff 4c 89 e7 <44> 0f b6 b8 c4 00 00 00 e8 d1 bc ff ff 0f b6 80 c5 00 00 00 88 44 RSP: 0018:ffa0000000468a10 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ff1100010091cc38 RCX: 0000000000000027 RDX: ff1100081f09ca48 RSI: 0000000000000001 RDI: ff1100010091cc88 RBP: ff1100010091c200 R08: ff1100083fe6e228 R09: 00000000ffffbfff R10: ff1100081eca0000 R11: ff1100083fe10dc0 R12: ff1100010091cc88 R13: 0000000000000001 R14: 0000000000000000 R15: 00000000000424b1 FS: 0000000000000000(0000) GS:ff1100081f080000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000c4 CR3: 0000000002c4a003 CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> lock_acquire (kernel/locking/lockdep.c:469 kernel/locking/lockdep.c:5853 kernel/locking/lockdep.c:5816) _raw_spin_lock_nested (kernel/locking/spinlock.c:379) tcp_v4_rcv (./include/linux/skbuff.h:1678 ./include/net/tcp.h:2547 net/ipv4/tcp_ipv4.c:2350) ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205 (discriminator 1)) ip_local_deliver_finish (./include/linux/rcupdate.h:878 net/ipv4/ip_input.c:234) ip_sublist_rcv_finish (net/ipv4/ip_input.c:576) ip_list_rcv_finish (net/ipv4/ip_input.c:628) ip_list_rcv (net/ipv4/ip_input.c:670) __netif_receive_skb_list_core (net/core/dev.c:5939 net/core/dev.c:5986) netif_receive_skb_list_internal (net/core/dev.c:6040 net/core/dev.c:6129) napi_complete_done (./include/linux/list.h:37 ./include/net/gro.h:519 ./include/net/gro.h:514 net/core/dev.c:6496) e1000_clean (drivers/net/ethernet/intel/e1000/e1000_main.c:3815) __napi_poll.constprop.0 (net/core/dev.c:7191) net_rx_action (net/core/dev.c:7262 net/core/dev.c:7382) handle_softirqs (kernel/softirq.c:561) __irq_exit_rcu (kernel/softirq.c:596 kernel/softirq.c:435 kernel/softirq.c:662) irq_exit_rcu (kernel/softirq.c:680) common_interrupt (arch/x86/kernel/irq.c:280 (discriminator 14)) </IRQ> <TASK> asm_common_interrupt (./arch/x86/include/asm/idtentry.h:693) RIP: 0010:default_idle (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:92 arch/x86/kernel/process.c:744) Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d c3 2b 15 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 RSP: 0018:ffa00000000ffee8 EFLAGS: 00000202 RAX: 000000000000640b RBX: ff1100010091c200 RCX: 0000000000061aa4 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff812f30c5 RBP: 000000000000000a R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ? do_idle (kernel/sched/idle.c:186 kernel/sched/idle.c:325) default_idle_call (./include/linux/cpuidle.h:143 kernel/sched/idle.c:118) do_idle (kernel/sched/idle.c:186 kernel/sched/idle.c:325) cpu_startup_entry (kernel/sched/idle.c:422 (discriminator 1)) start_secondary (arch/x86/kernel/smpboot.c:315) common_startup_64 (arch/x86/kernel/head_64.S:421) </TASK> Modules linked in: cifs_arc4 nls_ucs2_utils cifs_md4 [last unloaded: cifs] CR2: 00000000000000c4 ---[ end trace 0000000000000000 ]--- RIP: 0010:__lock_acquire+0x222/0x16f0 Code: 15 41 09 c7 41 8b 44 24 20 25 ff 1f 00 00 41 09 c7 8b 84 24 a0 00 00 00 45 89 7c 24 20 41 89 44 24 24 e8 e1 bc ff ff 4c 89 e7 <44> 0f b6 b8 c4 00 00 00 e8 d1 bc ff ff 0f b6 80 c5 00 00 00 88 44 RSP: 0018:ffa0000000468a10 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ff1100010091cc38 RCX: 0000000000000027 RDX: ff1100081f09ca48 RSI: 0000000000000001 RDI: ff1100010091cc88 RBP: ff1100010091c200 R08: ff1100083fe6e228 R09: 00000000ffffbfff R10: ff1100081eca0000 R11: ff1100083fe10dc0 R12: ff1100010091cc88 R13: 0000000000000001 R14: 0000000000000000 R15: 00000000000424b1 FS: 0000000000000000(0000) GS:ff1100081f080000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000c4 CR3: 0000000002c4a003 CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: disabled ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- ---8<---