Hi. My colleagues and I have been investigating the issue addressed by this patch and have discovered some significant concerns that require broader discussion. ### Socket Leak Issue After testing this patch extensively, I've confirmed it introduces a socket leak when TCP connections don't complete proper termination (e.g., when FIN packets are dropped). The leak manifests as a continuous increase in TCP slab usage: I've documented this issue with a reproducer in Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=219972#c0 The key issue appears to stem from the interaction between the SMB client and TCP socket lifecycle management: 1. Removing `sk->sk_net_refcnt = 1` causes TCP timers to be terminated early in `tcp_close()` via the `inet_csk_clear_xmit_timers_sync()` call when `!sk->sk_net_refcnt` 2. This early timer termination prevents proper reference counting resolution when connections don't complete the 4-way TCP termination handshake 3. The resulting socket references are never fully released, leading to a leak #### Timeline of Related Changes 1. v4.2-rc1 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets") - Added `sk_net_refcnt` field to distinguish user sockets (=1) from kernel sockets (=0) - Kernel sockets don't hold netns references, which can lead to potential UAF issues 2. v6.9-rc2 151c9c724d05: ("tcp: properly terminate timers for kernel sockets") - Modified `tcp_close()` to check `sk->sk_net_refcnt` and explicitly terminate timers for kernel sockets (=0) - This prevents UAF when netns is destroyed before socket timers complete - **Key change**: If `!sk->sk_net_refcnt`, call `inet_csk_clear_xmit_timers_sync()` 3. v6.12-rc7 ef7134c7fc48: ("smb: client: Fix use-after-free of network namespace") - Fixed netns UAF in CIFS by manually setting `sk->sk_net_refcnt = 1` - Also called `maybe_get_net()` to maintain netns references - This effectively made kernel sockets behave like user sockets for reference counting 4. v6.13-rc4 e9f2517a3e18: ("smb: client: fix TCP timers deadlock after rmmod") - Problem commit: Removed `sk->sk_net_refcnt = 1` setting - Changed to using explicit `get_net()/put_net()` at CIFS layer - This change leads to socket leaks because timers are terminated early ### Lockdep Warning Analysis I've also investigated the lockdep warning mentioned in the patch. My analysis suggests it may be a false positive rather than an actual deadlock. The crash actually occurs in the lockdep subsystem itself (null pointer dereference in `check_wait_context()`), not in the CIFS or networking code directly. The procedure for the null pointer dereference is as follows: When lockdep is enabled, the lock class "slock-AF_INET-CIFS" is set when a socket connection is established. ``` cifs_do_mount cifs_mount mount_get_conns cifs_get_tcp_session ip_connect generic_ip_connect cifs_reclassify_socket4 sock_lock_init_class_and_name(sk, "slock-AF_INET-CIFS", lockdep_init_map lockdep_init_map_wait lockdep_init_map_type lockdep_init_map_type register_lock_class __set_bit(class - lock_classes, lock_classes_in_use); ``` When the module is unloaded, the lock class is cleaned up. ``` free_mod_mem lockdep_free_key_range __lockdep_free_key_range zap_class __clear_bit(class - lock_classes, lock_classes_in_use); ``` After the module is uninstalled and the network connection is restored, the timer is woken up. ``` run_timer_softirq run_timer_base __run_timers call_timer_fn tcp_write_timer bh_lock_sock spin_lock(&((__sk)->sk_lock.slock)) _raw_spin_lock lock_acquire __lock_acquire check_wait_context hlock_class if (!test_bit(class_idx, lock_classes_in_use)) { return NULL; hlock_class(next)->wait_type_inner; // Null pointer dereference ``` The problem lies within lockdep, as Kuniyuki says:
I tried the repro and confirmed it triggers null deref. It happens in LOCKDEP internal, so for me it looks like a problem in LOCKDEP rather than CIFS or TCP. I think LOCKDEP should hold a module reference and prevent related modules from being unloaded.
Regarding the deadlock issue, it is clear that the locks mentioned in the deadlock warning do not belong to the same lock instance. A deadlock should not occur. ### Discussion Points 1. API Design Question: Is this fundamentally an issue with how CIFS uses the socket API, or is it a networking layer issue that should handle socket cleanup differently? 2. Approach to Resolution: Would it be better to: - Revert to the original solution (setting `sk->sk_net_refcnt = 1`) from ef7134c7fc48? - Work with networking subsystem maintainers on a more comprehensive solution that handles socket cleanup properly? 3. CVE Process Question: Given that CVE-2024-54680 appears to "fix" a non-existent issue while introducing an actual vulnerability, what's the appropriate way to address this? What's the best path forward? Best regards, Wang Zhaolong
Adding fsdevel and networking in case any thoughts on this fix for network/namespaces refcount issue (that causes rmmod UAF). Any opinions on Enzo's proposed Fix? ---------- Forwarded message --------- From: Steve French <smfrench@xxxxxxxxx> Date: Tue, Dec 17, 2024 at 9:24 PM Subject: [PATCH][SMB3 client] fix TCP timers deadlock after rmmod To: CIFS <linux-cifs@xxxxxxxxxxxxxxx> Cc: Kuniyuki Iwashima <kuniyu@xxxxxxxxxx>, Enzo Matsumiya <ematsumiya@xxxxxxx> Enzo had an interesting patch, that seems to fix an important problem. Here was his repro scenario: tw:~ # mount.cifs -o credentials=/root/wincreds,echo_interval=10 //someserver/target1 /mnt/test tw:~ # ls /mnt/test abc dir1 dir3 target1_file.txt tsub tw:~ # iptables -A INPUT -s someserver -j DROP Trigger reconnect and wait for 3*echo_interval: tw:~ # cat /mnt/test/target1_file.txt cat: /mnt/test/target1_file.txt: Host is down Then umount and rmmod. Note that rmmod might take several iterations until it properly tears down everything, so make sure you see the "not loaded" message before proceeding: tw:~ # umount /mnt/*; rmmod cifs umount: /mnt/az: not mounted. umount: /mnt/dfs: not mounted. umount: /mnt/local: not mounted. umount: /mnt/scratch: not mounted. rmmod: ERROR: Module cifs is in use ... tw:~ # rmmod cifs rmmod: ERROR: Module cifs is not currently loaded Then kickoff the TCP internals: tw:~ # iptables -F Gets the lockdep warning (requires CONFIG_LOCKDEP=y) + a NULL deref later on. Any thoughts on his patch? See below (and attached) Commit ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.") fixed a netns UAF by manually enabled socket refcounting (sk->sk_net_refcnt=1 and sock_inuse_add(net, 1)). The reason the patch worked for that bug was because we now hold references to the netns (get_net_track() gets a ref internally) and they're properly released (internally, on __sk_destruct()), but only because sk->sk_net_refcnt was set. Problem: (this happens regardless of CONFIG_NET_NS_REFCNT_TRACKER and regardless if init_net or other) Setting sk->sk_net_refcnt=1 *manually* and *after* socket creation is not only out of cifs scope, but also technically wrong -- it's set conditionally based on user (=1) vs kernel (=0) sockets. And net/ implementations seem to base their user vs kernel space operations on it. e.g. upon TCP socket close, the TCP timers are not cleared because sk->sk_net_refcnt=1: (cf. commit 151c9c724d05 ("tcp: properly terminate timers for kernel sockets")) net/ipv4/tcp.c: void tcp_close(struct sock *sk, long timeout) { lock_sock(sk); __tcp_close(sk, timeout); release_sock(sk); if (!sk->sk_net_refcnt) inet_csk_clear_xmit_timers_sync(sk); sock_put(sk); } Which will throw a lockdep warning and then, as expected, deadlock on tcp_write_timer(). A way to reproduce this is by running the reproducer from ef7134c7fc48 and then 'rmmod cifs'. A few seconds later, the deadlock/lockdep warning shows up. Fix: We shouldn't mess with socket internals ourselves, so do not set sk_net_refcnt manually. Also change __sock_create() to sock_create_kern() for explicitness. As for non-init_net network namespaces, we deal with it the best way we can -- hold an extra netns reference for server->ssocket and drop it when it's released. This ensures that the netns still exists whenever we need to create/destroy server->ssocket, but is not directly tied to it. Fixes: ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.") -- Thanks, Steve