On 2025-05-18 11:48:31, Cong Wang wrote: > On Thu, May 08, 2025 at 02:24:22PM +0800, Jiayuan Chen wrote: > > Triggering WARN_ON_ONCE(sk->sk_forward_alloc) by running the following > > command, followed by pressing Ctrl-C after 2 seconds: > > ./bench sockmap -c 2 -p 1 -a --rx-verdict-ingress > > ''' > > ------------[ cut here ]------------ > > WARNING: CPU: 2 PID: 40 at net/ipv4/af_inet.c inet_sock_destruct > > > > Call Trace: > > <TASK> > > __sk_destruct+0x46/0x222 > > sk_psock_destroy+0x22f/0x242 > > process_one_work+0x504/0x8a8 > > ? process_one_work+0x39d/0x8a8 > > ? __pfx_process_one_work+0x10/0x10 > > ? worker_thread+0x44/0x2ae > > ? __list_add_valid_or_report+0x83/0xea > > ? srso_return_thunk+0x5/0x5f > > ? __list_add+0x45/0x52 > > process_scheduled_works+0x73/0x82 > > worker_thread+0x1ce/0x2ae > > ''' > > > > Reason: > > When we are in the backlog process, we allocate sk_msg and then perform > > the charge process. Meanwhile, in the user process context, the recvmsg() > > operation performs the uncharge process, leading to concurrency issues > > between them. > > > > The charge process (2 functions): > > 1. sk_rmem_schedule(size) -> sk_forward_alloc increases by PAGE_SIZE > > multiples > > 2. sk_mem_charge(size) -> sk_forward_alloc -= size > > > > The uncharge process (sk_mem_uncharge()): > > 3. sk_forward_alloc += size > > 4. check if sk_forward_alloc > PAGE_SIZE > > 5. reclaim -> sk_forward_alloc decreases, possibly becoming 0 > > > > Because the sk performing charge and uncharge is not locked > > (mainly because the backlog process does not lock the socket), therefore, > > steps 1 to 5 will execute concurrently as follows: > > > > cpu0 cpu1 > > 1 > > 3 > > 4 --> sk_forward_alloc >= PAGE_SIZE > > 5 --> reclaim sk_forward_alloc > > 2 --> sk_forward_alloc may > > become negative > > > > Solution: > > 1. Add locking to the kfree_sk_msg() process, which is only called in the > > user process context. > > 2. Integrate the charge process into sk_psock_create_ingress_msg() in the > > backlog process and add locking. > > 3. Reuse the existing psock->ingress_lock. > > Reusing the psock->ingress_lock looks weird to me, as it is intended for > locking ingress queue, at least at the time it was introduced. > > And technically speaking, it is the sock lock which is supposed to serialize > socket charging. > > So is there any better solution here? Agree I would be more apt to add the sock_lock back to the backlog then to punish fast path this way. Holding the ref cnt on the psock stops blocks the sk_psock_destroy() in backlog now so is this still an issue? Thanks, John