On 8/22/25 8:42 AM, Yafang Shao wrote: > During recent testing with the netem qdisc to inject delays into TCP > traffic, we observed that our CLS BPF program failed to function correctly > due to incorrect classid retrieval from task_get_classid(). The issue > manifests in the following call stack: > > bpf_get_cgroup_classid+5 > cls_bpf_classify+507 > __tcf_classify+90 > tcf_classify+217 > __dev_queue_xmit+798 > bond_dev_queue_xmit+43 > __bond_start_xmit+211 > bond_start_xmit+70 > dev_hard_start_xmit+142 > sch_direct_xmit+161 > __qdisc_run+102 <<<<< Issue location > __dev_xmit_skb+1015 > __dev_queue_xmit+637 > neigh_hh_output+159 > ip_finish_output2+461 > __ip_finish_output+183 > ip_finish_output+41 > ip_output+120 > ip_local_out+94 > __ip_queue_xmit+394 > ip_queue_xmit+21 > __tcp_transmit_skb+2169 > tcp_write_xmit+959 > __tcp_push_pending_frames+55 > tcp_push+264 > tcp_sendmsg_locked+661 > tcp_sendmsg+45 > inet_sendmsg+67 > sock_sendmsg+98 > sock_write_iter+147 > vfs_write+786 > ksys_write+181 > __x64_sys_write+25 > do_syscall_64+56 > entry_SYSCALL_64_after_hwframe+100 > > The problem occurs when multiple tasks share a single qdisc. In such cases, > __qdisc_run() may transmit skbs created by different tasks. Consequently, > task_get_classid() retrieves an incorrect classid since it references the > current task's context rather than the skb's originating task. > > Given that dev_queue_xmit() always executes with bh disabled, we can safely > use in_softirq() instead of in_serving_softirq() to properly identify the > softirq context and obtain the correct classid. > > The simple steps to reproduce this issue: > 1. Add network delay to the network interface: > such as: tc qdisc add dev bond0 root netem delay 1.5ms > 2. Create two distinct net_cls cgroups, each running a network-intensive task > 3. Initiate parallel TCP streams from both tasks to external servers. > > Under this specific condition, the issue reliably occurs. The kernel > eventually dequeues an SKB that originated from Task-A while executing in > the context of Task-B. > > Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx> > Cc: Daniel Borkmann <daniel@xxxxxxxxxxxxx> > Cc: Thomas Graf <tgraf@xxxxxxx> > Cc: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> > > --- > > v1->v2: use softirq_count() instead of in_softirq() > --- > include/net/cls_cgroup.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/net/cls_cgroup.h b/include/net/cls_cgroup.h > index 7e78e7d6f015..668aeee9b3f6 100644 > --- a/include/net/cls_cgroup.h > +++ b/include/net/cls_cgroup.h > @@ -63,7 +63,7 @@ static inline u32 task_get_classid(const struct sk_buff *skb) > * calls by looking at the number of nested bh disable calls because > * softirqs always disables bh. > */ > - if (in_serving_softirq()) { > + if (softirq_count()) { > struct sock *sk = skb_to_full_sk(skb); > > /* If there is an sock_cgroup_classid we'll use that. */ AFAICS the above changes the established behavior for a slightly different scenario: <sock S is created by task A> <class ID for task A is changed> <skb is created by sock S xmit and classified> prior to this patch the skb will be classified with the 'new' task A classid, now with the old/original one. I'm unsure if such behavior change is acceptable; I think at very least it should be mentioned in the changelog and likely this change should target net-next. Thanks, Paolo