On Thu, Aug 28, 2025 at 3:55 PM Paolo Abeni <pabeni@xxxxxxxxxx> wrote: > > On 8/22/25 8:42 AM, Yafang Shao wrote: > > During recent testing with the netem qdisc to inject delays into TCP > > traffic, we observed that our CLS BPF program failed to function correctly > > due to incorrect classid retrieval from task_get_classid(). The issue > > manifests in the following call stack: > > > > bpf_get_cgroup_classid+5 > > cls_bpf_classify+507 > > __tcf_classify+90 > > tcf_classify+217 > > __dev_queue_xmit+798 > > bond_dev_queue_xmit+43 > > __bond_start_xmit+211 > > bond_start_xmit+70 > > dev_hard_start_xmit+142 > > sch_direct_xmit+161 > > __qdisc_run+102 <<<<< Issue location > > __dev_xmit_skb+1015 > > __dev_queue_xmit+637 > > neigh_hh_output+159 > > ip_finish_output2+461 > > __ip_finish_output+183 > > ip_finish_output+41 > > ip_output+120 > > ip_local_out+94 > > __ip_queue_xmit+394 > > ip_queue_xmit+21 > > __tcp_transmit_skb+2169 > > tcp_write_xmit+959 > > __tcp_push_pending_frames+55 > > tcp_push+264 > > tcp_sendmsg_locked+661 > > tcp_sendmsg+45 > > inet_sendmsg+67 > > sock_sendmsg+98 > > sock_write_iter+147 > > vfs_write+786 > > ksys_write+181 > > __x64_sys_write+25 > > do_syscall_64+56 > > entry_SYSCALL_64_after_hwframe+100 > > > > The problem occurs when multiple tasks share a single qdisc. In such cases, > > __qdisc_run() may transmit skbs created by different tasks. Consequently, > > task_get_classid() retrieves an incorrect classid since it references the > > current task's context rather than the skb's originating task. > > > > Given that dev_queue_xmit() always executes with bh disabled, we can safely > > use in_softirq() instead of in_serving_softirq() to properly identify the > > softirq context and obtain the correct classid. > > > > The simple steps to reproduce this issue: > > 1. Add network delay to the network interface: > > such as: tc qdisc add dev bond0 root netem delay 1.5ms > > 2. Create two distinct net_cls cgroups, each running a network-intensive task > > 3. Initiate parallel TCP streams from both tasks to external servers. > > > > Under this specific condition, the issue reliably occurs. The kernel > > eventually dequeues an SKB that originated from Task-A while executing in > > the context of Task-B. > > > > Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx> > > Cc: Daniel Borkmann <daniel@xxxxxxxxxxxxx> > > Cc: Thomas Graf <tgraf@xxxxxxx> > > Cc: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> > > > > --- > > > > v1->v2: use softirq_count() instead of in_softirq() > > --- > > include/net/cls_cgroup.h | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/include/net/cls_cgroup.h b/include/net/cls_cgroup.h > > index 7e78e7d6f015..668aeee9b3f6 100644 > > --- a/include/net/cls_cgroup.h > > +++ b/include/net/cls_cgroup.h > > @@ -63,7 +63,7 @@ static inline u32 task_get_classid(const struct sk_buff *skb) > > * calls by looking at the number of nested bh disable calls because > > * softirqs always disables bh. > > */ > > - if (in_serving_softirq()) { > > + if (softirq_count()) { > > struct sock *sk = skb_to_full_sk(skb); > > > > /* If there is an sock_cgroup_classid we'll use that. */ > > AFAICS the above changes the established behavior for a slightly > different scenario: right. > > <sock S is created by task A> > <class ID for task A is changed> > <skb is created by sock S xmit and classified> > > prior to this patch the skb will be classified with the 'new' task A > classid, now with the old/original one. > > I'm unsure if such behavior change is acceptable; The classid of a skb is only meaningful within its original network context, not from a random task. > I think at very least > it should be mentioned in the changelog and likely this change should > target net-next. Will add this to the commit log and tag it for net-next in the next version. -- Regards Yafang