On Fri, Aug 22, 2025 at 3:26 PM Nikolay Aleksandrov <razor@xxxxxxxxxxxxx> wrote: > > On 8/22/25 09:42, Yafang Shao wrote: > > During recent testing with the netem qdisc to inject delays into TCP > > traffic, we observed that our CLS BPF program failed to function correctly > > due to incorrect classid retrieval from task_get_classid(). The issue > > manifests in the following call stack: > > > > bpf_get_cgroup_classid+5 > > cls_bpf_classify+507 > > __tcf_classify+90 > > tcf_classify+217 > > __dev_queue_xmit+798 > > bond_dev_queue_xmit+43 > > __bond_start_xmit+211 > > bond_start_xmit+70 > > dev_hard_start_xmit+142 > > sch_direct_xmit+161 > > __qdisc_run+102 <<<<< Issue location > > __dev_xmit_skb+1015 > > __dev_queue_xmit+637 > > neigh_hh_output+159 > > ip_finish_output2+461 > > __ip_finish_output+183 > > ip_finish_output+41 > > ip_output+120 > > ip_local_out+94 > > __ip_queue_xmit+394 > > ip_queue_xmit+21 > > __tcp_transmit_skb+2169 > > tcp_write_xmit+959 > > __tcp_push_pending_frames+55 > > tcp_push+264 > > tcp_sendmsg_locked+661 > > tcp_sendmsg+45 > > inet_sendmsg+67 > > sock_sendmsg+98 > > sock_write_iter+147 > > vfs_write+786 > > ksys_write+181 > > __x64_sys_write+25 > > do_syscall_64+56 > > entry_SYSCALL_64_after_hwframe+100 > > > > The problem occurs when multiple tasks share a single qdisc. In such cases, > > __qdisc_run() may transmit skbs created by different tasks. Consequently, > > task_get_classid() retrieves an incorrect classid since it references the > > current task's context rather than the skb's originating task. > > > > Given that dev_queue_xmit() always executes with bh disabled, we can safely > > use in_softirq() instead of in_serving_softirq() to properly identify the > > softirq context and obtain the correct classid. > > > > nit: you are no longer using in_softirq() in v2, you should update the > commit message as well. Oh, my bad. I will update it. -- Regards Yafang