Re: [PATCH v4 next-next] net/cls_cgroup: Fix task_get_classid() during qdisc run

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/2/25 8:29 AM, Yafang Shao wrote:
> During recent testing with the netem qdisc to inject delays into TCP
> traffic, we observed that our CLS BPF program failed to function correctly
> due to incorrect classid retrieval from task_get_classid(). The issue
> manifests in the following call stack:
> 
>         bpf_get_cgroup_classid+5
>         cls_bpf_classify+507
>         __tcf_classify+90
>         tcf_classify+217
>         __dev_queue_xmit+798
>         bond_dev_queue_xmit+43
>         __bond_start_xmit+211
>         bond_start_xmit+70
>         dev_hard_start_xmit+142
>         sch_direct_xmit+161
>         __qdisc_run+102             <<<<< Issue location
>         __dev_xmit_skb+1015
>         __dev_queue_xmit+637
>         neigh_hh_output+159
>         ip_finish_output2+461
>         __ip_finish_output+183
>         ip_finish_output+41
>         ip_output+120
>         ip_local_out+94
>         __ip_queue_xmit+394
>         ip_queue_xmit+21
>         __tcp_transmit_skb+2169
>         tcp_write_xmit+959
>         __tcp_push_pending_frames+55
>         tcp_push+264
>         tcp_sendmsg_locked+661
>         tcp_sendmsg+45
>         inet_sendmsg+67
>         sock_sendmsg+98
>         sock_write_iter+147
>         vfs_write+786
>         ksys_write+181
>         __x64_sys_write+25
>         do_syscall_64+56
>         entry_SYSCALL_64_after_hwframe+100
> 
> The problem occurs when multiple tasks share a single qdisc. In such cases,
> __qdisc_run() may transmit skbs created by different tasks. Consequently,
> task_get_classid() retrieves an incorrect classid since it references the
> current task's context rather than the skb's originating task.
> 
> Given that dev_queue_xmit() always executes with bh disabled, we can use
> softirq_count() instead to obtain the correct classid.
> 
> The simple steps to reproduce this issue:
> 1. Add network delay to the network interface:
>   such as: tc qdisc add dev bond0 root netem delay 1.5ms
> 2. Build two distinct net_cls cgroups, each with a network-intensive task
> 3. Initiate parallel TCP streams from both tasks to external servers.
> 
> Under this specific condition, the issue reliably occurs. The kernel
> eventually dequeues an SKB that originated from Task-A while executing in
> the context of Task-B.
> 
> It is worth noting that it will change the established behavior for a
> slightly different scenario:
> 
>   <sock S is created by task A>
>   <class ID for task A is changed>
>   <skb is created by sock S xmit and classified>
> 
> prior to this patch the skb will be classified with the 'new' task A
> classid, now with the old/original one. The bpf_get_cgroup_classid_curr()
> function is a more appropriate choice for this case.
> 
> Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx>
> Cc: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
> Cc: Thomas Graf <tgraf@xxxxxxx>
> Cc: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
> Cc: Nikolay Aleksandrov <razor@xxxxxxxxxxxxx>

@Daniel: I'm wondering if you have some specific use-case explicit
leveraging the affected helper. If so, could you please test this change
does not break it?

Thanks,

Paolo





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux