Re: [PATCH bpf-next v3 4/4] selftests/bpf: add icmp_send_unreach kfunc tests

Martin KaFai Lau <martin.lau@xxxxxxxxx> · Tue, 29 Jul 2025 17:32:07 -0700

On 7/29/25 5:01 PM, Martin KaFai Lau wrote:
On 7/29/25 4:27 PM, Martin KaFai Lau wrote:
On 7/29/25 2:09 AM, Mahe Tardy wrote:
On Mon, Jul 28, 2025 at 06:18:11PM -0700, Martin KaFai Lau wrote:
On 7/28/25 2:43 AM, Mahe Tardy wrote:
+SEC("cgroup_skb/egress")
+int egress(struct __sk_buff *skb)
+{
+    void *data = (void *)(long)skb->data;
+    void *data_end = (void *)(long)skb->data_end;
+    struct iphdr *iph;
+    struct tcphdr *tcph;
+
+    iph = data;
+    if ((void *)(iph + 1) > data_end || iph->version != 4 ||
+        iph->protocol != IPPROTO_TCP || iph->daddr != bpf_htonl(SERVER_IP))
+        return SK_PASS;
+
+    tcph = (void *)iph + iph->ihl * 4;
+    if ((void *)(tcph + 1) > data_end ||
+        tcph->dest != bpf_htons(SERVER_PORT))
+        return SK_PASS;
+
+    kfunc_ret = bpf_icmp_send_unreach(skb, unreach_code);
+
+    /* returns SK_PASS to execute the test case quicker */

Do you know why the user space is slower if 0 (SK_DROP) is used?

I tried to write my understanding of this in the commit description:

"Note that the BPF program returns SK_PASS to let the connection being
established to finish the test cases quicker. Otherwise, you have to
wait for the TCP three-way handshake to timeout in the kernel and
retrieve the errno translated from the unreach code set by the ICMP
control message."

This feels like a bit hacky to let the 3WHS finished while the objective of 
the patch set is to drop it. It is not unusual for people to directly borrow 
this code. Does non blocking connect() help?

After reading more on how sk_err_soft is used, non blocking won't help. I think 
I see why tcp rst is better.

Actually, while replying on the cover letter and looking at tcp_v4_err again, 
there is an exception to do ip_icmp_error for TCP_SYN_SENT, so it may worth a 
try on non blocking connect and then poll the sk for err if you haven't tried 
that before.