On Fri, Aug 15, 2025 at 01:17:39PM -0700, Martin KaFai Lau wrote: > On 8/15/25 8:44 AM, Stanislav Fomichev wrote: > > On 08/15, Matt Bobrowski wrote: > > > Based on a bisect, it appears that commit 7ee988770326 ("timers: > > > Implement the hierarchical pull model") has somehow inadvertently > > > broken BPF selftest test_tcpnotify_user. The error that is being > > > generated by this test is as follows: > > > > > > FAILED: Wrong stats Expected 10 calls, got 8 > > > > > > It looks like the change allows timer functions to be run on CPUs > > > different from the one they are armed on. The test had pinned itself > > > to CPU 0, and in the past the retransmit attempts also occurred on CPU > > > 0. The test had set the max_entries attribute for > > > BPF_MAP_TYPE_PERF_EVENT_ARRAY to 2 and was calling > > > bpf_perf_event_output() with BPF_F_CURRENT_CPU, so the entry was > > > likely to be in range. With the change to allow timers to run on other > > > CPUs, the current CPU tasked with performing the retransmit might be > > > bumped and in turn fall out of range, as the event will be filtered > > > out via __bpf_perf_event_output() using: > > > > > > if (unlikely(index >= array->map.max_entries)) > > > return -E2BIG; > > > > [..] > > > > > A possible change would be to explicitly set the max_entries attribute > > > for perf_event_map in test_tcpnotify_kern.c to a value that's at least > > > as large as the number of CPUs. As it turns out however, if the field > > > is left unset, then the BPF selftest library will determine the number > > > of CPUs available on the underlying system and update the max_entries > > > attribute accordingly. > > > > nit: the max_entries is set by libbpf in map_set_def_max_entries. 'BPF > > selftest library' seems a bit vague. But not a reason for respin. > > Fixed the commit message. Thanks. Applied. ACK. Apologies about the oversimplification there. > > > A further problem with the test is that it has a thread that continues > > > running up until the program exits. The main thread cleans up some > > > LIBBPF data structures, while the other thread continues to use them, > > > which inevitably will trigger a SIGSEGV. This can be dealt with by > > > telling the thread to run for as long as necessary and doing a > > > pthread_join on it before exiting the program. > > Some of the "goto err" seems to have similar problem but ok-ish as long as > the iptables runs fine. I didn't look why the test needs to start a thread > at all, so I leave it as is. The CI is not running this test. The test is > getting rotten overall. It should be moved to test_progs. Probably as a > subtest in some of the existing sockops test in test_progs. Agree, this test is archaic and needs should be converted or folded such that it runs under test_progs. If I have time, I'll perform this clean up.