Re: [PATCH v1 bpf-next 03/10] bpf: tcp: Get rid of st_bucket_done

Martin KaFai Lau <martin.lau@xxxxxxxxx> · Tue, 27 May 2025 11:19:43 -0700

On 5/24/25 2:09 PM, Jordan Rife wrote:
On Fri, May 23, 2025 at 03:07:32PM -0700, Martin KaFai Lau wrote:
On 5/22/25 1:42 PM, Kuniyuki Iwashima wrote:
From: Jordan Rife <jordan@xxxxxxxx>
Date: Thu, 22 May 2025 11:16:13 -0700
   static void bpf_iter_tcp_put_batch(struct bpf_tcp_iter_state *iter)
   {
-	while (iter->cur_sk < iter->end_sk)
-		sock_gen_put(iter->batch[iter->cur_sk++]);
+	unsigned int cur_sk = iter->cur_sk;
+
+	while (cur_sk < iter->end_sk)
+		sock_gen_put(iter->batch[cur_sk++]);

Why is this chunk included in this patch ?

This should be in patch 5 to keep cur_sk for find_cookie

Without this, iter->cur_sk is mutated when iteration stops, and we lose
our place. When iteration resumes and we call bpf_iter_tcp_batch the
iter->cur_sk == iter->end_sk condition will always be true, so we will
skip to the next bucket without seeking to the offset.

Before, we relied on st_bucket_done to tell us if we had remaining items
in the current bucket to process but now need to preserve iter->cur_sk
through iterations to make the behavior equivalent to what we had before.

Thanks for explanation, I was confused by calling tcp_seek_last_pos()
multiple times, and I think we need to preserve/restore st->offset too
in patch 2 and need this change.

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index ac00015d5e7a..0816f20bfdff 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2791,6 +2791,7 @@ static void *tcp_seek_last_pos(struct seq_file *seq)
   			break;
   		st->bucket = 0;
   		st->state = TCP_SEQ_STATE_ESTABLISHED;
+		offset = 0;

This seems like an existing bug not necessarily related to this set.

Agree that this is more of an existing bug.

The patch 5 has also removed the tcp_seek_last_pos() dependency, so I think
it can be a standalone fix on its own.

With the tcp_seq_* ops there are also other corner cases that can lead
to skips, since they rely on st->offset to seek to the last position.

In the scenario described above, sockets disappearing from the last lhash
bucket leads to skipped sockets in the first ehash bucket, but you could
also have a scenario where, for example, the current lhash bucket has 6
sockets, iter->offset is currently 3, 3 sockets disappear from the start
of the current lhash bucket then tcp_seek_last_pos skips the remaining 3
sockets and goes to the next bucket.

I'm not sure it's worth fixing just this one case without also
overhauling the tcp_seq_* logic to prevent these other cases. Otherwise,
it seems more like a Band-aid fix. Perhaps a later series could explore
a more comprehensive solution there.

It is arguable that the missing "offset = 0;" here is a programmer’s error 
rather than the limitation of the offset approach itself. Adding it could be a 
quick fix for this corner case.

That said, it is a very rare case, given there is a "while (... && bucket == 
st->bucket)" condition, and the bug has probably existed since 2010 in commit 
a8b690f98baf. If there is a plan for a long-term fix in /proc/net/tcp[6], I 
think it is reasonable to wait also. I do not have a strong opinion either way. 
I am just unsure if any users care about the skip improvement in /proc/net/tcp[6].