On 06/30, Jordan Rife wrote: > Require that iter->batch always contains a full bucket snapshot. This > invariant is important to avoid skipping or repeating sockets during > iteration when combined with the next few patches. Before, there were > two cases where a call to bpf_iter_tcp_batch may only capture part of a > bucket: > > 1. When bpf_iter_tcp_realloc_batch() returns -ENOMEM. > 2. When more sockets are added to the bucket while calling > bpf_iter_tcp_realloc_batch(), making the updated batch size > insufficient. > > In cases where the batch size only covers part of a bucket, it is > possible to forget which sockets were already visited, especially if we > have to process a bucket in more than two batches. This forces us to > choose between repeating or skipping sockets, so don't allow this: > > 1. Stop iteration and propagate -ENOMEM up to userspace if reallocation > fails instead of continuing with a partial batch. > 2. Try bpf_iter_tcp_realloc_batch() with GFP_USER just as before, but if > we still aren't able to capture the full bucket, call > bpf_iter_tcp_realloc_batch() again while holding the bucket lock to > guarantee the bucket does not change. On the second attempt use > GFP_NOWAIT since we hold onto the spin lock. > > I did some manual testing to exercise the code paths where GFP_NOWAIT is > used and where ERR_PTR(err) is returned. I used the realloc test cases > included later in this series to trigger a scenario where a realloc > happens inside bpf_iter_tcp_batch and made a small code tweak to force > the first realloc attempt to allocate a too-small batch, thus requiring > another attempt with GFP_NOWAIT. Some printks showed both reallocs with > the tests passing: > > Jun 27 00:00:53 crow kernel: again GFP_USER > Jun 27 00:00:53 crow kernel: again GFP_NOWAIT > Jun 27 00:00:53 crow kernel: again GFP_USER > Jun 27 00:00:53 crow kernel: again GFP_NOWAIT > > With this setup, I also forced each of the bpf_iter_tcp_realloc_batch > calls to return -ENOMEM to ensure that iteration ends and that the > read() in userspace fails. > > Signed-off-by: Jordan Rife <jordan@xxxxxxxx> > Reviewed-by: Kuniyuki Iwashima <kuniyu@xxxxxxxxxx> Looks more approachable now, thank you! Acked-by: Stanislav Fomichev <sdf@xxxxxxxxxxx>