From: Martin KaFai Lau <martin.lau@xxxxxxxxx> Date: Thu, 17 Apr 2025 16:12:57 -0700 > On 4/17/25 3:41 PM, Kuniyuki Iwashima wrote: > > From: Jordan Rife <jordan@xxxxxxxx> > > Date: Wed, 16 Apr 2025 16:36:17 -0700 > >> Require that iter->batch always contains a full bucket snapshot. This > >> invariant is important to avoid skipping or repeating sockets during > >> iteration when combined with the next few patches. Before, there were > >> two cases where a call to bpf_iter_udp_batch may only capture part of a > >> bucket: > >> > >> 1. When bpf_iter_udp_realloc_batch() returns -ENOMEM [1]. > >> 2. When more sockets are added to the bucket while calling > >> bpf_iter_udp_realloc_batch(), making the updated batch size > >> insufficient [2]. > >> > >> In cases where the batch size only covers part of a bucket, it is > >> possible to forget which sockets were already visited, especially if we > >> have to process a bucket in more than two batches. This forces us to > >> choose between repeating or skipping sockets, so don't allow this: > >> > >> 1. Stop iteration and propagate -ENOMEM up to userspace if reallocation > >> fails instead of continuing with a partial batch. > >> 2. Retry bpf_iter_udp_realloc_batch() up to two times if we fail to > >> capture the full bucket. On the third attempt, hold onto the bucket > >> lock (hslot2->lock) through bpf_iter_udp_realloc_batch() with > >> GFP_ATOMIC to guarantee that the bucket size doesn't change before > >> our next attempt. Try with GFP_USER first to improve the chances > >> that memory allocation succeeds; only use GFP_ATOMIC as a last > >> resort. > > > > kvmalloc() tries the kmalloc path, 1. slab and 2. page allocator, and > > if both of them fails, then tries 3. vmalloc(). But, vmalloc() does not > > support GFP_ATOMIC, __kvmalloc_node_noprof() returns at > > !gfpflags_allow_blocking(). > > > > So, falling back to GFP_ATOMIC is most unlikely to work as the last resort. > > > > GFP_ATOMIC first and falling back to GFP_USER few more times, or not using > > GFP_ATOMIC makes more sense to me. > > If I read it correctly, the last retry with GFP_ATOMIC is not because of the > earlier GFP_USER allocation failure but the size of the bucket has changed a lot > that it is doing one final attempt to get the whole bucket and this requires to > hold the bucket lock to ensure the size stays the same which then must use > GFP_ATOMIC. Ah exactly, when allocation fails, it always returned an error. Sorry, I should've read code first.