On 6/5/25 2:30 PM, Benjamin Coddington wrote: > On 5 Jun 2025, at 14:08, Chuck Lever wrote: > >> On 6/5/25 12:54 PM, Benjamin Coddington wrote: >>> On 5 Jun 2025, at 10:26, Chuck Lever wrote: >>> >>>> This doesn't apply to v6.16-rc1 due to recent changes to use a >>>> dynamically-allocated rq_pages array. This array is allocated in >>>> svc_init_buffer(); the array allocation has to remain. >>> >>> Well, shucks. I guess I should be paying better attention. >>> >>> Can we drop the bulk allocation in svc_init_buffer if we're just going to >>> try it more robustly in svc_alloc_arg? >> >> Maybe! > > Ok, I'll send something. > >> I would like to understand the failure a little better. Why is mount >> susceptible to this issue? > > For v3, we're starting lockd, and on v4.0 it's the callback thread(s). It's > pretty easy to reproduce if you bump the cb threads to something insane like > 64k. > > Customers have a really hard time handling this on autofs, its not > like the system just booted - instead the system will be up for long periods > doing work, then the automount fails requiring manual intervention. > > I think the bulk allocator can be pretty sensitive to some conditions which > cause it to bail out and only return a single page. I see, it's client-side initialization that is failing. I didn't get that before. So, there is already a logic change in v6.16-rc that defers backchannel set-up to svc_alloc_arg(). Either that will fix it or you can piggyback on those changes. The problem I see is if you want to backport this fix. The bulk of the rq_pages changes in v6.16 are not appropriate for stable/LTS so we'll have to work something out. -- Chuck Lever