Re: [PATCH] SUNRPC: Cleanup/fix initial rq_pages allocation

Chuck Lever <chuck.lever@xxxxxxxxxx> · Thu, 5 Jun 2025 14:40:43 -0400

On 6/5/25 2:30 PM, Benjamin Coddington wrote:
> On 5 Jun 2025, at 14:08, Chuck Lever wrote:
> 
>> On 6/5/25 12:54 PM, Benjamin Coddington wrote:
>>> On 5 Jun 2025, at 10:26, Chuck Lever wrote:
>>>
>>>> This doesn't apply to v6.16-rc1 due to recent changes to use a
>>>> dynamically-allocated rq_pages array. This array is allocated in
>>>> svc_init_buffer(); the array allocation has to remain.
>>>
>>> Well, shucks.  I guess I should be paying better attention.
>>>
>>> Can we drop the bulk allocation in svc_init_buffer if we're just going to
>>> try it more robustly in svc_alloc_arg?
>>
>> Maybe!
> 
> Ok, I'll send something.
> 
>> I would like to understand the failure a little better. Why is mount
>> susceptible to this issue?
> 
> For v3, we're starting lockd, and on v4.0 it's the callback thread(s).  It's
> pretty easy to reproduce if you bump the cb threads to something insane like
> 64k.
> 
> Customers have a really hard time handling this on autofs, its not
> like the system just booted - instead the system will be up for long periods
> doing work, then the automount fails requiring manual intervention.
> 
> I think the bulk allocator can be pretty sensitive to some conditions which
> cause it to bail out and only return a single page.

I see, it's client-side initialization that is failing. I didn't get
that before.

So, there is already a logic change in v6.16-rc that defers backchannel
set-up to svc_alloc_arg(). Either that will fix it or you can piggyback
on those changes.

The problem I see is if you want to backport this fix. The bulk of the
rq_pages changes in v6.16 are not appropriate for stable/LTS so we'll
have to work something out.

-- 
Chuck Lever