Re: [PATCH] nfsd: Using guard() to simplify nfsd_cache_lookup()

Su Hui <suhui@xxxxxxxxxxxx> · Wed, 25 Jun 2025 19:51:54 +0800

On 2025/6/25 06:15, NeilBrown wrote:
On Wed, 25 Jun 2025, Chuck Lever wrote:
On 6/23/25 8:19 PM, NeilBrown wrote:
On Mon, 23 Jun 2025, Su Hui wrote:
Using guard() to replace *unlock* label. guard() makes lock/unlock code
more clear. Change the order of the code to let all lock code in the
same scope. No functional changes.
While I agree that this code could usefully be cleaned up and that you
have made some improvements, I think the use of guard() is a nearly
insignificant part of the change.  You could easily do exactly the same
patch without using guard() but having and explicit spin_unlock() before
the new return.  That doesn't mean you shouldn't use guard(), but it
does mean that the comment explaining the change could be more usefully
focused on the "Change the order ..." part, and maybe explain what that
is important.

I actually think there is room for other changes which would make the
code even better:
- Change nfsd_prune_bucket_locked() to nfsd_prune_bucket().  Have it
   take the lock when needed, then drop it, then call
   nfsd_cacherep_dispose() - and return the count.
- change nfsd_cache_insert to also skip updating the chain length stats
   when it finds a match - in that case the "entries" isn't a chain
   length. So just  lru_put_end(), return.  Have it return NULL if
   no match was found
- after the found_entry label don't use nfsd_reply_cache_free_locked(),
   just free rp.  It has never been included in any rbtree or list, so it
   doesn't need to be removed.
- I'd be tempted to have nfsd_cache_insert() take the spinlock itself
   and call it under rcu_read_lock() - and use RCU to free the cached
   items.
- put the chunk of code after the found_entry label into a separate
   function and instead just return RC_REPLY (and maybe rename that
   RC_CACHED).  Then in nfsd_dispatch(), if RC_CACHED was returned, call
   that function that has the found_entry code.

I think that would make the code a lot easier to follow.  Would you like
to have a go at that - I suspect it would be several patches - or shall
I do it?
I'm going to counsel some caution.

nfsd_cache_lookup() is a hot path. Source code readability, though
important, is not the priority in this area.

I'm happy to consider changes to this function, but the bottom line is
patches need to be accompanied by data that show that proposed code
modifications do not negatively impact performance. (Plus the usual
test results that show no impact to correctness).

That data might include:
- flame graphs that show a decrease in CPU utilization
- objdump output showing a smaller instruction cache footprint
   and/or short instruction path lengths
- perf results showing better memory bandwidth
- perf results showing better branch prediction
- lockstat results showing less contention and/or shorter hold
   time on locks held in this path

Macro benchmark results are also welcome: equal or lower latency for
NFSv3 operations, and equal or higher I/O throughput.

The benefit for the scoped_guard construct is that it might make it more
difficult to add code that returns from this function with a lock held.
However, so far that hasn't been an issue.

Thus I'm not sure there's a lot of strong technical justification for
modification of this code path. But, you might know of one -- if so,
please make sure that appears in the patch descriptions.

What is more interesting to me is trying out more sophisticated abstract
data types for the DRC hashtable. rhashtable is one alternative; so is
Maple tree, which is supposed to handle lookups with more memory
bandwidth efficiency than walking a linked list.

While I generally like rhashtable there is an awkwardness.  It doesn't
guarantee that an insert will always succeed.  If you get lots of new
records that hash to the same value, it will start failing insert
requests until is hash re-hashed the table with a new seed.  This is
intended to defeat collision attacks.  That means we would need to drop
requests sometimes.  Maybe that is OK.  The DRC could be the target of
collision attacks so maybe we really do want to drop requests if
rhashtable refuses to store them.

I think the other area that could use improvement is pruning old entries.
I would not include RC_INPROG entries in the lru at all - they are
always ignored, and will be added when they are switched to RCU_DONE.
I'd generally like to prune less often in larger batches, but removing
each of the batch from the rbtree could hold the lock for longer than we
would like.  I wonder if we could have an 'old' and a 'new' rbtree and
when the 'old' gets too old or the 'new' get too full, we extract 'old',
move 'new' to 'old', and outside the spinlock we free all of the moved
'old'.

But if we switched to rhashtable, we probably wouldn't need an lru -
just walk the entire table occasionally - there would be little conflict
with concurrent lookups.

But as you say, measuring would be useful.  Hopefully the DRC lookup
would be small contribution to the total request time, so we would need
to measure just want happens in the code to compare different versions.

NeilBrown

Anyway, have fun, get creative, and let's see what comes up.

Thanks for the above prompt. I think I need more time to complete this,
both for code and related tests. I will do my best with curiosity and
creativity :).

Regards,
Su Hui