Re: [BUG] mlx5_core memory management issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 15/08/2025 16.59, Jakub Kicinski wrote:
On Thu, 14 Aug 2025 17:58:21 +0200 Jesper Dangaard Brouer wrote:
Found-by: Dragos Tatulea <dtatulea@xxxxxxxxxx>

ENOSUCHTAG?


I pre-checked that "Found-by:" have already been used 32 times in git-
history. But don't worry, Martin applied it such that it isn't in the
tags section, by removing the ":" and placing it in the desc part.

Reported-by: Chris Arges <carges@xxxxxxxxxxxxxx>

The XDP code have evolved since the xdp_set_return_frame_no_direct()
calls were added.  Now page_pool keeps track of pp->napi and
pool-> cpuid.  Maybe the __xdp_return [1] checks should be updated?
(and maybe it allows us to remove the no_direct helpers).
So you mean to drop the napi_direct flag in __xdp_return and let
page_pool_put_unrefed_netmem() decide if direct should be used by
page_pool_napi_local()?

Yes, something like that, but I would like Kuba/Jakub's input, as IIRC
he introduced the page_pool->cpuid and page_pool->napi.

There are some corner-cases we need to consider if they are valid.  If
cpumap get redirected to the *same* CPU as "previous" NAPI instance,
which then makes page_pool->cpuid match, is it then still valid to do
"direct" return(?).

I think/hope so, but it depends on xdp_return only being called from
softirq context.. Since softirqs can't nest if producer and consumer
of the page pool pages are on the same CPU they can't race.

That is true, softirqs can't nest.

Jesse pointed me at the tun device driver, where we in-principle are missing a xdp_set_return_frame_no_direct() section. Except I believe, that the memory type cannot be page_pool in this driver. (Code hint, tun_xdp_act() calls xdp_do_redirect).

The tun driver made me realize, that we do have users that doesn't run under a softirq, but they do remember to disable BH. (IIRC BH-disable can nest). Are we also race safe in this case(?).

Is the code change as simple as below or did I miss something?

void __xdp_return
  [...]
  case MEM_TYPE_PAGE_POOL:
   [...]
    if (napi_direct && READ_ONCE(pool->cpuid) != smp_processor_id())
	napi_direct = false;

It is true, that when we exit NAPI, then pool->cpuid becomes -1.
Or what that only during shutdown?

I'm slightly worried that drivers which don't have dedicated Tx XDP
rings will clean it up from hard IRQ when netpoll calls. But that'd
be a bug, right? We don't allow XDP processing from IRQ context.

I didn't consider this code path. But, yes that would be considered a
netpoll bug IMHO.

--Jesper




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux