Re: [PATCH v1 5/5] relayfs: uniformally use possible cpu iteration

Jason Xing <kerneljasonxing@xxxxxxxxx> · Tue, 13 May 2025 11:25:10 +0800

On Tue, May 13, 2025 at 11:21 AM Andrew Morton
<akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Tue, 13 May 2025 10:03:01 +0800 Jason Xing <kerneljasonxing@xxxxxxxxx> wrote:
>
> > On Tue, May 13, 2025 at 8:52 AM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Mon, 12 May 2025 10:49:35 +0800 Jason Xing <kerneljasonxing@xxxxxxxxx> wrote:
> > >
> > > > From: Jason Xing <kernelxing@xxxxxxxxxxx>
> > > >
> > > > Use for_each_possible_cpu to create per-cpu relayfs file to avoid later
> > > > hotplug cpu which doesn't have its own file.
> > >
> > > I don't understand this.  Exactly what problem are we trying to solve?
> >
> > The reason behind this change is can we directly allocate per possible
> > cpu at the initialization phase. After this, even if some cpu goes
> > online, we don't need to care about it.
> >
> > The idea is directly borrowed from the networking area where people
> > use possible cpu iteration for most cases.
>
> I'd rephrase that as "I know this is wrong but I'm lazy so I'll do it
> this way for now and I'll fix it up later but I never actually do so".
>
> Yes, handling hotplug is a hassle and for_each_possible_cpu() saves a
> couple of hours at development time, but its cost is forever.
>
> > >
> > > > Reviewed-by: Yushan Zhou <katrinzhou@xxxxxxxxxxx>
> > > > Signed-off-by: Jason Xing <kernelxing@xxxxxxxxxxx>
> > > > ---
> > > >  kernel/relay.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/kernel/relay.c b/kernel/relay.c
> > > > index 27f7e701724f..dcb099859e83 100644
> > > > --- a/kernel/relay.c
> > > > +++ b/kernel/relay.c
> > > > @@ -519,7 +519,7 @@ struct rchan *relay_open(const char *base_filename,
> > > >       kref_init(&chan->kref);
> > > >
> > > >       mutex_lock(&relay_channels_mutex);
> > > > -     for_each_online_cpu(i) {
> > > > +     for_each_possible_cpu(i) {
> > >
> > > num_possible_cpus() can sometimes greatly exceed num_online_cpus(), so
> > > this is an unfortunate change.
> >
> > Are you worried about too much extra memory to waste in this case?
>
> That, and the cost of iterating over 1024 CPUs instead of 4 CPUs!
>
> > A relevant thing I would like to share here:
> > To keep the high performance of transferring data between kernel space
> > and user space, the per-cpu mechanism is needed like how relay works
> > at the moment. It allocates many unnecessary/big memory chunks
> > especially when the cpu number is very large, say, 256. I'm still
> > working on this to see if we can figure out a good approach to balance
> > the performance and memory.
> >
> > > It would be better to implement the
> > > hotplug notifier?
> >
> > Right, but sorry, I hesitate to do so because it involves much more
> > work and corresponding tests.
>
> I did this conversion a few times, a million years ago.  It's
> surprisingly easy.

I see. Let me try this :)

Thanks,
Jason