Re: [PATCH rfc 10/12] mm: introduce bpf_out_of_memory() bpf kfunc

Michal Hocko <mhocko@xxxxxxxx> · Mon, 5 May 2025 10:08:53 +0200

On Wed 30-04-25 14:53:50, Roman Gushchin wrote:
> On Wed, Apr 30, 2025 at 09:27:39AM +0200, Michal Hocko wrote:
> > On Tue 29-04-25 21:31:35, Roman Gushchin wrote:
> > > On Tue, Apr 29, 2025 at 01:46:07PM +0200, Michal Hocko wrote:
> > > > On Mon 28-04-25 03:36:15, Roman Gushchin wrote:
> > > > > Introduce bpf_out_of_memory() bpf kfunc, which allows to declare
> > > > > an out of memory events and trigger the corresponding kernel OOM
> > > > > handling mechanism.
> > > > > 
> > > > > It takes a trusted memcg pointer (or NULL for system-wide OOMs)
> > > > > as an argument, as well as the page order.
> > > > > 
> > > > > Only one OOM can be declared and handled in the system at once,
> > > > > so if the function is called in parallel to another OOM handling,
> > > > > it bails out with -EBUSY.
> > > > 
> > > > This makes sense for the global OOM handler because concurrent handlers
> > > > are cooperative. But is this really correct for memcg ooms which could
> > > > happen for different hierarchies? Currently we do block on oom_lock in
> > > > that case to make sure one oom doesn't starve others. Do we want the
> > > > same behavior for custom OOM handlers?
> > > 
> > > It's a good point and I had similar thoughts when I was working on it.
> > > But I think it's orthogonal to the customization of the oom handling.
> > > Even for the existing oom killer it makes no sense to serialize memcg ooms
> > > in independent memcg subtrees. But I'm worried about the dmesg reporting,
> > > it can become really messy for 2+ concurrent OOMs.
> > > 
> > > Also, some memory can be shared, so one OOM can eliminate a need for another
> > > OOM, even if they look independent.
> > > 
> > > So my conclusion here is to leave things as they are until we'll get signs
> > > of real world problems with the (lack of) concurrency between ooms.
> > 
> > How do we learn about that happening though? I do not think we have any
> > counters to watch to suspect that some oom handlers cannot run.
> 
> The bpf program which declares an OOM can handle this: e.g. retry, wait
> and retry, etc. We can also try to mimick the existing behavior and wait
> on oom_lock (potentially splitting it into multiple locks to support
> concurrent ooms in various memcgs). Do you think it's preferable?

Yes, I would just provide different callbacks for global and memcg ooms
and do the blockin for the latter. It will be consistent with the in
kernel implementation (therefore less surprising behavior).

-- 
Michal Hocko
SUSE Labs