Re: [RFC PATCH v2 bpf-next 0/3] bpf: cgroup: support writing and freezing cgroups from BPF

Djalal Harouni <tixxdz@xxxxxxxxx> · Fri, 22 Aug 2025 19:16:15 +0100

On 8/20/25 02:14, Tejun Heo wrote:
Hello,

On Wed, Aug 20, 2025 at 12:31:01AM +0100, Djalal Harouni wrote:
...
Approach 1:
First RFC months ago was something like that "bpf_task_freeze_cgroup" [1],
can make it bpf_cgroup_freeze() as a proper kfunc, so resurrect approach 1?

Thanks for reminding me. I often feel like my memory is a ring buffer which
lasts a few weeks at most.

yeh been a while...

Internally it used an ugly path to workaround kernfs active reference since
we don't hold a kernfs_open_file coming from userspace
kernfs->write path.

I can improve it, but let's discuss please approach (2) since you
suggested it ;-)

Approach 2:
Per the old suggestions from you and Alexei [2] [3] you wanted something
like:

   s32 bpf_kernfs_knob_write(struct kernfs_node *dir,
                             const char *knob, char *buf);

I didn't make it generic for kernfs, since don't know yet about sysfs use
cases and named it "bpf_cgroup_write_interface" to focus on cgroup base
interfaces.
Doing something that generic now including sysfs without a proper valid use
cases seems a bit too much. Also we have some cgroup kfunc to acquire and
release that integrate well, so I kept it focused.

Alexei suggested to refactor the cgroup_base_file[] [4][5] to take
"kernfs_node" as argument instead of "kernfs_open_file", which will open
other possibilities for BPF.

However, instead of going full change on cgroup_base_files[], I added a
minimalist: cgroup_kn_cftype kn_cfts[] that for now hold only
"cgroup.freeze".

I think there's some misunderstanding here. IIUC, Alexei didn't want to
expose direct file interface because none of the necessary abstractions are
available and the effort becomes too big and wide. However, I don't think
the suggestion to use @kn as the path designator (or @cgroup for that
matter) means that we want to pipe that all the way down to each op that's
to be written to. That'd be rather pointless - why add generic interface if
each needs a dedicated backend anyway?

I took time to check...

I thought to avoid precisely creating a kernfs_open_file, since we don't
have the corresponding context. Beside the user information who opened
the file, kernfs_open_file assumes a corresponding opened file and a
corresponding kernfs_open_node for each kernfs_node with one or more
open files.

Also it seems each kernfs_open_file is added back into
kernfs_open_node->files...

Can't you make kernfs to create
open_file from kn and follow the usual write path?

Seems tricky, needs to handle also kernfs_global_locks->open_file_mutex
compared with writing directly to the dedicated cgroup backend. The
only protection there beside cgroup_lock() is break kernfs active
protection so kernfs nodes can be drained later...

The cgroup_kn_lock_live() grabs a ref on the cgroup, breaks the active
protection, takes the cgroup_lock() and ensures the cgroup is not dead,
from there we don't dereference kn->  until we restore the protection.
Plus we have the kernfs_get(cgrp->kn) on cgroup_mkdir which ensures it
is always accessible.

I do realize taking the same usual path with write is the obvious thing,
but we don't have the corresponding open context, and faking it seems
more trouble than calling directly cgroup backends...

Allow me please to do it again directly on cgroup_base_file[] assuming
it was Alexei suggestion and see how it looks.

Also Tejun, could you please point me to extra cgroup or kernfs tests
you run? much appreciated!

Thanks!

Thanks.