On Thu, Sep 11, 2025 at 4:44 AM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Tue, Sep 9, 2025 at 7:46 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > > > +/* Detecting whether a task can successfully allocate THP is unreliable because > > + * it may be influenced by system memory pressure. Instead of making the result > > + * dependent on unpredictable factors, we should simply check > > + * bpf_hook_thp_get_orders()'s return value, which is deterministic. > > + */ > > +SEC("fexit/bpf_hook_thp_get_orders") > > +int BPF_PROG(thp_run, struct vm_area_struct *vma, u64 vma_flags, enum tva_type tva_type, > > + unsigned long orders, int retval) > > +{ > > ... > > > +SEC("struct_ops/thp_get_order") > > +int BPF_PROG(alloc_in_khugepaged, struct vm_area_struct *vma, enum bpf_thp_vma_type vma_type, > > + enum tva_type tva_type, unsigned long orders) > > +{ > > This is a bad idea to mix struct_ops logic with fentry/fexit style. > struct_ops hook will not be affected by compiler optimizations, > while fentry depends on a whim of compilers. > struct_ops can be scoped, while fentry is always global. > sched-ext already struggles with the later, since some scheds > need tracing data from other parts of the kernel and they cannot > be grouped together. All sorts of workarounds were proposed, but > no good solution in sight. So don't go this route for THP. > Make everything you need to be struct_ops based and/or pass > whatever extra data into these ops. will change it. > > Also think of scoping for bpf-thp from the start. > Currently st_ops/thp_get_order is only one and it's global. > It's ok for prototypes and experiments, but not ok for landing upstream. > I think cgroup would a natural scope and different cgroups might > want their own bpf based THP hints. Once you do that, think through > how delegation of suggested order will propagate through hierarchy. + Tejun As Johannes Weiner previously explained [[0]], cgroups are designed as nested hierarchies for partitioning resources. They are a poor fit for enforcing arbitrary, non-hierarchical policies. : Cgroups are for nested trees dividing up resources. They're not a good : fit for arbitrary, non-hierarchical policy settings. [0] https://lore.kernel.org/linux-mm/20250430175954.GD2020@xxxxxxxxxxx/ The THP policy is a quintessential example of such an arbitrary setting. Even within a single cgroup, it is often necessary to enable THP for performance-critical tasks while disabling it for others to avoid latency spikes. Implementing this policy through a cgroup interface that propagates hierarchically would eliminate the crucial ability to configure it on a per-task basis. While the bpf-thp mechanism has a global scope, this does not limit its application to a single system-wide policy. In contrast to a hierarchical cgroup-based setting, bpf-thp offers the flexibility to set policies per-task, per-cgroup, or globally. Fundamentally, it is a more powerful variant of prctl(), not a variant of cgroup interface file. > > bpf-oom seems to be aligning toward the same design principles, > so don't reinvent the wheel. Since bpf-oom's role is to select a task to kill from within **a defined group of tasks**, it is inherently well-suited for cgroup-based management. -- Regards Yafang