On 2025/9/11 10:48, Yafang Shao wrote:
On Wed, Sep 10, 2025 at 9:57 PM Lance Yang <lance.yang@xxxxxxxxx> wrote:
[...]
+/**
+ * @thp_order_fn_t: Get the suggested THP orders from a BPF program for allocation
+ * @vma: vm_area_struct associated with the THP allocation
+ * @vma_type: The VMA type, such as BPF_THP_VM_HUGEPAGE if VM_HUGEPAGE is set
+ * BPF_THP_VM_NOHUGEPAGE if VM_NOHUGEPAGE is set, or BPF_THP_VM_NONE if
+ * neither is set.
+ * @tva_type: TVA type for current @vma
+ * @orders: Bitmask of requested THP orders for this allocation
+ * - PMD-mapped allocation if PMD_ORDER is set
+ * - mTHP allocation otherwise
+ *
+ * Return: The suggested THP order from the BPF program for allocation. It will
+ * not exceed the highest requested order in @orders. Return -1 to
+ * indicate that the original requested @orders should remain unchanged.
A minor documentation nit: the comment says "Return -1 to indicate that the
original requested @orders should remain unchanged". It might be slightly
clearer to say "Return a negative value to fall back to the original
behavior". This would cover all error codes as well ;)
will change it.
Please feel free to change it ;)
[...]
Also, for future extensions, it might be a good idea to add a reserved
flags argument to the thp_order_fn_t signature.
For example thp_order_fn_t(..., unsigned long flags).
This would give us aforward-compatible way to add new semantics later
without breaking the ABI and needing a v2. We could just require it to be
0 for now.
That makes sense. However, as Lorenzo mentioned previously, we should
keep the interface as minimal as possible.
Got it.
[...]
Forgot to add:
Noticed that if the hook returns 0, bpf_hook_thp_get_orders() falls
back to 'orders', preventing us from dynamically disabling mTHP
allocations.
Could you please clarify what you mean by that?
+ thp_order = bpf_hook_thp_get_order(vma, vma_type, tva_type, orders);
+ if (thp_order < 0)
+ goto out;
In my implementation, it only falls back to @orders if the return
value is negative. If the return value is 0, it uses BIT(0):
My bad, I completely misread the code last night ...
I see now that returning 0 forces a base page (order-0)
+ if (thp_order <= highest_order(orders))
+ thp_orders = BIT(thp_order);
Yes, this is exactly the behavior we need. It will allow us to dynamically
disable mTHP for low-priority containers when we need to, which is perfect
for our use case!
Honoring a return of 0 is critical for our use case, which is to
dynamically disable mTHP for low-priority containers when memory gets
low in mixed workloads.
And then re-enable it for them when memory is back above the low
watermark.
Thank you for detailing your use case; that context is very helpful.
Cheers,
Lance