Re: [RFC PATCH 0/4] mm/damon: Add DAMOS action to interleave data across nodes

Joshua Hahn <joshua.hahnjy@xxxxxxxxx> · Fri, 13 Jun 2025 08:25:09 -0700

On Thu, 12 Jun 2025 13:13:26 -0500 Bijan Tabatabai <bijan311@xxxxxxxxx> wrote:

> From: Bijan Tabatabai <bijantabatab@xxxxxxxxxx>
> 
> A recent patch set automatically set the interleave weight for each node
> according to the node's maximum bandwidth [1]. In another thread, the patch
> set's author, Joshua Hahn, wondered if/how these weights should be changed
> if the bandwidth utilization of the system changes [2].

Hi Bijan,

Thank you for this patchset, and thank you for finding interest in my
question!

> This patch set adds the mechanism for dynamically changing how application
> data is interleaved across nodes while leaving the policy of what the
> interleave weights should be to userspace. It does this by adding a new
> DAMOS action: DAMOS_INTERLEAVE. We implement DAMOS_INTERLEAVE with both
> paddr and vaddr operations sets. Using the paddr version is useful for
> managing page placement globally. Using the vaddr version limits tracking
> to one process per kdamond instance, but the va based tracking better
> captures spacial locality.
>
> DAMOS_INTERLEAVE interleaves pages within a region across nodes using the
> interleave weights at /sys/kernel/mm/mempolicy/weighted_interleave/node<N>
> and the page placement algorithm in weighted_interleave_nid via
> policy_nodemask. We chose to reuse the mempolicy weighted interleave
> infrastructure to avoid reimplementing code. However, this has the awkward
> side effect that only pages that are mapped to processes using
> MPOL_WEIGHTED_INTERLEAVE will be migrated according to new interleave
> weights. This might be fine because workloads that want their data to be
> dynamically interleaved will want their newly allocated data to be
> interleaved at the same ratio.

I think this is generally true. Maybe until a user says that they have a
usecase where they would like to have a non-weighted-interleave policy
to allocate pages, but would like to place them according to a set weight,
we can leave support for other mempolicies out for now.

> If exposing policy_nodemask is undesirable, we have two alternative methods
> for having DAMON access the interleave weights it should use. We would
> appreciate feedback on which method is preferred.
> 1. Use mpol_misplaced instead
>   pros: mpol_misplaced is already exposed publically
>   cons: Would require refactoring mpol_misplaced to take a struct vm_area
>   instead of a struct vm_fault, and require refactoring mpol_misplaced and
>   get_vma_policy to take in a struct task_struct rather than just using
>   current. Also requires processes to use MPOL_WEIGHTED_INTERLEAVE.
> 2. Add a new field to struct damos, similar to target_nid for the
> MIGRATE_HOT/COLD schemes.
>   pros: Keeps changes contained inside DAMON. Would not require processes
>   to use MPOL_WEIGHTED_INTERLEAVE.
>   cons: Duplicates page placement code. Requires discussion on the sysfs
>   interface to use for users to pass in the interleave weights.

Here I agree with SJ's sentiment -- I think mpol_misplaced runs with the
context of working with current / fault contexts, like you pointed out.
Perhaps it is best to keep the scope of the changes as local as possible : -)
As for duplicating page placement code, I think that is something we can
refine over iterations of this patchset, and maybe SJ will have some great
ideas on how this can best be done as well.

> This patchset was tested on an AMD machine with a NUMA node with CPUs
> attached to DDR memory and a cpu-less NUMA node attached to CXL memory.
> However, this patch set should generalize to other architectures and number
> of NUMA nodes.

I think moving the test results to the cover letter will help reviewers
better understand the intent of the work. Also, I think it will also be
very helpful to include some potential use-cases in here as well. That is,
what workloads would benefit from placing pages according to a set ratio,
rather than using existing migration policies that adjust this based on
hotness / coldness?

One such use case that I can think of is using this patchset + weighted
interleave auto-tuning, which would help alleviate bandwidth limitations
by ensuring that past the allocation stage, pages are being accessed
in a way that maximizes the bandwidth usage of the system (at the cost of
latency, which may or may not even be true based on how bandwidth-bound
the workload is). 

Thank you again for the amazing patchset! Have a great day : -)
Joshua

Sent using hkml (https://github.com/sjp38/hackermail)