Hi SeongJae, thanks for your helpful auto-tuning patchset, which optimizes the ease of used of DAMON on tiered memory systems. I have tested demotion mechanism with a microbenchmark and would like to share the result. On Sun, 20 Apr 2025 12:40:23 -0700 SeongJae Park <sj@xxxxxxxxxx> wrote: [..snip..] > Utilizing DAMON for memory tiering usually requires manual tuning and/ > Evaluation Limitations > ---------------------- > > As mentioned above, this evaluation shows only comparison of promotion > mechanisms. DAMON-based tiering is recommended to be used together with > reclaim-based demotion as a faster backup under significant memory > pressure, though. > > >From some perspective, the modified version of Taobench may seems making > the picture distorted too much. It would be better to evaluate with > more realistic workload, or more finely tuned micro benchmarks. > Hardware. - Node 0: 512GB DRAM - Node 1: 0GB (memoryless) - Node 2: 96GB CXL memory Kernel - RFC patchset on top of v6.14-rc7 https://lore.kernel.org/damon/20250320053937.57734-1-sj@xxxxxxxxxx/ Workload - Microbenchmark creates hot and cold regions based on the specified parameters. $ ./hot_cold 1g 100g It repetitively performs memset on a 1GB hot region, but only performs memset once on a 100GB cold region. DAMON setup - My intention is to demote most of all regions of cold memory from node 0 to node 2. So, damo start with below yaml configuration: ... # damo v2.7.2 from https://git.kernel.org/pub/scm/linux/kernel/git/sj/damo.git/ schemes: - action: migrate_cold target_nid: 2 ... apply_interval_us: 0 quotas: time_ms: 0 s sz_bytes: 0 GiB reset_interval_ms: 6 s goals: - metric: node_mem_free_bp target_value: 99% nid: 0 current_value: 1 effective_sz_bytes: 0 B ... Results I've run the hot_cold benchmark for approximately 2 days, and have monitored the memory usage of each node as follows: $ numastat -c -p hot_cold Per-node process memory usage (in MBs) PID Node 0 Node 1 Node 2 Node 3 Total --------------- ------ ------ ------ ------ ------ 2689746 (watch) 2 0 0 1 3 2690067 (hot_col 100122 0 3303 0 103426 3770656 (watch) 0 0 0 1 1 3770657 (sh) 2 0 0 0 2 --------------- ------ ------ ------ ------ ------ Total 100127 0 3303 1 103432 I expected that most of cold data from node 0 would be demoted to node 2, but it isn't. In this situation, DAMON's variables are displayed as follows: [2067202.863431] totalram 131938449 free 84504526 used 47433923 numerator 84504526 [2067202.863446] goal->current_value: 6404 [2067202.863452] score: 6468 [2067202.863455] quota->esz: 1844674407370955 `score` 6468 means the goal hasn't been achieved yet, and the `quota->esz`, which specifies the aggressiveness of the demotion action, has reached ULONG_MAX. However, the demotion has not occured. [..snip..] I think there may be some errors or misunderstanding in my experiment. I would be grateful for any insights or feedback you might have regarding these results. Best Regards, Yunjeong