On Tue, Apr 29, 2025, James Houghton wrote: > On Mon, Apr 28, 2025 at 9:19 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > Using MGLRU on my home box fails. It's full cgroup v2, and has both > > CONFIG_IDLE_PAGE_TRACKING=y and MGLRU enabled. > > > > ==== Test Assertion Failure ==== > > access_tracking_perf_test.c:244: false > > pid=114670 tid=114670 errno=17 - File exists > > 1 0x00000000004032a9: find_generation at access_tracking_perf_test.c:244 > > 2 0x00000000004032da: lru_gen_mark_memory_idle at access_tracking_perf_test.c:272 > > 3 0x00000000004034e4: mark_memory_idle at access_tracking_perf_test.c:391 > > 4 (inlined by) run_test at access_tracking_perf_test.c:431 > > 5 0x0000000000403d84: for_each_guest_mode at guest_modes.c:96 > > 6 0x0000000000402c61: run_test_for_each_guest_mode at access_tracking_perf_test.c:492 > > 7 0x000000000041d8e2: cg_run at cgroup_util.c:382 > > 8 0x00000000004027fa: main at access_tracking_perf_test.c:572 > > 9 0x00007fa1cb629d8f: ?? ??:0 > > 10 0x00007fa1cb629e3f: ?? ??:0 > > 11 0x00000000004029d4: _start at ??:? > > Could not find a generation with 90% of guest memory (235929 pages). > > > > Interestingly, if I force the test to use /sys/kernel/mm/page_idle/bitmap, it > > passes. > > > > Please try to reproduce the failure (assuming you haven't already tested that > > exact combination of cgroups v2, MGLRU=y, and CONFIG_IDLE_PAGE_TRACKING=y). I > > don't have bandwidth to dig any further at this time. > > Sorry... please see the bottom of this message for a diff that should fix this. > It fixes these bugs: > > 1. Tracking generation numbers without hardware Accessed bit management. > (This is addition of lru_gen_last_gen.) > 1.5 It does an initial aging pass so that pages always move to newer > generations in (or before) the subsequent aging passes. This probably > isn't needed given the change I made for (1). > 2. Fixes the expected number of pages for guest page sizes > PAGE_SIZE. > (This is the move of test_pages. test_pages has also been renamed to avoid > shadowing.) > 3. Fixes an off-by-one error when looking for the generation with the most > pages. Previously it failed to check the youngest generation, which I think > is the bug you ran into. (This is the change to lru_gen_util.c.) Ya, this was the bug I initially ran into, I also encountered more failues after applying just that fix. But, with the full diff applied, it's passing, so good to go for the next version from my end.