On Wed, Mar 26, 2025 at 11:59:48AM -0700, Luis Chamberlain wrote: > I'd like to propose this as a a BoF for MM. > > We can find issues if we test them, but some bugs are hard to reproduce, > specially some mm bugs. How far are we willing to add knobs to help with > synthetic tests which which may not apply to numa for instance? An > example is the recent patch I just posted to force testing page > migration [0]. We can only run that test if we have a numa system, and a > lot of testing today runs on guests without numa. Would we be willing > to add a fake numa node to help with synthetic tests like page > migration? Boot your test VMs with fake-numa=4, and now you have a 4 node system being tested even though it's not a real, physical numa machine. I've been doing this for the best part of 15 years now with a couple of my larger test VMs explicitly to test NUMA interactions. I also have a large 64p VM with explicit qemu NUMA configuration that mirrors the underlying hardware NUMA layout. This allows NUMA aware perf testing from inside that VM that responds the same as a real physical machine would. $ $ lscpu .... CPU(s): 64 On-line CPU(s) list: 0-63 Thread(s) per core: 1 Core(s) per socket: 16 Socket(s): 4 ..... NUMA: NUMA node(s): 4 NUMA node0 CPU(s): 0-15 NUMA node1 CPU(s): 16-31 NUMA node2 CPU(s): 32-47 NUMA node3 CPU(s): 48-63 This is also the VM I'm doing most of my performance testing and check-parallel development on, so I see the NUMA scalability issues that occur when trying to make use of the underlying hardware NUMA capability... > Then what else could we add to help stress test page migration and > compaction further? We already have generic/750 and that has found some > snazzy issues so far. But what else can we do to help random guests > all over running fstests start covering complex mm tests better? Use check-parallel on buffered loop devices - it'll generate a heap of page cache pressure from all the IO, and run a heap more tests at the same time as the compaction is running from g/740. This often overlaps with g/650 which does background CPU hotplug, and it definitely overlaps with other tests running drop_caches, mount, unmount, etc, too. One of the eventual goals of check-parallel is to have all these things environmental variables like memory load, compaction, cpu hotplug, etc to be changing in the background whilst the tests running so that we can exercise all the filesystem functionality under changing MM and environmental conditions without having to code that into individual tests.... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx