On Tue, Mar 25, 2025 at 02:52:49PM +0800, Oliver Sang wrote: > hi, Luis, > > On Sun, Mar 23, 2025 at 12:07:27AM -0700, Luis Chamberlain wrote: > > On Sat, Mar 22, 2025 at 06:02:13PM -0700, Luis Chamberlain wrote: > > > On Sat, Mar 22, 2025 at 07:14:40PM -0400, Johannes Weiner wrote: > > > > Hey Luis, > > > > > > > > This looks like the same issue the bot reported here: > > > > > > > > https://lore.kernel.org/all/20250321135524.GA1888695@xxxxxxxxxxx/ > > > > > > > > There is a fix for it queued in next-20250318 and later. Could you > > > > please double check with your reproducer against a more recent next? > > > > > > Confirmed, at least it's been 30 minutes and no crashes now where as > > > before it would crash in 1 minute. I'll let it soak for 2.5 hours in > > > the hopes I can trigger the warning originally reported by this thread. > > > > > > Even though from code inspection I see how the kernel warning would > > > trigger I just want to force trigger it on a test, and I can't yet. > > > > Survied 5 hours now. This certainly fixed that crash. > > > > As for the kernel warning, I can't yet reproduce that, so trying to > > run generic/750 forever and looping > > ./testcases/kernel/syscalls/close_range/close_range01 > > and yet nothing. > > > > Oliver can you reproduce the kernel warning on next-20250321 ? > > the issue still exists on > 9388ec571cb1ad (tag: next-20250321, linux-next/master) Add linux-next specific files for 20250321 > > but randomly (reproduced 7 times in 12 runs, then ltp.close_range01 also failed. > on another 5 times, the issue cannot be reproduced then ltp.close_range01 pass) OK I narrowed down a reproducer to requiring the patch below diff --git a/mm/util.c b/mm/util.c index 448117da071f..3585bdb8700a 100644 --- a/mm/util.c +++ b/mm/util.c @@ -735,6 +735,8 @@ int folio_mc_copy(struct folio *dst, struct folio *src) long nr = folio_nr_pages(src); long i = 0; + might_sleep(); + for (;;) { if (copy_mc_highpage(folio_page(dst, i), folio_page(src, i))) return -EHWPOISON; And then just running: dd if=/dev/zero of=/dev/vde bs=1024M count=1024 For some reason a kernel with the following didn't trigger it so the above patch is needed CONFIG_PROVE_LOCKING=y CONFIG_DEBUG_SPINLOCK=y CONFIG_ACPI_SLEEP=y It may have to do with my preemtpion settings: CONFIG_PREEMPT_BUILD=y CONFIG_ARCH_HAS_PREEMPT_LAZY=y # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set # CONFIG_PREEMPT_LAZY is not set CONFIG_PREEMPT_COUNT=y CONFIG_PREEMPTION=y CONFIG_PREEMPT_DYNAMIC=y CONFIG_PREEMPT_RCU=y And so now to see how we should fix it. LUis