Re: [linux-next:master] [block/bdev] 3c20917120: BUG:sleeping_function_called_from_invalid_context_at_mm/util.c

Luis Chamberlain <mcgrof@xxxxxxxxxx> · Fri, 28 Mar 2025 12:09:04 -0700

On Fri, Mar 28, 2025 at 02:48:00AM -0700, Luis Chamberlain wrote:
> On Thu, Mar 27, 2025 at 09:21:30PM -0700, Luis Chamberlain wrote:
> > Would the extra ref check added via commit 060913999d7a9e50 ("mm:
> > migrate: support poisoned recover from migrate folio") make the removal
> > of the spin lock safe now given all the buffers are locked from the
> > folio? This survives some basic sanity checks on my end with
> > generic/750 against ext4 and also filling a drive at the same time with
> > fio. I have a feeling is we are not sure, do we have a reproducer for
> > the issue reported through ebdf4de5642fb6 ("mm: migrate: fix reference
> > check race between __find_get_block() and migration")? I suspect the
> > answer is no.

Sebastian, David, is there a reason CONFIG_DEBUG_ATOMIC_SLEEP=y won't
trigger a atomic sleeping context warning when cond_resched() is used?
Syzbot and 0-day had ways to reproduce it a kernel warning under these
conditions, but this config didn't, and require dan explicit might_sleep()

CONFIG_PREEMPT_BUILD=y
CONFIG_ARCH_HAS_PREEMPT_LAZY=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
# CONFIG_PREEMPT_LAZY is not set
# CONFIG_PREEMPT_RT is not set
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y
CONFIG_PREEMPT_DYNAMIC=y
CONFIG_PREEMPT_RCU=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_HAVE_PREEMPT_DYNAMIC_CALL=y
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_DEBUG_PREEMPT=y
CONFIG_PREEMPTIRQ_TRACEPOINTS=y
# CONFIG_PREEMPT_TRACER is not set
# CONFIG_PREEMPTIRQ_DELAY_TEST is not set

Are there some preemption configs under which cond_resched() won't
trigger a kernel splat where expected so the only thing I can think of
is perhaps some preempt configs don't implicate a sleep? If true,
instead of adding might_sleep() to one piece of code (in this case
foio_mc_copy()) I wonder if instead just adding it to cond_resched() may
be useful.

Note that the issue in question wouldn't trigger at all with ext4, that
some reports suggset it happened with btrfs  (0-day) with LTP, or
another test from syzbot was just coincidence on any filesystem, the
only way to reproduce this really was by triggering compaction with the
block device cache and hitting compaction as we're now enabling large
folios with the block device cache, and we've narrowed that down to
a simple reproducer of running

dd if=/dev/zero of=/dev/vde bs=1024M count=1024.

and by adding the might_sleep() on folio_mc_copy()

Then as for the issue we're analzying, now that I get back home I think
its important to highlight then that generic/750 seems likely able to
reproduce the original issue reported by commit ebdf4de5642fb6 ("mm:
migrate: fix reference check race between __find_get_block() and migration")
and that it takes about 3 hours to reproduce, which requires reverting
that commit which added the spin lock:

Mar 28 03:36:37 extra-ext4-4k unknown: run fstests generic/750 at 2025-03-28 03:36:37
<-- snip -->
Mar 28 05:57:09 extra-ext4-4k kernel: EXT4-fs error (device loop5): ext4_get_first_dir_block:3538: inode #5174: comm fsstress: directory missing '.'

Jan, can you confirm if the symptoms match the original report?

It would be good for us to see if running the newly proposed generic/764
I am proposing [0] can reproduce that corruption faster than 3 hours.

If we have a reproducer we can work on evaluating a fix for both the
older ext4 issue reported by commit ebdf4de5642fb6 and also remove
the spin lock from page migration to support large folios.

And lastly, can __find_get_block() avoid running in case of page
migration? Do we have semantics from a filesystem perspective to prevent
work in filesystems going on when page migration on a folio is happening
in atomic context? If not, do we need it?

[0] https://lore.kernel.org/all/20250326185101.2237319-1-mcgrof@xxxxxxxxxx/T/#u

  Luis