Re: page type is 0, migratetype passed is 2 (nr=256)

"Marc Hartmayer" <mhartmay@xxxxxxxxxxxxx> · Tue, 20 May 2025 12:23:42 +0200

On Mon, May 12, 2025 at 01:14 PM -0400, Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> On Mon, May 12, 2025 at 12:35:39PM -0400, Zi Yan wrote:
>> On 12 May 2025, at 12:16, Lorenzo Stoakes wrote:
>> 
>> > +cc Zi
>> >
>> > Hi Marc,
>> >
>> > I noticed this same bug as reported in [0], but only for a _very_ recent
>> > patch series by Zi, which is only present in mm-new, which is the most
>> > unstable mm branch right now :)
>> >
>> > So I wonder if related or a coincidence caused by something else?
>> 
>> Unless Marc's branch has my "make MIGRATE_ISOLATE a standalone bit" patchset,
>> it should be caused by something else.
>> 
>> A bisect would be very helpful.
>> 
>> >
>> > This is triggered by the mm self-test (in tools/testing/selftests/mm, you
>> > can just make -jXX there) transhuge-stress, invoked as:
>> >
>> > $ sudo ./transhuge-stress -d 20
>> >
>> > The stack traces do look very different though so perhaps unrelated?
>> 
>> The warning is triggered, in the both cases, a pageblock with MIGRATE_UNMOVABLE(0)
>> is moved to MIGRATE_RECLAIMABLE(2). The pageblock is supposed to have
>> MIGRATE_RECLAIMABLE(2) before the movement.
>
> The weird thing is that the warning is from expand(), when the broken
> up chunks are put *back*. Marc, can you confirm that this is the only
> warning in dmesg, and there aren't any before this one?

Yep, I’ve just checked, it was the first warning and `panic_on_warn` is
set to 1.

I managed to reproduce a similar crash using 6.15.0-rc7 (this time THP
seems to be involved):

  …
  root@qemus390x:~# [   40.442403] ------------[ cut here ]------------
  [   40.442471] page type is 0, passed migratetype is 1 (nr=256)
  [   40.442525] WARNING: CPU: 0 PID: 350 at mm/page_alloc.c:669 expand (mm/page_alloc.c:669 (discriminator 2) mm/page_alloc.c:1572 (discriminator 2))
  [   40.442558] Modules linked in: pkey_pckmo(E) pkey(E) diag288_wdt(E) watchdog(E) s390_trng(E) virtio_console(E) rng_core(E) vmw_vsock_virtio_transport(E) vmw_vsock_virtio_transport_common(E) vsock(E) ghash_s390(E) prng(E) aes_s390(E) des_s390(E) libdes(E) sha3_512_s390(E) sha3_256_s390(E) sha512_s390(E) sha256_s390(E) sha1_s390(E) sha_common(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) vfio(E) sch_fq_codel(E) drm(E) i2c_core(E) drm_panel_orientation_quirks(E) nfnetlink(E) autofs4(E)
  [   40.442651] Unloaded tainted modules: hmac_s390(E):1
  [   40.442677] CPU: 0 UID: 0 PID: 350 Comm: mempig_verify Tainted: G            E       6.15.0-rc7-11557-ga01c92c55b53 #1 PREEMPT
  [   40.442683] Tainted: [E]=UNSIGNED_MODULE
  [   40.442687] Hardware name: IBM 3931 A01 701 (KVM/Linux)
  [   40.442692] Krnl PSW : 0404d00180000000 000002ff929af40c expand (mm/page_alloc.c:669 (discriminator 10) mm/page_alloc.c:1572 (discriminator 10))
  [   40.442696]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
  [   40.442699] Krnl GPRS: 000002ff80000004 0000000000000005 0000000000000030 0000000000000000
  [   40.442701]            0000000000000005 0000027f80000005 0000000000000100 0000000000000008
  [   40.442703]            000002ff93f99290 000001f63a415900 0000027500000008 00000275829f4000
  [   40.442704]            0000000000000000 0000000000000008 000002ff929af408 0000027f928c36f8
  [ 40.442722] Krnl Code: 000002ff929af3fc: c02000883f4b larl %r2,000002ff93ab7292

  Code starting with the faulting instruction
  ===========================================
  [   40.442722]            000002ff929af402: c0e5ffe7bd17        brasl   %r14,000002ff926a6e30
  [   40.442722]           #000002ff929af408: af000000            mc      0,0
  [   40.442722]           >000002ff929af40c: a7f4ff49            brc     15,000002ff929af29e
  [   40.442722]            000002ff929af410: b904002b            lgr     %r2,%r11
  [   40.442722]            000002ff929af414: c03000881980        larl    %r3,000002ff93ab2714
  [   40.442722]            000002ff929af41a: c0e5fffdd883        brasl   %r14,000002ff9296a520
  [   40.442722]            000002ff929af420: af000000            mc      0,0
  [   40.442736] Call Trace:
  [   40.442738] expand (mm/page_alloc.c:669 (discriminator 10) mm/page_alloc.c:1572 (discriminator 10))
  [   40.442741] expand (mm/page_alloc.c:669 (discriminator 2) mm/page_alloc.c:1572 (discriminator 2))
  [   40.442743] rmqueue_bulk (mm/page_alloc.c:1587 mm/page_alloc.c:1758 mm/page_alloc.c:2311 mm/page_alloc.c:2364)
  [   40.442745] __rmqueue_pcplist (mm/page_alloc.c:3086)
  [   40.442748] rmqueue.isra.0 (mm/page_alloc.c:3124 mm/page_alloc.c:3155)
  [   40.442751] get_page_from_freelist (mm/page_alloc.c:3683)
  [   40.442754] __alloc_frozen_pages_noprof (mm/page_alloc.c:4967 (discriminator 1))
  [   40.442756] alloc_pages_mpol (mm/mempolicy.c:2290)
  [   40.442764] folio_alloc_mpol_noprof (mm/mempolicy.c:2322)
  [   40.442766] vma_alloc_folio_noprof (mm/mempolicy.c:2355 (discriminator 1))
  [   40.442769] vma_alloc_anon_folio_pmd (mm/huge_memory.c:1167 (discriminator 1))
  [   40.442773] __do_huge_pmd_anonymous_page (mm/huge_memory.c:1227 (discriminator 1))
  [   40.442775] __handle_mm_fault (mm/memory.c:5862 mm/memory.c:6111)
  [   40.442781] handle_mm_fault (mm/memory.c:6321)
  [   40.442783] do_exception (arch/s390/mm/fault.c:298)
  [   40.442792] __do_pgm_check (arch/s390/kernel/traps.c:345)
  [   40.442802] pgm_check_handler (arch/s390/kernel/entry.S:334)
  [   40.442805] Last Breaking-Event-Address:
  [   40.442806] __warn_printk (kernel/panic.c:801)
  [   40.442818] Kernel panic - not syncing: kernel: panic_on_warn set ...
  [   40.442822] CPU: 0 UID: 0 PID: 350 Comm: mempig_verify Tainted: G            E       6.15.0-rc7-11557-ga01c92c55b53 #1 PREEMPT
  [   40.442825] Tainted: [E]=UNSIGNED_MODULE
  [   40.442826] Hardware name: IBM 3931 A01 701 (KVM/Linux)
  [   40.442827] Call Trace:
  [   40.442828] dump_stack_lvl (lib/dump_stack.c:122)
  [   40.442831] panic (kernel/panic.c:372)
  [   40.442833] check_panic_on_warn (kernel/panic.c:247)
  [   40.442836] __warn (kernel/panic.c:751)
  [   40.443057] report_bug (lib/bug.c:176 lib/bug.c:215)
  [   40.443064] monitor_event_exception (arch/s390/kernel/traps.c:227 (discriminator 1))
  [   40.443067] __do_pgm_check (arch/s390/kernel/traps.c:345)
  [   40.443071] pgm_check_handler (arch/s390/kernel/entry.S:334)
  [   40.443074] expand (mm/page_alloc.c:669 (discriminator 10) mm/page_alloc.c:1572 (discriminator 10))
  [   40.443077] expand (mm/page_alloc.c:669 (discriminator 2) mm/page_alloc.c:1572 (discriminator 2))
  [   40.443080] rmqueue_bulk (mm/page_alloc.c:1587 mm/page_alloc.c:1758 mm/page_alloc.c:2311 mm/page_alloc.c:2364)
  [   40.443087] __rmqueue_pcplist (mm/page_alloc.c:3086)
  [   40.443090] rmqueue.isra.0 (mm/page_alloc.c:3124 mm/page_alloc.c:3155)
  [   40.443093] get_page_from_freelist (mm/page_alloc.c:3683)
  [   40.443097] __alloc_frozen_pages_noprof (mm/page_alloc.c:4967 (discriminator 1))
  [   40.443100] alloc_pages_mpol (mm/mempolicy.c:2290)
  [   40.443104] folio_alloc_mpol_noprof (mm/mempolicy.c:2322)
  [   40.443110] vma_alloc_folio_noprof (mm/mempolicy.c:2355 (discriminator 1))
  [   40.443114] vma_alloc_anon_folio_pmd (mm/huge_memory.c:1167 (discriminator 1))
  [   40.443117] __do_huge_pmd_anonymous_page (mm/huge_memory.c:1227 (discriminator 1))
  [   40.443120] __handle_mm_fault (mm/memory.c:5862 mm/memory.c:6111)
  [   40.443123] handle_mm_fault (mm/memory.c:6321)
  [   40.443126] do_exception (arch/s390/mm/fault.c:298)
  [   40.443129] __do_pgm_check (arch/s390/kernel/traps.c:345)
  [   40.443132] pgm_check_handler (arch/s390/kernel/entry.S:334)

This time, the setup is even simpler:

1. Start a 2GB QEMU/KVM guest
2. Now run some memory stress test

I run this test in a loop (with starting/shutting down the VM) and after
many iterations, the bug occurs.

[…snip…]