On Thu, 2025-08-28 at 21:05 +0200, Ilya Dryomov wrote: > On Thu, Aug 28, 2025 at 8:55 PM Viacheslav Dubeyko > <Slava.Dubeyko@xxxxxxx> wrote: > > > > On Wed, 2025-08-27 at 20:17 +0200, Max Kellermann wrote: > > > The function ceph_process_folio_batch() sets folio_batch entries to > > > NULL, which is an illegal state. Before folio_batch_release() crashes > > > due to this API violation, the function > > > ceph_shift_unused_folios_left() is supposed to remove those NULLs from > > > the array. > > > > > > However, since commit ce80b76dd327 ("ceph: introduce > > > ceph_process_folio_batch() method"), this shifting doesn't happen > > > anymore because the "for" loop got moved to > > > ceph_process_folio_batch(), and now the `i` variable that remains in > > > ceph_writepages_start() doesn't get incremented anymore, making the > > > shifting effectively unreachable much of the time. > > > > > > Later, commit 1551ec61dc55 ("ceph: introduce ceph_submit_write() > > > method") added more preconditions for doing the shift, replacing the > > > `i` check (with something that is still just as broken): > > > > > > - if ceph_process_folio_batch() fails, shifting never happens > > > > > > - if ceph_move_dirty_page_in_page_array() was never called (because > > > ceph_process_folio_batch() has returned early for some of various > > > reasons), shifting never happens > > > > > > - if `processed_in_fbatch` is zero (because ceph_process_folio_batch() > > > has returned early for some of the reasons mentioned above or > > > because ceph_move_dirty_page_in_page_array() has failed), shifting > > > never happens > > > > > > Since those two commits, any problem in ceph_process_folio_batch() > > > could crash the kernel, e.g. this way: > > > > > > BUG: kernel NULL pointer dereference, address: 0000000000000034 > > > #PF: supervisor write access in kernel mode > > > #PF: error_code(0x0002) - not-present page > > > PGD 0 P4D 0 > > > Oops: Oops: 0002 [#1] SMP NOPTI > > > CPU: 172 UID: 0 PID: 2342707 Comm: kworker/u778:8 Not tainted 6.15.10-cm4all1-es #714 NONE > > > Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.10 12/08/2023 > > > Workqueue: writeback wb_workfn (flush-ceph-1) > > > RIP: 0010:folios_put_refs+0x85/0x140 > > > Code: 83 c5 01 39 e8 7e 76 48 63 c5 49 8b 5c c4 08 b8 01 00 00 00 4d 85 ed 74 05 41 8b 44 ad 00 48 8b 15 b0 > > > > RSP: 0018:ffffb880af8db778 EFLAGS: 00010207 > > > RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000003 > > > RDX: ffffe377cc3b0000 RSI: 0000000000000000 RDI: ffffb880af8db8c0 > > > RBP: 0000000000000000 R08: 000000000000007d R09: 000000000102b86f > > > R10: 0000000000000001 R11: 00000000000000ac R12: ffffb880af8db8c0 > > > R13: 0000000000000000 R14: 0000000000000000 R15: ffff9bd262c97000 > > > FS: 0000000000000000(0000) GS:ffff9c8efc303000(0000) knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 0000000000000034 CR3: 0000000160958004 CR4: 0000000000770ef0 > > > PKRU: 55555554 > > > Call Trace: > > > <TASK> > > > ceph_writepages_start+0xeb9/0x1410 > > > > > > The crash can be reproduced easily by changing the > > > ceph_check_page_before_write() return value to `-E2BIG`. > > > > > > > I cannot reproduce the crash/issue. If ceph_check_page_before_write() returns > > `-E2BIG`, then nothing happens. There is no crush and no write operations could > > be processed by file system driver anymore. So, it doesn't look like recipe to > > reproduce the issue. I cannot confirm that the patch fixes the issue without > > clear way to reproduce the issue. > > > > Could you please provide more clear explanation of the issue reproduction path? > > Hi Slava, > > Was this bit taken into account? > > (Interestingly, the crash happens only if `huge_zero_folio` has > already been allocated; without `huge_zero_folio`, > is_huge_zero_folio(NULL) returns true and folios_put_refs() skips NULL > entries instead of dereferencing them. That makes reproducing the bug > somewhat unreliable. See > https://lore.kernel.org/20250826231626.218675-1-max.kellermann@xxxxxxxxx > for a discussion of this detail.) > > Hi Ilya, And which practical step of actions do you see to repeat and reproduce it? :) Thanks, Slava.