On Tue 27-05-25 11:00:56, Parav Pandit wrote: > > From: Jan Kara <jack@xxxxxxx> > > Sent: Monday, May 26, 2025 10:09 PM > > > > Hello! > > > > On Sat 24-05-25 05:56:55, Parav Pandit wrote: > > > I am running a basic test of block device driver unbind, bind while > > > the fio is running random write IOs with direct=0. The test hits the > > > WARN_ON assert on: > > > > > > void pagecache_isize_extended(struct inode *inode, loff_t from, loff_t > > > to) { > > > int bsize = i_blocksize(inode); > > > loff_t rounded_from; > > > struct folio *folio; > > > > > > WARN_ON(to > inode->i_size); > > > > > > This is because when the block device is removed during driver unbind, > > > the driver flow is, > > > > > > del_gendisk() > > > __blk_mark_disk_dead() > > > set_capacity((disk, 0); > > > bdev_set_nr_sectors() > > > i_size_write() -> This will set the inode's isize to 0, while the > > page cache is yet to be flushed. > > > > > > Below is the kernel call trace. > > > > > > Can someone help to identify, where should be the fix? > > > Should block layer to not set the capacity to 0? > > > Or page catch to overcome this dynamic changing of the size? > > > Or? > > > > After thinking about this the proper fix would be for i_size_write() to happen > > under i_rwsem because the change in the middle of the write is what's > > confusing the iomap code. I smell some deadlock potential here but it's > > perhaps worth trying :) > > > Without it, I gave a quick try with inode_lock() unlock() in > i_size_write() and initramfs level it was stuck. I am yet to try with > LOCKDEP. You definitely cannot put inode_lock() into i_size_write(). i_size_write() is expected to be called under inode_lock. And bdev_set_nr_sectors() is breaking this rule by not holding it. So what you can try is to do inode_lock() in bdev_set_nr_sectors() instead of grabbing bd_size_lock. > I was thinking, can the existing sequence lock be used for 64-bit case as > well? The sequence lock is about updating inode->i_size value itself. But we need much larger scale protection here - we need to make sure write to the block device is not happening while the device size changes. And that's what inode_lock is usually used for. Honza > > > WARNING: CPU: 58 PID: 9712 at mm/truncate.c:819 > > > pagecache_isize_extended+0x186/0x2b0 > > > Modules linked in: virtio_blk xt_CHECKSUM xt_MASQUERADE xt_conntrack > > > ipt_REJECT nf_reject_ipv4 xt_set ip_set xt_tcpudp xt_addrtype > > > nft_compat xfrm_user xfrm_algo nft_chain_nat nf_nat nf_conntrack > > > nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink nfsv3 > > > rpcsec_gss_krb5 nfsv4 nfs netfs nvme_fabrics nvme_core cuse overlay > > > bridge stp llc binfmt_misc intel_rapl_msr intel_rapl_common > > > intel_uncore_frequency intel_uncore_frequency_common skx_edac > > > skx_edac_common nfit x86_pkg_temp_thermal intel_powerclamp > > coretemp > > > kvm_intel ipmi_ssif kvm dell_pc dell_smbios platform_profile dcdbas > > > rapl intel_cstate dell_wmi_descriptor wmi_bmof mei_me mei > > > intel_pch_thermal ipmi_si acpi_power_meter acpi_ipmi nfsd sch_fq_codel > > > auth_rpcgss nfs_acl ipmi_devintf ipmi_msghandler lockd grace > > > dm_multipath msr scsi_dh_rdac scsi_dh_emc scsi_dh_alua parport_pc > > > sunrpc ppdev lp parport efi_pstore ip_tables x_tables autofs4 raid10 > > > raid456 async_raid6_recov async_memcpy async_pq async_xor xor > > async_tx > > > raid6_pq raid1 raid0 linear mlx5_core mgag200 i2c_algo_bit > > > drm_client_lib drm_shmem_helper drm_kms_helper mlxfw > > > ghash_clmulni_intel psample sha512_ssse3 drm sha256_ssse3 i2c_i801 tls > > > sha1_ssse3 ahci i2c_mux megaraid_sas tg3 pci_hyperv_intf i2c_smbus > > > lpc_ich libahci wmi aesni_intel crypto_simd cryptd > > > CPU: 58 UID: 0 PID: 9712 Comm: fio Not tainted 6.15.0-rc7-vblk+ #21 > > > PREEMPT(voluntary) Hardware name: Dell Inc. PowerEdge R740/0DY2X0, > > > BIOS 2.11.2 004/21/2021 > > > RIP: 0010:pagecache_isize_extended+0x186/0x2b0 > > > Code: 04 00 00 00 e8 2b bc 1f 00 f0 41 ff 4c 24 34 75 08 4c 89 e7 e8 > > > ab bd ff ff 48 83 c4 08 5b 41 5c 41 5d 41 5e 5d c3 cc cc cc cc <0f> 0b > > > e9 04 ff ff ff 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 20 > > > RSP: 0018:ffff88819a16f428 EFLAGS: 00010287 > > > RAX: dffffc0000000000 RBX: ffff88908380c738 RCX: 000000000000000c > > > RDX: 1ffff112107018f1 RSI: 000000002e47f000 RDI: ffff88908380c788 > > > RBP: ffff88819a16f450 R08: 0000000000000001 R09: fffff94008933c86 > > > R10: 000000002e47f000 R11: 0000000000000000 R12: 0000000000001000 > > > R13: 0000000033956000 R14: 000000002e47f000 R15: ffff88819a16f690 > > > FS: 00007f1be37fe640(0000) GS:ffff889069680000(0000) > > > knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 00007f1c05205018 CR3: 000000115d00d001 CR4: 00000000007726f0 > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > PKRU: 55555554 > > > Call Trace: > > > <TASK> > > > iomap_file_buffered_write+0x763/0xa90 > > > ? aa_file_perm+0x37e/0xd40 > > > ? __pfx_iomap_file_buffered_write+0x10/0x10 > > > ? __kasan_check_read+0x15/0x20 > > > ? __pfx_down_read+0x10/0x10 > > > ? __kasan_check_read+0x15/0x20 > > > ? inode_needs_update_time.part.0+0x15c/0x1e0 > > > blkdev_write_iter+0x628/0xc90 > > > aio_write+0x2f9/0x6e0 > > > ? io_submit_one+0xc98/0x1c20 > > > ? __pfx_aio_write+0x10/0x10 > > > ? kasan_save_stack+0x40/0x60 > > > ? kasan_save_stack+0x2c/0x60 > > > ? kasan_save_track+0x18/0x40 > > > ? kasan_save_free_info+0x3f/0x60 > > > ? kasan_save_track+0x18/0x40 > > > ? kasan_save_alloc_info+0x3c/0x50 > > > ? __kasan_slab_alloc+0x91/0xa0 > > > ? fget+0x17c/0x250 > > > io_submit_one+0xb9c/0x1c20 > > > ? io_submit_one+0xb9c/0x1c20 > > > ? __pfx_aio_write+0x10/0x10 > > > ? __pfx_io_submit_one+0x10/0x10 > > > ? __kasan_check_write+0x18/0x20 > > > ? _raw_spin_lock_irqsave+0x96/0xf0 > > > ? __kasan_check_write+0x18/0x20 > > > __x64_sys_io_submit+0x14e/0x390 > > > ? __pfx___x64_sys_io_submit+0x10/0x10 > > > ? aio_read_events+0x489/0x800 > > > ? read_events+0xc1/0x2f0 > > > x64_sys_call+0x20ad/0x2150 > > > do_syscall_64+0x6f/0x120 > > > ? __pfx_read_events+0x10/0x10 > > > ? __x64_sys_io_submit+0x1c6/0x390 > > > ? __x64_sys_io_submit+0x1c6/0x390 > > > ? __pfx___x64_sys_io_submit+0x10/0x10 > > > ? __x64_sys_io_getevents+0x14c/0x2a0 > > > ? __kasan_check_read+0x15/0x20 > > > ? do_io_getevents+0xfa/0x220 > > > ? __x64_sys_io_getevents+0x14c/0x2a0 > > > ? __pfx___x64_sys_io_getevents+0x10/0x10 > > > ? fpregs_assert_state_consistent+0x25/0xb0 > > > ? __kasan_check_read+0x15/0x20 > > > ? fpregs_assert_state_consistent+0x25/0xb0 > > > ? syscall_exit_to_user_mode+0x5e/0x1d0 > > > ? do_syscall_64+0x7b/0x120 > > > ? __x64_sys_io_getevents+0x14c/0x2a0 > > > ? __pfx___x64_sys_io_getevents+0x10/0x10 > > > ? __kasan_check_read+0x15/0x20 > > > ? fpregs_assert_state_consistent+0x25/0xb0 > > > ? syscall_exit_to_user_mode+0x5e/0x1d0 > > > ? do_syscall_64+0x7b/0x120 > > > ? syscall_exit_to_user_mode+0x5e/0x1d0 > > > ? do_syscall_64+0x7b/0x120 > > > ? syscall_exit_to_user_mode+0x5e/0x1d0 > > > ? clear_bhb_loop+0x40/0x90 > > > ? clear_bhb_loop+0x40/0x90 > > > ? clear_bhb_loop+0x40/0x90 > > > ? clear_bhb_loop+0x40/0x90 > > > ? clear_bhb_loop+0x40/0x90 > > > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > > RIP: 0033:0x7f1c0431e88d > > > Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 > > > 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d > > > 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48 > > > RSP: 002b:00007f1be37f9628 EFLAGS: 00000246 ORIG_RAX: > > 00000000000000d1 > > > RAX: ffffffffffffffda RBX: 00007f1be37fc7a8 RCX: 00007f1c0431e88d > > > RDX: 00007f1bd40032e8 RSI: 0000000000000001 RDI: 00007f1bfa545000 > > > RBP: 00007f1bfa545000 R08: 00007f1af0512010 R09: 0000000000000718 > > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 > > > R13: 0000000000000000 R14: 00007f1bd40032e8 R15: 00007f1bd4000b70 > > > </TASK> ---[ end trace 0000000000000000 ]--- > > > > > > fio: attempt to access beyond end of device > > > vda: rw=2049, sector=0, nr_sectors = 8 limit=0 Buffer I/O error on dev > > > vda, logical block 0, lost async page write > > > > > > > > -- > > Jan Kara <jack@xxxxxxxx> > > SUSE Labs, CR -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR