RE: warning on flushing page cache on block device removal

Parav Pandit <parav@xxxxxxxxxx> · Tue, 27 May 2025 12:07:20 +0000

> From: Jan Kara <jack@xxxxxxx>
> Sent: Tuesday, May 27, 2025 5:27 PM
> 
> On Tue 27-05-25 11:00:56, Parav Pandit wrote:
> > > From: Jan Kara <jack@xxxxxxx>
> > > Sent: Monday, May 26, 2025 10:09 PM
> > >
> > > Hello!
> > >
> > > On Sat 24-05-25 05:56:55, Parav Pandit wrote:
> > > > I am running a basic test of block device driver unbind, bind
> > > > while the fio is running random write IOs with direct=0.  The test
> > > > hits the WARN_ON assert on:
> > > >
> > > > void pagecache_isize_extended(struct inode *inode, loff_t from,
> > > > loff_t
> > > > to) {
> > > >         int bsize = i_blocksize(inode);
> > > >         loff_t rounded_from;
> > > >         struct folio *folio;
> > > >
> > > >         WARN_ON(to > inode->i_size);
> > > >
> > > > This is because when the block device is removed during driver
> > > > unbind, the driver flow is,
> > > >
> > > > del_gendisk()
> > > >     __blk_mark_disk_dead()
> > > >             set_capacity((disk, 0);
> > > >                 bdev_set_nr_sectors()
> > > >                     i_size_write() -> This will set the inode's
> > > > isize to 0, while the
> > > page cache is yet to be flushed.
> > > >
> > > > Below is the kernel call trace.
> > > >
> > > > Can someone help to identify, where should be the fix?
> > > > Should block layer to not set the capacity to 0?
> > > > Or page catch to overcome this dynamic changing of the size?
> > > > Or?
> > >
> > > After thinking about this the proper fix would be for i_size_write()
> > > to happen under i_rwsem because the change in the middle of the
> > > write is what's confusing the iomap code. I smell some deadlock
> > > potential here but it's perhaps worth trying :)
> > >
> > Without it, I gave a quick try with inode_lock() unlock() in
> > i_size_write() and initramfs level it was stuck.  I am yet to try with
> > LOCKDEP.
> 
> You definitely cannot put inode_lock() into i_size_write(). i_size_write() is
> expected to be called under inode_lock. And bdev_set_nr_sectors() is
> breaking this rule by not holding it. So what you can try is to do
> inode_lock() in bdev_set_nr_sectors() instead of grabbing bd_size_lock.
>
Ok. will try this.
I am off for few days on travel, so earliest I can do is on Sunday.

> > I was thinking, can the existing sequence lock be used for 64-bit case
> > as well?
> 
> The sequence lock is about updating inode->i_size value itself. But we need
> much larger scale protection here - we need to make sure write to the block
> device is not happening while the device size changes. And that's what
> inode_lock is usually used for.
> 
Other option to explore (with my limited knowledge) is, 
When the block device is removed, not to update the size,

Because queue dying flag and other barriers are placed to prevent the IOs entering lower layer or to fail them.
Can that be the direction to fix?