On Thu, Mar 21, 2019 at 02:13:04PM +0100, Andreas Gruenbacher wrote:
> Hi Christoph,
>
> we need your help fixing a gfs2 deadlock involving iomap. What's going
> on is the following:
>
> * During iomap_file_buffered_write, gfs2_iomap_begin grabs the log flush
> lock and keeps it until gfs2_iomap_end. It currently always does that
> even though there is no point other than for journaled data writes.
>
> * iomap_file_buffered_write then calls balance_dirty_pages_ratelimited.
> If that ends up calling gfs2_write_inode, gfs2 will try to grab the
> log flush lock again and deadlock.
What is the exactly call chain? balance_dirty_pages_ratelimited these
days doesn't start I/O, but just wakes up the flusher threads. Or
do we have a issue where it is blocking on those threads?
Also why do you need to flush the log for background writeback in
->write_inode?
balance_dirty_pages_ratelimited is per definition not a data integrity
writeback, so there shouldn't be a good reason to flush the log
(which I assume the log flush log is for). If we look gfs2_write_inode,
this seems to be the code:
bool flush_all = (wbc->sync_mode == WB_SYNC_ALL || gfs2_is_jdata(ip));
if (flush_all)
gfs2_log_flush(GFS2_SB(inode), ip->i_gl,
GFS2_LOG_HEAD_FLUSH_NORMAL |
GFS2_LFC_WRITE_INODE);
But what is the requirement to do this in writeback context? Can't
we move it out into another context instead?