On Wed 14-05-25 02:04:41, Jeongjun Park wrote: > Since handle->h_transaction may be a NULL pointer, so we should change it > to call is_handle_aborted(handle) first before dereferencing it. > > And the following data-race was reported in my fuzzer: > > ================================================================== > BUG: KCSAN: data-race in jbd2_journal_dirty_metadata / jbd2_journal_dirty_metadata > > write to 0xffff888011024104 of 4 bytes by task 10881 on cpu 1: > jbd2_journal_dirty_metadata+0x2a5/0x770 fs/jbd2/transaction.c:1556 > __ext4_handle_dirty_metadata+0xe7/0x4b0 fs/ext4/ext4_jbd2.c:358 > ext4_do_update_inode fs/ext4/inode.c:5220 [inline] > ext4_mark_iloc_dirty+0x32c/0xd50 fs/ext4/inode.c:5869 > __ext4_mark_inode_dirty+0xe1/0x450 fs/ext4/inode.c:6074 > ext4_dirty_inode+0x98/0xc0 fs/ext4/inode.c:6103 > .... > > read to 0xffff888011024104 of 4 bytes by task 10880 on cpu 0: > jbd2_journal_dirty_metadata+0xf2/0x770 fs/jbd2/transaction.c:1512 > __ext4_handle_dirty_metadata+0xe7/0x4b0 fs/ext4/ext4_jbd2.c:358 > ext4_do_update_inode fs/ext4/inode.c:5220 [inline] > ext4_mark_iloc_dirty+0x32c/0xd50 fs/ext4/inode.c:5869 > __ext4_mark_inode_dirty+0xe1/0x450 fs/ext4/inode.c:6074 > ext4_dirty_inode+0x98/0xc0 fs/ext4/inode.c:6103 > .... > > value changed: 0x00000000 -> 0x00000001 > ================================================================== > > According to this crash report, there is a read/write data-race in > jh->b_modified. > > This is because the b_state_lock is locked too late. > > For some reason, jbd2_journal_dirty_metadata() has been written in a way > that it does not lock b_state_lock before checking jh->b_transaction. Yes and that is deliberate because for some buffers like bitmaps the contention (and cacheline bouncing) on the b_state_lock can be actually quite significant. See commit 6e06ae88edae ("jbd2: speedup jbd2_journal_dirty_metadata()") for a bit more description. > However, This makes the code that checks jh->b_transaction messy and > causes a data-race in jh->b_* variables. Since locking b_state_lock > earlier is not enough to significantly affect performance and most of the > functions defined in transaction.c lock b_state_lock before > reading/writing jh->b_*. Well, the code is written with the expectation that b_transaction, b_jlist, and b_modified fields can be changing underneath. Since this was implemented 10 years ago, we didn't really bother with properly annotating this but the code as is is safe. I agree the data_race() annotation for the b_modified check was missed when adding data_race() annotations 4 years ago so that should be fixed. Honza > Thereforce, I think it would be appropriate to modify > jbd2_journal_dirty_metadata() to lock b_state_lock earlier as well. > > Reported-by: syzbot+de24c3fe3c4091051710@xxxxxxxxxxxxxxxxxxxxxxxxx > Closes: https://syzkaller.appspot.com/bug?extid=de24c3fe3c4091051710 > Fixes: 6e06ae88edae ("jbd2: speedup jbd2_journal_dirty_metadata()") > Signed-off-by: Jeongjun Park <aha310510@xxxxxxxxx> > --- > fs/jbd2/transaction.c | 49 +++++++++++++++---------------------------- > 1 file changed, 17 insertions(+), 32 deletions(-) > > diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c > index cbc4785462f5..7e6dbf37396f 100644 > --- a/fs/jbd2/transaction.c > +++ b/fs/jbd2/transaction.c > @@ -1496,41 +1496,25 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh) > jbd2_debug(5, "journal_head %p\n", jh); > JBUFFER_TRACE(jh, "entry"); > > - /* > - * This and the following assertions are unreliable since we may see jh > - * in inconsistent state unless we grab bh_state lock. But this is > - * crucial to catch bugs so let's do a reliable check until the > - * lockless handling is fully proven. > - */ > - if (data_race(jh->b_transaction != transaction && > - jh->b_next_transaction != transaction)) { > - spin_lock(&jh->b_state_lock); > - J_ASSERT_JH(jh, jh->b_transaction == transaction || > - jh->b_next_transaction == transaction); > - spin_unlock(&jh->b_state_lock); > - } > + spin_lock(&jh->b_state_lock); > + > + J_ASSERT_JH(jh, jh->b_transaction == transaction || > + jh->b_next_transaction == transaction); > + > if (jh->b_modified == 1) { > /* If it's in our transaction it must be in BJ_Metadata list. */ > - if (data_race(jh->b_transaction == transaction && > - jh->b_jlist != BJ_Metadata)) { > - spin_lock(&jh->b_state_lock); > - if (jh->b_transaction == transaction && > - jh->b_jlist != BJ_Metadata) > - pr_err("JBD2: assertion failure: h_type=%u " > - "h_line_no=%u block_no=%llu jlist=%u\n", > - handle->h_type, handle->h_line_no, > - (unsigned long long) bh->b_blocknr, > - jh->b_jlist); > - J_ASSERT_JH(jh, jh->b_transaction != transaction || > - jh->b_jlist == BJ_Metadata); > - spin_unlock(&jh->b_state_lock); > - } > - goto out; > + if (jh->b_transaction == transaction && > + jh->b_jlist != BJ_Metadata) > + pr_err("JBD2: assertion failure: h_type=%u " > + "h_line_no=%u block_no=%llu jlist=%u\n", > + handle->h_type, handle->h_line_no, > + (unsigned long long) bh->b_blocknr, > + jh->b_jlist); > + J_ASSERT_JH(jh, jh->b_transaction != transaction || > + jh->b_jlist == BJ_Metadata); > + goto out_unlock_bh; > } > > - journal = transaction->t_journal; > - spin_lock(&jh->b_state_lock); > - > if (is_handle_aborted(handle)) { > /* > * Check journal aborting with @jh->b_state_lock locked, > @@ -1543,6 +1527,8 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh) > goto out_unlock_bh; > } > > + journal = transaction->t_journal; > + > if (jh->b_modified == 0) { > /* > * This buffer's got modified and becoming part > @@ -1628,7 +1614,6 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh) > spin_unlock(&journal->j_list_lock); > out_unlock_bh: > spin_unlock(&jh->b_state_lock); > -out: > JBUFFER_TRACE(jh, "exit"); > return ret; > } > -- -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR