On Mon, Mar 10, 2025 at 11:10:26AM +0800, Baokun Li wrote: > On 2025/3/8 23:21, Ojaswin Mujoo wrote: > > On Sat, Mar 08, 2025 at 06:37:41PM +0530, Ojaswin Mujoo wrote: > > > On Sat, Mar 08, 2025 at 09:09:28AM +0800, Baokun Li wrote: > > > > On 2025/3/6 22:28, Ojaswin Mujoo wrote: > > > > > ** Changes since v1 [1] ** > > > > > > > > > > * Picked up RVBs from Jan and Ritesh > > > > > * In patch 2/3, we now use a flag in sbi instead of SB_ACITVE > > > > > to determine when to journal sb vs when to commit directly. > > > > > * Added a prep patch 1/3 > > > > > > > > > > [1] https://lore.kernel.org/linux-ext4/cover.1740212945.git.ojaswin@xxxxxxxxxxxxx/T/#m5e659425b8c8fe2ac01e7242b77fed315ff89db4 > > > > > > > > > > @Baokun, I didn't get a chance to look into the journal_inode > > > > > modifications we were discussing in [2]. I'll try to spend some time and > > > > > send that as a separate patch. Hope that's okay > > > > > > > > > > [2] https://lore.kernel.org/linux-ext4/cover.1740212945.git.ojaswin@xxxxxxxxxxxxx/T/#mad8feb44d9b6ddadf87830b92caa7b78d902dc05 > > > > That's fine, it's not a priority. And if this patch set makes sure we > > > > don't crash when things go wrong, I'm okay with leaving it as is. > > > > > > > > It's possible that jbd2_journal_commit_transaction() could call > > > > ext4_handle_error() in other places as the code evolves. Fixing known > > > > problems and protecting against potential ones is always a good thing. > > > Yep thats true, I did spend some time on this since the codepath was a > > > bit unfamiliar to me. Seems like a straighforward enough change. I'll > > > add it to the next patch. > > > > > > thanks, > > > ojaswin > > Hey Baokun, > > > > So while coding this up, I started looking at some codepaths and it got > > me wondering when can we actually change the sbi->s_es->s_journal_inum > > (or sbi->s_sbh) from the time it gets populated to the umount time? > > > > Since the sbi->s_sbh buffer head is always in memory and never reclaimed > > due the elevated reference, the only way to modify it should be if we > > modify the memory page somehow. Or is there some codepath/tooling magic > > I'm missing that can modify this value? > For the ext4 file system, s_journal_inum is fixed after mkfs and is not > expected to be modified. However, the sbi->s_sbh buffer head belongs to > the block device, so direct write operations to the file system's block > device /dev/xxx may directly modify sbi->s_sbh. Right got it, now I get the full picture of the change you are suggesting. Thanks Baokun, I'll do that in next revision. Regards, ojaswin > > For example, you can use the following script to trigger a journal bmap > failure: > > mkfs.ext4 -F /dev/sda > mount /dev/sda /tmp/test > dd if=/dev/zero of=/dev/sda bs=4096 count=1 > echo 1 > /tmp/test/file > umount /tmp/test > > (Adding a delay in put_super and performing the script during the delay > can trigger the issue described in the patch.) > > > Cheers, > Baokun > > > > Regards, > > ojaswin > > > > > > > > Cheers, > > > > Baokun > > > > > ** Original Cover ** > > > > > > > > > > When running LTP stress tests on ext4, after a multiday run we seemed to > > > > > have hit the following BUG_ON: > > > > > > > > > > [NIP : start_this_handle+268] > > > > > #3 [c000001067c27a40] start_this_handle at c008000004d40f74 [jbd2] (unreliable) > > > > > #4 [c000001067c27b60] jbd2__journal_start at c008000004d415cc [jbd2] > > > > > #5 [c000001067c27be0] update_super_work at c0080000053f9758 [ext4] > > > > > #6 [c000001067c27c70] process_one_work at c000000000188790 > > > > > #7 [c000001067c27d20] worker_thread at c00000000018973c > > > > > #8 [c000001067c27dc0] kthread at c000000000196c84 > > > > > #9 [c000001067c27e10] ret_from_kernel_thread at c00000000000cd64 > > > > > > > > > > Which comes out to > > > > > > > > > > 382 repeat: > > > > > 383 read_lock(&journal->j_state_lock); > > > > > * 384 BUG_ON(journal->j_flags & JBD2_UNMOUNT); > > > > > 385 if (is_journal_aborted(journal) || > > > > > 386 (journal->j_errno != 0 && !(journal->j_flags & JBD2_ACK_ERR))) { > > > > > 387 read_unlock(&journal->j_state_lock); > > > > > > > > > > > > > > > Initially this seemed like it should never happen but upon crash > > > > > analysis it seems like it could indeed be hit as described in patch 1/2. > > > > > > > > > > I would like to add that through the logs we only knew that: > > > > > > > > > > - ext4_journal_bmap -> ext4_map_blocks is failing with EFSCORRUPTED. > > > > > - update_super_work had hit the BUG_ON > > > > > > > > > > I was not able to hit this bug again (without modifying the kernel to > > > > > inject errors) but the above backtrace seems to be one possible paths > > > > > where this BUG_ON can be hit. Rest of the analysis and fix is in patch > > > > > 2/3. Patch 3 is just a small tweak that i found helpful while debugging. > > > > > > > > > > That being said, journalling is something I'm not very familiar with and > > > > > there might be gaps in my understanding so thoughts and suggestions are > > > > > welcome. > > > > > > > > > > Ojaswin Mujoo (3): > > > > > ext4: define ext4_journal_destroy wrapper > > > > > ext4: avoid journaling sb update on error if journal is destroying > > > > > ext4: Make sb update interval tunable > > > > > > > > > > fs/ext4/ext4.h | 11 +++++++++++ > > > > > fs/ext4/ext4_jbd2.h | 22 ++++++++++++++++++++++ > > > > > fs/ext4/super.c | 35 +++++++++++++++++------------------ > > > > > fs/ext4/sysfs.c | 4 ++++ > > > > > 4 files changed, 54 insertions(+), 18 deletions(-) > > > > > >