[PATCH AUTOSEL 6.16-5.15] fs: writeback: fix use-after-free in __mark_inode_dirty()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Jiufei Xue <jiufei.xue@xxxxxxxxxxx>

[ Upstream commit d02d2c98d25793902f65803ab853b592c7a96b29 ]

An use-after-free issue occurred when __mark_inode_dirty() get the
bdi_writeback that was in the progress of switching.

CPU: 1 PID: 562 Comm: systemd-random- Not tainted 6.6.56-gb4403bd46a8e #1
......
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __mark_inode_dirty+0x124/0x418
lr : __mark_inode_dirty+0x118/0x418
sp : ffffffc08c9dbbc0
........
Call trace:
 __mark_inode_dirty+0x124/0x418
 generic_update_time+0x4c/0x60
 file_modified+0xcc/0xd0
 ext4_buffered_write_iter+0x58/0x124
 ext4_file_write_iter+0x54/0x704
 vfs_write+0x1c0/0x308
 ksys_write+0x74/0x10c
 __arm64_sys_write+0x1c/0x28
 invoke_syscall+0x48/0x114
 el0_svc_common.constprop.0+0xc0/0xe0
 do_el0_svc+0x1c/0x28
 el0_svc+0x40/0xe4
 el0t_64_sync_handler+0x120/0x12c
 el0t_64_sync+0x194/0x198

Root cause is:

systemd-random-seed                         kworker
----------------------------------------------------------------------
___mark_inode_dirty                     inode_switch_wbs_work_fn

  spin_lock(&inode->i_lock);
  inode_attach_wb
  locked_inode_to_wb_and_lock_list
     get inode->i_wb
     spin_unlock(&inode->i_lock);
     spin_lock(&wb->list_lock)
  spin_lock(&inode->i_lock)
  inode_io_list_move_locked
  spin_unlock(&wb->list_lock)
  spin_unlock(&inode->i_lock)
                                    spin_lock(&old_wb->list_lock)
                                      inode_do_switch_wbs
                                        spin_lock(&inode->i_lock)
                                        inode->i_wb = new_wb
                                        spin_unlock(&inode->i_lock)
                                    spin_unlock(&old_wb->list_lock)
                                    wb_put_many(old_wb, nr_switched)
                                      cgwb_release
                                      old wb released
  wb_wakeup_delayed() accesses wb,
  then trigger the use-after-free
  issue

Fix this race condition by holding inode spinlock until
wb_wakeup_delayed() finished.

Signed-off-by: Jiufei Xue <jiufei.xue@xxxxxxxxxxx>
Link: https://lore.kernel.org/20250728100715.3863241-1-jiufei.xue@xxxxxxxxxxx
Reviewed-by: Jan Kara <jack@xxxxxxx>
Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx>
Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
---

LLM Generated explanations, may be completely bogus:

Based on my thorough analysis of the commit and the kernel code, here's
my assessment:

**Backport Status: YES**

## Extensive Explanation:

This commit fixes a **critical use-after-free vulnerability** in the
writeback subsystem that should definitely be backported to stable
kernel trees. Here's why:

### 1. **Real Security Bug with Clear Impact**
The commit fixes a genuine use-after-free vulnerability that occurs in
`__mark_inode_dirty()`. The bug manifests as a kernel crash with a clear
call trace showing memory corruption. This is not a theoretical issue -
it has been observed in production (kernel 6.6.56).

### 2. **Race Condition Details**
The race condition occurs between two concurrent operations:
- **Thread A** (`__mark_inode_dirty`): Gets a reference to
  `inode->i_wb`, releases the inode lock, then calls
  `wb_wakeup_delayed(wb)`
- **Thread B** (`inode_switch_wbs_work_fn`): Switches the inode's
  writeback context, releases the old wb via `wb_put_many()`, which can
  trigger `cgwb_release` and free the wb structure

The vulnerability window exists because Thread A accesses the wb
structure (`wb_wakeup_delayed(wb)`) after releasing the inode lock but
before completing its operation, while Thread B can free that same wb
structure in parallel.

### 3. **Minimal and Contained Fix**
The fix is remarkably simple and surgical - it only reorders lock
releases:
```c
- spin_unlock(&wb->list_lock);
- spin_unlock(&inode->i_lock);
- trace_writeback_dirty_inode_enqueue(inode);
-
  if (wakeup_bdi && (wb->bdi->capabilities & BDI_CAP_WRITEBACK))
      wb_wakeup_delayed(wb);
+
+ spin_unlock(&wb->list_lock);
+ spin_unlock(&inode->i_lock);
+ trace_writeback_dirty_inode_enqueue(inode);
```

The fix ensures that `wb_wakeup_delayed()` is called while still holding
the locks, preventing the wb from being freed during the operation. This
is a classic lock ordering fix with minimal code change (just moving 3
lines).

### 4. **Critical Subsystem**
The writeback subsystem is fundamental to filesystem operations - it
handles dirty page writeback, inode management, and I/O scheduling. A
use-after-free here can lead to:
- System crashes/panics
- Data corruption
- Security vulnerabilities (potential for exploitation)

### 5. **Matches Stable Criteria**
This fix perfectly aligns with stable kernel rules:
- **Fixes a real bug**: Documented use-after-free with crash trace
- **Already upstream**: Has been reviewed by Jan Kara (filesystem
  maintainer) and merged by Christian Brauner
- **Small change**: Only 6 lines moved, no new logic
- **No new features**: Pure bug fix
- **Low regression risk**: Simply extends existing lock protection
  slightly

### 6. **Similar Historical Fixes**
The git history shows multiple similar use-after-free fixes in this
subsystem have been backported:
- `f87904c07551` - writeback: avoid use-after-free after removing device
- `8e00c4e9dd85` - writeback: fix use-after-free in
  finish_writeback_work()
- `614a4e377314` - writeback, cgroup: fix premature wb_put()

### 7. **CONFIG_CGROUP_WRITEBACK Dependency**
While this bug primarily affects systems with `CONFIG_CGROUP_WRITEBACK`
enabled (used for cgroup-aware writeback), this is a common
configuration in modern distributions, especially those using cgroups v2
for container workloads.

### 8. **No Side Effects**
The change has no functional side effects beyond fixing the race:
- The trace event still fires at the same logical point
- The wb_wakeup_delayed() still executes under the same conditions
- Only the lock hold duration is slightly extended to cover the wb
  access

This is a textbook example of a commit that should be backported to
stable kernels - it fixes a serious bug with minimal, well-understood
changes in a critical subsystem.

 fs/fs-writeback.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index cc57367fb641..a07b8cf73ae2 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2608,10 +2608,6 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 			wakeup_bdi = inode_io_list_move_locked(inode, wb,
 							       dirty_list);
 
-			spin_unlock(&wb->list_lock);
-			spin_unlock(&inode->i_lock);
-			trace_writeback_dirty_inode_enqueue(inode);
-
 			/*
 			 * If this is the first dirty inode for this bdi,
 			 * we have to wake-up the corresponding bdi thread
@@ -2621,6 +2617,11 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 			if (wakeup_bdi &&
 			    (wb->bdi->capabilities & BDI_CAP_WRITEBACK))
 				wb_wakeup_delayed(wb);
+
+			spin_unlock(&wb->list_lock);
+			spin_unlock(&inode->i_lock);
+			trace_writeback_dirty_inode_enqueue(inode);
+
 			return;
 		}
 	}
-- 
2.50.1





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux