On 19/03/2025 10:11, Dave Chinner wrote: > On Wed, Mar 19, 2025 at 08:19:19AM +0000, Hans Holmberg wrote: >> Presently we start garbage collection late - when we start running >> out of free zones to backfill max_open_zones. This is a reasonable >> default as it minimizes write amplification. The longer we wait, >> the more blocks are invalidated and reclaim cost less in terms >> of blocks to relocate. >> >> Starting this late however introduces a risk of GC being outcompeted >> by user writes. If GC can't keep up, user writes will be forced to >> wait for free zones with high tail latencies as a result. >> >> This is not a problem under normal circumstances, but if fragmentation >> is bad and user write pressure is high (multiple full-throttle >> writers) we will "bottom out" of free zones. >> >> To mitigate this, introduce a gc_pressure mount option that lets the >> user specify a percentage of how much of the unused space that gc >> should keep available for writing. A high value will reclaim more of >> the space occupied by unused blocks, creating a larger buffer against >> write bursts. >> >> This comes at a cost as write amplification is increased. To >> illustrate this using a sample workload, setting gc_pressure to 60% >> avoids high (500ms) max latencies while increasing write amplification >> by 15%. > > It seems to me that this is runtime workload dependent, and so maybe > a tunable variable in /sys/fs/xfs/<dev>/.... might suit better? > > That way it can be controlled by a userspace agent as the filesystem > fills and empties rather than being fixed at mount time and never > really being optimal for a changing workload... Yes, that would make sense. > >> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h >> index 799b84220ebb..af595024de00 100644 >> --- a/fs/xfs/xfs_mount.h >> +++ b/fs/xfs/xfs_mount.h >> @@ -229,6 +229,7 @@ typedef struct xfs_mount { >> bool m_finobt_nores; /* no per-AG finobt resv. */ >> bool m_update_sb; /* sb needs update in mount */ >> unsigned int m_max_open_zones; >> + unsigned int m_gc_pressure; > > This is not explicitly initialised somewhere. If the magic "mount > gets zeroed on allocation" value of zero it gets means this feature > is turned off, there needs to be a comment somewhere explaining why > it is turned completely off rather than having a default of, say, > 5% like we have for low space allocation thresholds in various > other lowspace allocation and reclaim algorithms.... Right, it is not at all obvious why 0 is a good default. I'll add a comment in the next rev. >> + >> + free = xfs_estimate_freecounter(mp, XC_FREE_RTEXTENTS); >> + if (available < div_s64(free * mp->m_gc_pressure, 100)) > > mult_frac(free, mp->m_gc_pressure, 100) to avoid overflow. > > Also, this is really a free space threshold, not a dynamic > "pressure" measurement... > Yeah, naming is hard. I can't come up with a better name off the bat, but I'll give it some thought. Thanks, Hans