Re: [DISCUSSION] Revisiting Slab Movable Objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 21, 2025 at 10:47:39PM +0900, Harry Yoo wrote:
> Hi folks,
> 
> As a long term project, I'm starting to look into resurrecting
> Slab Movable Objects. The goal is to make certain types of slab memory
> movable and thus enable targeted reclamation, migration, and
> defragmentation.
> 
> The main purpose of this posting is to briefly review what's been tried
> in the past, ask people why prior efforts have stalled (due to lack of
> time or insufficient justification for additional complexity?),
> and discuss what's feasible today.
> 
> Please add anyone I may have missed to Cc. :)

Adding -fsdevel because dentry/inode cache discussion needs to be
visible to all the fs/VFS developers.

I'm going to cut straight to the chase here, but I'll leave the rest
of the original email quoted below for -fsdevel readers.

> Previous Work on Slab Movable Objects
> =====================================

<snip>

Without including any sort of viable proposal for dentry/inode
relocation (i.e. the showstopper for past attempts), what is the
point of trying to ressurect this?

I don't have a solution for the dentry cache reference issues - the
dentry cache maintains the working set of files, so anything that
randomly shoots down unused dentries for compaction is likely to
have negative performance implications for dentry cache intensive
workloads.

However, I can think of two possible solutions to the untracked
external inode reference issue.

The first is that external inode references need to take an active
reference to the inode (like a dentry does), and this prevents
inodes from being relocated whilst such external references exist.

Josef has proposed an active/passive reference counting mechanism
for all references to inodes recently on -fsdevel here:

https://lore.kernel.org/linux-fsdevel/20250303170029.GA3964340@perftesting/

However, the ability to revoke external references and/or resolve
internal references atomically has not really been considered at
this point in time.

To allow referenced inodes to be relocated, I'd suggest that any
subsystem that takes an external reference to the inode needs to
provide something like a SRCU notifier block to allow the external
reference to be dynamically removed. Once the relocation is done,
another notifier method can be called allowing the external
reference to be updated with the new inode address.  Any attempt to
access the inode whilst it is being relocated through that external
mechanism should probably block.

[ Note: this could be leveraged as a general ->revoke mechanism for
external inode references. Instead of the external access blocking
after reference recall, it would return an error if access
revocation has occurred. This mechanism could likely also solve some
of the current lifetime issues with fsnotify and landlock objects. ]

This leaves internal (passive) references that can be resolved by
locking the inode itself. e.g. getting rid of mapping tree
references (e.g. folio->mapping->host) by invalidating the
inode page cache.

The other solution is to prevent excessive inode slab cache
fragmentation in the first place. i.e. *stop caching unreferenced
inodes*. In this case, the inode LRU goes away and we rely fully on
the dentry cache pinning inodes to maintain the working set of
inodes in memory. This works with/without Josef's proposed reference
counting changes - though Josef's proposed changes make getting rid
of the inode LRU a lot easier.

I talk about some of that stuff in the discussion of this superblock
inode list iteration patchset here:

https://lore.kernel.org/linux-fsdevel/20241002014017.3801899-1-david@xxxxxxxxxxxxx/

-Dave.

> 
> Previous Work on Slab Movable Objects
> =====================================
> 
> Christoph Lameter, Slab Defragmentation Reduction, 2007-2017 (V16: [2]):
> Christoph Lameter, Slab object migration for xarray, 2017-2018 (V2: [3]):
>   Christoph's long-standing effort (since 2007) aiming to defragment
>   slab memory in cases where sparsely populated slabs occupy excessive
>   amount of memory.
> 
>   Early versions of the work focused on defragmenting slab caches
>   for filesystem data structures such as inode, dentry, and buffer head.
>   updatedb was suggested as the standard way to trigger for generating
>   sparsely populated slabs on file servers.
> 
>   However, defragmenting slabs for filesystem data structures has proven
>   to be very difficult to fully solve, because inodes and dentries are
>   neither reclaimable nor migratable, limiting the effectiveness of
>   defragmentation.
> 
>   In late 2018, the effort was revived with a new focus on migrating
>   XArray nodes. However, it appears the work was discontinued after
>   V2 [3]?
> 
> Tobin C. Harding, Slab Movable Objects, 2019 (First Non-RFC: [5])
> - Tobin C. Harding revived Christoph's earlier work and introduced
>   a few enhancements, including partial shrinking of dentries, moving
>   objects to and from a specific NUMA node, and balancing objects across
>   all NUMA nodes.
> 
>   Also appears to be discontinued after the first non-RFC version [5]? 
> 
> At LSFMM 2017, Andrea Arcangeli suggested [6] virtually mapped slabs,
> which might be useful since migrating them does not require changing the
> address of objects. But as Rik van Riel pointed out at that time, it
> isn't really useful for defragmentation. Andrea Arcangeli responded
> that it can be beneficial for memory hotplug, compaction and out-of-memory
> avoidance.
> 
> The exact mechanism wasn't described in [6], but I assume it'll involve
> 1) unmap a slab (and page faults after unmap need to wait for migration
> to complete), 2) copy objects to a new slab, and 3) map the new slab?
> But the idea hasn't gained enough attention for anyone to actually
> implement it.
> 
> Potential Candidates of SMO
> ===========================
> 
> Basic Rules
> -----------
> 
> - Slab memory can only be reclaimed or migrated if the user of the slab
>   provides a way to isolate / migrate objects.
> - If objects can be reclaimed, it makes sense to simply reclaim them
>   instead of migrating them (unless we know it's better to keep that
>   object in memory).
> - Some objects can't be reclaimed, but migrating them is (if possible)
>   still useful for defragmentation and compaction.
>   - However it is not always feasible 
> 
> Potential candidates include (but not limited to):
> --------------------------------------------------
> 
> - XArray nodes can be migrated (can't be reclaimed as they're being used)
>   - Can be reclaimed if it only includes shadow entries.
> - Maple tree nodes (if without external locking) and VMAs can be migrated
>   and obviously can't be reclaimed.
> - Negative dentry should be reclaimed, instead of being migrated.
> - Only unused dentries can be reclaimed without high cost.
>   - Dentries with nonzero refcount are not really relocatable? (per [1])
> - Even unused inodes can't be reclaimed nor relocated due to external
>   references? (per [4])
> 
> Al Viro made it clear [1] that inodes / dentries are not really
> relocatable. He also mentioned:
> > So from the correctness POV
> > 	* you can kick out everything with zero refcount not
> > on shrink lists.
> > 	* you _might_ try shrink_dcache_parent() on directory
> > dentries, in hope to drive their refcount to zero.  However,
> > that's almost certainly going to hit too hard and be too costly.
> > 	* d_invalidate() is no-go; if anything, you want something
> > weaker than shrink_dcache_parent(), not stronger.
> > 
> > For anything beyond "just kick out everything in that page that
> > happens to have zero refcount" I would really like to see the
> > stats - how much does it help, how costly it is _and_ how much
> > of the cache does it throw away (see above re running into a root
> > dentry of some filesystem and essentially trimming dcache for
> > that fs down to the unevictable stuff).
> 
> Dave Chinner mentioned [4] why it is hard to reclaim or migrate (in a
> targeted manner) even inodes with no active references:
> > On Wed, Dec 27, 2017 at 04:06:36PM -0600, Christoph Lameter wrote:
> > > This is a patchset on top of Matthew Wilcox Xarray code and implements
> > > object migration of xarray nodes. The migration is integrated into
> > > the defragmetation and shrinking logic of the slab allocator.
> > .....
> > > This is only possible for xarray for now but it would be worthwhile
> > > to extend this to dentries and inodes.
> > 
> > Christoph, you keep saying this is the goal, but I'm yet to see a
> > solution proposed for the atomic replacement of all the pointers to
> > an inode from external objects.  An inode that has no active
> > references still has an awful lot of passive and internal references
> > that need to be dealt with.
> > 
> > e.g. racing page operations accessing mapping->host, the inode in
> > various lists (e.g. superblock inode list, writeback lists, etc),
> > the inode lookup cache(s), backpointers from LSMs, fsnotify marks,
> > crypto information, internal filesystem pointers (e.g. log items,
> > journal handles, buffer references, etc) and so on. And each
> > filesystem has a different set of passive references, too.
> > 
> > Oh, and I haven't even mentioned deadlocks yet, either. :P
> > 
> > IOWs, just saying "it would be worthwhile to extend this to dentries
> > and inodes" completely misrepresents the sheer complexity of doing
> > so. We've known that atomic replacement is the big problem for
> > defragging inodes and dentries since this work was started, what,
> > more than 10 years? And while there's been many revisions of the
> > core defrag code since then, there has been no credible solution
> > presented for atomic replacement of objects with complex external
> > references. This is a show-stopper for inode/dentry slab defrag, and
> > I don't see that this new patchset is any different...
> 
> [1] https://lore.kernel.org/linux-mm/20190403190520.GW2217@xxxxxxxxxxxxxxxxxx
> [2] https://lore.kernel.org/linux-mm/20170307212429.044249411@xxxxxxxxx
> [3] https://marc.info/?l=linux-mm&m=154533371911133
> [4] https://lore.kernel.org/linux-mm/20171228222419.GQ1871@rh
> [5] https://lore.kernel.org/linux-mm/20190603042637.2018-1-tobin@xxxxxxxxxx
> [6] https://lwn.net/Articles/717650
> 
> -- 
> Cheers,
> Harry / Hyeonggon
> 

-- 
Dave Chinner
david@xxxxxxxxxxxxx




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux