On Tue, Apr 22, 2025 at 07:54:08AM +1000, Dave Chinner wrote: > On Mon, Apr 21, 2025 at 10:47:39PM +0900, Harry Yoo wrote: > > Hi folks, > > > > As a long term project, I'm starting to look into resurrecting > > Slab Movable Objects. The goal is to make certain types of slab memory > > movable and thus enable targeted reclamation, migration, and > > defragmentation. > > > > The main purpose of this posting is to briefly review what's been tried > > in the past, ask people why prior efforts have stalled (due to lack of > > time or insufficient justification for additional complexity?), > > and discuss what's feasible today. > > > > Please add anyone I may have missed to Cc. :) > > Adding -fsdevel because dentry/inode cache discussion needs to be > visible to all the fs/VFS developers. > > I'm going to cut straight to the chase here, but I'll leave the rest > of the original email quoted below for -fsdevel readers. > > > Previous Work on Slab Movable Objects > > ===================================== > > <snip> > > Without including any sort of viable proposal for dentry/inode > relocation (i.e. the showstopper for past attempts), what is the > point of trying to ressurect this? Migrating slabs still makes sense for other objects such as xarray / maple tree nodes, and VMAs. Of course, if filesystem folks could enhance it further and make more of dentry/inode objects that would be very welcome. > However, I can think of two possible solutions to the untracked > external inode reference issue. > > The first is that external inode references need to take an active > reference to the inode (like a dentry does), and this prevents > inodes from being relocated whilst such external references exist. > > Josef has proposed an active/passive reference counting mechanism > for all references to inodes recently on -fsdevel here: > > https://lore.kernel.org/linux-fsdevel/20250303170029.GA3964340@perftesting/ > > However, the ability to revoke external references and/or resolve > internal references atomically has not really been considered at > this point in time. ...alright, I expect that'll be more tricker part. > To allow referenced inodes to be relocated, I'd suggest that any > subsystem that takes an external reference to the inode needs to > provide something like a SRCU notifier block to allow the external > reference to be dynamically removed. Once the relocation is done, > another notifier method can be called allowing the external > reference to be updated with the new inode address. Any attempt to > access the inode whilst it is being relocated through that external > mechanism should probably block. > > [ Note: this could be leveraged as a general ->revoke mechanism for > external inode references. Instead of the external access blocking > after reference recall, it would return an error if access > revocation has occurred. This mechanism could likely also solve some > of the current lifetime issues with fsnotify and landlock objects. ] > > This leaves internal (passive) references that can be resolved by > locking the inode itself. e.g. getting rid of mapping tree > references (e.g. folio->mapping->host) by invalidating the > inode page cache. Thank you so much for such a detailed writeup. The former approach would allow allocating them from movable areas, help mm/compaction.c to build high-order folios, and help slab to reduce fragmentation. > The other solution is to prevent excessive inode slab cache > fragmentation in the first place. i.e. *stop caching unreferenced > inodes*. In this case, the inode LRU goes away and we rely fully on > the dentry cache pinning inodes to maintain the working set of > inodes in memory. This works with/without Josef's proposed reference > counting changes - though Josef's proposed changes make getting rid > of the inode LRU a lot easier. > > I talk about some of that stuff in the discussion of this superblock > inode list iteration patchset here: > > https://lore.kernel.org/linux-fsdevel/20241002014017.3801899-1-david@xxxxxxxxxxxxx/ The latter approach, while it does not make them relocatable, will reduce fragmentation at least. Unfortunately, as an MM developer, I don’t have enough experience with filesystems to assess which proposal is more feasible. It would be really helpful to get consensus from the FS folks before we push this path forward—whether it's relocating inode entries or avoiding their fragmentation. -- Cheers, Harry / Hyeonggon