NOTE: this is a WIP not meant to be included anywhere yet and perhaps should be split into 2 patchsets. It is generated against against: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs-6.18.inode.refcount.preliminaries The first patch in the series is ready to use(tm) and was sent separately here: https://lore.kernel.org/linux-fsdevel/20250909082613.1296550-1-mjguzik@xxxxxxxxx/T/#u It is included in this posting as the newly patched routine has to get patched further due to the i_state accessor thing. Having it here should make it handier to test for interested. This is a cleaned up continuation of the churn-ey patch which merely removed I_WILL_FREE: https://lore.kernel.org/linux-fsdevel/20250902145428.456510-1-mjguzik@xxxxxxxxx/ The entire thing is a response to the patchset by Josef Bacik concerning refcount changes, see: https://lore.kernel.org/linux-fsdevel/cover.1756222464.git.josef@xxxxxxxxxxxxxx/ I'm writing my second reply to that patchset, but in the meantime the stuff below should facilitate work forward, regardless if the refcount patchset goes in or not. The patchset splits churn from actual changes. Plain ->i_state access is still possible to reduce upfront churn, only some of the tree got covered so far. short rundown: fs: hide ->i_state handling behind accessors This is churn-ey and should largely be a nop, worst case there will be failed lock assertions if I messed up some annotations and which should be easy to sort out. It covers fs/*.c and friends, but no filesystems. bcachefs: use the new ->i_state accessors btrfs: use the new ->i_state accessors ext4: use the new ->i_state accessors gfs2: use the new ->i_state accessors ocfs2: use the new ->i_state accessors This patches only the filesystems which reference I_WILL_FREE. Again should be a nop. ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage Actual change, needs ocfs2 folks approval. fs: set I_FREEING instead of I_WILL_FREE in iput_final() prior to writeback Actual change. I_WILL_FREE still exists as a macro but is no longer set by anything. As of right now I'm not fully confident this is correct, the writeback code is more fuck-ey than not. However, figuring this out is imo a hard prerequisite for the refcount patchset by Josef anyway. fs: retire the I_WILL_FREE flag Churn change to whack the flag. It also uses the opportunity to do some cosmetics. Onto the rationale: I. i_state The handling in the stock kernel is highly error prone and with 0 assert coverage. Notably there is nothing guaranteeing the caller owns the necessary lock when making changes. But apart from that there are spots which look at ->i_state several times and it is unclear if they think it is stable or not. Moreover spots used on inode teardown use WRITE_ONCE, some spots in hash lookup use READ_ONCE, but everyone else issues plain loads and stores which invites compiler mischief. All of this is easily preventable. The ideal state as I see it would also hide the field behind a struct so that plain open-coded accesses fail to compile. Not done yet to reduce churn. Another step not taken here but to be sorted out later is strict handling of flags, where it is illegal to clear flags which are not present and to set flags which are already set -- the kernel should know whether a given flag can legally be present or not (trivial example: I_FREEING). Setting a flag which is already set is more likely to be a logic error than not. If it turns out there are cases where a flag can be legally already present/missing, an additional helper can be added to forego the assertion. Practical examples: spin_lock(&inode->i_lock); if (inode_state_read(inode) & I_WHATEVER) { .... } spin_unlock(&inode->i_lock); This asserts the lock is held. But if the caller is looking to do a lockless check first, they can do it and explicitly denote this is what they want: if (inode_state_read_unlocked(inode) & I_WHATEVER) { spin_lock(&inode->i_lock); if (inode_state_read(inode) & I_WHATEVER) { .... } spin_unlock(&inode->i_lock); } Similarly: state = inode_state_read_unlocked(inode); if (state & I_CRAP) { } else (state & I_MEH) { } ... We are guaranteed no mischief and the caller acknowledges the value in the inode could have changed from under them and the code is READ_ONCE (as opposed to plain ->i_state loads now). Furthermore, should better lifecycle tracking get introduced, the helpers can validate no flags get added when it is invalid to do so. The *current* routines are as below. I don't care about specific names, I do care about semantics. /* * i_state handling * * We hide all of it behind helpers so that we can validate consumers. */ static inline enum inode_state_flags_enum inode_state_read(struct inode *inode) { lockdep_assert_held(&inode->i_lock); return inode->i_state; } static inline enum inode_state_flags_enum inode_state_read_unlocked(struct inode *inode) { return READ_ONCE(inode->i_state); } static inline void inode_state_add(struct inode *inode, enum inode_state_flags_enum newflags) { lockdep_assert_held(&inode->i_lock); WRITE_ONCE(inode->i_state, inode->i_state | newflags); } static inline void inode_state_del(struct inode *inode, enum inode_state_flags_enum rmflags) { lockdep_assert_held(&inode->i_lock); WRITE_ONCE(inode->i_state, inode->i_state & ~rmflags); } static inline void inode_state_set_unchecked(struct inode *inode, enum inode_state_flags_enum newflags) { WRITE_ONCE(inode->i_state, newflags); } The inode_state_set_unchecked() crapper is there to handle early access during inode construction (before it lands in the hash). II. I_WILL_FREE removal Sounds like nobody likes this flag and even the developer documenting it in fs.h was not able to provide a justification for its existence, merely stating how it is used. As far as I can tell the only use was to allow ->drop_inode() handlers to drop ->i_lock and still prevent anyone from picking up the inode. I *suspect* this was used instead of I_FREEING because the routine could have decided to *not* drop afterwards. Differentiating between indicating the inode is going down vs just telling the consumer to bugger off for the time being seemed like an ok idea. However, the only filesystem using today is ocfs2, it always returns "drop it" and this usage does not even have to be there. Removed in one of the patches. Apart from that the only use was write_inode_now() call in iput_final() prior to setting I_FREEING anyway. This probably works as posted here, but there might be some fuckery to sort out in writeback to truly eliminate the flag. In the worst case it's just some work, but *so far* I'm not staking anything on the patchset being fully correct yet. tl;dr the flag does not have to be there, but there may be dragons in writeback (to be seen). No matter what, shaking bugs out of this should be considered a pre-requisite for any future work regarding inode lifecycle (whether the refcount patchset lands or not, imo it should not which I'll elaborate on later in that thread). Apart from that the I_CREATING flag seems to have inconsistent handling, but that's for another e-mail after I get a better hang of it. So.. comments? Mateusz Guzik (10): fs: expand dump_inode() fs: hide ->i_state handling behind accessors bcachefs: use the new ->i_state accessors btrfs: use the new ->i_state accessors ext4: use the new ->i_state accessors gfs2: use the new ->i_state accessors ocfs2: use the new ->i_state accessors ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage fs: set I_FREEING instead of I_WILL_FREE in iput_final() prior to writeback fs: retire the I_WILL_FREE flag block/bdev.c | 4 +- fs/bcachefs/fs.c | 8 +- fs/btrfs/inode.c | 10 +-- fs/buffer.c | 4 +- fs/crypto/keyring.c | 2 +- fs/crypto/keysetup.c | 2 +- fs/dcache.c | 8 +- fs/drop_caches.c | 2 +- fs/ext4/inode.c | 10 +-- fs/ext4/orphan.c | 4 +- fs/fs-writeback.c | 131 +++++++++++++++---------------- fs/gfs2/file.c | 2 +- fs/gfs2/glops.c | 2 +- fs/gfs2/inode.c | 4 +- fs/gfs2/ops_fstype.c | 2 +- fs/inode.c | 115 ++++++++++++++------------- fs/libfs.c | 6 +- fs/namei.c | 8 +- fs/notify/fsnotify.c | 8 +- fs/ocfs2/dlmglue.c | 2 +- fs/ocfs2/inode.c | 27 +------ fs/ocfs2/inode.h | 1 - fs/ocfs2/ocfs2_trace.h | 2 - fs/ocfs2/super.c | 2 +- fs/pipe.c | 2 +- fs/quota/dquot.c | 2 +- fs/sync.c | 2 +- fs/xfs/scrub/common.c | 3 +- include/linux/backing-dev.h | 5 +- include/linux/fs.h | 75 ++++++++++++------ include/linux/writeback.h | 4 +- include/trace/events/writeback.h | 11 ++- security/landlock/fs.c | 12 +-- 33 files changed, 249 insertions(+), 233 deletions(-) -- 2.43.0