On Tue, Apr 22, 2025 at 03:33:03PM +0200, Mateusz Guzik wrote: > On Tue, Apr 22, 2025 at 12:37 PM Jan Kara <jack@xxxxxxx> wrote: > > > > On Wed 16-04-25 15:17:22, Christian Brauner wrote: > > > We currently always chase a pointer inode->i_sb->s_user_ns whenever we > > > need to map a uid/gid which is noticeable during path lookup as noticed > > > by Linus in [1]. In the majority of cases we don't need to bother with > > > that pointer chase because the inode won't be located on a filesystem > > > that's mounted in a user namespace. The user namespace of the superblock > > > cannot ever change once it's mounted. So introduce and raise IOP_USERNS > > > on all inodes and check for that flag in i_user_ns() when we retrieve > > > the user namespace. > > > > > > Link: https://lore.kernel.org/CAHk-=whJgRDtxTudTQ9HV8BFw5-bBsu+c8Ouwd_PrPqPB6_KEQ@xxxxxxxxxxxxxx [1] > > > Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx> > > > > Some performance numbers would be in place here I guess - in particular > > whether this change indeed improved the speed of path lookup or whether the > > cost just moved elsewhere. > > Note that right now path lookup is a raging branchfest, with some > avoidable memory references to boot. > > I have a WIP patch to bypass inode permission checks with an > ->i_opflag and get over 5% speed up when stating stuff in > /usr/include/linux/. This might be slightly more now. > > Anyhow, this bit here probably does not help that much in isolation > and I would not worry about that fact given the overall state. > Demonstrating that this indeed avoids some work in the common case > would be sufficient for me. > > To give you a taste: stat(2) specifically around 4.28 mln ops/s on my > box. Based on perf top I estimate sorting out the avoidable > single-threaded slowdowns will bring it above 5 mln. > > The slowdowns notably include the dog slow memory allocation (likely > to be sorted out with sheaves), the smp_mb fence in legitimize_mnt and > more. > > Part of the problem is LOOKUP_RCU checks all over the place. I presume > the intent was to keep this and refwalk closely tied to reduce code > duplication and make sure all parties get updated as needed. I know > the code would be faster (and I *suspect* cleaner) if this got > refactored into dedicated routines instead. Something to ponder after > the bigger fish is fried. I think the cleanup itself the right thing to do because it makes it obvious that we're not doing any work when no idmapped mounts are involved. v2 is a lot cleaner and simpler as well.