Hi Mickaël, Song, Al and other reviewers, In this series I present some experiments and investigations I did on a potential optimization for Landlock, which is to avoid taking references during the walk from the target being accessed toward the global root. I initially made this change and tested it, but did not send for review as I was slightly uncertain about the correctness here, but I did saw significant performance improvements. However, earlier I saw Al Viro mention in another discussion [1]: > Note, BTW, that it might be better off by doing that similar to > d_path.c - without arseloads of dget_parent/dput et.al.; not sure > how feasible it is, but if everything in it can be done under > rcu_read_lock(), that's something to look into. which prompted me to pick this up again (even though I realize in the reply to [1] Song wasn't sure this would work). After some thinking, I can think of one particular case where a simple chase of `d_parent`s would be incorrect - that is, when unlinks are involved and dentry can turn negative. I _think_ that if we can check the global rename seqcount and restart the pathwalk with the old approach if it changes, we might avoid these sort of issues. However I'm new to this and very welcome to be proven wrong here :) I realize that Song has already submitted a patch [2] to extract out the pathwalk logic here and improve the mountpoint handling, and I don't want this to block that in any way, but I think this is worth looking into separate from that, based on the benchmarking results below: Earlier when testing other Landlock performance improvement ideas I came up with what I think is a "typical" workload [3] involving two layers and a few rules to places like $HOME, /tmp, /proc etc. (Note that this is just something I came up with, and Mickaël might not agree on the "typicalness" here.) With that we have: Comparing: original pathwalk-noref // this is the % of time spent in Landlock within an openat syscall landlock_overhead: avg = 33 27 (-6) (%) median = 34 28 (-6) (%) // absolute time spent in Landlock landlock_hook: avg = 852 627 (-26.4%) (ns) median = 827 611 (-26.1%) (ns) // duration of the entire openat syscall open_syscall: avg = 2507 2260 (-9.9%) (ns) median = 2451 2222 (-9.3%) (ns) I also ran the benchmarks Mickaël created [4], which tests openat performance at various depths in the file hierarchy, all with 2 rules, and the results are basically: depth inc. / % change in median time spent in Landlock % change in median openat duration % in Landlock, old -> new 1 -21.9% +3.4% 9 -> 7 10 -45.3% -2.4% 16 -> 9 20 -57.2% -7.7% 21 -> 10 30 -61.1% -75.9% 22 -> 11 // ^^ On the unchanged kernel, this seems to be a bimodal // distribution when landlock is involved. Outliers happens // much less on the kernel with this patch applied. Raw results including histograms at https://fileshare.maowtm.org/landlock/20250603/index.html [1]: https://lore.kernel.org/all/20250529231018.GP2023217@ZenIV/ [2]: https://lore.kernel.org/all/20250603065920.3404510-1-song@xxxxxxxxxx/ [3]: https://github.com/torvalds/linux/commit/f1865ce970af97ac3b6f4edf580529b8cdc66371 [4]: https://github.com/landlock-lsm/landlock-test-tools/pull/16 Tingmao Wang (3): landlock: walk parent dir without taking references selftests/landlock: Add fs_race_test Restart pathwalk on rename seqcount change security/landlock/fs.c | 133 ++++- .../testing/selftests/landlock/fs_race_test.c | 505 ++++++++++++++++++ 2 files changed, 617 insertions(+), 21 deletions(-) create mode 100644 tools/testing/selftests/landlock/fs_race_test.c base-commit: a5806cd506af5a7c19bcd596e4708b5c464bfd21 -- 2.49.0