On Fri, May 2, 2025 at 2:34 PM Jan Kara <jack@xxxxxxx> wrote: > > On Wed 30-04-25 22:50:23, Mateusz Guzik wrote: > > Before I explain why the system call and how, I'm noting a significant > > limitation upfront: in my proposal the system call is allowed to fail > > with EAGAIN. It's not inherent, but I think it's the sane thing to do. > > Why I think that's sensible and why it does not defeat the point is > > explained later. > > > > Why the system call: realpath(3) is issued a lot for example by gcc > > (mostly for header files). libc implements it as a series of > > readlinks(!) and it unsurprisingly looks atrocious: > > [pid 1096382] readlink("/usr", 0x7fffbac84f90, 1023) = -1 EINVAL > > (Invalid argument) > > [pid 1096382] readlink("/usr/local", 0x7fffbac84f90, 1023) = -1 EINVAL > > (Invalid argument) > > [pid 1096382] readlink("/usr/local/include", 0x7fffbac84f90, 1023) = > > -1 EINVAL (Invalid argument) > > [pid 1096382] readlink("/usr/local/include/bits", 0x7fffbac84f90, > > 1023) = -1 ENOENT (No such file or directory) > > [pid 1096382] readlink("/usr", 0x7fffbac84f90, 1023) = -1 EINVAL > > (Invalid argument) > > [pid 1096382] readlink("/usr/include", 0x7fffbac84f90, 1023) = -1 > > EINVAL (Invalid argument) > > [pid 1096382] readlink("/usr/include/x86_64-linux-gnu", > > 0x7fffbac84f90, 1023) = -1 EINVAL (Invalid argument) > > [pid 1096382] readlink("/usr/include/x86_64-linux-gnu/bits", > > 0x7fffbac84f90, 1023) = -1 EINVAL (Invalid argument) > > [pid 1096382] readlink("/usr/include/x86_64-linux-gnu/bits/types", > > 0x7fffbac84f90, 1023) = -1 EINVAL (Invalid argument) > > [pid 1096382] readlink("/usr/include/x86_64-linux-gnu/bits/types/FILE.h", > > 0x7fffbac84f90, 1023) = -1 EINVAL (Invalid argument) > > > > and so on. This converts one path lookup to N (by path component). Not > > only that's terrible single-threaded, you may also notice all these > > lookups bounce lockref-containing cachelines for every path component > > in face of gccs running at the same time (and highly parallel > > compilations are not rare, are they). > > > > One way to approach this is to construct the new path on the fly. The > > problem with that is that it would require some rototoiling and more > > importantly is highly error prone (notably due to symlinks). This is > > the bit I'm trying to avoid. > > > > A very pleasant way out is to instead walk the path forward, then > > backward on the found dentry et voila -- all the complexity is handled > > for you. There is however a catch: no forward progress guarantee. > > So AFAIU what you describe here is doing a path lookup and then calling > d_path() on the result - actually prepend_path() as I'm glancing in your > POC code. > Ye that's the gist. > > rename seqlock is needed to guarantee correctness, otherwise if > > someone renamed a dir as you were resolving the path forward, by the > > time you walk it backwards you may get a path which would not be > > accessible to you -- a result which is not possible with userspace > > realpath. > > In presence of filesystem mutations paths are always unreliable, aren't > they? I mean even with userspace realpath() implementation the moment the > function call is returning the path the filesystem can be modified so that > the path stops being valid. With kernel it is the same. So I don't see any > strong reason to bother with handling parallel filesystem modifications. > But maybe I'm missing some practically important case... > The concern is not that the result is stale, but that it was not legitimately obtainable at any point by the caller doing the current realpath walk. Consider the following tree: /foo/file /bar where foo is 755, bar is 700 and both are owned by root, while the user issuing realpath has some other uid if root renames /foo/file to /bar/file while racing against realpath /foo/file, there is a time window where the user will find the dentry and by the time they d_path the result is /bar/file. but they never would get /bar/file with the current implementation. -- Mateusz Guzik <mjguzik gmail.com>