Re: [RFF] realpathat system call

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 2, 2025 at 7:35 PM Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
>
> On Fri, May 2, 2025 at 2:34 PM Jan Kara <jack@xxxxxxx> wrote:
> >
> > On Wed 30-04-25 22:50:23, Mateusz Guzik wrote:
> > > Before I explain why the system call and how, I'm noting a significant
> > > limitation upfront: in my proposal the system call is allowed to fail
> > > with EAGAIN. It's not inherent, but I think it's the sane thing to do.
> > > Why I think that's sensible and why it does not defeat the point is
> > > explained later.
> > >
> > > Why the system call: realpath(3) is issued a lot for example by gcc
> > > (mostly for header files). libc implements it as a series of
> > > readlinks(!) and it unsurprisingly looks atrocious:
> > > [pid 1096382] readlink("/usr", 0x7fffbac84f90, 1023) = -1 EINVAL
> > > (Invalid argument)
> > > [pid 1096382] readlink("/usr/local", 0x7fffbac84f90, 1023) = -1 EINVAL
> > > (Invalid argument)
> > > [pid 1096382] readlink("/usr/local/include", 0x7fffbac84f90, 1023) =
> > > -1 EINVAL (Invalid argument)
> > > [pid 1096382] readlink("/usr/local/include/bits", 0x7fffbac84f90,
> > > 1023) = -1 ENOENT (No such file or directory)
> > > [pid 1096382] readlink("/usr", 0x7fffbac84f90, 1023) = -1 EINVAL
> > > (Invalid argument)
> > > [pid 1096382] readlink("/usr/include", 0x7fffbac84f90, 1023) = -1
> > > EINVAL (Invalid argument)
> > > [pid 1096382] readlink("/usr/include/x86_64-linux-gnu",
> > > 0x7fffbac84f90, 1023) = -1 EINVAL (Invalid argument)
> > > [pid 1096382] readlink("/usr/include/x86_64-linux-gnu/bits",
> > > 0x7fffbac84f90, 1023) = -1 EINVAL (Invalid argument)
> > > [pid 1096382] readlink("/usr/include/x86_64-linux-gnu/bits/types",
> > > 0x7fffbac84f90, 1023) = -1 EINVAL (Invalid argument)
> > > [pid 1096382] readlink("/usr/include/x86_64-linux-gnu/bits/types/FILE.h",
> > > 0x7fffbac84f90, 1023) = -1 EINVAL (Invalid argument)
> > >
> > > and so on. This converts one path lookup to N (by path component). Not
> > > only that's terrible single-threaded, you may also notice all these
> > > lookups bounce lockref-containing cachelines for every path component
> > > in face of gccs running at the same time (and highly parallel
> > > compilations are not rare, are they).
> > >
> > > One way to approach this is to construct the new path on the fly. The
> > > problem with that is that it would require some rototoiling and more
> > > importantly is highly error prone (notably due to symlinks). This is
> > > the bit I'm trying to avoid.
> > >
> > > A very pleasant way out is to instead walk the path forward, then
> > > backward on the found dentry et voila -- all the complexity is handled
> > > for you. There is however a catch: no forward progress guarantee.
> >
> > So AFAIU what you describe here is doing a path lookup and then calling
> > d_path() on the result - actually prepend_path() as I'm glancing in your
> > POC code.
> >
>
> Ye that's the gist.
>
> > > rename seqlock is needed to guarantee correctness, otherwise if
> > > someone renamed a dir as you were resolving the path forward, by the
> > > time you walk it backwards you may get a path which would not be
> > > accessible to you -- a result which is not possible with userspace
> > > realpath.
> >
> > In presence of filesystem mutations paths are always unreliable, aren't
> > they? I mean even with userspace realpath() implementation the moment the
> > function call is returning the path the filesystem can be modified so that
> > the path stops being valid. With kernel it is the same. So I don't see any
> > strong reason to bother with handling parallel filesystem modifications.
> > But maybe I'm missing some practically important case...
> >
>
> The concern is not that the result is stale, but that it was not
> legitimately obtainable at any point by the caller doing the current
> realpath walk.
>
> Consider the following tree:
> /foo/file
> /bar
>
> where foo is 755, bar is 700 and both are owned by root, while the
> user issuing realpath has some other uid
>
> if root renames /foo/file to /bar/file while racing against realpath
> /foo/file, there is a time window where the user will find the dentry
> and by the time they d_path the result is /bar/file. but they never
> would get /bar/file with the current implementation.
>

That said, if this is considered fine, the entire thing turns a
trivial patch for sure.

-- 
Mateusz Guzik <mjguzik gmail.com>





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux