Re: [PATCH] fs/xattr: Fix handling of AT_FDCWD in setxattrat(2) and getxattrat(2)

Mateusz Guzik <mjguzik@xxxxxxxxx> · Mon, 28 Apr 2025 17:50:55 +0200

On Mon, Apr 28, 2025 at 11:58 AM Christian Brauner <brauner@xxxxxxxxxx> wrote:
>
> On Sat, Apr 26, 2025 at 09:30:25PM +0200, Mateusz Guzik wrote:
> > On Fri, Apr 25, 2025 at 3:33 PM Jan Kara <jack@xxxxxxx> wrote:
> > >
> > > On Fri 25-04-25 10:45:22, Christian Brauner wrote:
> > > > On Thu, Apr 24, 2025 at 05:45:17PM +0200, Mateusz Guzik wrote:
> > > > > On Thu, Apr 24, 2025 at 03:22:47PM +0200, Jan Kara wrote:
> > > > > > Currently, setxattrat(2) and getxattrat(2) are wrongly handling the
> > > > > > calls of the from setxattrat(AF_FDCWD, NULL, AT_EMPTY_PATH, ...) and
> > > > > > fail with -EBADF error instead of operating on CWD. Fix it.
> > > > > >
> > > > > > Fixes: 6140be90ec70 ("fs/xattr: add *at family syscalls")
> > > > > > Signed-off-by: Jan Kara <jack@xxxxxxx>
> > > > > > ---
> > > > > >  fs/xattr.c | 4 ++--
> > > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/fs/xattr.c b/fs/xattr.c
> > > > > > index 02bee149ad96..fabb2a04501e 100644
> > > > > > --- a/fs/xattr.c
> > > > > > +++ b/fs/xattr.c
> > > > > > @@ -703,7 +703,7 @@ static int path_setxattrat(int dfd, const char __user *pathname,
> > > > > >           return error;
> > > > > >
> > > > > >   filename = getname_maybe_null(pathname, at_flags);
> > > > > > - if (!filename) {
> > > > > > + if (!filename && dfd >= 0) {
> > > > > >           CLASS(fd, f)(dfd);
> > > > > >           if (fd_empty(f))
> > > > > >                   error = -EBADF;
> > > > > > @@ -847,7 +847,7 @@ static ssize_t path_getxattrat(int dfd, const char __user *pathname,
> > > > > >           return error;
> > > > > >
> > > > > >   filename = getname_maybe_null(pathname, at_flags);
> > > > > > - if (!filename) {
> > > > > > + if (!filename && dfd >= 0) {
> > > > > >           CLASS(fd, f)(dfd);
> > > > > >           if (fd_empty(f))
> > > > > >                   return -EBADF;
> > > > >
> > > > > Is there any code which legitimately does not follow this pattern?
> > > > >
> > > > > With some refactoring getname_maybe_null() could handle the fd thing,
> > > > > notably return the NULL pointer if the name is empty. This could bring
> > > > > back the invariant that the path argument is not NULL.
> > > > >
> > > > > Something like this:
> > > > > static inline struct filename *getname_maybe_null(int fd, const char __user *name, int flags)
> > > > > {
> > > > >         if (!(flags & AT_EMPTY_PATH))
> > > > >                 return getname(name);
> > > > >
> > > > >         if (!name && fd >= 0)
> > > > >                 return NULL;
> > > > >         return __getname_maybe_null(fd, name);
> > > > > }
> > > > >
> > > > > struct filename *__getname_maybe_null(int fd, const char __user *pathname)
> > > > > {
> > > > >         char c;
> > > > >
> > > > >         if (fd >= 0) {
> > > > >                 /* try to save on allocations; loss on um, though */
> > > > >                 if (get_user(c, pathname))
> > > > >                         return ERR_PTR(-EFAULT);
> > > > >                 if (!c)
> > > > >                         return NULL;
> > > > >         }
> > > > >
> > > > >     /* we alloc suffer the allocation of the buffer. worst case, if
> > > > >      * the name turned empty in the meantime, we return it and
> > > > >      * handle it the old-fashioned way.
> > > > >      /
> > > > >         return getname_flags(pathname, LOOKUP_EMPTY);
> > > > > }
> > > > >
> > > > > Then callers would look like this:
> > > > > filename = getname_maybe_null(dfd, pathname, at_flags);
> > > > > if (!filename) {
> > > > >     /* fd handling goes here */
> > > > >     CLASS(fd, f)(dfd);
> > > > >     ....
> > > > >
> > > > > } else {
> > > > >     /* regular path handling goes here */
> > > > > }
> > > > >
> > > > >
> > > > > set_nameidata() would lose this branch:
> > > > > p->pathname = likely(name) ? name->name : "";
> > > > >
> > > > > and putname would convert IS_ERR_OR_NULL (which is 2 branches) into one,
> > > > > maybe like so:
> > > > > -       if (IS_ERR_OR_NULL(name))
> > > > > +       VFS_BUG_ON(!name);
> > > > > +
> > > > > +       if (IS_ERR(name))
> > > > >                 return;
> > > > >
> > > > > i think this would be an ok cleanup
> > > >
> > > > Not opposed, but please for -next and Jan's thing as a backportable fix,
> > > > please. Thanks!
> > >
> > > Exactly, I agree the code is pretty subtle and ugly. It shouldn't take
> > > several engineers to properly call a function to lookup a file :) So
> > > some cleanup and refactoring is definitely long overdue but for now I
> > > wanted some minimal fix which is easy to backport to stable.
> > >
> > > When we speak about refactoring: Is there a reason why user_path_at()
> > > actually doesn't handle NULL 'name' as empty like we do it in *xattrat()
> > > syscalls? I understand this will make all _at() syscalls accept NULL name
> > > with AT_EMPTY_PATH but is that a problem?
> >
> > Is there a benefit for doing it though?
> >
> > I think the entire AT_EMPTY_PATH and NULL thing is trainwreck which
> > needs to be reasonably contained instead. In particular the flag has
> > most regrettable semantics of requiring an actual path (the NULL thing
> > is a Linux extension) and being a nop if the path is not empty.
> >
> > The entire thing is a kludge for syscalls which don't have an fd-only
> > variant and imo was the wrong way to approach this (provide fd-only
> > variants instead), but it's too late now.
> >
> > user_path_at() always returns a path (go figure). Suppose it got
> > extended with the fuckery and some userspace started to rely on it.
> >
> > Part of the benefit of having a fd-based op and knowing it is fd-based
> > is that you know the inode itself is secured by liveness of the file
> > object. If the calling thread is a part of a single-threaded process,
> > then there is the extra benefit of eliding atomics on the file thing
> > (reducing single-threaded cost). If the thing is multi-threaded,
> > atomics are only done on the file (not the inode), which scales better
> > if other procs use a different file obj for the same inode.
> >
> > Or to put it differently, if user_path_at() keeps returning a path
> > like it does now *and* is relied on for AT_EMPTY_PATH fuckery, it is
> > going to impose extra overhead on its consumers.
> >
> > Suppose one will decide to combat it. Then the routine will have to
> > copy path from the file without refing it and return an indicator
> > what's needed -- path_put for a real path handling, fput for fd-only
> > in a multithreaded proc [but then also it will need to return the
> > found file obj] and nothing for a fd-only in a single-threaded proc.
> >
> > I think that's ugly af and completely unnecessary.
>
> I'm not going to debate AT_EMPTY_PATH with NULL again. This particular
> hedgehog can never be buggered at all (anymore).
>
> The fdsyscall() and fdatsyscall() is an ancient debate as well. In
> principle for most use-cases its possible to get away with openat(fd,
> path) and then most other system calls could very likely just fd-based.
>
> So one could argue "fsck fdatsyscall()s" and refuse to add them. That of
> course will ignore everyone who doesn't want to or cannot open the file
> they want to operate on which is not super common but common enough.
> O_PATH won't save them in all cases because they might need a file with
> proper file_operations not empty_fops set.
>
> The other thing is that this forces everyone to allocate a file for
> every operation they do and returning it to userspace and then closing
> it again. It's annoying and also very costly for a bunch of use-cases.
>
> Ok, so the other option is that we just merge fdsyscall()s whenever
> someone needs to really not be bothered with passing a path and we also
> merge fdatsyscall() whenever someone needs to be able to lookup. I
> personally hate this and I'm sure we'd get some questions form Linus why
> we always merge two variants.
>
> But ok we'd probably handle the fdsyscall()/fdatsyscall() split
> gracefully enough by separating pure fdsyscall() vfs_*() helpers and
> fdatsyscall() vfs_*() helpers and come up with a scheme that doesn't
> lead to too much fragementation in how we handle this.
>
> And that is at the core of the issue for me:
>
> (1) We try to reduce the number of helpers that we have internally as
>     much as possible.
> (2) We try to reduce the number of special paths that code can take as
>     much as possible.
>
> This is vital. It is a long-term survival and sanity question. Because
> we have again and again observed endless fragmentation in the number of
> helpers and number of special-cases. They will keep coming and someone
> needs to understand them all.
>
> The price is high, very very high in the long-term. Because if we don't
> pay close attention we suddenly end up with 10 helpers for the same
> thing, 5 of which inexplicably end up being exported to 15 random
> modules of which 5 abuse it. So now we need to clean this up - tree
> wide. Fun times.
>
> Same with special-cases.
>
> So yes, there's a trade-off where taking the additional hit of an atomic
> or refcount is done because it collapses a bunch of special-cases into a
> single case. And that may have an impact on some workloads. If that gets
> reported we always try to figure out an acceptable solution and we
> almost always do.
>
> Your work is actually a good example of this. You _should be_ (note the
> _should_) a pain in our sweet little behinds :) because in some sense a
> lot of your requests are "If we make this a special-case and add a tiny
> helper for it then we elide an atomic in this and that condition for the
> single-threaded use-case.". So you are always on the border of pushing
> against (1) and (2). That's fine and your work is great and needed and
> we seem to always fine a good way to make it acceptable.

Well it's not my call to make, moreover is not a good convo to have
over an e-mail either, so I'm just going to make few remarks and drop
it.

I don't out of control helpers for the syscalls are inherent to
special casing fd-only operation. In my experience dealing with VFS
$elsewhere what actually gets in the way of long term maintenance is
poor assert coverage and weird internal helpers (where the real magic
happens), not some entry point.

Also note my proposal above with refactoring getname_maybe_null
*reduces* special casing by whacking a corner case of a NULL name
landing in the real lookup.

That said, looking at this again, I think the cases which already
explicitly special case fd handling would end up cleaner than they are
right now with a helper taking care of it. I still disagree
user_path_at() should be that helper though (to not disturb the
current consumers).

so something like this perhaps (names not binding):
struct user_path_state {
    struct path *path;
    struct file *fp;
    int flags;
};

struct user_path_state ups = user_path_state_init();
err = user_path_lookup(....., &ups);
if (err)
    goto bad;
err = magic_work_based_on_path(ups->path, ....); /* path is secured,
maybe refed or maybe relying on the file obj */
user_path_cleanup(&ups); /* mandatory call to clean things up */
if (!err)
    err = copyout(.....);
return err;

then user_path_cleanup() knows what's up.

btw, in the name of maintenance, I think part of the problem is the
notorious NULL checks everywhere, which prevent asserting on the
consumer knowing what they are executing (for example in path_put).

-- 
Mateusz Guzik <mjguzik gmail.com>