Re: CAP_SYS_ADMIN restriction for passthrough fds

Aleksa Sarai <cyphar@xxxxxxxxxx> · Sat, 3 May 2025 00:09:32 +1000

On 2025-05-02, Allison Karlitskaya <lis@xxxxxxxxxx> wrote:
> hi,
> 
> Please excuse me if these are dumb questions.  I'm not great at this stuff. :)
> 
> In fuse_backing_open() there's a check with an interesting comment:
> 
>     /* TODO: relax CAP_SYS_ADMIN once backing files are visible to lsof */
>     res = -EPERM;
>     if (!fc->passthrough || !capable(CAP_SYS_ADMIN))
>         goto out;
> 
> I've done some research into this but I wasn't able to find any
> original discussion about what led to this, or about current plans to
> "relax" this restriction -- only speculation about it being a
> potential mechanism to "hide" open files.
> 
> It would be nice to have an official story about this, on the record.
> What's the concrete problem here, and what would it take to solve it?
> Are there plans?  Is help required?  Would it be possible to relax the
> check to having CAP_SYS_ADMIN in the userns which owns the mount (ie:
> ns_capable(...))?  What would it take to do that?  It would be
> wonderful to be able to use this inside of containers.
> 
> The most obvious guess about direction (based on the comment) is that
> we need to do something to make sure that fds that are registered with
> backing IDs remain visible in the output of `lsof` even after the
> original fd is closed?
> 
> Thanks in advance for any information you can give.  Even if the
> answer is "no, it's impossible" it would be great to have that on
> record.

My guess is that the issue is that we don't want an unprivileged process
to be able to create a file reference that cannot be found (with
something like lsof) and forcefully closed/killed by a sysadmin.
Otherwise you could end up with a DOS with an admin being unable to
unmount a filesystem or otherwise figure out what process is holding on
to garbage.

My hot take is that this is already possible in several ways, though
admittedly the ones I can think of all require unprivileged user
namespaces. (You can create bind-mount that is kept alive but not
visible to any user-space process. The simplest way is to do mounts and
chroot. Another is with open_tree().) Now, these won't block umount
outright but you'll get the same effect as umount -l, which can be a
problem.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
Attachment:
signature.asc

Description: PGP signature