On 2025-08-08, Josh Triplett <josh@xxxxxxxxxxxxxxxx> wrote: > On Fri, Aug 08, 2025 at 03:22:58PM +0200, Christian Brauner wrote: > > On Thu, Aug 07, 2025 at 02:01:15PM -0700, Josh Triplett wrote: > > > I just discovered that opening a file with O_PATH gives an fd that works > > > with > > > > > > utimensat(fd, "", times, O_EMPTY_PATH) > > > > > > but does *not* work with what futimens calls, which is: > > > > > > utimensat(fd, NULL, times, 0) > > > > It's in line with what we do for fchownat() and fchmodat2() iirc. > > O_PATH as today is a broken concept imho. O_PATH file descriptors > > should've never have gained the ability to meaningfully alter state. I > > think it's broken that they can be used to change ownership or mode and > > similar. > > In the absence of having O_PATH file descriptors, what would be the way > to modify the properties of a symlink using race-free > file-descriptor-based calls rather than filenames? AFAICT, there's no > way to get a file descriptor corresponding to a symbolic link without > using `O_PATH | O_NOFOLLOW`. Yes, O_PATH|O_NOFOLLOW is the only way to get a file descriptor referencing a symlink. However, depending on what property you were talking about, doing fooat(parent_dirfd, "terminal-pathname-without-slashes", AT_SYMLINK_NOFOLLOW); is probably sufficient for most programs, and I believe is the pattern that Solaris was going for when they introduced *at(2) system calls. Solaris does also have O_SEARCH, but I believe it's more restrictive than O_PATH. Yes, if you want to operate on a very specific inode, this approach doesn't work if an attacker has write access to the parent directory. But in my experience there are very few cases where you want to operate on a very specific inode inside an attacker-controlled directory (most of the time you just want to avoid being tricked to operate on stuff outside the directory, and any inode inside the directory is fine -- which is what the above gives you). > It makes sense that a file descriptor for a symbolic link would be able > to do inode operations but not file operations. From a kernel developer's perspective, maybe. But what is a file operation or an inode operation is not immediately obvious to user space, and the in-kernel distinction really isn't an API that was intended to be user-visible IMHO. In general, when it comes to O_PATH some userspace programs would prefer O_PATH to disallow modifying _any aspect_ of the file descriptor, so that you can pass them to untrusted programs (like a real capability-based system). This is no longer achievable on Linux today, and the fact we keep poking more holes in O_PATH is making the situation less and less tenable. I _do_ want a better solution for this, but if we want to keep expanding O_PATH then we really need to have some way for programs to opt-out of those expansions. Then we can come up with a default set of allowed operations on O_PATH that programs can adjust, which will finally break up the binary nature of O_PATH. -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/
Attachment:
signature.asc
Description: PGP signature