This is loosely based on the original documentation written by David Howells and later maintained by Christian Brauner, but has been rewritten to be more from a user perspective (as well as fixing a few critical mistakes). Co-authored-by: David Howells <dhowells@xxxxxxxxxx> Signed-off-by: David Howells <dhowells@xxxxxxxxxx> Co-authored-by: Christian Brauner <brauner@xxxxxxxxxx> Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx> Signed-off-by: Aleksa Sarai <cyphar@xxxxxxxxxx> --- man/man2/open_tree.2 | 471 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 471 insertions(+) diff --git a/man/man2/open_tree.2 b/man/man2/open_tree.2 new file mode 100644 index 0000000000000000000000000000000000000000..07aac7616107d16d05cc71ba7db6aee35f3a9cc6 --- /dev/null +++ b/man/man2/open_tree.2 @@ -0,0 +1,471 @@ +.\" Copyright, the authors of the Linux man-pages project +.\" +.\" SPDX-License-Identifier: Linux-man-pages-copyleft +.\" +.TH open_tree 2 (date) "Linux man-pages (unreleased)" +.SH NAME +open_tree \- open path or create detached mount object and attach to fd +.SH LIBRARY +Standard C library +.RI ( libc ,\~ \-lc ) +.SH SYNOPSIS +.nf +.BR "#include <fcntl.h>" " /* Definition of " AT_* " constants */" +.B #include <sys/mount.h> +.P +.BI "int open_tree(int " dirfd ", const char *" path ", unsigned int " flags ");" +.fi +.SH DESCRIPTION +The +.BR open_tree () +system call is part of +the suite of file descriptor based mount facilities in Linux. +.IP \[bu] 3 +If +.I flags +contains +.BR \%OPEN_TREE_CLONE , +.BR open_tree () +creates a detached mount object +which consists of a bind-mount of +the path specified by the +.IR path . +A new file descriptor +associated with the detached mount object +is then returned. +The mount object is equivalent to a bind-mount +that would be created by +.BR mount (2) +called with +.BR MS_BIND , +except that it is tied to a file descriptor +and is not mounted onto the filesystem. +.IP +As with file descriptors returned from +.BR fsmount (2), +the resultant file descriptor can then be used with +.BR move_mount (2), +.BR mount_setattr (2), +or other such system calls to do further mount operations. +This mount object will be unmounted and destroyed +when the file descriptor is closed +if it was not otherwise attached to a mount point +by calling +.BR move_mount (2). +.IP \[bu] +If +.I flags +does not contain +.BR \%OPEN_TREE_CLONE , +.BR open_tree () +returns a file descriptor +that is exactly equivalent to +one produced by +.BR openat (2) +when called with the same +.I dirfd +and +.IR path . +.P +In either case, the resultant file descriptor +acts the same as one produced by +.BR open (2) +with +.BR O_PATH , +meaning it can also be used as a +.I dirfd +argument to +"*at()" system calls. +.P +As with "*at()" system calls, +.BR open_tree () +uses the +.I dirfd +argument in conjunction with the +.I path +argument to determine the path to operate on, as follows: +.IP \[bu] 3 +If the pathname given in +.I path +is absolute, then +.I dirfd +is ignored. +.IP \[bu] +If the pathname given in +.I path +is relative and +.I dirfd +is the special value +.BR \%AT_FDCWD , +then +.I path +is interpreted relative to +the current working directory +of the calling process (like +.BR open (2)). +.IP \[bu] +If the pathname given in +.I path +is relative, +then it is interpreted relative to +the directory referred to by the file descriptor +.I dirfd +(rather than relative to +the current working directory +of the calling process, +as is done by +.BR open (2) +for a relative pathname). +In this case, +.I dirfd +must be a directory +that was opened for reading +.RB ( O_RDONLY ) +or using the +.B O_PATH +flag. +.IP \[bu] +If +.I path +is an empty string, +and +.I flags +contains +.BR \%AT_EMPTY_PATH , +then the file descriptor +.I dirfd +is operated on directly. +In this case, +.I dirfd +may refer to any type of file, +not just a directory. +.P +.I flags +can be used to control aspects of the path lookup +and properties of the returned file descriptor. +A value for +.I flags +is constructed by bitwise ORing +zero or more of the following constants: +.RS +.TP +.B AT_EMPTY_PATH +If +.I path +is an empty string, operate on the file referred to by +.I dirfd +(which may have been obtained from +.BR open (2), +.BR fsmount(2), +or from another +.BR open_tree () +call). +In this case, +.I dirfd +may refer to any type of file, not just a directory. +If +.I dirfd +is +.BR \%AT_FDCWD , +.BR open_tree () +will operate on the current working directory +of the calling process. +This flag is Linux-specific; define +.B \%_GNU_SOURCE +to obtain its definition. +.TP +.B AT_NO_AUTOMOUNT +Do not automount any automount points encountered +while resolving +.IR path . +This allows you to create a handle to the automount point itself, +rather than the location it would mount. +This flag has no effect if the mount point has already been mounted over. +This flag is Linux-specific; define +.B \%_GNU_SOURCE +to obtain its definition. +.TP +.B AT_SYMLINK_NOFOLLOW +If +.I path +is a symbolic link, do not dereference it; instead, +create either a handle to the link itself +or a bind-mount of it. +The resultant file descriptor is indistinguishable from one produced by +.BR openat (2) +with +.BR \%O_PATH | O_NOFOLLLOW . +.TP +.B OPEN_TREE_CLOEXEC +Set the close-on-exec +.RB ( FD_CLOEXEC ) +flag on the new file descriptor. +See the description of the +.B O_CLOEXEC +flag in +.BR open (2) +for reasons why this may be useful. +.TP +.B OPEN_TREE_CLONE +Rather than creating an +.BR openat (2)-style +.B O_PATH +file descriptor, +create a bind-mount of +.I path +(akin to +.IR "mount --bind" ) +as a detached mount object. +In order to do this operation, +the calling process must have the +.BR \%CAP_SYS_ADMIN +capability. +.TP +.B AT_RECURSIVE +Create a recursive bind-mount of the path +(akin to +.IR "mount --rbind" ) +as a detached mount object. +This flag is only permitted in conjunction with +.BR \%OPEN_TREE_CLONE . +.SH RETURN VALUE +On success, a new file descriptor is returned. +On error, \-1 is returned, and +.I errno +is set to indicate the error. +.SH ERRORS +.TP +.B EACCES +Search permission is denied for one of the directories +in the path prefix of +.IR path . +(See also +.BR path_resolution (7).) +.TP +.B EBADF +.I path +is relative but +.I dirfd +is neither +.B \%AT_FDCWD +nor a valid file descriptor. +.TP +.B EFAULT +.I path +is NULL +or a pointer to a location +outside the calling process's accessible address space. +.TP +.B EINVAL +Invalid flag specified in +.IR flags . +.TP +.B ELOOP +Too many symbolic links encountered when resolving +.IR path . +.TP +.B EMFILE +The calling process has too many open files to create more. +.TP +.B ENAMETOOLONG +.I path +is longer than +.BR PATH_MAX . +.TP +.B ENFILE +The system has too many open files to create more. +.TP +.B ENOENT +A component of +.I path +does not exist, or is a dangling symbolic link. +.TP +.B ENOENT +.I path +is an empty string, but +.B AT_EMPTY_PATH +is not specified in +.IR flags . +.TP +.B ENOTDIR +A component of the path prefix of +.I path +is not a directory, or +.I path +is relative and +.I dirfd +is a file descriptor referring to a file other than a directory. +.TP +.B ENOSPC +The "anonymous" mount namespace +necessary to contain the +.B \%OPEN_TREE_CLONE +detached bind-mount mount object +could not be allocated, +as doing so would exceed +the configured per-user limit on +the number of mount namespaces in the current user namespace. +(See also +.BR namespaces (7).) +.TP +.B ENOMEM +The kernel could not allocate sufficient memory to complete the operation. +.TP +.B EPERM +.I flags +contains +.B \%OPEN_TREE_CLONE +but the calling process does not have the required +.B CAP_SYS_ADMIN +capability. +.SH STANDARDS +Linux. +.SH HISTORY +Linux 5.2. +.\" commit a07b20004793d8926f78d63eb5980559f7813404 +.\" commit 400913252d09f9cfb8cce33daee43167921fc343 +glibc 2.36. +.SH NOTES +.SS Anonymous mount namespaces +The bind-mount mount objects created by +.BR open_tree () +with +.B \%OPEN_TREE_CLONE +are not attached to the mount namespace of the calling process. +Instead, each mount object is attached to +a newly allocated "anonymous" mount namespace +associated with the calling process. +.P +One of the side-effects of this is that +(unlike bind-mounts created with +.BR mount (2)), +mount propagation +(as described in +.BR mount_namespaces (7)) +will not be applied to bind-mounts created by +.BR open_tree () +until the bind-mount is attached with +.BR move_mount (2), +at which point the mount +will be associated with the mount namespace +where it was mounted +and mount propagation will resume. +.SH EXAMPLES +The following examples show how +.BR open_tree () +can be used in place of more traditional +.BR mount (2) +calls with +.BR MS_BIND . +.P +.in +4n +.EX +int srcfd = open_tree(AT_FDCWD, "/var", OPEN_TREE_CLONE); +move_mount(srcfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH); +.EE +.in +.P +First, +a detached bind-mount mount object of +.I /var +is created and attached to the file descriptor +.IR srcfd . +Then, the mount object is attached to +.I /mnt +using +.BR move_mount (2) +with +.B \%MOVE_MOUNT_F_EMPTY_PATH +to request that the detached mount object attached to the file descriptor +.I srcfd +be moved (and thus attached) to +.IR /mnt . +.P +The above procedure is functionally equivalent to +the following mount operation using +.BR mount (2): +.P +.in +4n +.EX +mount("/var", "/mnt", NULL, MS_BIND, NULL); +.EE +.in +.P +.B \%OPEN_TREE_CLONE +can be combined with +.B \%AT_RECURSIVE +to create recursive detached bind-mount mount objects, +which in turn can be attached to mount points +to create recursive bind-mounts. +.P +.in +4n +.EX +int srcfd = open_tree(AT_FDCWD, "/var", OPEN_TREE_CLONE | AT_RECURSIVE); +move_mount(srcfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH); +.EE +.in +.P +The above procedure is functionally equivalent to +the following mount operation using +.BR mount (2): +.P +.in +4n +.EX +mount("/var", "/mnt", NULL, MS_BIND | MS_REC, NULL); +.EE +.in +.P +One of the primary benefits of using +.BR open_tree () +and +.BR move_mount (2) +over the traditional +.BR mount (2) +is that operating with +.IR dirfd -style +file descriptors is far easier and more intuitive. +.P +.in +4n +.EX +int srcfd = open_tree(100, "", AT_EMPTY_PATH | OPEN_TREE_CLONE); +move_mount(srcfd, "", 200, "foo", MOVE_MOUNT_F_EMPTY_PATH); +.EE +.in +.P +The above procedure is roughly equivalent to +the following mount operation using +.BR mount (2): +.P +.in +4n +.EX +mount("/proc/self/fd/100", "/proc/self/fd/200/foo", NULL, MS_BIND, NULL); +.EE +.in +.P +In addition, you can use the file descriptor returned by +.BR open_tree () +as the +.I dirfd +argument to any "*at()" system calls: +.P +.in +4n +.EX +int dirfd, fd; +\& +dirfd = open_tree(AT_FDCWD, "/etc", OPEN_TREE_CLONE); +fd = openat(dirfd, "passwd", O_RDONLY); +fchmodat(dirfd, "shadow", 0000, 0); +close(dirfd); +close(fd); +/* The bind-mount is now destroyed. */ +.EE +.in +.SH SEE ALSO +.BR fsconfig (2), +.BR fsmount (2), +.BR fsopen (2), +.BR fspick (2), +.BR mount (2), +.BR mount_setattr (2), +.BR move_mount (2), +.BR mount_namespaces (7) -- 2.50.1