This is loosely based on the original documentation written by David Howells and later maintained by Christian Brauner, but has been rewritten to be more from a user perspective (as well as fixing a few critical mistakes). Co-developed-by: David Howells <dhowells@xxxxxxxxxx> Co-developed-by: Christian Brauner <brauner@xxxxxxxxxx> Signed-off-by: Aleksa Sarai <cyphar@xxxxxxxxxx> --- man/man2/fsopen.2 | 319 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 319 insertions(+) diff --git a/man/man2/fsopen.2 b/man/man2/fsopen.2 new file mode 100644 index 000000000000..ad38ef0782be --- /dev/null +++ b/man/man2/fsopen.2 @@ -0,0 +1,319 @@ +.\" Copyright, the authors of the Linux man-pages project +.\" +.\" SPDX-License-Identifier: Linux-man-pages-copyleft +.\" +.TH fsopen 2 (date) "Linux man-pages (unreleased)" +.SH NAME +fsopen \- create a new filesystem context +.SH LIBRARY +Standard C library +.RI ( libc ,\~ \-lc ) +.SH SYNOPSIS +.nf +.BR "#include <sys/mount.h>" +.P +.BI "int fsopen(const char *" fsname ", unsigned int " flags ");" +.fi +.SH DESCRIPTION +The +.BR fsopen () +system call is part of the suite of file descriptor based mount facilities in +Linux. +.P +.BR fsopen () +creates a blank filesystem configuration context within the kernel +for the filesystem named by +.IR fsname , +puts the context into creation mode and attaches it to a file descriptor, +which is then returned. +The calling process must have the +.B \%CAP_SYS_ADMIN +capability in order to create a new filesystem configuration context. +.P +A filesystem configuration context is an in-kernel representation of a pending +transaction, +containing a set of configuration parameters that are to be applied +when creating a new instance of a filesystem +(or modifying the configuration of an existing filesystem instance, +such as when using +.BR fspick (2)). +.P +After obtaining a filesystem configuration context with +.BR fsopen (), +the general workflow for operating on the context looks like the following: +.IP (1) 5 +Pass the filesystem context file descriptor to +.BR fsconfig (2) +to specify any desired filesystem parameters. +This may be done as many times as necessary. +.IP (2) +Pass the same filesystem context file descriptor to +.BR fsconfig (2) +with +.B \%FSCONFIG_CMD_CREATE +to create an instance of the configured filesystem. +.IP (3) +Pass the same filesystem context file descriptor to +.BR fsmount (2) +to create a new mount object for the root of the filesystem, +which is then attached to a new file descriptor. +This also places the filesystem context file descriptor into reconfiguration +mode, +similar to the mode produced by +.BR fspick (2). +.IP (4) +Use the mount object file descriptor as a +.I dirfd +argument to "*at()" system calls; +or attach the mount object to a mount point +by passing the mount object file descriptor to +.BR move_mount (2). +.P +A filesystem context will move between different modes throughout its +lifecycle +(such as the creation phase when created with +.BR fsopen (), +the reconfiguration phase when an existing filesystem instance is selected by +.BR fspick (2), +and the intermediate "needs-mount" phase between +.\" FS_CONTEXT_NEEDS_MOUNT is the term the kernel uses for this. +.BR \%FSCONFIG_CMD_CREATE +and +.BR fsmount (2)), +which has an impact on what operations are permitted on the filesystem context. +.P +The file descriptor returned by +.BR fsopen () +also acts as a channel for filesystem drivers to provide more comprehensive +error, warning, and information messages +than are normally provided through the standard +.BR errno (3) +interface for system calls. +If an error occurs at any time during the workflow mentioned above, +calling +.BR read (2) +on the filesystem context file descriptor will retrieve any ancillary +information about the encountered errors. +(See the "Message retrieval interface" section for more details on the message +format.) +.P +.I flags +can be used to control aspects of the creation of the filesystem configuration +context file descriptor. +A value for +.I flags +is constructed by bitwise ORing +zero or more of the following constants: +.RS +.TP +.B FSOPEN_CLOEXEC +Set the close-on-exec +.RB ( FD_CLOEXEC ) +flag on the new file descriptor. +See the description of the +.B O_CLOEXEC +flag in +.BR open (2) +for reasons why this may be useful. +.RE +.P +A list of filesystems supported by the running kernel +(and thus a list of valid values for +.IR fsname ) +can be obtained from +.IR /proc/filesystems . +(See also +.BR proc_filesystems (5).) +.SS Message retrieval interface +When doing operations on a filesystem configuration context, +the filesystem driver may choose to provide ancillary information to userspace +in the form of message strings. +.P +The filesystem context file descriptors returned by +.BR fsopen () +and +.BR fspick (2) +may be queried for message strings at any time by calling +.BR read (2) +on the file descriptor. +Each call to +.BR read (2) +will return a single message, +prefixed to indicate its class: +.RS +.TP +.B "e <message>" +An error message was logged. +This is usually associated with an error being returned from the corresponding +system call which triggered this message. +.TP +.B "w <message>" +A warning message was logged. +.TP +.B "i <message>" +An informational message was logged. +.RE +.P +Messages are removed from the queue as they are read. +Note that the message queue has limited depth, +so it is possible for messages to get lost. +If there are no messages in the message queue, +.B read(2) +will return no data and +.I errno +will be set to +.BR \%ENODATA . +If the +.I buf +argument to +.BR read (2) +is not large enough to contain the message, +.BR read (2) +will return no data and +.I errno +will be set to +.BR \%EMSGSIZE . +.P +If there are multiple filesystem context file descriptors referencing the same +filesystem instance +(such as if you call +.BR fspick (2) +multiple times for the same mount), +each one gets its own independent message queue. +This does not apply to file descriptors that were duplicated with +.BR dup (2). +.P +Messages strings will usually be prefixed by the filesystem driver that logged +the message, though this may not always be the case. +See the Linux kernel source code for details. +.SH RETURN VALUE +On success, a new file descriptor is returned. +On error, \-1 is returned, and +.I errno +is set to indicate the error. +.SH ERRORS +.TP +.B EFAULT +.I fsname +is NULL +or a pointer to a location +outside the calling process's accessible address space. +.TP +.B EINVAL +.I flags +had an invalid flag set. +.TP +.B EMFILE +The calling process has too many open files to create more. +.TP +.B ENFILE +The system has too many open files to create more. +.TP +.B ENODEV +The filesystem named by +.I fsname +is not supported by the kernel. +.TP +.B ENOMEM +The kernel could not allocate sufficient memory to complete the operation. +.TP +.B EPERM +The calling process does not have the required +.B \%CAP_SYS_ADMIN +capability. +.SH STANDARDS +Linux. +.SH HISTORY +Linux 5.2. +.\" commit 24dcb3d90a1f67fe08c68a004af37df059d74005 +glibc 2.36. +.SH EXAMPLES +To illustrate the workflow for creating a new mount, +the following is an example of how to mount an +.BR ext4 (5) +filesystem stored on +.I /dev/sdb1 +onto +.IR /mnt . +.P +.in +4n +.EX +int fsfd, mntfd; +\& +fsfd = fsopen("ext4", FSOPEN_CLOEXEC); +fsconfig(fsfd, FSCONFIG_SET_FLAG, "ro", NULL, 0); +fsconfig(fsfd, FSCONFIG_SET_PATH, "source", "/dev/sdb1", AT_FDCWD); +fsconfig(fsfd, FSCONFIG_SET_FLAG, "noatime", NULL, 0); +fsconfig(fsfd, FSCONFIG_SET_FLAG, "acl", NULL, 0); +fsconfig(fsfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0); +fsconfig(fsfd, FSCONFIG_SET_FLAG, "iversion", NULL, 0) +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0); +mntfd = fsmount(fsfd, FSMOUNT_CLOEXEC, MOUNT_ATTR_RELATIME); +move_mount(mntfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH); +.EE +.in +.P +First, an ext4 configuration context is created and attached to the file +descriptor +.IR fsfd . +Then, a series of parameters +(such as the source of the filesystem) +are provided using +.BR fsconfig (2), +followed by the filesystem instance being created with +.BR \%FSCONFIG_CMD_CREATE . +.BR fsmount (2) +is then used to create a new mount object attached to the file descriptor +.IR mntfd , +which is then attached to the intended mount point using +.BR move_mount (2). +.P +The above procedure is functionally equivalent to the following mount operation +using +.BR mount (2): +.P +.in +4n +.EX +mount("/dev/sdb1", "/mnt", "ext4", MS_RELATIME, + "ro,noatime,acl,user_xattr,iversion"); +.EE +.in +.P +And here's an example of creating a mount object +of an NFS server share +and setting a Smack security module label. +However, instead of attaching it to a mount point, +the program uses the mount object directly +to open a file from the NFS share. +.P +.in +4n +.EX +int fsfd, mntfd, fd; +\& +fsfd = fsopen("nfs", 0); +fsconfig(fsfd, FSCONFIG_SET_STRING, "source", "example.com/pub/linux", 0); +fsconfig(fsfd, FSCONFIG_SET_STRING, "nfsvers", "3", 0); +fsconfig(fsfd, FSCONFIG_SET_STRING, "rsize", "65536", 0); +fsconfig(fsfd, FSCONFIG_SET_STRING, "wsize", "65536", 0); +fsconfig(fsfd, FSCONFIG_SET_STRING, "smackfsdef", "foolabel", 0); +fsconfig(fsfd, FSCONFIG_SET_FLAG, "rdma", NULL, 0); +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0); +mntfd = fsmount(fsfd, 0, MOUNT_ATTR_NODEV); +fd = openat(mntfd, "src/linux-5.2.tar.xz", O_RDONLY); +.EE +.in +.P +Unlike the previous example, +this operation has no trivial equivalent with +.BR mount (2), +as it was not previously possible to create a mount object +that is not attached to any mount point. +.SH SEE ALSO +.BR fsconfig (2), +.BR fsmount (2), +.BR fspick (2), +.BR mount (2), +.BR mount_setattr (2), +.BR move_mount (2), +.BR open_tree (2), +.BR mount_namespaces (7) -- 2.50.1