On 2025-08-07, Alejandro Colomar <alx@xxxxxxxxxx> wrote: > Hi Aleksa, > > On Thu, Aug 07, 2025 at 03:44:37AM +1000, Aleksa Sarai wrote: > > This is loosely based on the original documentation written by David > > Howells and later maintained by Christian Brauner, but has been > > rewritten to be more from a user perspective (as well as fixing a few > > critical mistakes). > > > > Co-developed-by: David Howells <dhowells@xxxxxxxxxx> > > Co-developed-by: Christian Brauner <brauner@xxxxxxxxxx> > > Please use Co-authored-by. It's documented under CONTRIBUTING.d/: > > $ cat CONTRIBUTING.d/patches/description | grep -A99 Trailer; > Trailer > Sign your patch with "Signed-off-by:". Read about the > "Developer's Certificate of Origin" at > <https://www.kernel.org/doc/Documentation/process/submitting-patches.rst>. > When appropriate, other tags documented in that file, such as > "Reported-by:", "Reviewed-by:", "Acked-by:", and "Suggested-by:" > can be added to the patch. We use "Co-authored-by:" instead of > "Co-developed-by:". Example: > > Signed-off-by: Alejandro Colomar <alx@xxxxxxxxxx> > > I think 'author' is more appropriate than 'developer' for documentation. > It is also more consistent with the Copyright notice, which assigns > copyright to the authors (documented in AUTHORS). And ironically, even > the kernel documentation about Co-authored-by talks about authorship > instead of development: > > Co-developed-by: states that the patch was co-created by > multiple developers; it is used to give attribution to > co-authors (in addition to the author attributed by the From: > tag) when several people work on a single patch. Sure, fixed. Can you also clarify whether CONTRIBUTING.d/patches/range-diff is required for submissions? I don't think b4 supports including it (and I really would prefer to not have to use raw git-send-email again just for man-pages -- b4 has so many benefits over raw git-send-email). Is the b4-style changelog I include in the cover-letter sufficient? I like to think of myself as a fairly prolific git user, but I don't think I've ever seen --range-diff= output in a git-send-email patch before... > > Signed-off-by: Aleksa Sarai <cyphar@xxxxxxxxxx> > > --- > > man/man2/fsopen.2 | 319 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 319 insertions(+) > > > > diff --git a/man/man2/fsopen.2 b/man/man2/fsopen.2 > > new file mode 100644 > > index 000000000000..ad38ef0782be > > --- /dev/null > > +++ b/man/man2/fsopen.2 > > @@ -0,0 +1,319 @@ > > +.\" Copyright, the authors of the Linux man-pages project > > +.\" > > +.\" SPDX-License-Identifier: Linux-man-pages-copyleft > > +.\" > > +.TH fsopen 2 (date) "Linux man-pages (unreleased)" > > +.SH NAME > > +fsopen \- create a new filesystem context > > +.SH LIBRARY > > +Standard C library > > +.RI ( libc ,\~ \-lc ) > > +.SH SYNOPSIS > > +.nf > > +.BR "#include <sys/mount.h>" > > +.P > > +.BI "int fsopen(const char *" fsname ", unsigned int " flags ");" > > +.fi > > +.SH DESCRIPTION > > +The > > +.BR fsopen () > > +system call is part of the suite of file descriptor based mount facilities in > > +Linux. > > +.P > > +.BR fsopen () > > +creates a blank filesystem configuration context within the kernel > > +for the filesystem named by > > +.IR fsname , > > +puts the context into creation mode and attaches it to a file descriptor, > > +which is then returned. > > +The calling process must have the > > +.B \%CAP_SYS_ADMIN > > +capability in order to create a new filesystem configuration context. > > +.P > > +A filesystem configuration context is an in-kernel representation of a pending > > +transaction, > > This page still needs semantic newlines. (Please review all pages > regarding that.) (In this specific sentence, I'd break after 'is'.) > > > +containing a set of configuration parameters that are to be applied > > +when creating a new instance of a filesystem > > +(or modifying the configuration of an existing filesystem instance, > > +such as when using > > +.BR fspick (2)). > > +.P > > +After obtaining a filesystem configuration context with > > +.BR fsopen (), > > +the general workflow for operating on the context looks like the following: > > +.IP (1) 5 > > +Pass the filesystem context file descriptor to > > +.BR fsconfig (2) > > +to specify any desired filesystem parameters. > > +This may be done as many times as necessary. > > +.IP (2) > > +Pass the same filesystem context file descriptor to > > Do we need to say "same"? I guess it's obvious. Or do you expect > any confusion if we don't? > > > +.BR fsconfig (2) > > +with > > +.B \%FSCONFIG_CMD_CREATE > > +to create an instance of the configured filesystem. > > +.IP (3) > > +Pass the same filesystem context file descriptor to > > +.BR fsmount (2) > > +to create a new mount object for the root of the filesystem, > > +which is then attached to a new file descriptor. > > +This also places the filesystem context file descriptor into reconfiguration > > +mode, > > +similar to the mode produced by > > +.BR fspick (2). > > +.IP (4) > > +Use the mount object file descriptor as a > > +.I dirfd > > +argument to "*at()" system calls; > > +or attach the mount object to a mount point > > +by passing the mount object file descriptor to > > +.BR move_mount (2). > > +.P > > +A filesystem context will move between different modes throughout its > > +lifecycle > > +(such as the creation phase when created with > > +.BR fsopen (), > > +the reconfiguration phase when an existing filesystem instance is selected by > > +.BR fspick (2), > > +and the intermediate "needs-mount" phase between > > +.\" FS_CONTEXT_NEEDS_MOUNT is the term the kernel uses for this. > > +.BR \%FSCONFIG_CMD_CREATE > > +and > > +.BR fsmount (2)), > > +which has an impact on what operations are permitted on the filesystem context. > > +.P > > +The file descriptor returned by > > +.BR fsopen () > > +also acts as a channel for filesystem drivers to provide more comprehensive > > +error, warning, and information messages > > Should we just say "diagnostic messages" to avoid explicitly mentioning > all the levels? > > > +than are normally provided through the standard > > +.BR errno (3) > > +interface for system calls. > > +If an error occurs at any time during the workflow mentioned above, > > +calling > > +.BR read (2) > > +on the filesystem context file descriptor will retrieve any ancillary > > +information about the encountered errors. > > +(See the "Message retrieval interface" section for more details on the message > > +format.) > > +.P > > +.I flags > > +can be used to control aspects of the creation of the filesystem configuration > > +context file descriptor. > > +A value for > > +.I flags > > +is constructed by bitwise ORing > > +zero or more of the following constants: > > +.RS > > +.TP > > +.B FSOPEN_CLOEXEC > > +Set the close-on-exec > > +.RB ( FD_CLOEXEC ) > > +flag on the new file descriptor. > > +See the description of the > > +.B O_CLOEXEC > > +flag in > > +.BR open (2) > > +for reasons why this may be useful. > > +.RE > > +.P > > +A list of filesystems supported by the running kernel > > +(and thus a list of valid values for > > +.IR fsname ) > > +can be obtained from > > +.IR /proc/filesystems . > > +(See also > > +.BR proc_filesystems (5).) > > +.SS Message retrieval interface > > +When doing operations on a filesystem configuration context, > > +the filesystem driver may choose to provide ancillary information to userspace > > +in the form of message strings. > > +.P > > +The filesystem context file descriptors returned by > > +.BR fsopen () > > +and > > +.BR fspick (2) > > +may be queried for message strings at any time by calling > > +.BR read (2) > > +on the file descriptor. > > +Each call to > > +.BR read (2) > > +will return a single message, > > +prefixed to indicate its class: > > +.RS > > +.TP > > +.B "e <message>" > > +An error message was logged. > > +This is usually associated with an error being returned from the corresponding > > +system call which triggered this message. > > +.TP > > +.B "w <message>" > > +A warning message was logged. > > +.TP > > +.B "i <message>" > > +An informational message was logged. > > +.RE > > +.P > > +Messages are removed from the queue as they are read. > > +Note that the message queue has limited depth, > > +so it is possible for messages to get lost. > > +If there are no messages in the message queue, > > +.B read(2) > > +will return no data and > > +.I errno > > +will be set to > > +.BR \%ENODATA . > > +If the > > +.I buf > > +argument to > > +.BR read (2) > > +is not large enough to contain the message, > > +.BR read (2) > > +will return no data and > > +.I errno > > +will be set to > > +.BR \%EMSGSIZE . > > +.P > > +If there are multiple filesystem context file descriptors referencing the same > > +filesystem instance > > +(such as if you call > > +.BR fspick (2) > > +multiple times for the same mount), > > +each one gets its own independent message queue. > > +This does not apply to file descriptors that were duplicated with > > +.BR dup (2). > > +.P > > +Messages strings will usually be prefixed by the filesystem driver that logged > > s/Messages/Message/ > > BTW, here, I'd break after 'prefixed', and then after the ','. > > > +the message, though this may not always be the case. > > +See the Linux kernel source code for details. > > +.SH RETURN VALUE > > +On success, a new file descriptor is returned. > > +On error, \-1 is returned, and > > +.I errno > > +is set to indicate the error. > > +.SH ERRORS > > +.TP > > +.B EFAULT > > +.I fsname > > +is NULL > > +or a pointer to a location > > +outside the calling process's accessible address space. > > +.TP > > +.B EINVAL > > +.I flags > > +had an invalid flag set. > > +.TP > > +.B EMFILE > > +The calling process has too many open files to create more. > > +.TP > > +.B ENFILE > > +The system has too many open files to create more. > > +.TP > > +.B ENODEV > > +The filesystem named by > > +.I fsname > > +is not supported by the kernel. > > +.TP > > +.B ENOMEM > > +The kernel could not allocate sufficient memory to complete the operation. > > +.TP > > +.B EPERM > > +The calling process does not have the required > > +.B \%CAP_SYS_ADMIN > > +capability. > > +.SH STANDARDS > > +Linux. > > +.SH HISTORY > > +Linux 5.2. > > +.\" commit 24dcb3d90a1f67fe08c68a004af37df059d74005 > > +glibc 2.36. > > +.SH EXAMPLES > > +To illustrate the workflow for creating a new mount, > > +the following is an example of how to mount an > > +.BR ext4 (5) > > +filesystem stored on > > +.I /dev/sdb1 > > +onto > > +.IR /mnt . > > +.P > > +.in +4n > > +.EX > > +int fsfd, mntfd; > > +\& > > +fsfd = fsopen("ext4", FSOPEN_CLOEXEC); > > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "ro", NULL, 0); > > +fsconfig(fsfd, FSCONFIG_SET_PATH, "source", "/dev/sdb1", AT_FDCWD); > > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "noatime", NULL, 0); > > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "acl", NULL, 0); > > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0); > > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "iversion", NULL, 0) > > +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0); > > +mntfd = fsmount(fsfd, FSMOUNT_CLOEXEC, MOUNT_ATTR_RELATIME); > > +move_mount(mntfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH); > > +.EE > > +.in > > +.P > > +First, an ext4 configuration context is created and attached to the file > > Here, I'd break after the ',', and if you need to break again, after > 'created'. > > > +descriptor > > +.IR fsfd . > > +Then, a series of parameters > > +(such as the source of the filesystem) > > +are provided using > > +.BR fsconfig (2), > > +followed by the filesystem instance being created with > > +.BR \%FSCONFIG_CMD_CREATE . > > +.BR fsmount (2) > > +is then used to create a new mount object attached to the file descriptor > > +.IR mntfd , > > +which is then attached to the intended mount point using > > +.BR move_mount (2). > > +.P > > +The above procedure is functionally equivalent to the following mount operation > > +using > > +.BR mount (2): > > +.P > > +.in +4n > > +.EX > > +mount("/dev/sdb1", "/mnt", "ext4", MS_RELATIME, > > + "ro,noatime,acl,user_xattr,iversion"); > > +.EE > > +.in > > +.P > > +And here's an example of creating a mount object > > +of an NFS server share > > +and setting a Smack security module label. > > +However, instead of attaching it to a mount point, > > +the program uses the mount object directly > > +to open a file from the NFS share. > > +.P > > +.in +4n > > +.EX > > +int fsfd, mntfd, fd; > > +\& > > +fsfd = fsopen("nfs", 0); > > +fsconfig(fsfd, FSCONFIG_SET_STRING, "source", "example.com/pub/linux", 0); > > +fsconfig(fsfd, FSCONFIG_SET_STRING, "nfsvers", "3", 0); > > +fsconfig(fsfd, FSCONFIG_SET_STRING, "rsize", "65536", 0); > > +fsconfig(fsfd, FSCONFIG_SET_STRING, "wsize", "65536", 0); > > +fsconfig(fsfd, FSCONFIG_SET_STRING, "smackfsdef", "foolabel", 0); > > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "rdma", NULL, 0); > > +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0); > > +mntfd = fsmount(fsfd, 0, MOUNT_ATTR_NODEV); > > +fd = openat(mntfd, "src/linux-5.2.tar.xz", O_RDONLY); > > +.EE > > +.in > > +.P > > +Unlike the previous example, > > +this operation has no trivial equivalent with > > +.BR mount (2), > > +as it was not previously possible to create a mount object > > +that is not attached to any mount point. > > +.SH SEE ALSO > > +.BR fsconfig (2), > > +.BR fsmount (2), > > +.BR fspick (2), > > +.BR mount (2), > > +.BR mount_setattr (2), > > +.BR move_mount (2), > > +.BR open_tree (2), > > +.BR mount_namespaces (7) > > Other than those minor comments, the text LGTM. > > > Cheers, > Alex > > -- > <https://www.alejandro-colomar.es/> -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/
Attachment:
signature.asc
Description: PGP signature