Re: [PATCH v2 03/11] fsopen.2: document 'new' mount api

Aleksa Sarai <cyphar@xxxxxxxxxx> · Thu, 7 Aug 2025 23:27:04 +1000

On 2025-08-07, Alejandro Colomar <alx@xxxxxxxxxx> wrote:
> Hi Aleksa,
> 
> On Thu, Aug 07, 2025 at 03:44:37AM +1000, Aleksa Sarai wrote:
> > This is loosely based on the original documentation written by David
> > Howells and later maintained by Christian Brauner, but has been
> > rewritten to be more from a user perspective (as well as fixing a few
> > critical mistakes).
> > 
> > Co-developed-by: David Howells <dhowells@xxxxxxxxxx>
> > Co-developed-by: Christian Brauner <brauner@xxxxxxxxxx>
> 
> Please use Co-authored-by.  It's documented under CONTRIBUTING.d/:
> 
> 	$ cat CONTRIBUTING.d/patches/description | grep -A99 Trailer;
> 	    Trailer
> 		Sign your patch with "Signed-off-by:".  Read about the
> 		"Developer's Certificate of Origin" at
> 		<https://www.kernel.org/doc/Documentation/process/submitting-patches.rst>.
> 		When appropriate, other tags documented in that file, such as
> 		"Reported-by:", "Reviewed-by:", "Acked-by:", and "Suggested-by:"
> 		can be added to the patch.  We use "Co-authored-by:" instead of
> 		"Co-developed-by:".  Example:
> 
> 			Signed-off-by: Alejandro Colomar <alx@xxxxxxxxxx>
> 
> I think 'author' is more appropriate than 'developer' for documentation.
> It is also more consistent with the Copyright notice, which assigns
> copyright to the authors (documented in AUTHORS).  And ironically, even
> the kernel documentation about Co-authored-by talks about authorship
> instead of development:
> 
> 	Co-developed-by: states that the patch was co-created by
> 	multiple developers; it is used to give attribution to
> 	co-authors (in addition to the author attributed by the From:
> 	tag) when several people work on a single patch.

Sure, fixed.

Can you also clarify whether CONTRIBUTING.d/patches/range-diff is
required for submissions? I don't think b4 supports including it (and I
really would prefer to not have to use raw git-send-email again just for
man-pages -- b4 has so many benefits over raw git-send-email). Is the
b4-style changelog I include in the cover-letter sufficient?

I like to think of myself as a fairly prolific git user, but I don't
think I've ever seen --range-diff= output in a git-send-email patch
before...

> > Signed-off-by: Aleksa Sarai <cyphar@xxxxxxxxxx>
> > ---
> >  man/man2/fsopen.2 | 319 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 319 insertions(+)
> > 
> > diff --git a/man/man2/fsopen.2 b/man/man2/fsopen.2
> > new file mode 100644
> > index 000000000000..ad38ef0782be
> > --- /dev/null
> > +++ b/man/man2/fsopen.2
> > @@ -0,0 +1,319 @@
> > +.\" Copyright, the authors of the Linux man-pages project
> > +.\"
> > +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> > +.\"
> > +.TH fsopen 2 (date) "Linux man-pages (unreleased)"
> > +.SH NAME
> > +fsopen \- create a new filesystem context
> > +.SH LIBRARY
> > +Standard C library
> > +.RI ( libc ,\~ \-lc )
> > +.SH SYNOPSIS
> > +.nf
> > +.BR "#include <sys/mount.h>"
> > +.P
> > +.BI "int fsopen(const char *" fsname ", unsigned int " flags ");"
> > +.fi
> > +.SH DESCRIPTION
> > +The
> > +.BR fsopen ()
> > +system call is part of the suite of file descriptor based mount facilities in
> > +Linux.
> > +.P
> > +.BR fsopen ()
> > +creates a blank filesystem configuration context within the kernel
> > +for the filesystem named by
> > +.IR fsname ,
> > +puts the context into creation mode and attaches it to a file descriptor,
> > +which is then returned.
> > +The calling process must have the
> > +.B \%CAP_SYS_ADMIN
> > +capability in order to create a new filesystem configuration context.
> > +.P
> > +A filesystem configuration context is an in-kernel representation of a pending
> > +transaction,
> 
> This page still needs semantic newlines.  (Please review all pages
> regarding that.)  (In this specific sentence, I'd break after 'is'.)
> 
> > +containing a set of configuration parameters that are to be applied
> > +when creating a new instance of a filesystem
> > +(or modifying the configuration of an existing filesystem instance,
> > +such as when using
> > +.BR fspick (2)).
> > +.P
> > +After obtaining a filesystem configuration context with
> > +.BR fsopen (),
> > +the general workflow for operating on the context looks like the following:
> > +.IP (1) 5
> > +Pass the filesystem context file descriptor to
> > +.BR fsconfig (2)
> > +to specify any desired filesystem parameters.
> > +This may be done as many times as necessary.
> > +.IP (2)
> > +Pass the same filesystem context file descriptor to
> 
> Do we need to say "same"?  I guess it's obvious.  Or do you expect
> any confusion if we don't?
> 
> > +.BR fsconfig (2)
> > +with
> > +.B \%FSCONFIG_CMD_CREATE
> > +to create an instance of the configured filesystem.
> > +.IP (3)
> > +Pass the same filesystem context file descriptor to
> > +.BR fsmount (2)
> > +to create a new mount object for the root of the filesystem,
> > +which is then attached to a new file descriptor.
> > +This also places the filesystem context file descriptor into reconfiguration
> > +mode,
> > +similar to the mode produced by
> > +.BR fspick (2).
> > +.IP (4)
> > +Use the mount object file descriptor as a
> > +.I dirfd
> > +argument to "*at()" system calls;
> > +or attach the mount object to a mount point
> > +by passing the mount object file descriptor to
> > +.BR move_mount (2).
> > +.P
> > +A filesystem context will move between different modes throughout its
> > +lifecycle
> > +(such as the creation phase when created with
> > +.BR fsopen (),
> > +the reconfiguration phase when an existing filesystem instance is selected by
> > +.BR fspick (2),
> > +and the intermediate "needs-mount" phase between
> > +.\" FS_CONTEXT_NEEDS_MOUNT is the term the kernel uses for this.
> > +.BR \%FSCONFIG_CMD_CREATE
> > +and
> > +.BR fsmount (2)),
> > +which has an impact on what operations are permitted on the filesystem context.
> > +.P
> > +The file descriptor returned by
> > +.BR fsopen ()
> > +also acts as a channel for filesystem drivers to provide more comprehensive
> > +error, warning, and information messages
> 
> Should we just say "diagnostic messages" to avoid explicitly mentioning
> all the levels?
> 
> > +than are normally provided through the standard
> > +.BR errno (3)
> > +interface for system calls.
> > +If an error occurs at any time during the workflow mentioned above,
> > +calling
> > +.BR read (2)
> > +on the filesystem context file descriptor will retrieve any ancillary
> > +information about the encountered errors.
> > +(See the "Message retrieval interface" section for more details on the message
> > +format.)
> > +.P
> > +.I flags
> > +can be used to control aspects of the creation of the filesystem configuration
> > +context file descriptor.
> > +A value for
> > +.I flags
> > +is constructed by bitwise ORing
> > +zero or more of the following constants:
> > +.RS
> > +.TP
> > +.B FSOPEN_CLOEXEC
> > +Set the close-on-exec
> > +.RB ( FD_CLOEXEC )
> > +flag on the new file descriptor.
> > +See the description of the
> > +.B O_CLOEXEC
> > +flag in
> > +.BR open (2)
> > +for reasons why this may be useful.
> > +.RE
> > +.P
> > +A list of filesystems supported by the running kernel
> > +(and thus a list of valid values for
> > +.IR fsname )
> > +can be obtained from
> > +.IR /proc/filesystems .
> > +(See also
> > +.BR proc_filesystems (5).)
> > +.SS Message retrieval interface
> > +When doing operations on a filesystem configuration context,
> > +the filesystem driver may choose to provide ancillary information to userspace
> > +in the form of message strings.
> > +.P
> > +The filesystem context file descriptors returned by
> > +.BR fsopen ()
> > +and
> > +.BR fspick (2)
> > +may be queried for message strings at any time by calling
> > +.BR read (2)
> > +on the file descriptor.
> > +Each call to
> > +.BR read (2)
> > +will return a single message,
> > +prefixed to indicate its class:
> > +.RS
> > +.TP
> > +.B "e <message>"
> > +An error message was logged.
> > +This is usually associated with an error being returned from the corresponding
> > +system call which triggered this message.
> > +.TP
> > +.B "w <message>"
> > +A warning message was logged.
> > +.TP
> > +.B "i <message>"
> > +An informational message was logged.
> > +.RE
> > +.P
> > +Messages are removed from the queue as they are read.
> > +Note that the message queue has limited depth,
> > +so it is possible for messages to get lost.
> > +If there are no messages in the message queue,
> > +.B read(2)
> > +will return no data and
> > +.I errno
> > +will be set to
> > +.BR \%ENODATA .
> > +If the
> > +.I buf
> > +argument to
> > +.BR read (2)
> > +is not large enough to contain the message,
> > +.BR read (2)
> > +will return no data and
> > +.I errno
> > +will be set to
> > +.BR \%EMSGSIZE .
> > +.P
> > +If there are multiple filesystem context file descriptors referencing the same
> > +filesystem instance
> > +(such as if you call
> > +.BR fspick (2)
> > +multiple times for the same mount),
> > +each one gets its own independent message queue.
> > +This does not apply to file descriptors that were duplicated with
> > +.BR dup (2).
> > +.P
> > +Messages strings will usually be prefixed by the filesystem driver that logged
> 
> s/Messages/Message/
> 
> BTW, here, I'd break after 'prefixed', and then after the ','.
> 
> > +the message, though this may not always be the case.
> > +See the Linux kernel source code for details.
> > +.SH RETURN VALUE
> > +On success, a new file descriptor is returned.
> > +On error, \-1 is returned, and
> > +.I errno
> > +is set to indicate the error.
> > +.SH ERRORS
> > +.TP
> > +.B EFAULT
> > +.I fsname
> > +is NULL
> > +or a pointer to a location
> > +outside the calling process's accessible address space.
> > +.TP
> > +.B EINVAL
> > +.I flags
> > +had an invalid flag set.
> > +.TP
> > +.B EMFILE
> > +The calling process has too many open files to create more.
> > +.TP
> > +.B ENFILE
> > +The system has too many open files to create more.
> > +.TP
> > +.B ENODEV
> > +The filesystem named by
> > +.I fsname
> > +is not supported by the kernel.
> > +.TP
> > +.B ENOMEM
> > +The kernel could not allocate sufficient memory to complete the operation.
> > +.TP
> > +.B EPERM
> > +The calling process does not have the required
> > +.B \%CAP_SYS_ADMIN
> > +capability.
> > +.SH STANDARDS
> > +Linux.
> > +.SH HISTORY
> > +Linux 5.2.
> > +.\" commit 24dcb3d90a1f67fe08c68a004af37df059d74005
> > +glibc 2.36.
> > +.SH EXAMPLES
> > +To illustrate the workflow for creating a new mount,
> > +the following is an example of how to mount an
> > +.BR ext4 (5)
> > +filesystem stored on
> > +.I /dev/sdb1
> > +onto
> > +.IR /mnt .
> > +.P
> > +.in +4n
> > +.EX
> > +int fsfd, mntfd;
> > +\&
> > +fsfd = fsopen("ext4", FSOPEN_CLOEXEC);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "ro", NULL, 0);
> > +fsconfig(fsfd, FSCONFIG_SET_PATH, "source", "/dev/sdb1", AT_FDCWD);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "noatime", NULL, 0);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "acl", NULL, 0);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "iversion", NULL, 0)
> > +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
> > +mntfd = fsmount(fsfd, FSMOUNT_CLOEXEC, MOUNT_ATTR_RELATIME);
> > +move_mount(mntfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
> > +.EE
> > +.in
> > +.P
> > +First, an ext4 configuration context is created and attached to the file
> 
> Here, I'd break after the ',', and if you need to break again, after
> 'created'.
> 
> > +descriptor
> > +.IR fsfd .
> > +Then, a series of parameters
> > +(such as the source of the filesystem)
> > +are provided using
> > +.BR fsconfig (2),
> > +followed by the filesystem instance being created with
> > +.BR \%FSCONFIG_CMD_CREATE .
> > +.BR fsmount (2)
> > +is then used to create a new mount object attached to the file descriptor
> > +.IR mntfd ,
> > +which is then attached to the intended mount point using
> > +.BR move_mount (2).
> > +.P
> > +The above procedure is functionally equivalent to the following mount operation
> > +using
> > +.BR mount (2):
> > +.P
> > +.in +4n
> > +.EX
> > +mount("/dev/sdb1", "/mnt", "ext4", MS_RELATIME,
> > +      "ro,noatime,acl,user_xattr,iversion");
> > +.EE
> > +.in
> > +.P
> > +And here's an example of creating a mount object
> > +of an NFS server share
> > +and setting a Smack security module label.
> > +However, instead of attaching it to a mount point,
> > +the program uses the mount object directly
> > +to open a file from the NFS share.
> > +.P
> > +.in +4n
> > +.EX
> > +int fsfd, mntfd, fd;
> > +\&
> > +fsfd = fsopen("nfs", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_STRING, "source", "example.com/pub/linux", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_STRING, "nfsvers", "3", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_STRING, "rsize", "65536", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_STRING, "wsize", "65536", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_STRING, "smackfsdef", "foolabel", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "rdma", NULL, 0);
> > +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
> > +mntfd = fsmount(fsfd, 0, MOUNT_ATTR_NODEV);
> > +fd = openat(mntfd, "src/linux-5.2.tar.xz", O_RDONLY);
> > +.EE
> > +.in
> > +.P
> > +Unlike the previous example,
> > +this operation has no trivial equivalent with
> > +.BR mount (2),
> > +as it was not previously possible to create a mount object
> > +that is not attached to any mount point.
> > +.SH SEE ALSO
> > +.BR fsconfig (2),
> > +.BR fsmount (2),
> > +.BR fspick (2),
> > +.BR mount (2),
> > +.BR mount_setattr (2),
> > +.BR move_mount (2),
> > +.BR open_tree (2),
> > +.BR mount_namespaces (7)
> 
> Other than those minor comments, the text LGTM.
> 
> 
> Cheers,
> Alex
> 
> -- 
> <https://www.alejandro-colomar.es/>

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
Attachment:
signature.asc

Description: PGP signature