On Thu, Aug 21, 2025 at 11:05:28AM +0200, Amir Goldstein wrote: > On Thu, Aug 21, 2025 at 2:53 AM Darrick J. Wong <djwong@xxxxxxxxxx> wrote: > > > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > In preparation for iomap, move the passthrough-specific validation code > > back to passthrough.c and create a new Kconfig item for conditional > > compilation of backing.c. In the next patch, iomap will share the > > backing structures. > > > > Signed-off-by: "Darrick J. Wong" <djwong@xxxxxxxxxx> > > --- > > fs/fuse/fuse_i.h | 14 ++++++ > > fs/fuse/fuse_trace.h | 35 ++++++++++++++++ > > fs/fuse/Kconfig | 4 ++ > > fs/fuse/Makefile | 3 + > > fs/fuse/backing.c | 106 +++++++++++++++++++++++++++++++++++++------------ > > fs/fuse/dev.c | 4 +- > > fs/fuse/inode.c | 4 +- > > fs/fuse/passthrough.c | 28 +++++++++++++ > > 8 files changed, 165 insertions(+), 33 deletions(-) > > > > <snip to the relevant parts> > > diff --git a/fs/fuse/backing.c b/fs/fuse/backing.c > > index ddb23b7400fc72..c128bed95a76b8 100644 > > --- a/fs/fuse/backing.c > > +++ b/fs/fuse/backing.c > > @@ -102,46 +101,68 @@ int fuse_backing_open(struct fuse_conn *fc, struct fuse_backing_map *map) > > if (!file) > > goto out; > > > > - backing_sb = file_inode(file)->i_sb; > > - res = -ELOOP; > > - if (backing_sb->s_stack_depth >= fc->max_stack_depth) > > - goto out_fput; > > - > > fb = kmalloc(sizeof(struct fuse_backing), GFP_KERNEL); > > res = -ENOMEM; > > if (!fb) > > - goto out_fput; > > + goto out_file; > > > > + /* fb now owns file */ > > fb->file = file; > > + file = NULL; > > fb->cred = prepare_creds(); > > refcount_set(&fb->count, 1); > > > > + /* > > + * Each _backing_open function should either: > > + * > > + * 1. Take a ref to fb if it wants the file and return 0. > > + * 2. Return 0 without taking a ref if the backing file isn't needed. > > + * 3. Return an errno explaining why it couldn't attach. > > + * > > + * If at least one subsystem bumps the reference count to open it, > > + * we'll install it into the index and return the index. If nobody > > + * opens the file, the error code will be passed up. EPERM is the > > + * default. > > + */ > > + passthrough_res = fuse_passthrough_backing_open(fc, fb); > > + > > + if (refcount_read(&fb->count) < 2) { > > + if (passthrough_res) > > + res = passthrough_res; > > + if (!res) > > + res = -EPERM; > > + goto out_fb; > > + } > > + > > res = fuse_backing_id_alloc(fc, fb); > > - if (res < 0) { > > - fuse_backing_free(fb); > > - fb = NULL; > > - } > > + if (res < 0) > > + goto out_fb; > > + > > + trace_fuse_backing_open(fc, res, fb); > > > > -out: > > pr_debug("%s: fb=0x%p, ret=%i\n", __func__, fb, res); > > - > > + fuse_backing_put(fb); > > return res; > > > > -out_fput: > > - fput(file); > > - goto out; > > +out_fb: > > + fuse_backing_free(fb); > > +out_file: > > + if (file) > > + fput(file); > > +out: > > + pr_debug("%s: ret=%i\n", __func__, res); > > + return res; > > } > > > > int fuse_backing_close(struct fuse_conn *fc, int backing_id) > > { > > - struct fuse_backing *fb = NULL; > > - int err; > > + struct fuse_backing *fb = NULL, *test_fb; > > + int err, passthrough_err; > > > > pr_debug("%s: backing_id=%d\n", __func__, backing_id); > > > > - /* TODO: relax CAP_SYS_ADMIN once backing files are visible to lsof */ > > err = -EPERM; > > - if (!fc->passthrough || !capable(CAP_SYS_ADMIN)) > > + if (!fc->passthrough) > > goto out; > > > > err = -EINVAL; > > @@ -149,12 +170,45 @@ int fuse_backing_close(struct fuse_conn *fc, int backing_id) > > goto out; > > > > err = -ENOENT; > > - fb = fuse_backing_id_remove(fc, backing_id); > > + fb = fuse_backing_lookup(fc, backing_id); > > if (!fb) > > goto out; > > > > + /* > > + * Each _backing_close function should either: > > + * > > + * 1. Release the ref that it took in _backing_open and return 0. > > + * 2. Don't release the ref if the backing file is busy, and return 0. > > + * 2. Return an errno explaining why it couldn't detach. > > + * > > + * If there are no more active references to the backing file, it will > > + * be closed and removed from the index. If there are still active > > + * references to the backing file other than the one we just took, the > > That does not look right. > The fuse_backing object can often outliive the backing_id mapping > 1. fuse server attached backing fd to backing id 1 > 2. fuse server opens a file with passthrough to backing id 1 > 3. fuse inode holds a refcount to the fuse_backing object > 4. fuse server closes backing id 1 mapping > 5. fuse server closes file, drops last reference to fuse_backing object Ah, I didn't account for backing files needing to outlive being registered in the index. Ok, my whole approach above is wrong. :) > IOW, fb->count is not about being in the index. > With your code the fuse server call in #4 above will end up leaving the > fuse_backing object in the index and after #5 it will remain a dangling > object in the index. > > TBH, I don't understand why we need any of the complexity > of two subsystems claiming the same fuse_backing object for two > different purposes. I decided I should explore your suggestion from v2: https://lore.kernel.org/linux-fsdevel/CAOQ4uxiZTTEOs4HYD0vGi3XtihyDiQbDFXBCuGKoJyFPQv_+Lw@xxxxxxxxxxxxxx/ ...and it didn't occur to me that, well, there's plenty of device_id address space so if some weird server has to register the same fd twice for two subsystems to use it, that's completely ok. :) > Also, I think that an explicit statement from the server about the > purpose of the backing file is due (like your commit message implies) > This could be easily done with the backing open flags member: Hrm. That /would/ eliminate all the stupid {iomap,passthrough}_res juggling if you were only allowed to register a backing id with a single subsystem. Worst case, a hybrid iomap+passthrough fs ends up with the same file registered with multiple ids. Yeah, let's do that. > diff --git a/fs/fuse/backing.c b/fs/fuse/backing.c > index c63990254649c..e5a675fca7505 100644 > --- a/fs/fuse/backing.c > +++ b/fs/fuse/backing.c > @@ -96,7 +96,7 @@ int fuse_backing_open(struct fuse_conn *fc, struct > fuse_backing_map *map) > goto out; > > res = -EINVAL; > - if (map->flags || map->padding) > + if (map->flags & ~FUSE_BACKING_VALID_FLAGS || map->padding) > goto out; > > file = fget_raw(map->fd); > @@ -127,8 +127,10 @@ int fuse_backing_open(struct fuse_conn *fc, > struct fuse_backing_map *map) > * opens the file, the error code will be passed up. EPERM is the > * default. > */ > - passthrough_res = fuse_passthrough_backing_open(fc, fb); > - iomap_res = fuse_iomap_backing_open(fc, fb); > + if (map->flags & FUSE_BACKING_IOMAP) > + iomap_res = fuse_iomap_backing_open(fc, fb); > + else > + passthrough_res = fuse_passthrough_backing_open(fc, fb); > > if (refcount_read(&fb->count) < 2) { > if (passthrough_res) > @@ -192,8 +194,10 @@ int fuse_backing_close(struct fuse_conn *fc, int > backing_id) > * references to the backing file other than the one we just took, the > * error code will be passed up. EBUSY is the default. > */ > - passthrough_err = fuse_passthrough_backing_close(fc, fb); > - iomap_err = fuse_iomap_backing_close(fc, fb); > + if (fb->bdev) > + iomap_err = fuse_iomap_backing_close(fc, fb); > + else > + passthrough_err = fuse_passthrough_backing_close(fc, fb); Yes, that's a lot better, thanks. :D > > if (refcount_read(&fb->count) > 1) { > if (passthrough_err) > diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h > index 70b5530e587d4..ee81903ad2f98 100644 > --- a/include/uapi/linux/fuse.h > +++ b/include/uapi/linux/fuse.h > @@ -1148,6 +1148,10 @@ struct fuse_notify_retrieve_in { > uint64_t dummy4; > }; > > +/* basic file I/O functionality through iomap */ > +#define FUSE_BACKING_IOMAP (1 << 0) > +#define FUSE_BACKING_VALID_FLAGS (FUSE_BACKING_IOMAP) > + > struct fuse_backing_map { > int32_t fd; > uint32_t flags; > > > > + * error code will be passed up. EBUSY is the default. > > + */ > > + passthrough_err = fuse_passthrough_backing_close(fc, fb); > > + > > + if (refcount_read(&fb->count) > 1) { > > + if (passthrough_err) > > + err = passthrough_err; > > + if (!err) > > + err = -EBUSY; > > + goto out_fb; > > + } > > + > > + trace_fuse_backing_close(fc, backing_id, fb); > > + > > + err = -ENOENT; > > + test_fb = fuse_backing_id_remove(fc, backing_id); > > + if (!test_fb) > > + goto out_fb; > > + > > + WARN_ON(fb != test_fb); > > + pr_debug("%s: fb=0x%p, err=0\n", __func__, fb); > > + fuse_backing_put(fb); > > + return 0; > > +out_fb: > > fuse_backing_put(fb); > > - err = 0; > > out: > > pr_debug("%s: fb=0x%p, err=%i\n", __func__, fb, err); > > > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > > index dbde17fff0cda9..31d9f006836ac1 100644 > > --- a/fs/fuse/dev.c > > +++ b/fs/fuse/dev.c > > @@ -2623,7 +2623,7 @@ static long fuse_dev_ioctl_backing_open(struct file *file, > > if (!fud) > > return -EPERM; > > > > - if (!IS_ENABLED(CONFIG_FUSE_PASSTHROUGH)) > > + if (!IS_ENABLED(CONFIG_FUSE_BACKING)) > > return -EOPNOTSUPP; > > > > if (copy_from_user(&map, argp, sizeof(map))) > > @@ -2640,7 +2640,7 @@ static long fuse_dev_ioctl_backing_close(struct file *file, __u32 __user *argp) > > if (!fud) > > return -EPERM; > > > > - if (!IS_ENABLED(CONFIG_FUSE_PASSTHROUGH)) > > + if (!IS_ENABLED(CONFIG_FUSE_BACKING)) > > return -EOPNOTSUPP; > > > > if (get_user(backing_id, argp)) > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > > index 9448a11c828fef..1f3f91981410aa 100644 > > --- a/fs/fuse/inode.c > > +++ b/fs/fuse/inode.c > > @@ -993,7 +993,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm, > > fc->name_max = FUSE_NAME_LOW_MAX; > > fc->timeout.req_timeout = 0; > > > > - if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH)) > > + if (IS_ENABLED(CONFIG_FUSE_BACKING)) > > fuse_backing_files_init(fc); > > > > INIT_LIST_HEAD(&fc->mounts); > > @@ -1030,7 +1030,7 @@ void fuse_conn_put(struct fuse_conn *fc) > > WARN_ON(atomic_read(&bucket->count) != 1); > > kfree(bucket); > > } > > - if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH)) > > + if (IS_ENABLED(CONFIG_FUSE_BACKING)) > > fuse_backing_files_free(fc); > > call_rcu(&fc->rcu, delayed_release); > > } > > diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c > > index e0b8d885bc81f3..dfc61cc4bd21af 100644 > > --- a/fs/fuse/passthrough.c > > +++ b/fs/fuse/passthrough.c > > @@ -197,3 +197,31 @@ void fuse_passthrough_release(struct fuse_file *ff, struct fuse_backing *fb) > > put_cred(ff->cred); > > ff->cred = NULL; > > } > > + > > +int fuse_passthrough_backing_open(struct fuse_conn *fc, > > + struct fuse_backing *fb) > > +{ > > + struct super_block *backing_sb; > > + > > + /* TODO: relax CAP_SYS_ADMIN once backing files are visible to lsof */ > > + if (!capable(CAP_SYS_ADMIN)) > > + return -EPERM; > > This limitation is not specific to fuse passthrough. > While the fuse passthrough use case is likely to request many fuse > backing files, > the limitation is here to protect from malicious actors and the same ioctl used > by the iomap fuse server can just as well open many "lsof invisible" files, > so the limitation should be in the generic function. Hrmm. Well for iomap block devices I'm not as worried because (a) you sort of need privileges to open them, (b) there aren't that many block devices, and (c) to use fuse-iomap at all you need CAP_SYS_RAWIO. As for the invisibility problem, I wonder if I could just make /sys/block/XXX/holder have a symlink to the fuse mount? For fuse2fs we need to maintain the open fd to /dev/XXX, but I suppose that's not necessarily true for a fuse server. > > + > > + backing_sb = file_inode(fb->file)->i_sb; > > + if (backing_sb->s_stack_depth >= fc->max_stack_depth) > > + return -ELOOP; > > + > > + fuse_backing_get(fb); > > + return 0; > > +} > > + > > +int fuse_passthrough_backing_close(struct fuse_conn *fc, > > + struct fuse_backing *fb) > > +{ > > + /* TODO: relax CAP_SYS_ADMIN once backing files are visible to lsof */ > > + if (!capable(CAP_SYS_ADMIN)) > > + return -EPERM; > > + > > Probably this comment in upstream is not very accurate because there is no > harm done in closing the backing files, but sure for symmetry. > Same comment as above through, unless there are reasons to relax > CAP_SYS_ADMIN for file iomap, would leave this in the genetic code. <nod> > And then there is not much justification left for the close helpers IMO, > especially given that the implementation wrt removing from index is > incorrect, I would keep it simple: > > @@ -175,11 +177,19 @@ int fuse_backing_close(struct fuse_conn *fc, int > backing_id) > if (backing_id <= 0) > goto out; > > - err = -ENOENT; > - fb = fuse_backing_lookup(fc, backing_id); > - if (!fb) > + err = -EPERM; > + if (!capable(CAP_SYS_ADMIN)) > goto out; > > + err = -EBUSY; > + if (fb->bdev) > + goto out; > + > + fb = fuse_backing_id_remove(fc, backing_id); > + if (!fb) > + err = -ENOENT; > + goto out_fb; I'll think about this, though I don't know how much of the security checking would need to be relaxed to enable the completely locked down fuse4fs systemd service that I was imagining. My guess is that I'll have to establish iomap capability (aka CAP_SYS_RAWIO) when /dev/fuse is first opened by the mount helper, then the mount helper opens all the block devices required, and finally it passes all these fds into the contained service. <shrug> As I said, food for thought for v5 :) --D > + > > Thanks, > Amir. >