Re: [PATCH RFC v7 3/8] security: Export security_inode_init_security_anon for KVM guest_memfd

Shivank Garg <shivankg@xxxxxxx> · Thu, 8 May 2025 12:07:25 +0530

On 4/22/2025 10:55 PM, David Hildenbrand wrote:
> On 10.04.25 10:41, Christoph Hellwig wrote:
>> On Tue, Apr 08, 2025 at 11:23:57AM +0000, Shivank Garg wrote:
>>> KVM guest_memfd is implementing its own inodes to store metadata for
>>> backing memory using a custom filesystem. This requires the ability to
>>> initialize anonymous inode using security_inode_init_security_anon().
>>>
>>> As guest_memfd currently resides in the KVM module, we need to export this
>>> symbol for use outside the core kernel. In the future, guest_memfd might be
>>> moved to core-mm, at which point the symbols no longer would have to be
>>> exported. When/if that happens is still unclear.
>>
>> This really should be a EXPORT_SYMBOL_GPL, if at all.
>>
>> But you really should look into a new interface in anon_inode.c that
>> can be reused instead of duplicating anonymouns inode logic in kvm.ko.
> 
> I assume you mean combining the alloc_anon_inode()+
> security_inode_init_security_anon(), correct?
> 
> I can see mm/secretmem.c doing the same thing, so agreed that
> we're duplicating it.
> 
> 
> Regarding your other mail, I am also starting to wonder where/why
> we want security_inode_init_security_anon(). At least for
> mm/secretmem.c, it was introduced by:
> 
> commit 2bfe15c5261212130f1a71f32a300bcf426443d4
> Author: Christian Göttsche <cgzones@xxxxxxxxxxxxxx>
> Date:   Tue Jan 25 15:33:04 2022 +0100
> 
>     mm: create security context for memfd_secret inodes
>         Create a security context for the inodes created by memfd_secret(2) via
>     the LSM hook inode_init_security_anon to allow a fine grained control.
>     As secret memory areas can affect hibernation and have a global shared
>     limit access control might be desirable.
>         Signed-off-by: Christian Göttsche <cgzones@xxxxxxxxxxxxxx>
>     Signed-off-by: Paul Moore <paul@xxxxxxxxxxxxxx>
> 
> 
> In combination with Paul's review comment [1]
> 
> "
> This seems reasonable to me, and I like the idea of labeling the anon
> inode as opposed to creating a new set of LSM hooks.  If we want to
> apply access control policy to the memfd_secret() fds we are going to
> need to attach some sort of LSM state to the inode, we might as well
> use the mechanism we already have instead of inventing another one.
> "
> 
> 
> IIUC, we really only want security_inode_init_security_anon() when there
> might be interest to have global access control.
> 
> 
> Given that guest_memfd already shares many similarities with guest_memfd
> (e.g., pages not swappable/migratable) and might share even more in the future
> (e.g., directmap removal), I assume that we want the same thing for guest_memfd.
> 
> 
> Would something like the following seem reasonable? We should be adding some
> documentation for the new function, and I wonder if S_PRIVATE should actually
> be cleared for secretmem + guest_memfd (I have no idea what this "fs-internal" flag
> affects).
> 

Here's my understanding of S_PRIVATE flag:
1. S_PRIVATE tells the kernel that an inode is special and it should
skip the LSM permission checks (via IS_PRIVATE()):

For instance,
int security_inode_mknod(struct inode *dir, struct dentry *dentry,
                         umode_t mode, dev_t dev)
{
        if (unlikely(IS_PRIVATE(dir)))
                return 0;
        return call_int_hook(inode_mknod, dir, dentry, mode, dev);
}

2. In landlock LSM:
S_PRIVATE inodes are denied from opening file and getting path from fd
syscall: open_by_handle_at
calls do_handle_open -> handle_to_path -> get_path_from_fd

static int get_path_from_fd(const s32 fd, struct path *const path)
{
...
        /*
         * Forbids ruleset FDs, internal filesystems (e.g. nsfs), including
         * pseudo filesystems that will never be mountable (e.g. sockfs,
         * pipefs).
         */
        if ((fd_file(f)->f_op == &ruleset_fops) ||
            (fd_file(f)->f_path.mnt->mnt_flags & MNT_INTERNAL) ||
            (fd_file(f)->f_path.dentry->d_sb->s_flags & SB_NOUSER) ||
            d_is_negative(fd_file(f)->f_path.dentry) ||
            IS_PRIVATE(d_backing_inode(fd_file(f)->f_path.dentry)))
                return -EBADFD;

Using is_nouser_or_private() in is_access_to_paths_allowed() (allows accesses
for requests with a common path)

static bool is_access_to_paths_allowed() {
...
        if (is_nouser_or_private(path->dentry))
                return true;

3. S_PRIVATE skips security attribute initialization in SELinux:
security/selinux/hooks.c
sb_finish_set_opts(){
...
                if (inode) {
                        if (!IS_PRIVATE(inode))
                                inode_doinit_with_dentry(inode, NULL);

4. mm/shmem.c
/**
 * shmem_kernel_file_setup - get an unlinked file living in tmpfs which must be
 *      kernel internal.  There will be NO LSM permission checks against the
 *      underlying inode.  So users of this interface must do LSM checks at a
 *      higher layer.  The users are the big_key and shm implementations.  LSM
 *      checks are provided at the key or shm level rather than the inode.
 * @name: name for dentry (to be seen in /proc/<pid>/maps)
 * @size: size to be set for the file
 * @flags: VM_NORESERVE suppresses pre-accounting of the entire object size
 */
struct file *shmem_kernel_file_setup(const char *name, loff_t size, unsigned long flags)
{
        return __shmem_file_setup(shm_mnt, name, size, flags, S_PRIVATE);