Re: [PATCH v11 08/18] KVM: guest_memfd: Allow host to map guest_memfd pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 6 Jun 2025 at 10:55, David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> On 06.06.25 11:30, Fuad Tabba wrote:
> > Hi David,
> >
> > On Fri, 6 Jun 2025 at 10:12, David Hildenbrand <david@xxxxxxxxxx> wrote:
> >>
> >> On 05.06.25 17:37, Fuad Tabba wrote:
> >>> This patch enables support for shared memory in guest_memfd, including
> >>> mapping that memory from host userspace.
> >>>
> >>> This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option,
> >>> and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED
> >>> flag at creation time.
> >>>
> >>> Co-developed-by: Ackerley Tng <ackerleytng@xxxxxxxxxx>
> >>> Signed-off-by: Ackerley Tng <ackerleytng@xxxxxxxxxx>
> >>> Signed-off-by: Fuad Tabba <tabba@xxxxxxxxxx>
> >>> ---
> >>
> >> [...]
> >>
> >>> +static bool kvm_gmem_supports_shared(struct inode *inode)
> >>> +{
> >>> +     u64 flags;
> >>> +
> >>> +     if (!IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM))
> >>> +             return false;
> >>> +
> >>> +     flags = (u64)inode->i_private;
> >>
> >> Can probably do above
> >>
> >> const u64 flags = (u64)inode->i_private;
> >>
> >
> > Ack.
> >
> >>> +
> >>> +     return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED;
> >>> +}
> >>> +
> >>> +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf)
> >>> +{
> >>> +     struct inode *inode = file_inode(vmf->vma->vm_file);
> >>> +     struct folio *folio;
> >>> +     vm_fault_t ret = VM_FAULT_LOCKED;
> >>> +
> >>> +     if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
> >>> +             return VM_FAULT_SIGBUS;
> >>> +
> >>> +     folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> >>> +     if (IS_ERR(folio)) {
> >>> +             int err = PTR_ERR(folio);
> >>> +
> >>> +             if (err == -EAGAIN)
> >>> +                     return VM_FAULT_RETRY;
> >>> +
> >>> +             return vmf_error(err);
> >>> +     }
> >>> +
> >>> +     if (WARN_ON_ONCE(folio_test_large(folio))) {
> >>> +             ret = VM_FAULT_SIGBUS;
> >>> +             goto out_folio;
> >>> +     }
> >>> +
> >>> +     if (!folio_test_uptodate(folio)) {
> >>> +             clear_highpage(folio_page(folio, 0));
> >>> +             kvm_gmem_mark_prepared(folio);
> >>> +     }
> >>> +
> >>> +     vmf->page = folio_file_page(folio, vmf->pgoff);
> >>> +
> >>> +out_folio:
> >>> +     if (ret != VM_FAULT_LOCKED) {
> >>> +             folio_unlock(folio);
> >>> +             folio_put(folio);
> >>> +     }
> >>> +
> >>> +     return ret;
> >>> +}
> >>> +
> >>> +static const struct vm_operations_struct kvm_gmem_vm_ops = {
> >>> +     .fault = kvm_gmem_fault_shared,
> >>> +};
> >>> +
> >>> +static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
> >>> +{
> >>> +     if (!kvm_gmem_supports_shared(file_inode(file)))
> >>> +             return -ENODEV;
> >>> +
> >>> +     if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
> >>> +         (VM_SHARED | VM_MAYSHARE)) {
> >>> +             return -EINVAL;
> >>> +     }
> >>> +
> >>> +     vma->vm_ops = &kvm_gmem_vm_ops;
> >>> +
> >>> +     return 0;
> >>> +}
> >>> +
> >>>    static struct file_operations kvm_gmem_fops = {
> >>> +     .mmap           = kvm_gmem_mmap,
> >>>        .open           = generic_file_open,
> >>>        .release        = kvm_gmem_release,
> >>>        .fallocate      = kvm_gmem_fallocate,
> >>> @@ -428,6 +500,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
> >>>        }
> >>>
> >>>        file->f_flags |= O_LARGEFILE;
> >>> +     allow_write_access(file);
> >>
> >> Why is that required?
> >>
> >> As the docs mention, it must be paired with a previous deny_write_access().
> >>
> >> ... and I don't find similar usage anywhere else.
> >
> > This is to address Gavin's concern [*] regarding MADV_COLLAPSE, which
> > isn't an issue until hugepage support is enabled. Should we wait until
> > we have hugepage support?
>
> If we keep this, we *definitely* need a comment why we do something
> nobody else does.
>
> But I don't think allow_write_access() would ever be the way we want to
> fence off MADV_COLLAPSE. :) Maybe AS_INACCESSIBLE or sth. like that
> could fence it off in file_thp_enabled().
>
> Fortunately, CONFIG_READ_ONLY_THP_FOR_FS might vanish at some point ...
> so I've been told.
>
> So if it's not done for secretmem for now or others, we also shouldn't
> be doing it for now I think.

I'll remove it.

Thanks!
/fuad

> --
> Cheers,
>
> David / dhildenb
>




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux