Fuad Tabba <tabba@xxxxxxxxxx> writes: > This patch enables support for shared memory in guest_memfd, including > mapping that memory from host userspace. > > This functionality is gated by the KVM_GMEM_SHARED_MEM Kconfig option, > and enabled for a given instance by the GUEST_MEMFD_FLAG_SUPPORT_SHARED > flag at creation time. > > Reviewed-by: Gavin Shan <gshan@xxxxxxxxxx> > Acked-by: David Hildenbrand <david@xxxxxxxxxx> > Co-developed-by: Ackerley Tng <ackerleytng@xxxxxxxxxx> > Signed-off-by: Ackerley Tng <ackerleytng@xxxxxxxxxx> > Signed-off-by: Fuad Tabba <tabba@xxxxxxxxxx> > --- > include/linux/kvm_host.h | 13 +++++++ > include/uapi/linux/kvm.h | 1 + > virt/kvm/Kconfig | 4 +++ > virt/kvm/guest_memfd.c | 73 ++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 91 insertions(+) > > [...] > Just want to call out here that I believe HWpoison handling (and kvm_gmem_error_folio()) remains correct after this patch. Would still appreciate a review of the following! > +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf) > +{ > + struct inode *inode = file_inode(vmf->vma->vm_file); > + struct folio *folio; > + vm_fault_t ret = VM_FAULT_LOCKED; > + > + if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode)) > + return VM_FAULT_SIGBUS; > + > + folio = kvm_gmem_get_folio(inode, vmf->pgoff); > + if (IS_ERR(folio)) { > + int err = PTR_ERR(folio); > + > + if (err == -EAGAIN) > + return VM_FAULT_RETRY; > + > + return vmf_error(err); > + } > + > + if (WARN_ON_ONCE(folio_test_large(folio))) { > + ret = VM_FAULT_SIGBUS; > + goto out_folio; > + } > + > + if (!folio_test_uptodate(folio)) { > + clear_highpage(folio_page(folio, 0)); > + kvm_gmem_mark_prepared(folio); > + } > + > + vmf->page = folio_file_page(folio, vmf->pgoff); > + > +out_folio: > + if (ret != VM_FAULT_LOCKED) { > + folio_unlock(folio); > + folio_put(folio); > + } > + > + return ret; > +} > + > [...] This ->fault() callback does not explicitly check for folio_test_hwpoison(), but up the call tree, __do_fault() checks for HWpoison. If the folio is clean, the folio is removed from the filemap. Fault is eventually retried and (hopefully) another non-HWpoison folio will be faulted in. If the folio is dirty, userspace gets a SIGBUS. kvm_gmem_error_folio() calls kvm_gmem_invalidate_begin(), which only unmaps KVM_FILTER_PRIVATE, but IIUC that's okay since after mmap is introduced, * non-Coco VMs will always zap KVM_DIRECT_ROOTS anyway so the HWpoison folio is still zapped from guest page tables * Unmapping from host userspace page tables is handled in memory_failure(), so the next access will lead to a fault, which is handled using a SIGBUS in __do_fault() * Coco VMs can only use guest_memfd for private pages, so there's no change there since private pages still get zapped.