On Mon, May 5, 2025 at 11:07 PM Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote: > > On Mon, May 05, 2025 at 10:08:24PM -0700, Vishal Annapurve wrote: > > On Mon, May 5, 2025 at 5:56 PM Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote: > > > > > > Sorry for the late reply, I was on leave last week. > > > > > > On Tue, Apr 29, 2025 at 06:46:59AM -0700, Vishal Annapurve wrote: > > > > On Mon, Apr 28, 2025 at 5:52 PM Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote: > > > > > So, we plan to remove folio_ref_add()/folio_put_refs() in future, only invoking > > > > > folio_ref_add() in the event of a removal failure. > > > > > > > > In my opinion, the above scheme can be deployed with this series > > > > itself. guest_memfd will not take away memory from TDX VMs without an > > > I initially intended to add a separate patch at the end of this series to > > > implement invoking folio_ref_add() only upon a removal failure. However, I > > > decided against it since it's not a must before guest_memfd supports in-place > > > conversion. > > > > > > We can include it in the next version If you think it's better. > > > > Ackerley is planning to send out a series for 1G Hugetlb support with > > guest memfd soon, hopefully this week. Plus I don't see any reason to > > hold extra refcounts in TDX stack so it would be good to clean up this > > logic. > > > > > > > > > invalidation. folio_ref_add() will not work for memory not backed by > > > > page structs, but that problem can be solved in future possibly by > > > With current TDX code, all memory must be backed by a page struct. > > > Both tdh_mem_page_add() and tdh_mem_page_aug() require a "struct page *" rather > > > than a pfn. > > > > > > > notifying guest_memfd of certain ranges being in use even after > > > > invalidation completes. > > > A curious question: > > > To support memory not backed by page structs in future, is there any counterpart > > > to the page struct to hold ref count and map count? > > > > > > > I imagine the needed support will match similar semantics as VM_PFNMAP > > [1] memory. No need to maintain refcounts/map counts for such physical > > memory ranges as all users will be notified when mappings are > > changed/removed. > So, it's possible to map such memory in both shared and private EPT > simultaneously? No, guest_memfd will still ensure that userspace can only fault in shared memory regions in order to support CoCo VM usecases. > > > > Any guest_memfd range updates will result in invalidations/updates of > > userspace, guest, IOMMU or any other page tables referring to > > guest_memfd backed pfns. This story will become clearer once the > > support for PFN range allocator for backing guest_memfd starts getting > > discussed. > Ok. It is indeed unclear right now to support such kind of memory. > > Up to now, we don't anticipate TDX will allow any mapping of VM_PFNMAP memory > into private EPT until TDX connect. There is a plan to use VM_PFNMAP memory for all of guest_memfd shared/private ranges orthogonal to TDX connect usecase. With TDX connect/Sev TIO, major difference would be that guest_memfd private ranges will be mapped into IOMMU page tables. Irrespective of whether/when VM_PFNMAP memory support lands, there have been discussions on not using page structs for private memory ranges altogether [1] even with hugetlb allocator, which will simplify seamless merge/split story for private hugepages to support memory conversion. So I think the general direction we should head towards is not relying on refcounts for guest_memfd private ranges and/or page structs altogether. I think the series [2] to work better with PFNMAP'd physical memory in KVM is in the very right direction of not assuming page struct backed memory ranges for guest_memfd as well. [1] https://lore.kernel.org/all/CAGtprH8akKUF=8+RkX_QMjp35C0bU1zxGi4v1Zm5AWCw=8V8AQ@xxxxxxxxxxxxxx/ [2] https://lore.kernel.org/linux-arm-kernel/20241010182427.1434605-1-seanjc@xxxxxxxxxx/ > And even in that scenario, the memory is only for private MMIO, so the backend > driver is VFIO pci driver rather than guest_memfd. Not necessary. As I mentioned above guest_memfd ranges will be backed by VM_PFNMAP memory. > > > > [1] https://elixir.bootlin.com/linux/v6.14.5/source/mm/memory.c#L6543