"Edgecombe, Rick P" <rick.p.edgecombe@xxxxxxxxx> writes: > On Fri, 2025-07-11 at 13:12 +0800, Yan Zhao wrote: >> > Yan, is that your recollection? I guess the other points were that although >> > TDX >> I'm ok if KVM_BUG_ON() is considered loud enough to warn about the rare >> potential corruption, thereby making TDX less special. >> >> > doesn't need it today, for long term, userspace ABI around invalidations >> > should >> > support failure. But the actual gmem/kvm interface for this can be figured >> > out >> Could we elaborate what're included in userspace ABI around invalidations? > > Let's see what Ackerley says. > There's no specific invalidation command for ioctl but I assume you're referring to the conversion ioctl? There is a conversion ioctl planned for guest_memfd and the conversion ioctl can return an error. The process of conversion involves invalidating the memory that is to be converted, and for now, guest_memfd assumes unmapping is successful (like Yan says), but that can be changed. >> >> I'm a bit confused as I think the userspace ABI today supports failure >> already. >> >> Currently, the unmap API between gmem and KVM does not support failure. > > Great. I'm just trying to summarize the internal conversations. I think the > point was for a future looking user ABI, supporting failure is important. But we > don't need the KVM/gmem interface figured out yet. > I'm onboard here. So "do nothing" means if there is a TDX unmap failure, + KVM_BUG_ON() and hence the TD in question stops running, + No more conversions will be possible for this TD since the TD stops running. + Other TDs can continue running? + No refcounts will be taken for the folio/page where the memory failure happened. + No other indication (including HWpoison) anywhere in folio/page to indicate this happened. + To round this topic up, do we do anything else as part of "do nothing" that I missed? Is there any record in the TDX module (TDX module itself, not within the kernel)? I'll probably be okay with an answer like "won't know what will happen", but just checking - what might happen if this page that had an unmap failure gets reused? Suppose the KVM_BUG_ON() is noted but somehow we couldn't get to the machine in time and the machine continues to serve, and the memory is used by 1. Some other non-VM user, something else entirely, say a database? 2. Some new non-TDX VM? 3. Some new TD? >> >> In the future, we hope gmem can check if KVM allows a page to be unmapped >> before >> triggering the actual unmap.