Sean Christopherson wrote: > On Thu, Aug 28, 2025, Ira Weiny wrote: > > Edgecombe, Rick P wrote: > > > On Thu, 2025-08-28 at 13:26 -0700, Sean Christopherson wrote: > > > > Me confused. This is pre-boot, not the normal fault path, i.e. blocking other > > > > operations is not a concern. > > > > > > Just was my recollection of the discussion. I found it: > > > https://lore.kernel.org/lkml/Zbrj5WKVgMsUFDtb@xxxxxxxxxx/ > > > > > > > > > > > If tdh_mr_extend() is too heavy for a non-preemptible section, then the current > > > > code is also broken in the sense that there are no cond_resched() calls. The > > > > vast majority of TDX hosts will be using non-preemptible kernels, so without an > > > > explicit cond_resched(), there's no practical difference between extending the > > > > measurement under mmu_lock versus outside of mmu_lock. > > > > > > > > _If_ we need/want to do tdh_mr_extend() outside of mmu_lock, we can and should > > > > still do tdh_mem_page_add() under mmu_lock. > > > > > > I just did a quick test and we should be on the order of <1 ms per page for the > > > full loop. I can try to get some more formal test data if it matters. But that > > > doesn't sound too horrible? > > > > > > tdh_mr_extend() outside MMU lock is tempting because it doesn't *need* to be > > > inside it. > > > > I'm probably not following this conversation, so stupid question: It > > doesn't need to be in the lock because user space should not be setting up > > memory and extending the measurement in an asynchronous way. Is that > > correct? > > No, from userspace's perspective ADD+MEASURE is fully serialized. ADD "needs" > to be under mmu_lock to guarantee consistency between the mirror EPT and the > "real" S-EPT entries. E.g. if ADD is done after the fact, then KVM can end up > with a PRESENT M-EPT entry but a corresponding S-EPT entry that is !PRESENT. > That causes a pile of problems because it breaks KVM's fundamental assumption > that M-EPT and S-EPT entries updated in lock-step. Ok yes, I think I worded my query incorrectly but this makes things clear. Thanks! > > TDH_MR_EXTEND doesn't have the same same consistency issue. If it fails, the > only thing that's left in a bad state is the measurement. That's obviously not > ideal either, but we can handle that by forcefully terminating the VM, without > opening up KVM to edge cases that would otherwise be impossible. > > > > But maybe a better reason is that we could better handle errors > > > outside the fault. (i.e. no 5 line comment about why not to return an error in > > > tdx_mem_page_add() due to code in another file). > > > > > > I wonder if Yan can give an analysis of any zapping races if we do that. > > > > When you say analysis, you mean detecting user space did something wrong > > and failing gracefully? Is that correct? > > More specifically, whether or not KVM can WARN without the WARN being user > triggerable. Kernel policy is that WARNs must not be triggerable absent kernel, > hardware, or firmware bugs. What we're trying to figure out is if there's a > flow that can be triggered by userspace (misbehving or not) that would trip a > WARN even if KVM is operating as expected. I'm pretty sure the answer is "no". > > Oh, and WARNing here is desirable, because it improves the chances of detecting > a fatal-to-the-VM bug, e.g. in KVM and/or in the TDX-Module. OK... In other areas of the kernel if the user misbehaves it is reasonable to fail an operation. I would think that being fatal to the VM would be fine if QEMU did not properly synchronize ADD, measurement, and finalize, for example. Am I wrong in that assumption? Ira