On Mon, Jun 16, 2025 at 01:46:42PM +0530, Aneesh Kumar K.V wrote: > Xu Yilun <yilun.xu@xxxxxxxxxxxxxxx> writes: > > > On Wed, Jun 04, 2025 at 07:07:18PM +0530, Aneesh Kumar K.V wrote: > >> Xu Yilun <yilun.xu@xxxxxxxxxxxxxxx> writes: > >> > >> > On Sun, Jun 01, 2025 at 04:15:32PM +0530, Aneesh Kumar K.V wrote: > >> >> Xu Yilun <yilun.xu@xxxxxxxxxxxxxxx> writes: > >> >> > >> >> > Add new IOCTLs to do TSM based TDI bind/unbind. These IOCTLs are > >> >> > expected to be called by userspace when CoCo VM issues TDI bind/unbind > >> >> > command to VMM. Specifically for TDX Connect, these commands are some > >> >> > secure Hypervisor call named GHCI (Guest-Hypervisor Communication > >> >> > Interface). > >> >> > > >> >> > The TSM TDI bind/unbind operations are expected to be initiated by a > >> >> > running CoCo VM, which already have the legacy assigned device in place. > >> >> > The TSM bind operation is to request VMM make all secure configurations > >> >> > to support device work as a TDI, and then issue TDISP messages to move > >> >> > the TDI to CONFIG_LOCKED or RUN state, waiting for guest's attestation. > >> >> > > >> >> > Do TSM Unbind before vfio_pci_core_disable(), otherwise will lead > >> >> > device to TDISP ERROR state. > >> >> > > >> >> > >> >> Any reason these need to be a vfio ioctl instead of iommufd ioctl? > >> >> For ex: https://lore.kernel.org/all/20250529133757.462088-3-aneesh.kumar@xxxxxxxxxx/ > >> > > >> > A general reason is, the device driver - VFIO should be aware of the > >> > bound state, and some operations break the bound state. VFIO should also > >> > know some operations on bound may crash kernel because of platform TSM > >> > firmware's enforcement. E.g. zapping MMIO, because private MMIO mapping > >> > in secure page tables cannot be unmapped before TDI STOP [1]. > >> > > >> > Specifically, for TDX Connect, the firmware enforces MMIO unmapping in > >> > S-EPT would fail if TDI is bound. For AMD there seems also some > >> > requirement about this but I need Alexey's confirmation. > >> > > >> > [1] https://lore.kernel.org/all/aDnXxk46kwrOcl0i@yilunxu-OptiPlex-7050/ > >> > > >> > >> According to the TDISP specification (Section 11.2.6), clearing either > >> the Bus Master Enable (BME) or Memory Space Enable (MSE) bits will cause > >> the TDI to transition to an error state. To handle this gracefully, it > >> seems necessary to unbind the TDI before modifying the BME or MSE bits. > > > > Yes. But now the suggestion is never let VFIO do unbind, instead VFIO > > should block these operations when device is bound. > > > >> > >> If I understand correctly, we also need to unmap the Stage-2 mapping due > >> to the issue described in commit > >> abafbc551fddede3e0a08dee1dcde08fc0eb8476. Are there any additional > >> reasons we would want to unmap the Stage-2 mapping for the BAR (as done > >> in vfio_pci_zap_and_down_write_memory_lock)? > > > > I think no more reason. > > > >> > >> Additionally, with TDX, it appears that before unmapping the Stage-2 > >> mapping for the BAR, we should first unbind the TDI (ie, move it to the > >> "unlock" state?) Is this step related Section 11.2.6 of the TDISP spec, > >> or is it driven by a different requirement? > > > > No, this is not device side TDISP requirement. It is host side > > requirement to fix DMA silent drop issue. TDX enforces CPU S2 PT share > > with IOMMU S2 PT (does ARM do the same?), so unmap CPU S2 PT in KVM equals > > unmap IOMMU S2 PT. > > > > If we allow IOMMU S2 PT unmapped when TDI is running, host could fool > > guest by just unmap some PT entry and suppress the fault event. Guest > > thought a DMA writting is successful but it is not and may cause > > data integrity issue. > > > > I am still trying to find more details here. How did the guest conclude > DMA writing is successful? Traditionally VMM is the trusted entity. If there is no IOMMU fault reported, guest assumes DMA writing is successful. > Guest would timeout waiting for DMA to complete There is no *generic* machanism to detect or wait for a single DMA write completion. They are "posted" in terms of PCIe. Thanks, Yilun > if the host hides the interrupt delivery of failed DMA transfer? > > > > > This is not a TDX specific problem, but different vendors has different > > mechanisms for this. For TDX, firmware fails the MMIO unmap for S2. For > > AMD, will trigger some HW protection called "ASID fence" [1]. Not sure > > how ARM handles this? > > > > https://lore.kernel.org/all/aDnXxk46kwrOcl0i@yilunxu-OptiPlex-7050/ > > > > Thanks, > > Yilun > > > > -aneesh