On Thu, Jul 31, 2025 at 07:07:17PM -0700, dan.j.williams@xxxxxxxxx wrote: > Aneesh Kumar K.V (Arm) wrote: > > Host: > > step 1. > > echo ${DEVICE} > /sys/bus/pci/devices/${DEVICE}/driver/unbind > > echo vfio-pci > /sys/bus/pci/devices/${DEVICE}/driver_override > > echo ${DEVICE} > /sys/bus/pci/drivers_probe > > > > step 2. > > echo 1 > /sys/bus/pci/devices/$DEVICE/tsm/connect > > Just for my own understanding... presumably there is no ordering > constraint for ARM CCA between step1 and step2, right? I.e. The connect > state is independent of the bind state. > > In the v4 PCI/TSM scheme the connect command is now: > > echo $tsm_dev > /sys/bus/pci/devices/$DEVICE/tsm/connect What does this do on the host? It seems to somehow prep it for VM assignment? Seems pretty strange this is here in sysfs and not part of creating the vPCI function in the VM through VFIO and iommufd? Frankly, I'm nervous about making any uAPI whatsoever for the hypervisor side at this point. I don't think we have enough of the solution even in draft format. I'd really like your first merged TSM series to only have uAPI for the guest side where things are hopefully closer to complete.. > > step 1: > > echo ${DEVICE} > /sys/bus/pci/devices/${DEVICE}/driver/unbind > > > > step 2: Move the device to TDISP LOCK state > > echo 1 > /sys/bus/pci/devices/${DEVICE}/tsm/lock > > Ok, so my stance has recently picked up some nuance here. As Jason > mentions here: > > http://lore.kernel.org/20250410235008.GC63245@xxxxxxxx > > "However it works, it should be done before the driver is probed and > remain stable for the duration of the driver attachment. From the > iommu side the correct iommu domain, on the correct IOMMU instance to > handle the expected traffic should be setup as the DMA API's iommu > domain." I think it is not just the dma api, but also the MMIO registers may move location (form shared to protected IPA space for example). Meaning any attached driver is completely wrecked. > I agree with that up until the point where the implication is userspace > control of the UNLOCKED->LOCKED transition. That transition requires > enabling bus-mastering (BME), Why? That's sad. BME should be controlled by the VM driver not the TSM, and it should be set only when a VM driver is probed to the RUN state device? > and *then* locking the device. That means userspace is blindly > hoping that the device is in a state where it will remain quiet on the > bus between BME and LOCKED, and that the previous unbind left the device > in a state where it is prepared to be locked again. Yes, but we broadly assume this already in Linux. Drivers assume their devices are quiet when they are bound the first time, we expect on unbinding a driver quiets the device before removing. So broadly I think you can assume that a device with no driver is quiet regardless of BME. > 2 potential ways to solve this, but open to other ideas: > > - Userspace only picks the iommu domain context for the device not the > lock state. Something like: > > private > /sys/bus/pci/devices/${DEVICE}/tsm/domain > > ...where the default is "shared" and from that point the device can > not issue DMA until a driver attaches. Driver controls > UNLOCKED->LOCKED->RUN. What? Gross, no way can we let userspace control such intimate details of the kernel. The kernel must auto set based on what T=x mode the device driver binds into. > - Userspace is not involved in this transition and the dma mapping API > is updated to allow a driver to switch the iommu domain at runtime, > but only if the device has no outstanding mappings and the transition > can only happen from ->probe() context. Driver controls joining > secure-world-DMA and UNLOCKED->LOCKED->RUN. I don't see why it is so complicated. The driver is unbound before it reaches T=1 so we expect the device to be quiet (bigger problems if not). When the PCI core reaches T=1 it tells the DMA API to reconfigure things for the unbound struct device. Then we bind a driver as normal. Driver controls nothing. All existing T=0 drivers "just work" with no source changes in T=1 mode. DMA API magically hides the bounce buffering. Surely this should be the baseline target functionality from a Linux perspective? So we should not have "driver controls" statements at all. Userspace prepares the PCI device, driver probes onto a T=1 environment and just works. > > step 3: Moves the device to TDISP RUN state > > echo 1 > /sys/bus/pci/devices/${DEVICE}/tsm/accept > > This has the same concern from me about userspace being in control of > BME. It feels like a departure from typical expectations. It is, it is architecturally broken for BME to be controlled by the TSM. BME is controlled by the guest OS driver only. IMHO if this is a real worry (and I don't think it is) then the right answer is for physical BME to be set on during locking, but VIRTUAL BME is left off. Virtual BME is created by the hypervisor/tsm by telling the IOMMU to block DMA. The Guest OS should not participate in this broken design, the hypervisor can set pBME automatically when the lock request comes in, and the quality of vBME emulation is left up to the implementation, but the implementation must provide at least a NOP vBME once locked. > Now, the nice thing about the scheme as proposed in this set is that > userspace has all the time in the world between "lock" and "accept" to > talk to a verifier. Seems right to me. There should be NO trusted kernel driver bound until the verifier accepts the attestation. Anything else allows un accepted devices to attack the kernel drivers. Few kernel drivers today distrust their HW interfaces as hostile actors and security defend against them. Therefore we should be very reluctant to bind drivers to anything.. Arguably a CC secure kernel should have an allow list of audited secure drivers that can autoprobe and all other drivers must be approved by userspace in some way, either through T=1 and attestation or some customer-aware risk assumption.