Re: [PATCH v3 13/13] PCI/TSM: Add Guest TSM Support

<dan.j.williams@xxxxxxxxx> · Fri, 11 Jul 2025 16:04:27 -0700

Alexey Kardashevskiy wrote:
> 
> 
> On 4/6/25 08:26, Dan Williams wrote:
> > Alexey Kardashevskiy wrote:
> > [..]
> >>>>> +static int pci_tsm_accept(struct pci_dev *pdev)
> >>>>> +{
> >>>>> +	struct pci_tsm_pf0 *tsm = to_pci_tsm_pf0(pdev->tsm);
> >>>>> +	int rc;
> >>>>> +
> >>>>> +	struct mutex *lock __free(tsm_ops_unlock) = tsm_ops_lock(tsm);
> >>>>
> >>>> Add an empty line.
> >>>
> >>> I think we, as a community, are still figuring out the coding-style
> >>> around scope-based cleanup declarations, but I would argue, no empty
> >>> line required after mid-function variable declarations. Now, in this
> >>> case it is arguably not "mid-function", but all the other occurrences of
> >>> tsm_ops_lock() are checking the result on the immediate next line.
> >>
> >> Do not really care as much :)
> > 
> > Hey, what's kernel development without little side-arguments about
> > whitespace? Will leave it alone for now.
> 
> :)
> 
> >>>>> +	if (!lock)
> >>>>> +		return -EINTR;
> >>>>> +
> >>>>> +	if (tsm->state < PCI_TSM_CONNECT)
> >>>>> +		return -ENXIO;
> >>>>> +	if (tsm->state >= PCI_TSM_ACCEPT)
> >>>>> +		return 0;
> >>>>> +
> >>>>> +	/*
> >>>>> +	 * "Accept" transitions a device to the run state, it is only suitable
> >>>>> +	 * to make that transition from a known DMA-idle (no active mappings)
> >>>>> +	 * state.  The "driver detached" state is a coarse way to assert that
> >>>>> +	 * requirement.
> >>>>
> >>>> And then the userspace will modprobe the driver, which will enable BME
> >>>> and MSE which in turn will render the ERROR state, what is the plan
> >>>> here?
> >>>
> >>> Right, so the notifier proposal [1] gave me pause because of perceived
> >>> complexity and my general reluctance to rely on the magic of notifiers
> >>> when an explicit sequence is easier to read/maintain. The proposal is
> >>> that drivers switch to TDISP aware setup helpers that understand that
> >>> BME and MSE were handled at LOCK. Becauase it is not just
> >>> pci_enable_device() / pci_set_master() awareness that is needed but also
> >>> pci_disable_device() / pci_clear_master() flows that need consideration
> >>> for avoiding/handling the TDISP ERROR state.
> >>>
> >>> I.e. support for TDISP-unaware drivers is not a goal.
> >>
> >> So your plan it to modify driver to switch to the secure mode on the
> >> go? (not arguing, just asking for now)
> > 
> > Effectively, yes. In the non-TDISP case the driver handles the MSE+BME
> > transition, in the TDISP case the driver also effectively handles the
> > same as BME+MSE are superseded by the LOCKED state.
> > 
> > So TVM userspace is responsible for marking the device "accepted" and the
> > driver checks that state before enabling the device (LOCKED -> RUN).
> > 
> > This also allows for kernel debug overrides of the acceptance policy,
> > because, in the end, the Linux kernel trusts drivers. If the TVM owner
> > loads a driver that ignores the "accepted" bit, that is the owner's
> > prerogative. If the TVM owner does not trust a driver there are multiple
> > knobs under the TVM's control to mitigate that mistrust.
> > 
> >> The alternative is - let TSM do the attestation and acceptance and
> >> then "modprobe tdispawaredriver tdisp=on" and change the driver to
> >> assume that BME and MSE are already enabled.
> > 
> > My heartburn with that is that there is an indefinite amount of time
> > whereby a device is MSE + BME active without any driver to deal with the
> > consequences.
> 
> (out of curiosity) AMD can block DMA until the guest decides it is ready and enabled IOMMU for the device, cannot TDX do the same?

Circling back to this as I go to refresh this series...

I am not worried about the host protecting itself, I am worried about
diverging from the model where the device is expected to not be active
while a driver is detached. If the device is enabled to, for example,
trigger platform errors (platform self protection from unauthorized DMA
/ interrupts), that is a difference from the non-CC case.

So this is more about symmetry of behavior with the typical non-CC case
where PCI devices are not issuing or accepting bus cycles while a driver
is detached.

> And what is the consequence of MSE being enabled?

I think the problem is reversed with MSE, you *want* errors to happen
when the driver is detached which is what happens in the non-CC case.

> It is in the guest's best interest to avoid touching MMIO before
> things are set up. p2p DMA?

Yes.

> > For example, what if the device needs some form of reset /
> > re-initialization to quiet an engine or silence an interrupt that
> > immediately starts firing upon the LOCKED -> RUN transition.
> 
> The OS will have to ignore such interrupts, what is a problem with it?

I don't know, but I do know that it would a bug to fix in the non-CC
case. I.e. it is a problem that need not be introduced by maintaining
"driver is in control of MSE+BME activation" (LOCKED->RUN transition).

> > Userspace
> > is not in a good position to make judgements about the state of the
> > device outside of the Interface Report.
> > 
> >>> There are still details to work out like supporting drivers that want to
> >>> stay loaded over the UNLOCKED->LOCKED->RUN transitions, and whether the
> >>> "accept" UAPI triggers entry into "RUN" or still requires a driver to
> >>> perform that.
> > 
> > Yes, I now think entry into "RUN" needs to be a driver triggered event
> > to maintain parity with the safety of the non-TDISP case.
> > 
> > [..]
> >>>>> @@ -135,6 +141,8 @@ struct pci_tsm_guest_req_info {
> >>>>>      * @bind: establish a secure binding with the TVM
> >>>>>      * @unbind: teardown the secure binding
> >>>>>      * @guest_req: handle the vendor specific requests from TVM when bound
> >>>>> + * @accept: TVM-only operation to confirm that system policy allows
> >>>>> + *	    device to access private memory and be mapped with private mmio.
> >>>>>      *
> >>>>>      * @probe and @remove run in pci_tsm_rwsem held for write context. All
> >>>>>      * other ops run under the @pdev->tsm->lock mutex and pci_tsm_rwsem held
> >>>>> @@ -150,6 +158,7 @@ struct pci_tsm_ops {
> >>>>>     	void (*unbind)(struct pci_tdi *tdi);
> >>>>>     	int (*guest_req)(struct pci_dev *pdev,
> >>>>>     			 struct pci_tsm_guest_req_info *info);
> >>>>> +	int (*accept)(struct pci_dev *pdev);
> >>>>
> >>>>
> >>>> When I posted my v1, I got several comments to not put host and guest
> >>>> callbacks together which makes sense (as only really "connect" and
> >>>> "status" are shared, and I am not sure I like dual use of "connect")
> >>>> and so did I in v2 and yet you are pushing for one struct for all?
> >>>
> >>> Frankly, I missed that feedback and was focused on how to simply extend
> >>> PCI to understand TSM semantics.
> >>
> >> That was literally you (and I think someone else mentioned it too) ;)
> >>
> >> https://lore.kernel.org/all/66d7a10a4d621_3975294ac@xxxxxxxxxxxxxxxxxxxxxxxxx.notmuch/
> > 
> > Ugh, yes, it seems that joke: "debugging is a murder mystery where you
> > find out you were the killer the whole time." can also be true for patch
> > review.>>> "Lets not mix HV and VM hooks in the same ops without good reason" and
> >> I do not see a good reason here yet.
> >>
> >> More to the point, the host and guest have very little in common to
> >> have one ops struct for both and then deal with questions "do we
> >> execute the code related to PF0 in the guest", etc.
> > 
> > Now that is a problem independent of the ops unification question. The
> > 'struct pci_tsm_pf0' data-type should not be used for guest devices. I
> > will rework that to be a separate data-structure, but still keep
> > 'pci_tsm_ops' unified since the signatures are identical.
> > 
> >> My life definitely got easier with 2 separate structures and my split
> >> to virt/coco/...(tsm-host.c|tsm-guest.c|tsm.c) + pci/tsm.c.
> > 
> > Here is the reason my thinking evolved from that comment. A primary goal
> > of drivers/pci/tsm.c is to give one "Device Security" lifetime model to
> > the PCI core. That means TSM driver discovery (host or guest) lights up
> > TEE I/O capabilities in the PCI topology. That supports "pci_tsm_ops +
> > mode flag" vs separate registration mechanisms for different ops.
> > 
> > I also am not perceiving the need for guest-specific ops beyond
> > ->accept(), as part of what drove my reaction to that RFC proposal was
> > the quantity of proposed ops.
> 
> 
> But this is all the guest will ever need, why allow possibility of
> (not) dealing with IDE/DOE in the guest? We will end up with
> "host-connect" and "guest-connect" when talking about this, having 2
> types of bind (VFIO bind and TDI bind) is already confusing people
> whom I tell about this TSM business. And a global pointer, why... :(
> 
> "tvm_mode == !!tsm_ops->accept" - this kind of knob should really be
> compile-time imho.
> 
> Is it going to be one TSM driver for TDX host and guest, sharing
> measurable amount of code? I am definitely missing a bigger picture
> here. Thanks,

FWIW between this and Aneesh's comments about the different ops needed
for guest vs host side I am going to separate them in the next version.