Re: [PATCH v3 13/13] PCI/TSM: Add Guest TSM Support

Alexey Kardashevskiy <aik@xxxxxxx> · Tue, 10 Jun 2025 18:31:42 +1000






On 4/6/25 08:26, Dan Williams wrote:
Alexey Kardashevskiy wrote:
[..]
+static int pci_tsm_accept(struct pci_dev *pdev)
+{
+	struct pci_tsm_pf0 *tsm = to_pci_tsm_pf0(pdev->tsm);
+	int rc;
+
+	struct mutex *lock __free(tsm_ops_unlock) = tsm_ops_lock(tsm);

Add an empty line.

I think we, as a community, are still figuring out the coding-style
around scope-based cleanup declarations, but I would argue, no empty
line required after mid-function variable declarations. Now, in this
case it is arguably not "mid-function", but all the other occurrences of
tsm_ops_lock() are checking the result on the immediate next line.

Do not really care as much :)

Hey, what's kernel development without little side-arguments about
whitespace? Will leave it alone for now.

:)

+	if (!lock)
+		return -EINTR;
+
+	if (tsm->state < PCI_TSM_CONNECT)
+		return -ENXIO;
+	if (tsm->state >= PCI_TSM_ACCEPT)
+		return 0;
+
+	/*
+	 * "Accept" transitions a device to the run state, it is only suitable
+	 * to make that transition from a known DMA-idle (no active mappings)
+	 * state.  The "driver detached" state is a coarse way to assert that
+	 * requirement.

And then the userspace will modprobe the driver, which will enable BME
and MSE which in turn will render the ERROR state, what is the plan
here?

Right, so the notifier proposal [1] gave me pause because of perceived
complexity and my general reluctance to rely on the magic of notifiers
when an explicit sequence is easier to read/maintain. The proposal is
that drivers switch to TDISP aware setup helpers that understand that
BME and MSE were handled at LOCK. Becauase it is not just
pci_enable_device() / pci_set_master() awareness that is needed but also
pci_disable_device() / pci_clear_master() flows that need consideration
for avoiding/handling the TDISP ERROR state.

I.e. support for TDISP-unaware drivers is not a goal.

So your plan it to modify driver to switch to the secure mode on the
go? (not arguing, just asking for now)

Effectively, yes. In the non-TDISP case the driver handles the MSE+BME
transition, in the TDISP case the driver also effectively handles the
same as BME+MSE are superseded by the LOCKED state.

So TVM userspace is responsible for marking the device "accepted" and the
driver checks that state before enabling the device (LOCKED -> RUN).

This also allows for kernel debug overrides of the acceptance policy,
because, in the end, the Linux kernel trusts drivers. If the TVM owner
loads a driver that ignores the "accepted" bit, that is the owner's
prerogative. If the TVM owner does not trust a driver there are multiple
knobs under the TVM's control to mitigate that mistrust.

The alternative is - let TSM do the attestation and acceptance and
then "modprobe tdispawaredriver tdisp=on" and change the driver to
assume that BME and MSE are already enabled.

My heartburn with that is that there is an indefinite amount of time
whereby a device is MSE + BME active without any driver to deal with the
consequences.

(out of curiosity) AMD can block DMA until the guest decides it is ready and enabled IOMMU for the device, cannot TDX do the same?

And what is the consequence of MSE being enabled? It is in the guest's best interest to avoid touching MMIO before things are set up. p2p DMA?


For example, what if the device needs some form of reset /
re-initialization to quiet an engine or silence an interrupt that
immediately starts firing upon the LOCKED -> RUN transition.

The OS will have to ignore such interrupts, what is a problem with it?

Userspace
is not in a good position to make judgements about the state of the
device outside of the Interface Report.

There are still details to work out like supporting drivers that want to
stay loaded over the UNLOCKED->LOCKED->RUN transitions, and whether the
"accept" UAPI triggers entry into "RUN" or still requires a driver to
perform that.

Yes, I now think entry into "RUN" needs to be a driver triggered event
to maintain parity with the safety of the non-TDISP case.

[..]
@@ -135,6 +141,8 @@ struct pci_tsm_guest_req_info {
     * @bind: establish a secure binding with the TVM
     * @unbind: teardown the secure binding
     * @guest_req: handle the vendor specific requests from TVM when bound
+ * @accept: TVM-only operation to confirm that system policy allows
+ *	    device to access private memory and be mapped with private mmio.
     *
     * @probe and @remove run in pci_tsm_rwsem held for write context. All
     * other ops run under the @pdev->tsm->lock mutex and pci_tsm_rwsem held
@@ -150,6 +158,7 @@ struct pci_tsm_ops {
    	void (*unbind)(struct pci_tdi *tdi);
    	int (*guest_req)(struct pci_dev *pdev,
    			 struct pci_tsm_guest_req_info *info);
+	int (*accept)(struct pci_dev *pdev);


When I posted my v1, I got several comments to not put host and guest
callbacks together which makes sense (as only really "connect" and
"status" are shared, and I am not sure I like dual use of "connect")
and so did I in v2 and yet you are pushing for one struct for all?

Frankly, I missed that feedback and was focused on how to simply extend
PCI to understand TSM semantics.

That was literally you (and I think someone else mentioned it too) ;)

https://lore.kernel.org/all/66d7a10a4d621_3975294ac@xxxxxxxxxxxxxxxxxxxxxxxxx.notmuch/

Ugh, yes, it seems that joke: "debugging is a murder mystery where you
find out you were the killer the whole time." can also be true for patch
review.>>> "Lets not mix HV and VM hooks in the same ops without good reason" and
I do not see a good reason here yet.

More to the point, the host and guest have very little in common to
have one ops struct for both and then deal with questions "do we
execute the code related to PF0 in the guest", etc.

Now that is a problem independent of the ops unification question. The
'struct pci_tsm_pf0' data-type should not be used for guest devices. I
will rework that to be a separate data-structure, but still keep
'pci_tsm_ops' unified since the signatures are identical.

My life definitely got easier with 2 separate structures and my split
to virt/coco/...(tsm-host.c|tsm-guest.c|tsm.c) + pci/tsm.c.

Here is the reason my thinking evolved from that comment. A primary goal
of drivers/pci/tsm.c is to give one "Device Security" lifetime model to
the PCI core. That means TSM driver discovery (host or guest) lights up
TEE I/O capabilities in the PCI topology. That supports "pci_tsm_ops +
mode flag" vs separate registration mechanisms for different ops.

I also am not perceiving the need for guest-specific ops beyond
->accept(), as part of what drove my reaction to that RFC proposal was
the quantity of proposed ops.


But this is all the guest will ever need, why allow possibility of (not) dealing with IDE/DOE in the guest? We will end up with "host-connect" and "guest-connect" when talking about this, having 2 types of bind (VFIO bind and TDI bind) is already confusing people whom I tell about this TSM business. And a global pointer, why... :(

"tvm_mode == !!tsm_ops->accept" - this kind of knob should really be compile-time imho.

Is it going to be one TSM driver for TDX host and guest, sharing measurable amount of code? I am definitely missing a bigger picture here. Thanks,


So today's "good reason" is the useful programming pattern of "push
complexity from core-to-leaf". Where the low-level TSM driver needs to
be "mode" aware for some operations.


--
Alexey