Re: [PATCH 3/7] device core: Introduce confidential device acceptance

<dan.j.williams@xxxxxxxxx> · Thu, 28 Aug 2025 13:07:50 -0700

Greg KH wrote:
> On Tue, Aug 26, 2025 at 08:52:55PM -0700, Dan Williams wrote:
> > --- a/drivers/base/base.h
> > +++ b/drivers/base/base.h
> > @@ -98,6 +98,8 @@ struct driver_private {
> >   *	the device; typically because it depends on another driver getting
> >   *	probed first.
> >   * @async_driver - pointer to device driver awaiting probe via async_probe
> > + * @cc_accepted - track the TEE acceptance state of the device for deferred
> > + *	probing, MMIO mapping type, and SWIOTLB bypass for private memory DMA.
> >   * @device - pointer back to the struct device that this structure is
> >   * associated with.
> >   * @dead - This device is currently either in the process of or has been
> > @@ -115,6 +117,9 @@ struct device_private {
> >  	struct list_head deferred_probe;
> >  	const struct device_driver *async_driver;
> >  	char *deferred_probe_reason;
> > +#ifdef CONFIG_CONFIDENTIAL_DEVICES
> > +	bool cc_accepted;
> > +#endif
> >  	struct device *device;
> >  	u8 dead:1;
> 
> Why did you not just use another u8:1 at the end?  You kind of added a
> big hole in the structure that is created for every device :(

It was the concern for colliding bitfield updates. I will go audit if
all ->dead and ->cc_accepted updates can be guaranteed to be ordered by
the device_lock(), or otherwise put this bool in a better place in the
struct to not cause a hole.

> >  };
> > diff --git a/drivers/base/coco.c b/drivers/base/coco.c
> > new file mode 100644
> > index 000000000000..97c22d0e9247
> > --- /dev/null
> > +++ b/drivers/base/coco.c
> > @@ -0,0 +1,96 @@
> > +// SPDX-License-Identifier: GPL-2.0
> 
> No copyright at the top?  Bold :)

Will fix.

Hanlon's razor v2, "never attribute to boldness..."

> > +#include <linux/device.h>
> > +#include <linux/dev_printk.h>
> > +#include <linux/lockdep.h>
> > +#include "base.h"
> > +
> > +/*
> > + * Confidential devices implement encrypted + integrity protected MMIO and have
> > + * the ability to issue DMA to encrypted + integrity protected System RAM. The
> > + * device_cc_*() helpers aid buses in setting the acceptance state, drivers in
> > + * preparing and probing the acceptance state, and other kernel subsystem in
> > + * augmenting behavior in the presence of accepted devices (e.g.
> > + * ioremap_encrypted()).
> > + */
> > +
> > +/**
> > + * device_cc_accept(): Mark a device as accepted for TEE operation
> 
> What does TEE mean here?  I feel you mix "confidential" and TEE a bunch.

The distinction in my mind of merely Confidential and TEE acceptance, is
that a Trusted Execution Environment implies a larger attestation
infrastructure beyond just "are the bits on the wire protected by AES
XTS". A TEE has a launch state attestation for the TVM a vTPM or other
Runtime Measurement Scheme to have a relying party validate that changes
to the TVM do not violate expectations of the TEE, and in this case
"PCIe Device Acceptance" is one more event for the TEE to validate with
a relying party. The confidential device mechanism is just a property
that allows the TEE to maintain its confidentiality and data integrity
assumptions.

I will include a form of that commentary in v2 because it is an
important distinction.

[..]
> > +/**
> > + * device_cc_probe(): Coordinate dynamic acceptance with a device driver
> > + * @dev: device to defer probing while acceptance pending
> > + *
> > + * Dynamically accepted devices may need a driver to perform initial
> > + * configuration to get the device into a state where it can be accepted. Use
> > + * this helper to exit driver probe at that partial device-init point and log
> > + * this TEE acceptance specific deferral reason.
> > + *
> > + * This is an exported helper for device drivers that need to coordinate device
> > + * configuration state and acceptance.
> > + */
> > +int device_cc_probe(struct device *dev)
> > +{
> > +	/*
> > +	 * See work_on_cpu() in local_pci_probe() for one reason why
> > +	 * lockdep_assert_held() can not be used here.
> > +	 */
> > +	WARN_ON_ONCE(!mutex_is_locked(&dev->mutex));
> 
> If not locked you just keep going?  Why not return an error?

It is ok to keep going because this is a warning that should only fire
on a kernel developer workstation when they have somehow messed up a
bus's probe implementation. It is more for documentation purposes like
lockdep_assert_held(). Maybe a lockdep_assert_remote_held() for this
work_on_cpu() case where the thread holding the lock is also in charge
of flushing work_on_cpu()?

Either way, this race fails safely. If the driver proceeds with falsely
believing the device is accepted the hardware will throw errors on MMIO
cycles, and fail to issue DMA. Failure in the other direction just means
the driver fall into deferred probing unnecessarily.