Re: [PATCH 6/7] samples/devsec: Introduce a "Device Security TSM" sample driver

Jason Gunthorpe <jgg@xxxxxxxxxx> · Fri, 29 Aug 2025 20:34:53 -0300

On Fri, Aug 29, 2025 at 01:00:09PM -0700, dan.j.williams@xxxxxxxxx wrote:
> Jason Gunthorpe wrote:
> > On Thu, Aug 28, 2025 at 02:38:14PM -0700, dan.j.williams@xxxxxxxxx wrote:
> > > > device_cc_probe() doesn't save anything, doesn't this just get into an
> > > > endless loop of EPROBE_DEFER? Usually the kernel will retry these
> > > > things during booting?
> > > 
> > > Hmm, no, deferred probing retriggers after a one-time boot timeout
> > > (extended by driver registration events) and after any device
> > > successfully completes probe.
> > 
> > So it is not "endless" but it is also not "single probe then wait till
> > accept". I'm not keen on using this mechanism, I think the things
> > people want to do in the T=0 mode are going to be time consuming and
> > repeatedly doing that time consuming step is not a good idea.
> 
> It would only ever run multiple times if the driver is built-in or
> loaded early, which is also mitigated by disabling autoprobe like you
> have below. So that problem is manageable.

There can be many tdisp devices loading after boot so it could be many
times while all the booting happens. I'm imagining around 8-15 TDISP
devices as what we may see in some real systems.

> In the past this idea has been met with "but but typical distro kernels
> have lots of built-in drivers that *may* be unsafe", and the answer is
> "yes, a VM image with a CC aware / specific kernel config is a
> requirement".

Yeah, possibly that is where this is going. Or at least someone on the
distro side is well teed up to propose some kind of pre-initrd
mechanism to mitigate this down the road and get back to single kernel
build.

> >    Maybe we also need a small kernel change to allow userspace to make
> >    drivers_autoprobe false for all future busses too.
> 
> I do think we need a mechanism to say, "no more dynamic device
> enumeration", but a coarse and future promise "no autoprobe of any bus"
> I fear is going to have a long tail of problems especially with design
> patterns like "faux_device" and "auxiliary_device".

For the moment I would probably just have userspace special case
those and automatically run the userspace probing sequence.

> As far as I understand, these CC environments do not immediately have
> secrets to protect at launch. Also, not sure how many are ready to
> validate the launch state of the TVM that early. 

It is not about secrets, it is about protecting the integrity of the
kernel - the software you intend to load secrets into. mlx5 is a 300k
LOC driver. I fully believe that an attacker prentending to be the
device can attack the driver and insert hostile code into the kernel
using this driver.

As such an attack would escape measurement it is completely
invisible. The only prevntion is to control what parts of the kernel
the VMM side can reach to attack by denying driver binding.

If the kernel now running hostile code gets secrets released the
hostile kernel code can ex-filtrate them back to the VMM.

It is the same argument we see MS making about secure boot, you have
to take steps to ensure that unmeasured code is never injected into
the system before you complete the boot and release the secrets.

>From this view point any compromise that allows unmeasured code into
the boot chain is a security issue.

The same argument is made for T=1 devices.. I imagine an attack where
the VM accepts a T=1 device, and it instantly DMAs all over the kernel
and effectively makes itself invisible to the verifier. Hopefully this
is prevented by measurements made by the TSM, but IDK, seems scary.

However, I know there are alternative views. For instance that CC VM
users should just trust the CSP, trust their boot flow, trust their
provided VM kernels, trust their verfiers, and if you are already
agreeing to that trust then defending against a hostile VMM is silly.

IMHO I don't know where the industry will end up, I see people on both
sides of this debate pushing for their perspective. I'd like the
kernel to be happy with a userspace that wants to trust the VMM and a
userspace that is untrusting and very paranoid.

> I think it is more a case of allow everything by default to start
> (whatever is in ACPI, and T=0 PCI devices). Later the relying party
> either says "no, you have enumerated devices that should not be
> there", or "yes, launch state looks good, lock device topology,
> proceed with the performance enhancement of converting some PCI
> TDISP devices to T=1 operation, here are your secrets".

This is really the above "we trust the VMM" sort of view point, and
from a kernel perspective I think it is fine so long as userspace is
the one making the decision to work like that. I don't want to see the
kernel force the weakest security option onto the userspace.

IMHO the minimal issue here is what should the kernel do with a T=0
device that has TDISP capability..

We don't really want the kernel to autobind a driver in T=0 mode, that
is wasteful if we are going to unbind it, lock/run and then bind it
again.

So, IMHO, the bare minimum would be for the kernel to disable auto
binding for TDISP capable devices only and shoot out a udev event
signaling that userspace has to bind the device instead.

Let udev take it from there, and udev can then do whatever dance we
define.

Then we can have everything from a minimal security posture to a very
tight drivers_autoprobe situation, based on what userspace wants to
do.

> >      mlx5 is allowed to bind to a RUN device after measuring and
> >      verifying it, and never otherwise.
> 
> ...and if userspace binds mlx5 pre-RUN that is not the kernel's problem.
> I state that explicitly not for you, but because of the rejection of the
> "device filter" in-kernel mechanism previously.

Right. I am stating the system level goal, expecting that userspace is
in control and conforming to it. The kernel just has to not bypass the
policy choices userspace is making.

> >    Basically userspace policy is entirely in control if a device is
> >    "accepted" by the ccVM or not. The kernel won't auto bind
> >    a driver to a physical device. It would be driven off of
> >    uevents, I guess through new CC focused features in udev.
> 
> Yes, the only quibble is whether that "kernel won't bind" is more a
> "userspace shall lock and validate device topology" at a certain point
> in the boot flow. Userspace may need to be prepared for some unaccepted
> devices to bind before that point.

My argument is "lock and validate" is a fine option, but kernel should
be designed to allow the more secure option of "approve every single
driver bind". Userspace can pick, but kernel should be desigend to do
both.

> The kernel problems to solve are "accepted" flag and maybe documenting
> to driver writers / udev developers strategies to handle the "prepare"
> problem.

Yes, and maybe some small less-critical kernel items:

 - modules.alias includes the driver name
 - A way to default off drivers_autoprobe
 - A way for userspace to tell which busses are discovered from
   HW vs internal to the kernel (aux, fuax)
 - ccprepare_ drivers
 - A way to restrict built in drivers at initrd creation time

But I think each of these topics can be its own independent thing, and
I would send them along side RFC patches for udev if that is how
things are going to go.

> For RAS I do still like the property of a driver that will field errors
> also having everything it needs to take a device from reset back to the
> ready-to-accept state. That can be solved later, and maybe the outcome 
> is "cc_prepare" is incompatible with "recovery".

Yeah, probably.

> > Sure, I think you shold drop this patch from this series and have this
> > series focus only on creating an accepted struct device environment
> > that a driver can bind to and operate.
> 
> You mean drop the device_cc_probe() piece. The rest of patch is starting
> the work of a "accepted struct device environment" with a single flag
> that MMIO and DMA infrastructure can reference.

Yes, sorry, I forgot which patch this was: :)

> It is trivial for a driver to open code EPROBE_DEFER so
> device_cc_probe() is not putting any burden on the kernel besides
> documentation, but I will drop it for now.

Sort of, it also establishes a kind of uAPI that I think is best
avoided until things are a bit more mature..

Jason