On Fri, Aug 01, 2025 at 04:04:39PM -0700, Chris Li wrote: > My philosophy is that the LUO PCI subsystem is for service of the PCI > device driver. Ultimately it is the PCI device driver who decides what > part of the config space they want to preserve or overwrite. The PCI > layer is just there to facilitate that service. I don't think this makes any sense at all. There is nothing the device driver can contribute here. > If you still think it is unjustifiable to have one test try to > preserve all config space for liveupdate. I do think it is unjustifiable, it is architecurally wrong. You only should be preserving the absolute bare minimum of config space bits and everything else should be rewritten by the next kernel in the normal way. This MSI is a prime example of a nonsensical outcome if you take the position the config space should not be written to. > > Only some config accesse are bad. Each and every "bad" one needs to be > > clearly explained *why* it is bad and only then mitigated. > > That is exactly the reason why we have the conservative test that > preserves every config space test as a starting point. That is completely the opposite of what I said. Preserving everything is giving up on the harder job of identifying which bits cannot be changed, explaining why they can't be changed, and then mitigating only those things. > Another constraint is that the data center servers are dependent on > the network device able to connect to the network appropriately. Take > diorite NIC for example, if I try only preserving ATS/PASID did not > finish the rest of liveupdate, the nic wasn't able to boot up and > connect to the network all the way. Even if the test passes for the > ATS part, the over test fails because the server is not back online. I > can't include that test into the test dashboard, because it brings > down the server. The only way to recover from that is rebooting the > server, which takes a long time for a big server. I can only keep that > non-passing test as my own private developing test, not the regression > test set. I have no idea what this is trying to say and it sounds like you also can't explain exactly what is "wrong" and justify why things are being preserved. Again, your series should be starting simpler. Perserve the dumbest simplest PCI configuration. Certainly no switches, P2P, ATS or PASID. When that is working you can then add on more complex PCI features piece by piece. Jason