On Fri, 25 Apr 2025 16:12:26 +0200 Karolina Stolarek <karolina.stolarek@xxxxxxxxxx> wrote: > On 25/04/2025 15:14, Jonathan Cameron wrote: > > On Fri, 25 Apr 2025 12:32:10 +0200 > > Karolina Stolarek <karolina.stolarek@xxxxxxxxxx> wrote: > >> > >> It's possible that some of the nuances of this escaped me. I decided to > >> pick up the series, as I saw "PCI Express bus error injection via GHES" > >> script and thought it might be useful. > > > > With Mauro's series you can inject (on ARM64 virt) any CPER record you > > like. That doesn't synchronize the wider state of the system though > > so may not exercise everything (PCI registers etc not updated as it > > is only injecting the record). Mostly it just works, as remarkably > > few error handlers actually take the state of the components on which > > the error is reported into account. > > OK, that means even if we manage to inject a PCIe error, AER wouldn't be > able to look up the Source ID and other values it needs to report an > error, which is not quite the solution I was looking for. Isn't the source ID in the CPER record? (Device ID field) or do you mean something else? > > > The aim is specifically to allow exercising FW first error handling > > paths because it's a pain to get real systems that have firmware to inject > > the full range of what the kernel etc need to handle. > > Does this include PCIe errors? If so, that probably doesn't make sense > to try to test my patch on an actual system? Ideally test it on a real system as well, but indeed the intent is to allow testing of PCI errors on emulation. > > > x86 support for emulated injection is a work in progress (more of a mess wrt > > to the different ways the event signaling is handled than it is on arm64). > > > > I did have an earlier version of that work wired up to the same > > hooks as the native CXL error injection but I dropped it from my QEMU > > CXL staging tree for now as it was a pain to rebase whilst Mauro was rapidly > > revising the infrastructure. I'll bring it back when I get time. > > I understand, I saw some of your series while looking for ways to test > my patch. Thank you very much for your work. As you can see, there are > people actually looking forward to it :) Great! I'll try and get back to wiring it all up again sometime soon. Jonathan > > > All the best, > Karolina > > > > > Jonathan > > > >> > >>> Unfortunately there are some typos in the spec (FIRMWARE_FIRST, > >>> FIRMWAREFIRST in 18.4), so it's a little hard to find all the > >>> references. > >> > >> Thanks for the pointers, I'll take a look. > >> > >>> It's a long shot, but I added Yijun as a Dell contact that who might > >>> have a pointer to someone who could possibly test GHES logging on a > >>> Dell box with and without your patch so we could have a concrete > >>> comparison of the dmesg log differences. > >> > >> Thank you very much. Let's see, maybe we'll get lucky :) > >> > >> All the best, > >> Karolina > >> > >>> > >>>>> If you can't produce actual logs for comparison, I think we can take > >>>>> info from a sample log somebody has posted and synthesize what the > >>>>> changes would be after this patch. > >>>> > >>>> I also found some logs at some point, mostly from 2021 and 2023, but I felt > >>>> bad about mocking up the messages and tried to produce actual logs. If I > >>>> can't find a way to get this working in two weeks, I'll revisit this idea. > >>>> > >>>> All the best, > >>>> Karolina > >>>> > >>>> ------------------------------------------------------------- > >>>> [1] - https://lore.kernel.org/lkml/76824dfc6bb5dd23a9f04607a907ac4ccf7cb147.1740653898.git.mchehab+huawei@xxxxxxxxxx/ > >> > >> > > > >