On Mon, 12 May 2025 15:09:09 +0100, David Sauerwein <dssauerw@xxxxxxxxx> wrote: > > Hi Jing, > > After pulling this patch in via the v6.6.64 and v5.10.226 LTS releases, I see > NULL pointer dereferences in some guests. The dereference happens in different > parts of the kernel outside of the GIC driver (file systems, NVMe driver, > etc.). The issue only appears once every few hundred DISCARDs / guest boots. > Reverting the commit does fix the problem. I have seen multiple different guest > kernel versions (4.14, 5.15) and distributions exhibit this issue. Where is the guest stack trace? > The issue looks like some kind of race. I think the guest re-uses the memory > allocated for the ITT before the hypervisor is actually done with the DISCARD > command, i.e. before it zeros the ITE. From what I can tell, the guest should > wait for the command to finish via its_wait_for_range_completion(). I tried > locking reads to its->cwriter in vgic_mmio_read_its_cwriter() and its->creadr > in vgic_mmio_read_its_creadr() with its->cmd_lock in the hypervisor kernel, but > that did not help. I also instrumented the guest kernel both via printk() and > trace events. In both cases the issue disappears once the instrumentation is in > place, so I'm not able to fully observe what is happening on the guest side. > > Do you have an idea of what might cause the issue? I'm a bit sceptical of this analysis, because KVM makes no use of the guest's owned memory outside of a save/restore event, and otherwise shadows everything. So what are you *exactly* doing here? Have you reproduced this with an upstream, current KVM host? Thanks, M. -- Without deviation from the norm, progress is not possible.