Re: [Confusing Bug] A Long-running Syzkaller Docker Crashes Host System

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 09, 2025, Zhiyu Zhang wrote:
> Dear Syzkaller Group and Linux Kernel Upstream,
> 
> I am writing to report an intermittent issue that appears when running
> Syzkaller inside a Docker container with privileged KVM access. The
> host system becomes unresponsive after prolonged fuzzing, and I hope
> your insights can help identify the root cause.
> 
> Environment Details:
> - Host Machine:
>     - OS: Ubuntu 20.04.6 LTS
>     - Kernel: x86_64 Linux 5.15.0-136-generic
>     - CPU: Intel Xeon Platinum 8268 @ 192×3.9GHz
> - Docker Container:
>     - Base Image: Ubuntu 22.04 (qgrain/kernel-fuzz:v1)
>     - Syzkaller Version: commit 4121cf9 (20250217)
>     - Startup Command: docker run -itd -p 29400:22 -v
> /PATH/KERNELS:/root/kernels --name NAME --privileged=true
> qgrain/kernel-fuzz:v1
> 
> After the fuzzing instances had been running for an extended period,
> the host system became completely inaccessible (e.g., SSH connections
> failed). Through IPMI, I observed the following repeated log messages
> on the virtual terminal:

You'll likely need some way to get more information about the state of the host
kernel when things go sideways.  E.g. force a crash and get a kdump.  Either that,
or hope you get lucky and capture an oops/panic.

> [244053.888249] kvm [3867]: vcpu2, guest pF: 0xffffffff813008ac
> vmx_set_jsr: BTF ln_inA2_DEBUGTRLSR 0x2, nop
> [244053.938264] kvm [3867]: vcpu3, guest pF: 0xffffffff813008ac
> vmx_set_jsr: BTF ln_inA2_DEBUGTRLSR 0x2, nop
> [244053.960191] kvm [3867]: vcpu0, guest pF: 0xffffffff813008ac
> vmx_set_jsr: BTF ln_inA2_DEBUGTRLSR 0x2, nop
> [244053.992411] kvm [3867]: vcpu1, guest pF: 0xffffffff813008ac
> vmx_set_jsr: BTF ln_inA2_DEBUGTRLSR 0x2, nop
> [244075.149293] kvm [3882]: vcpu3, guest pF: 0xffffffff81300744
> vmx_set_jsr: BTF ln_inA2_DEBUGTRLSR 0x2, nop

What is producing these messages?  It's not the upstream kernel.  If your system
is generating gobs of logging, it's entirely possible the logging itself is
causing problems.

> ...
> 
> Speculation on Possible Causes:
> - One possibility is that the long-term Syzkaller fuzzing workload has
> generated test cases that trigger an edge-case bug in the host KVM
> module. The repeated “guest pF” errors could indicate that a specific
> sequence of guest instructions is not being handled correctly.
> - Alternatively, prolonged high-load conditions from continuous
> fuzzing might have exposed an unhandled kernel or hardware bug related
> to virtualization—potentially in the CPU’s VMX or within the KVM
> module itself.
> 
> I apologize for the limited diagnostic information available at this
> time (find nothing relevant to KVM in system logs). The above
> speculation is preliminary, and I am unsure whether the root cause
> lies within the Syzkaller side or Kernel KVM side.
> 
> Thank you for your attention to this matter. I look forward to any
> suggestions or questions you may have.
> 
> Best regards,
> Zhiyu Zhang





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux