On Thu, Jul 24, 2025 at 04:22:15AM -0400, Michael S. Tsirkin wrote: > On Thu, Jul 24, 2025 at 10:14:36AM +0200, Stefano Garzarella wrote: > > CCing Will Thanks. > > On Thu, 24 Jul 2025 at 09:48, Michael S. Tsirkin <mst@xxxxxxxxxx> wrote: > > > > > > On Wed, Jul 23, 2025 at 08:04:42AM -0700, Breno Leitao wrote: > > > > Hello, > > > > > > > > I've seen a crash in linux-next for a while on my arm64 server, and > > > > I decided to report. > > > > > > > > While running stress-ng on linux-next, I see the crash below. > > > > > > > > This is happening in a kernel configure with some debug options (KASAN, > > > > LOCKDEP and KMEMLEAK). > > > > > > > > Basically running stress-ng in a loop would crash the host in 15-20 > > > > minutes: > > > > # while (true); do stress-ng -r 10 -t 10; done > > > > > > > > >From the early warning "virt_to_phys used for non-linear address", > > > > mmm, we recently added nonlinear SKBs support in vhost-vsock [1], > > @Will can this issue be related? > > Good point. > > Breno, if bisecting is too much trouble, would you mind testing the commits > c76f3c4364fe523cd2782269eab92529c86217aa > and > c7991b44d7b44f9270dec63acd0b2965d29aab43 > and telling us if this reproduces? That's definitely worth doing, but we should be careful not to confuse the "non-linear address" from the warning (which refers to virtual addresses that lie outside of the linear mapping of memory, e.g. in the vmalloc space) and "non-linear SKBs" which refer to SKBs with fragment pages. Breno -- when you say you've been seeing this "for a while", what's the earliest kernel you know you saw it on? > > > > I suppose corrupted data is at vq->nheads. > > > > > > > > Here is the decoded stack against 9798752 ("Add linux-next specific > > > > files for 20250721") > > > > > > > > > > > > [ 620.685144] [ T250731] VFIO - User Level meta-driver version: 0.3 > > > > [ 622.394448] [ T250254] ------------[ cut here ]------------ > > > > [ 622.413492] [ T250254] virt_to_phys used for non-linear address: 000000006e69fe64 (0xcfcecdcccbcac9c8) So here's the bad (non-linear) pointer. Do you know if 0xcfcecdcccbcac9c8 correlates with the packet data that stress-ng is generating? I wonder if we're somehow overflowing vq->iov[]. Will