On Fri, 13 Jun 2025 00:24:13 +0200, Ryan Lahfa wrote: > Le Wed, Oct 23, 2024 at 09:38:39PM +0200, Antony Antony a écrit : > > On Wed, Oct 23, 2024 at 11:07:05 +0100, David Howells wrote: > > > Hi Antony, > > > > > > I think the attached should fix it properly rather than working around it as > > > the previous patch did. If you could give it a whirl? > > > > Yes this also fix the crash. > > > > Tested-by: Antony Antony <antony.antony@xxxxxxxxxxx> > > I cannot confirm this fixes the crash for me. My reproducer is slightly > more complicated than Max's original one, albeit, still on NixOS and > probably uses 9p more intensively than the automated NixOS testings > workload. I'm seeing a problem in the same area - the symptom is slightly different, but the location seems very similar. I'm also running a NixOS image. Mounting a 9p filesystem in qemu with `cache=readahead`, reading a 12943-byte file, in the guest I do see a 12943-byte file, but only the first 12288 bytes are populated: the rest are zero. This also reproduces (most but not all of the time) on 6.16-rc7, but not on all host machines I've tried. After applying a simplified version of [1] (i.e. [2]), the problem does not reproduce anymore. It seems something in `p9_client_read_once` somehow leaves the iov_iter in an unhealthy state. It would be good to understand exactly what, but I haven't been able to figure that out yet. I have a smallish nix-based reproducer at [3], and a more involved setup with a lot of logging enabled and a convenient way to attach gdb at [4]. You start the VM and then 'cat /repro/default.json' manually, and see if it looks 'truncated'. Interestingly, the file is read in two p9 read calls: one of 12288 bytes and one of 655 bytes. The first read is a zero-copy one, the second is not zero-copy (because it is smaller than 1024). I've also tried with a slightly larger version of the file, that is read as 2 zero-copy reads, and I have not been able to reproduce the problem with that. From my (admittedly limited) understanding the non-zerocopy code path looks fine, though. I hope this is helpful - I'd be happy to keep looking into this further, but any help pointing me in the right direction would be much appreciated :) Kind regards, Arnout [1] https://lore.kernel.org/all/3327438.1729678025@xxxxxxxxxxxxxxxxxxxxxx/T/#mc97a248b0f673dff6dc8613b508ca4fd45c4fefe [2] https://codeberg.org/raboof/nextcloud-onlyoffice-test-vm/src/branch/reproducer-with-debugging/kernel-use-copied-iov_iter.patch [3] https://codeberg.org/raboof/nextcloud-onlyoffice-test-vm/src/branch/small-reproducer [4] https://codeberg.org/raboof/nextcloud-onlyoffice-test-vm/src/branch/reproducer-with-debugging