Re: [REGRESSION] 9pfs issues on 6.12-rc1

Arnout Engelen <arnout@xxxxxxxx> · Sun, 10 Aug 2025 19:57:11 +0200

On Fri, 13 Jun 2025 00:24:13 +0200, Ryan Lahfa wrote:
> Le Wed, Oct 23, 2024 at 09:38:39PM +0200, Antony Antony a écrit :
> > On Wed, Oct 23, 2024 at 11:07:05 +0100, David Howells wrote:
> > > Hi Antony,
> > > 
> > > I think the attached should fix it properly rather than working around it as
> > > the previous patch did.  If you could give it a whirl?
> > 
> > Yes this also fix the crash.
> > 
> > Tested-by: Antony Antony <antony.antony@xxxxxxxxxxx>
> 
> I cannot confirm this fixes the crash for me. My reproducer is slightly
> more complicated than Max's original one, albeit, still on NixOS and
> probably uses 9p more intensively than the automated NixOS testings
> workload.

I'm seeing a problem in the same area - the symptom is slightly different,
but the location seems very similar. I'm also running a NixOS image.
Mounting a 9p filesystem in qemu with `cache=readahead`, reading a
12943-byte file, in the guest I do see a 12943-byte file, but only
the first 12288 bytes are populated: the rest are zero. This also
reproduces (most but not all of the time) on 6.16-rc7, but not on all host
machines I've tried.

After applying a simplified version of [1] (i.e. [2]), the problem does not
reproduce anymore. It seems something in `p9_client_read_once` somehow
leaves the iov_iter in an unhealthy state. It would be good to understand
exactly what, but I haven't been able to figure that out yet.

I have a smallish nix-based reproducer at [3], and a more involved setup
with a lot of logging enabled and a convenient way to attach gdb at [4].
You start the VM and then 'cat /repro/default.json' manually, and see if
it looks 'truncated'.

Interestingly, the file is read in two p9 read calls: one of 12288 bytes and
one of 655 bytes. The first read is a zero-copy one, the second is not
zero-copy (because it is smaller than 1024). I've also tried with a slightly
larger version of the file, that is read as 2 zero-copy reads, and I have not
been able to reproduce the problem with that. From my (admittedly limited)
understanding the non-zerocopy code path looks fine, though.

I hope this is helpful - I'd be happy to keep looking into this further,
but any help pointing me in the right direction would be much appreciated :)

Kind regards,

Arnout

[1] https://lore.kernel.org/all/3327438.1729678025@xxxxxxxxxxxxxxxxxxxxxx/T/#mc97a248b0f673dff6dc8613b508ca4fd45c4fefe
[2] https://codeberg.org/raboof/nextcloud-onlyoffice-test-vm/src/branch/reproducer-with-debugging/kernel-use-copied-iov_iter.patch
[3] https://codeberg.org/raboof/nextcloud-onlyoffice-test-vm/src/branch/small-reproducer
[4] https://codeberg.org/raboof/nextcloud-onlyoffice-test-vm/src/branch/reproducer-with-debugging