On Mon, May 12, 2025 at 04:43:05PM +0200, Christoph Hellwig wrote: > xfs_zone_gc_write_chunk writes out the data buffer read in earlier using > the same bio, and currenly looks at bv_offset for the offset into the > scratch folio for that. But commit 26064d3e2b4d ("block: fix adding > folio to bio") changed how bv_page and bv_offset are calculated for > adding larger folios, breaking this fragile logic. > > Switch to extracting the full physical address from the old bio_vec, > and calculate the offset into the folio from that instead. > > This fixes data corruption during garbage collection with heavy rockdsb > workloads. Thanks to Hans for tracking down the culprit commit during > long bisection sessions. > > Fixes: 26064d3e2b4d ("block: fix adding folio to bio") > Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection") > Reported-by: Hans Holmberg <Hans.Holmberg@xxxxxxx> > Signed-off-by: Christoph Hellwig <hch@xxxxxx> > --- > fs/xfs/xfs_zone_gc.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c > index 8c541ca71872..a045b1dedd68 100644 > --- a/fs/xfs/xfs_zone_gc.c > +++ b/fs/xfs/xfs_zone_gc.c > @@ -801,7 +801,8 @@ xfs_zone_gc_write_chunk( > { > struct xfs_zone_gc_data *data = chunk->data; > struct xfs_mount *mp = chunk->ip->i_mount; > - unsigned int folio_offset = chunk->bio.bi_io_vec->bv_offset; > + phys_addr_t bvec_paddr = > + bvec_phys(bio_first_bvec_all(&chunk->bio)); Hmm. I started wondering why you can't reuse chunk->scratch->offset in the bio_add_folio_nofail call below. I /think/ that's because xfs_zone_gc_start_chunk increments chunk->scratch->offset after adding the folio to the bio? bio_add_folio_nofail(bio, chunk->scratch->folio, chunk->len, chunk->scratch->offset); chunk->scratch->offset += chunk->len; And then we can attach the same scratch->folio to different read bios. Each bio gets a different offset within the folio, right? So xfs_zone_gc_write_chunk really does need to find the offset_in_folio from the read bio. And I guess that's why you use bvec_phys to figure that out (instead of, say, wasting space recording it again in the xfs_gc_bio), correct? If the answers to all my questions are 'yes' then I think I grok this sufficiently to say Reviewed-by: "Darrick J. Wong" <djwong@xxxxxxxxxx> --D > struct xfs_gc_bio *split_chunk; > > if (chunk->bio.bi_status) > @@ -816,7 +817,7 @@ xfs_zone_gc_write_chunk( > > bio_reset(&chunk->bio, mp->m_rtdev_targp->bt_bdev, REQ_OP_WRITE); > bio_add_folio_nofail(&chunk->bio, chunk->scratch->folio, chunk->len, > - folio_offset); > + offset_in_folio(chunk->scratch->folio, bvec_paddr)); > > while ((split_chunk = xfs_zone_gc_split_write(data, chunk))) > xfs_zone_gc_submit_write(data, split_chunk); > -- > 2.47.2 > >