在 2025/4/3 23:05, Matthew Wilcox 写道:
On Thu, Apr 03, 2025 at 08:06:53PM +1030, Qu Wenruo wrote:
Recently I hit a bug when developing the large folios support for btrfs.
That we call filemap_get_folios_contig(), then lock each returned folio.
(We also have a case where we unlock each returned folio)
However since a large folio can be returned several times in the batch,
this obviously makes a deadlock, as btrfs is trying to lock the same
folio more than once.
Sorry, what? A large folio should only be returned once. xas_next()
moves to the next folio. How is it possible that
filemap_get_folios_contig() returns the same folio more than once?
But that's exactly what I got from filemap_get_folios_contig():
lock_delalloc_folios: r/i=5/260 locked_folio=720896(65536) start=782336
end=819199(36864)
lock_delalloc_folios: r/i=5/260 found_folios=1
lock_delalloc_folios: r/i=5/260 i=0 folio=720896(65536)
lock_delalloc_folios: r/i=5/260 found_folios=8
lock_delalloc_folios: r/i=5/260 i=0 folio=786432(262144)
lock_delalloc_folios: r/i=5/260 i=1 folio=786432(262144)
lock_delalloc_folios: r/i=5/260 i=2 folio=786432(262144)
lock_delalloc_folios: r/i=5/260 i=3 folio=786432(262144)
lock_delalloc_folios: r/i=5/260 i=4 folio=786432(262144)
lock_delalloc_folios: r/i=5/260 i=5 folio=786432(262144)
lock_delalloc_folios: r/i=5/260 i=6 folio=786432(262144)
lock_delalloc_folios: r/i=5/260 i=7 folio=786432(262144)
r/i is the root and inode number from btrfs, and you can completely
ignore it.
@locked_folio is the folio we're already holding a lock, the value
inside the brackets is the folio size.
@start and @end is the range we're searching for, the value inside the
brackets is the search range length.
The first iteration returns the current locked folio, and since the
range inside that folio is only 4K, thus it's only returned once.
The next 8 slots are all inside the same large folio at 786432,
resulting duplicated entries.
Then I looked into the caller of filemap_get_folios_contig() inside
mm/gup, and it indeed does the correct skip.
... that code looks wrong to me.
It looks like it's xas_find() is doing the correct skip by calling
xas_next_offset() -> xas_move_index() to skip the next one.
But the filemap_get_folios_contig() only calls xas_next() by increasing
the index, not really skip to the next folio.
Although I can be totally wrong as I'm not familiar with the xarray
internals at all.
However I totally agree the duplicated behavior (and the extra handling
of duplicated entries) looks very wrong.
Thanks,
Qu