On Fri, Apr 04, 2025 at 07:46:59AM +1030, Qu Wenruo wrote: > > > 在 2025/4/3 23:05, Matthew Wilcox 写道: > > On Thu, Apr 03, 2025 at 08:06:53PM +1030, Qu Wenruo wrote: > > > Recently I hit a bug when developing the large folios support for btrfs. > > > > > > That we call filemap_get_folios_contig(), then lock each returned folio. > > > (We also have a case where we unlock each returned folio) > > > > > > However since a large folio can be returned several times in the batch, > > > this obviously makes a deadlock, as btrfs is trying to lock the same > > > folio more than once. > > > > Sorry, what? A large folio should only be returned once. xas_next() > > moves to the next folio. How is it possible that > > filemap_get_folios_contig() returns the same folio more than once? > > But that's exactly what I got from filemap_get_folios_contig(): > > lock_delalloc_folios: r/i=5/260 locked_folio=720896(65536) start=782336 > end=819199(36864) > lock_delalloc_folios: r/i=5/260 found_folios=1 > lock_delalloc_folios: r/i=5/260 i=0 folio=720896(65536) > lock_delalloc_folios: r/i=5/260 found_folios=8 > lock_delalloc_folios: r/i=5/260 i=0 folio=786432(262144) > lock_delalloc_folios: r/i=5/260 i=1 folio=786432(262144) > lock_delalloc_folios: r/i=5/260 i=2 folio=786432(262144) > lock_delalloc_folios: r/i=5/260 i=3 folio=786432(262144) > lock_delalloc_folios: r/i=5/260 i=4 folio=786432(262144) > lock_delalloc_folios: r/i=5/260 i=5 folio=786432(262144) > lock_delalloc_folios: r/i=5/260 i=6 folio=786432(262144) > lock_delalloc_folios: r/i=5/260 i=7 folio=786432(262144) > > r/i is the root and inode number from btrfs, and you can completely ignore > it. > > @locked_folio is the folio we're already holding a lock, the value inside > the brackets is the folio size. > > @start and @end is the range we're searching for, the value inside the > brackets is the search range length. > > The first iteration returns the current locked folio, and since the range > inside that folio is only 4K, thus it's only returned once. > > The next 8 slots are all inside the same large folio at 786432, resulting > duplicated entries. > > > > > > Then I looked into the caller of filemap_get_folios_contig() inside > > > mm/gup, and it indeed does the correct skip. > > > > ... that code looks wrong to me. > > It looks like it's xas_find() is doing the correct skip by calling > xas_next_offset() -> xas_move_index() to skip the next one. > > But the filemap_get_folios_contig() only calls xas_next() by increasing the > index, not really skip to the next folio. > > Although I can be totally wrong as I'm not familiar with the xarray > internals at all. Thanks for bringing this to my attention, it looks like this is due to a mistake during my folio conversion for this function. The xas_next() call tries to go to the next index, but if that index is part of a multi-index entry things get awkward if we don't manually account for the index shift of a large folio (which I missed). Can you please try out this attached patch and see if it gets rid of the duplicate problem? > However I totally agree the duplicated behavior (and the extra handling of > duplicated entries) looks very wrong. > > Thanks, > Qu