Re: Large folios and filemap_get_folios_contig()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 04, 2025 at 07:46:59AM +1030, Qu Wenruo wrote:
> 
> 
> 在 2025/4/3 23:05, Matthew Wilcox 写道:
> > On Thu, Apr 03, 2025 at 08:06:53PM +1030, Qu Wenruo wrote:
> > > Recently I hit a bug when developing the large folios support for btrfs.
> > > 
> > > That we call filemap_get_folios_contig(), then lock each returned folio.
> > > (We also have a case where we unlock each returned folio)
> > > 
> > > However since a large folio can be returned several times in the batch,
> > > this obviously makes a deadlock, as btrfs is trying to lock the same
> > > folio more than once.
> > 
> > Sorry, what?  A large folio should only be returned once.  xas_next()
> > moves to the next folio.  How is it possible that
> > filemap_get_folios_contig() returns the same folio more than once?
> 
> But that's exactly what I got from filemap_get_folios_contig():
> 
> lock_delalloc_folios: r/i=5/260 locked_folio=720896(65536) start=782336
> end=819199(36864)
> lock_delalloc_folios: r/i=5/260 found_folios=1
> lock_delalloc_folios: r/i=5/260 i=0 folio=720896(65536)
> lock_delalloc_folios: r/i=5/260 found_folios=8
> lock_delalloc_folios: r/i=5/260 i=0 folio=786432(262144)
> lock_delalloc_folios: r/i=5/260 i=1 folio=786432(262144)
> lock_delalloc_folios: r/i=5/260 i=2 folio=786432(262144)
> lock_delalloc_folios: r/i=5/260 i=3 folio=786432(262144)
> lock_delalloc_folios: r/i=5/260 i=4 folio=786432(262144)
> lock_delalloc_folios: r/i=5/260 i=5 folio=786432(262144)
> lock_delalloc_folios: r/i=5/260 i=6 folio=786432(262144)
> lock_delalloc_folios: r/i=5/260 i=7 folio=786432(262144)
> 
> r/i is the root and inode number from btrfs, and you can completely ignore
> it.
> 
> @locked_folio is the folio we're already holding a lock, the value inside
> the brackets is the folio size.
> 
> @start and @end is the range we're searching for, the value inside the
> brackets is the search range length.
> 
> The first iteration returns the current locked folio, and since the range
> inside that folio is only 4K, thus it's only returned once.
> 
> The next 8 slots are all inside the same large folio at 786432, resulting
> duplicated entries.
> 
> > 
> > > Then I looked into the caller of filemap_get_folios_contig() inside
> > > mm/gup, and it indeed does the correct skip.
> > 
> > ... that code looks wrong to me.
> 
> It looks like it's xas_find() is doing the correct skip by calling
> xas_next_offset() -> xas_move_index() to skip the next one.
> 
> But the filemap_get_folios_contig() only calls xas_next() by increasing the
> index, not really skip to the next folio.
> 
> Although I can be totally wrong as I'm not familiar with the xarray
> internals at all.

Thanks for bringing this to my attention, it looks like this is due to a
mistake during my folio conversion for this function. The xas_next()
call tries to go to the next index, but if that index is part of a
multi-index entry things get awkward if we don't manually account for the
index shift of a large folio (which I missed).

Can you please try out this attached patch and see if it gets rid of
the duplicate problem?

> However I totally agree the duplicated behavior (and the extra handling of
> duplicated entries) looks very wrong.
> 
> Thanks,
> Qu





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux