Re: Large folios and filemap_get_folios_contig()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





在 2025/4/3 23:05, Matthew Wilcox 写道:
On Thu, Apr 03, 2025 at 08:06:53PM +1030, Qu Wenruo wrote:
Recently I hit a bug when developing the large folios support for btrfs.

That we call filemap_get_folios_contig(), then lock each returned folio.
(We also have a case where we unlock each returned folio)

However since a large folio can be returned several times in the batch,
this obviously makes a deadlock, as btrfs is trying to lock the same
folio more than once.

Sorry, what?  A large folio should only be returned once.  xas_next()
moves to the next folio.  How is it possible that
filemap_get_folios_contig() returns the same folio more than once?

But that's exactly what I got from filemap_get_folios_contig():

lock_delalloc_folios: r/i=5/260 locked_folio=720896(65536) start=782336 end=819199(36864)
lock_delalloc_folios: r/i=5/260 found_folios=1
lock_delalloc_folios: r/i=5/260 i=0 folio=720896(65536)
lock_delalloc_folios: r/i=5/260 found_folios=8
lock_delalloc_folios: r/i=5/260 i=0 folio=786432(262144)
lock_delalloc_folios: r/i=5/260 i=1 folio=786432(262144)
lock_delalloc_folios: r/i=5/260 i=2 folio=786432(262144)
lock_delalloc_folios: r/i=5/260 i=3 folio=786432(262144)
lock_delalloc_folios: r/i=5/260 i=4 folio=786432(262144)
lock_delalloc_folios: r/i=5/260 i=5 folio=786432(262144)
lock_delalloc_folios: r/i=5/260 i=6 folio=786432(262144)
lock_delalloc_folios: r/i=5/260 i=7 folio=786432(262144)

r/i is the root and inode number from btrfs, and you can completely ignore it.

@locked_folio is the folio we're already holding a lock, the value inside the brackets is the folio size.

@start and @end is the range we're searching for, the value inside the brackets is the search range length.

The first iteration returns the current locked folio, and since the range inside that folio is only 4K, thus it's only returned once.

The next 8 slots are all inside the same large folio at 786432, resulting duplicated entries.


Then I looked into the caller of filemap_get_folios_contig() inside
mm/gup, and it indeed does the correct skip.

... that code looks wrong to me.

It looks like it's xas_find() is doing the correct skip by calling xas_next_offset() -> xas_move_index() to skip the next one.

But the filemap_get_folios_contig() only calls xas_next() by increasing the index, not really skip to the next folio.

Although I can be totally wrong as I'm not familiar with the xarray internals at all.

However I totally agree the duplicated behavior (and the extra handling of duplicated entries) looks very wrong.

Thanks,
Qu




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux