On 4/24/2025 5:01 AM, Jason Gunthorpe wrote:
On Wed, Apr 23, 2025 at 10:35:06PM -0700, jane.chu@xxxxxxxxxx wrote:
On 4/23/2025 4:28 PM, Jason Gunthorpe wrote:
The flow of a single test run:
1. reserve virtual address space for (61440 * 2MB) via mmap with PROT_NONE
and MAP_ANONYMOUS | MAP_NORESERVE| MAP_PRIVATE
2. mmap ((61440 * 2MB) / 12) from each of the 12 device-dax to the
reserved virtual address space sequentially to form a continual VA
space
Like is there any chance that each of these 61440 VMA's is a single
2MB folio from device-dax, or could it be?
IIRC device-dax does could not use folios until 6.15 so I'm assuming
it is not folios even if it is a pmd mapping?
I just ran the mr registration stress test in 6.15-rc3, much better!
What's changed? is it folio for device-dax? none of the code in
ib_umem_get() has changed though, it still loops through 'npages' doing
I don't know, it is kind of strange that it changed. If device-dax is
now using folios then it does change the access pattern to the struct
page array somewhat, especially it moves all the writes to the head
page of the 2MB section which maybe impacts the the caching?
6.15-rc3 is orders of magnitude better.
Agreed that device-dax's using folio are likely the heros. I've yet to
check the code and bisect, maybe pin_user_page_fast() adds folios to
page_list[] instead of 4K pages? if so, with 511/512 size reduction in
page_list[], that could drastically improve the dowstream call
performance in spite of the thrashing, that is, if thrashing is still there.
I'll report my findings.
Thanks,
-jane
Jason