On Wed, 19 Mar 2025 09:47:05 -0600 Keith Busch <kbusch@xxxxxxxxxx> wrote: > On Mon, Mar 17, 2025 at 04:53:47PM -0600, Alex Williamson wrote: > > On Mon, 17 Mar 2025 16:30:47 -0600 > > Keith Busch <kbusch@xxxxxxxxxx> wrote: > > > > > On Mon, Mar 17, 2025 at 03:44:17PM -0600, Alex Williamson wrote: > > > > On Wed, 12 Mar 2025 15:52:55 -0700 > > > > > @@ -679,6 +679,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, > > > > > > > > > > if (unlikely(disable_hugepages)) > > > > > break; > > > > > + cond_resched(); > > > > > } > > > > > > > > > > out: > > > > > > > > Hey Keith, is this still necessary with: > > > > > > > > https://lore.kernel.org/all/20250218222209.1382449-1-alex.williamson@xxxxxxxxxx/ > > > > > > Thank you for the suggestion. I'll try to fold this into a build, and > > > see what happens. But from what I can tell, I'm not sure it will help. > > > We're simply not getting large folios in this path and dealing with > > > individual pages. Though it is a large contiguous range (~60GB, not > > > necessarily aligned). Shoould we expect to only be dealing with PUD and > > > PMD levels with these kinds of mappings? > > > > IME with QEMU, PMD alignment basically happens without any effort and > > gets 90+% of the way there, PUD alignment requires a bit of work[1]. > > > > > > This is currently in linux-next from the vfio next branch and should > > > > pretty much eliminate any stalls related to DMA mapping MMIO BARs. > > > > Also the code here has been refactored in next, so this doesn't apply > > > > anyway, and if there is a resched still needed, this location would > > > > only affect DMA mapping of memory, not device BARs. Thanks, > > > > > > Thanks for the head's up. Regardless, it doesn't look like bad place to > > > cond_resched(), but may not trigger any cpu stall indicator outside this > > > vfio fault path. > > > > Note that we already have a cond_resched() in vfio_iommu_map(), which > > we'll hit any time we get a break in a contiguous mapping. We may hit > > that regularly enough that it's not an issue for RAM mapping, but I've > > certainly seen soft lockups when we have many GiB of contiguous pfnmaps > > prior to the series above. Thanks, > > So far adding the additional patches has not changed anything. We've > ensured we are using an address and length aligned to 2MB, but it sure > looks like vfio's fault handler is only getting order-0 faults. I'm not > finding anything immediately obvious about what we can change to get the > desired higher order behvaior, though. Any other hints or information I > could provide? Since you mention folding in the changes, are you working on an upstream kernel or a downstream backport? Huge pfnmap support was added in v6.12 via [1]. Without that you'd never see better than a order-a fault. I hope that's it because with all the kernel pieces in place it should "Just work". Thanks, Alex [1] https://lore.kernel.org/all/20240826204353.2228736-1-peterx@xxxxxxxxxx/