On Mon, Mar 17, 2025 at 04:53:47PM -0600, Alex Williamson wrote: > On Mon, 17 Mar 2025 16:30:47 -0600 > Keith Busch <kbusch@xxxxxxxxxx> wrote: > > > On Mon, Mar 17, 2025 at 03:44:17PM -0600, Alex Williamson wrote: > > > On Wed, 12 Mar 2025 15:52:55 -0700 > > > > @@ -679,6 +679,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, > > > > > > > > if (unlikely(disable_hugepages)) > > > > break; > > > > + cond_resched(); > > > > } > > > > > > > > out: > > > > > > Hey Keith, is this still necessary with: > > > > > > https://lore.kernel.org/all/20250218222209.1382449-1-alex.williamson@xxxxxxxxxx/ > > > > Thank you for the suggestion. I'll try to fold this into a build, and > > see what happens. But from what I can tell, I'm not sure it will help. > > We're simply not getting large folios in this path and dealing with > > individual pages. Though it is a large contiguous range (~60GB, not > > necessarily aligned). Shoould we expect to only be dealing with PUD and > > PMD levels with these kinds of mappings? > > IME with QEMU, PMD alignment basically happens without any effort and > gets 90+% of the way there, PUD alignment requires a bit of work[1]. > > > > This is currently in linux-next from the vfio next branch and should > > > pretty much eliminate any stalls related to DMA mapping MMIO BARs. > > > Also the code here has been refactored in next, so this doesn't apply > > > anyway, and if there is a resched still needed, this location would > > > only affect DMA mapping of memory, not device BARs. Thanks, > > > > Thanks for the head's up. Regardless, it doesn't look like bad place to > > cond_resched(), but may not trigger any cpu stall indicator outside this > > vfio fault path. > > Note that we already have a cond_resched() in vfio_iommu_map(), which > we'll hit any time we get a break in a contiguous mapping. We may hit > that regularly enough that it's not an issue for RAM mapping, but I've > certainly seen soft lockups when we have many GiB of contiguous pfnmaps > prior to the series above. Thanks, So far adding the additional patches has not changed anything. We've ensured we are using an address and length aligned to 2MB, but it sure looks like vfio's fault handler is only getting order-0 faults. I'm not finding anything immediately obvious about what we can change to get the desired higher order behvaior, though. Any other hints or information I could provide?