> From: Vasant Hegde <vasant.hegde@xxxxxxx> > Sent: Tuesday, May 20, 2025 4:39 PM > > Hi Nicolin, > > > On 5/19/2025 11:44 PM, Nicolin Chen wrote: > > On Mon, May 19, 2025 at 10:59:49PM +0530, Vasant Hegde wrote: > >> Jason, Nicolin, Kevin, > >> > >> > >> On 5/15/2025 9:36 PM, Jason Gunthorpe wrote: > >>> On Thu, May 08, 2025 at 08:02:32PM -0700, Nicolin Chen wrote: > >>>> +/** > >>>> + * struct iommu_hw_queue_alloc - ioctl(IOMMU_HW_QUEUE_ALLOC) > >>>> + * @size: sizeof(struct iommu_hw_queue_alloc) > >>>> + * @flags: Must be 0 > >>>> + * @viommu_id: Virtual IOMMU ID to associate the HW queue with > >>>> + * @type: One of enum iommu_hw_queue_type > >>>> + * @index: The logical index to the HW queue per virtual IOMMU for a > multi-queue > >>>> + * model > >>>> + * @out_hw_queue_id: The ID of the new HW queue > >>>> + * @base_addr: Base address of the queue memory in guest physical > address space > >>>> + * @length: Length of the queue memory in the guest physical address > space > >>>> + * > >>>> + * Allocate a HW queue object for a vIOMMU-specific HW-accelerated > queue, which > >>>> + * allows HW to access a guest queue memory described by > @base_addr and @length. > >>>> + * Upon success, the underlying physical pages of the guest queue > memory will be > >>>> + * pinned to prevent VMM from unmapping them in the IOAS until the > HW queue gets > >>>> + * destroyed. > >>> > >>> Do we have way to make the pinning optional? > >>> > >>> As I understand AMD's system the iommu HW itself translates the > >>> base_addr through the S2 page table automatically, so it doesn't need > >>> pinned memory and physical addresses but just the IOVA. > >> > >> Correct. HW will translate GPA -> SPA automatically using below > information. > >> > >> AMD IOMMU need special device ID to setup with GPA -> SPA mapping > per VM. > >> and its programmed in VF Control BAR (VFCntlMMIO Offset > {16’b[GuestID], > >> 6’b01_0000} Guest Miscellaneous Control Register). IOMMU HW will use > this > >> address for GPA to SPA translation for buffers like command buffer. > >> > >> So HW will use Base address (GPA), head/tail pointer to get the offset > from > >> Base. Then it will use GPA -> SPA translation. > >> > >> > >>> > >>> Perhaps for this reason the pinning should be done with a function > >>> call from the driver? > >> > >> We still need to make sure memory allocated for page is present in > memory so > >> that IOMMU HW can access it. > >> > >> Pinning at the time of guest boot is enough here -OR- do we need to > increase > >> reference in queue_alloc() path ? > > > > For NVIDIA's vCMDQ that reads host PA directly, pages should be > > pinned once when stage 2 mappings are created for the guest RAM, > > and iommu_hw_queue_alloc() should pin the pages again to prevent > > the gPA from being unmapped in the stage 2 page table. Otherwise > > it will be a security hole, as HW continues to read the unmapped > > memory through physical address space. > > > > I understand that AMD Command Buffer also needs the S2 mappings > > to be present in order to work correctly. But what happens if a > > queue memory that isn't pinned (or even gets unmapped)? Will it > > raise a translation fault v.s. HW reading the unmapped memory? > > If page is unmapped then stage 2 (Host page table) gets updated. IOMMU > will not > be able to find page and logs fault. > As long as the fault is contained only for the relevant queue, yes we don't need another pinning from the driver.