On Sat, Mar 22, 2025 at 01:37:39AM +0800, Yi Liu wrote: > On 2025/3/21 12:27, Nicolin Chen wrote: > > On Thu, Mar 20, 2025 at 04:48:33PM -0700, Nicolin Chen wrote: > > Reading this further, I found that Yi did report VFIO device cap > > for PASID via a VFIO ioctl in the early versions but switched to > > using the IOMMU_GET_HW_INFO since v3 (nearly a year ago). So, I > > see that's a made decision. > > > > Given that our IOMMU_GET_HW_INFO defines this: > > * Query an iommu type specific hardware information data from an iommu behind > > * a given device that has been bound to iommufd. This hardware info data will > > * be used to sync capabilities between the virtual iommu and the physical > > * iommu, e.g. a nested translation setup needs to check the hardware info, so > > * a guest stage-1 page table can be compatible with the physical iommu. > > > > max_pasid_log2 is something that fits well. But PCI device cap > > still feels odd in that regard, as it repurposes the ioctl. > > PASID cap is a bit special. It should not be reported to user unless > both iommu and device enabled it. So adding it in this hw_info ioctl > is fine. It can avoid duplicate ioctls across userspace driver frameworks > as well. Yea, I get the convenience. > > So, perhaps we should update the uAPI documentation and ask user > > space to run IOMMU_GET_HW_INFO for every iommufd_device, because > > the output out_capabilities may be different per iommufd_device, > > even if both devices are correctly assigned to the same vIOMMU. > > since this is a per-device ioctl. userspace should expect difference > and. Actually, the userspace e.g. vfio may just invoke this ioctl > to know if the PASID cap instead of asking vIOMMU if we define it > in the driver-specific part. This is much convenient. A PASID cap of an IOMMU's is reported by max_pasid_log2 alone, isn't it? Only the PCI layer that holds the VFIO device cares about these two PCI device PASID caps that will be reported in its emulated PCI_PASID_CAP register. Yes, this is a per-device ioctl. But we defined it to use the device only as a bridge to get access to its IOMMU and return IOMMU's caps/infos. Now, we are reporting HW info about this bridge itself. I think it repurposes the ioctl. And honestly, "userspace should expect difference" isn't very fair. A vIOMMU could have been initialized by the first given iommufd_device, as it could have expected the IOMMU info from either the first device or the second device to be consistent. Yet now how a vIOMMU to get finalized given "userspace should expect difference"? Certainly, I don't see an issue with these two PCI caps, since a vIOMMU would unlikely integrate them in its registers, so long as we note it down clearly that these two "IOMMU_HW" caps come from the bridging idev v.s. IOMMU HW. Thanks Nicolin