On 4/7/25 15:49, Chenyi Qiang wrote:
Commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated discard") highlighted that subsystems like VFIO may disable RAM block discard. However, guest_memfd relies on discard operations for page conversion between private and shared memory, potentially leading to stale IOMMU mapping issue when assigning hardware devices to confidential VMs via shared memory. To address this, it is crucial to ensure systems like VFIO refresh its IOMMU mappings. PrivateSharedManager is introduced to manage private and shared states in confidential VMs, similar to RamDiscardManager, which supports coordinated RAM discard in VFIO. Integrating PrivateSharedManager with guest_memfd can facilitate the adjustment of VFIO mappings in response to page conversion events. Since guest_memfd is not an object, it cannot directly implement the PrivateSharedManager interface. Implementing it in HostMemoryBackend is not appropriate because guest_memfd is per RAMBlock, and some RAMBlocks have a memory backend while others do not. Notably, virtual BIOS RAMBlocks using memory_region_init_ram_guest_memfd() do not have a backend. To manage RAMBlocks with guest_memfd, define a new object named RamBlockAttribute to implement the RamDiscardManager interface. This object stores guest_memfd information such as shared_bitmap, and handles page conversion notification. The memory state is tracked at the host page size granularity, as the minimum memory conversion size can be one page per request. Additionally, VFIO expects the DMA mapping for a specific iova to be mapped and unmapped with the same granularity. Confidential VMs may perform partial conversions, such as conversions on small regions within larger regions. To prevent invalid cases and until cut_mapping operation support is available, all operations are performed with 4K granularity.
Just for your information, IOMMUFD plans to introduce the support for cut operation. The kickoff patch series is under discussion here: https://lore.kernel.org/linux-iommu/0-v2-5c26bde5c22d+58b-iommu_pt_jgg@xxxxxxxxxx/ This new cut support is expected to be exclusive to IOMMUFD and not directly available in the VFIO container context. The VFIO uAPI for map/ unmap is being superseded by IOMMUFD, and all new features will only be available in IOMMUFD.
Signed-off-by: Chenyi Qiang<chenyi.qiang@xxxxxxxxx>
<...>
+ +int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion *mr) +{ + uint64_t shared_bitmap_size; + const int block_size = qemu_real_host_page_size(); + int ret; + + shared_bitmap_size = ROUND_UP(mr->size, block_size) / block_size; + + attr->mr = mr; + ret = memory_region_set_generic_state_manager(mr, GENERIC_STATE_MANAGER(attr)); + if (ret) { + return ret; + } + attr->shared_bitmap_size = shared_bitmap_size; + attr->shared_bitmap = bitmap_new(shared_bitmap_size);
Above introduces a bitmap to track the private/shared state of each 4KB page. While functional, for large RAM blocks managed by guest_memfd, this could lead to significant memory consumption. Have you considered an alternative like a Maple Tree or a generic interval tree? Both are often more memory-efficient for tracking ranges of contiguous states.
+ + return ret; +} + +void ram_block_attribute_unrealize(RamBlockAttribute *attr) +{ + g_free(attr->shared_bitmap); + memory_region_set_generic_state_manager(attr->mr, NULL); +}
Thanks, baolu