On 5/26/2025 5:01 PM, David Hildenbrand wrote: > On 20.05.25 12:28, Chenyi Qiang wrote: >> Commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated >> discard") highlighted that subsystems like VFIO may disable RAM block >> discard. However, guest_memfd relies on discard operations for page >> conversion between private and shared memory, potentially leading to >> stale IOMMU mapping issue when assigning hardware devices to >> confidential VMs via shared memory. To address this and allow shared >> device assignement, it is crucial to ensure VFIO system refresh its >> IOMMU mappings. >> >> RamDiscardManager is an existing interface (used by virtio-mem) to >> adjust VFIO mappings in relation to VM page assignment. Effectively page >> conversion is similar to hot-removing a page in one mode and adding it >> back in the other. Therefore, similar actions are required for page >> conversion events. Introduce the RamDiscardManager to guest_memfd to >> facilitate this process. >> >> Since guest_memfd is not an object, it cannot directly implement the >> RamDiscardManager interface. Implementing it in HostMemoryBackend is >> not appropriate because guest_memfd is per RAMBlock, and some RAMBlocks >> have a memory backend while others do not. Notably, virtual BIOS >> RAMBlocks using memory_region_init_ram_guest_memfd() do not have a >> backend. >> >> To manage RAMBlocks with guest_memfd, define a new object named >> RamBlockAttribute to implement the RamDiscardManager interface. This >> object can store the guest_memfd information such as bitmap for shared >> memory, and handles page conversion notification. In the context of >> RamDiscardManager, shared state is analogous to populated and private >> state is treated as discard. The memory state is tracked at the host >> page size granularity, as minimum memory conversion size can be one page >> per request. Additionally, VFIO expects the DMA mapping for a specific >> iova to be mapped and unmapped with the same granularity. Confidential >> VMs may perform partial conversions, such as conversions on small >> regions within larger regions. To prevent such invalid cases and until >> cut_mapping operation support is available, all operations are performed >> with 4K granularity. >> >> Signed-off-by: Chenyi Qiang <chenyi.qiang@xxxxxxxxx> >> --- >> Changes in v5: >> - Revert to use RamDiscardManager interface instead of introducing >> new hierarchy of class to manage private/shared state, and keep >> using the new name of RamBlockAttribute compared with the >> MemoryAttributeManager in v3. >> - Use *simple* version of object_define and object_declare since the >> state_change() function is changed as an exported function instead >> of a virtual function in later patch. >> - Move the introduction of RamBlockAttribute field to this patch and >> rename it to ram_shared. (Alexey) >> - call the exit() when register/unregister failed. (Zhao) >> - Add the ram-block-attribute.c to Memory API related part in >> MAINTAINERS. >> >> Changes in v4: >> - Change the name from memory-attribute-manager to >> ram-block-attribute. >> - Implement the newly-introduced PrivateSharedManager instead of >> RamDiscardManager and change related commit message. >> - Define the new object in ramblock.h instead of adding a new file. >> >> Changes in v3: >> - Some rename (bitmap_size->shared_bitmap_size, >> first_one/zero_bit->first_bit, etc.) >> - Change shared_bitmap_size from uint32_t to unsigned >> - Return mgr->mr->ram_block->page_size in get_block_size() >> - Move set_ram_discard_manager() up to avoid a g_free() in failure >> case. >> - Add const for the memory_attribute_manager_get_block_size() >> - Unify the ReplayRamPopulate and ReplayRamDiscard and related >> callback. >> >> Changes in v2: >> - Rename the object name to MemoryAttributeManager >> - Rename the bitmap to shared_bitmap to make it more clear. >> - Remove block_size field and get it from a helper. In future, we >> can get the page_size from RAMBlock if necessary. >> - Remove the unncessary "struct" before GuestMemfdReplayData >> - Remove the unncessary g_free() for the bitmap >> - Add some error report when the callback failure for >> populated/discarded section. >> - Move the realize()/unrealize() definition to this patch. >> --- >> MAINTAINERS | 1 + >> include/system/ramblock.h | 20 +++ >> system/meson.build | 1 + >> system/ram-block-attribute.c | 311 +++++++++++++++++++++++++++++++++++ >> 4 files changed, 333 insertions(+) >> create mode 100644 system/ram-block-attribute.c >> >> diff --git a/MAINTAINERS b/MAINTAINERS >> index 6dacd6d004..3b4947dc74 100644 >> --- a/MAINTAINERS >> +++ b/MAINTAINERS >> @@ -3149,6 +3149,7 @@ F: system/memory.c >> F: system/memory_mapping.c >> F: system/physmem.c >> F: system/memory-internal.h >> +F: system/ram-block-attribute.c >> F: scripts/coccinelle/memory-region-housekeeping.cocci >> Memory devices >> diff --git a/include/system/ramblock.h b/include/system/ramblock.h >> index d8a116ba99..09255e8495 100644 >> --- a/include/system/ramblock.h >> +++ b/include/system/ramblock.h >> @@ -22,6 +22,10 @@ >> #include "exec/cpu-common.h" >> #include "qemu/rcu.h" >> #include "exec/ramlist.h" >> +#include "system/hostmem.h" >> + >> +#define TYPE_RAM_BLOCK_ATTRIBUTE "ram-block-attribute" >> +OBJECT_DECLARE_SIMPLE_TYPE(RamBlockAttribute, RAM_BLOCK_ATTRIBUTE) >> struct RAMBlock { >> struct rcu_head rcu; >> @@ -42,6 +46,8 @@ struct RAMBlock { >> int fd; >> uint64_t fd_offset; >> int guest_memfd; >> + /* 1-setting of the bitmap in ram_shared represents ram is shared */ > > That comment looks misplaced, and the variable misnamed. > > The commet should go into RamBlockAttribute and the variable should > likely be named "attributes". > > Also, "ram_shared" is not used at all in this patch, it should be moved > into the corresponding patch. I thought we only manage the private and shared attribute, so name it as ram_shared. And in the future if managing other attributes, then rename it to attributes. It seems I overcomplicated things. > >> + RamBlockAttribute *ram_shared; >> size_t page_size; >> /* dirty bitmap used during migration */ >> unsigned long *bmap; >> @@ -91,4 +97,18 @@ struct RAMBlock { >> ram_addr_t postcopy_length; >> }; >> +struct RamBlockAttribute { > > Should this actually be "RamBlockAttributes" ? Yes. To match with variable name "attributes", it can be renamed as RamBlockAttributes. > >> + Object parent; >> + >> + MemoryRegion *mr; > > > Should we link to the parent RAMBlock instead, and lookup the MR from > there? Good suggestion! It can also help to reduce the long arrow operation in ram_block_attribute_get_block_size(). > >