Ackerley Tng <ackerleytng@xxxxxxxxxx> writes: > <snip> > > Here are some remaining issues/TODOs: > > 1. Memory error handling such as machine check errors have not been > implemented. > 2. I've not looked into preparedness of pages, only zeroing has been > considered. > 3. When allocating HugeTLB pages, if two threads allocate indices > mapping to the same huge page, the utilization in guest_memfd inode's > subpool may momentarily go over the subpool limit (the requested size > of the inode at guest_memfd creation time), causing one of the two > threads to get -ENOMEM. Suggestions to solve this are appreciated! > 4. max_usage_in_bytes statistic (cgroups v1) for guest_memfd HugeTLB > pages should be correct but needs testing and could be wrong. > 5. memcg charging (charge_memcg()) for cgroups v2 for guest_memfd > HugeTLB pages after splitting should be correct but needs testing and > could be wrong. > 6. Page cache accounting: When a hugetlb page is split, guest_memfd will > incur page count in both NR_HUGETLB (counted at hugetlb allocation > time) and NR_FILE_PAGES stats (counted when split pages are added to > the filemap). Is this aligned with what people expect? > For people who might be testing this series with non-Coco VMs (heads up, Patrick and Nikita!), this currently splits the folio as long as some shareability in the huge folio is shared, which is probably unnecessary? IIUC core-mm doesn't support mapping at 1G but from a cursory reading it seems like the faulting function calling kvm_gmem_fault_shared() could possibly be able to map a 1G page at 4K. Looks like we might need another flag like GUEST_MEMFD_FLAG_SUPPORT_CONVERSION, which will gate initialization of the shareability maple tree/xarray. If shareability is NULL for the entire hugepage range, then no splitting will occur. For Coco VMs, this should be safe, since if this flag is not set, kvm_gmem_fault_shared() will always not be able to fault (the shareability value will be NULL. > Here are some optimizations that could be explored in future series: > > 1. Pages could be split from 1G to 2M first and only split to 4K if > necessary. > 2. Zeroing could be skipped for Coco VMs if hardware already zeroes the > pages. > > <snip>