On Thu, Aug 28, 2025 at 8:48 PM Ira Weiny <ira.weiny@xxxxxxxxx> wrote: > > Michał Cłapiński wrote: > > On Tue, Jul 1, 2025 at 2:05 PM Michał Cłapiński <mclapinski@xxxxxxxxxx> wrote: > > > > > > On Wed, Jun 25, 2025 at 11:16 PM Ira Weiny <ira.weiny@xxxxxxxxx> wrote: > > > > > > > > Michal Clapinski wrote: > > > > > This includes: > > > > > 1. Splitting one e820 entry into many regions. > > > > > 2. Conversion to devdax during boot. > > > > > > > > > > This change is needed for the hypervisor live update. VMs' memory will > > > > > be backed by those emulated pmem devices. To support various VM shapes > > > > > I want to create devdax devices at 1GB granularity similar to hugetlb. > > > > > Also detecting those devices as devdax during boot speeds up the whole > > > > > process. Conversion in userspace would be much slower which is > > > > > unacceptable while trying to minimize > > > > > > > > Did you explore the NFIT injection strategy which Dan suggested?[1] > > > > > > > > [1] https://lore.kernel.org/all/6807f0bfbe589_71fe2944d@xxxxxxxxxxxxxxxxxxxxxxxxx.notmuch/ > > > > > > > > If so why did it not work? > > > > > > I'm new to all this so I might be off on some/all of the things. > > > > > > My issues with NFIT: > > > 1. I can either go with custom bios or acpi nfit injection. Custom > > > bios sounds rather aggressive to me and I'd prefer to avoid this. The > > > NFIT injection is done via initramfs, right? If a system doesn't use > > > initramfs at the moment, that would introduce another step in the boot > > > process. One of the requirements of the hypervisor live update project > > > is that the boot process has to be blazing fast and I'm worried > > > introducing initramfs would go against this requirement. > > > 2. If I were to create an NFIT, it would have to contain thousands of > > > entries. That would have to be parsed on every boot. Again, I'm > > > worried about the performance. > > > > > > Do you think an NFIT solution could be as fast as the simple command > > > line solution? > > > > Hello, > > just a follow up email. I'd like to receive some feedback on this. > > Apologies. I'm not keen on adding kernel parameters so I'm curious what > you think about Mike's new driver?[1] Hi Ira, Mike's proposal and our use case are different. What we're proposing is a way to automatically convert emulated PMEM into DAX/FSDAX during boot and subdivide it into page-aligned chunks (e.g., 1G/2M). We have a userspace agent that then manages these devdax devices, similar to how HugeTLB pages are handled, allowing the chunks to be used in a cloud environment to support guest memory for live updates. To be clear, we are not trying to make the carved-out PMEM region scalable. The hypervisor's memory allocation stays the same, and these PMEM/DAX devices are used exclusively for running VMs. This approach isn't intended for the general-purpose, scalable persistent memory use case that Mike's driver addresses. Pasha