Re: [PATCH v6 5/7] KVM: guest_memfd: Restore folio state after final folio_put()

Michael Roth <michael.roth@xxxxxxx> · Wed, 2 Apr 2025 17:17:39 -0500

On Tue, Mar 25, 2025 at 03:57:00PM +0000, Fuad Tabba wrote:
> Hi Vishal,
> 
> 
> On Fri, 21 Mar 2025 at 20:09, Vishal Annapurve <vannapurve@xxxxxxxxxx> wrote:
> >
> > On Tue, Mar 18, 2025 at 9:20 AM Fuad Tabba <tabba@xxxxxxxxxx> wrote:
> > > ...
> > > +/*
> > > + * Callback function for __folio_put(), i.e., called once all references by the
> > > + * host to the folio have been dropped. This allows gmem to transition the state
> > > + * of the folio to shared with the guest, and allows the hypervisor to continue
> > > + * transitioning its state to private, since the host cannot attempt to access
> > > + * it anymore.
> > > + */
> > >  void kvm_gmem_handle_folio_put(struct folio *folio)
> > >  {
> > > -       WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress.");
> > > +       struct address_space *mapping;
> > > +       struct xarray *shared_offsets;
> > > +       struct inode *inode;
> > > +       pgoff_t index;
> > > +       void *xval;
> > > +
> > > +       mapping = folio->mapping;
> > > +       if (WARN_ON_ONCE(!mapping))
> > > +               return;
> > > +
> > > +       inode = mapping->host;
> > > +       index = folio->index;
> > > +       shared_offsets = &kvm_gmem_private(inode)->shared_offsets;
> > > +       xval = xa_mk_value(KVM_GMEM_GUEST_SHARED);
> > > +
> > > +       filemap_invalidate_lock(inode->i_mapping);
> >
> > As discussed in the guest_memfd upstream, folio_put can happen from
> > atomic context [1], so we need a way to either defer the work outside
> > kvm_gmem_handle_folio_put() (which is very likely needed to handle
> > hugepages and merge operation) or ensure to execute the logic using
> > synchronization primitives that will not sleep.
> 
> Thanks for pointing this out. For now, rather than deferring (which
> we'll come to when hugepages come into play), I think this would be

FWIW, with SNP, it's only possible to unsplit an RMP entry if the guest
cooperates with re-validating/re-accepting the memory at a higher order.
Currently, this guest support is not implemented in linux.

So, if we were to opportunistically unsplit hugepages, we'd zap the
mappings in KVM, let it fault in at a higher order so we could reduce
TLB misses, and then KVM would (via
kvm_x86_call(private_max_mapping_level)(kvm, pfn) find that the RMP
entry is still split to 4K, and remap everything right back to the 4K
granularity it was already at to begin with.

TDX seems to have a bit more flexibility in being able to
'unsplit'/promote private ranges back up to higher orders, so it could
potentially benefit from doing things opportunistically...

However, ideally...the guest would just avoid unecessarily carving up
ranges to begin with and pack all it's shared mappings into smaller GPA
ranges. Then, all this unsplitting of huge pages could be completely
avoided until cleanup/truncate time. So maybe even for hugepages we
should just plan to do things this way, at least as a start?

> possible to resolve by ensuring we have exclusive access* to the folio
> instead, and using that to ensure that we can access the
> shared_offsets maps.
> 
> * By exclusive access I mean either holding the folio lock, or knowing
> that no one else has references to the folio (which is the case when
> kvm_gmem_handle_folio_put() is called).
> 
> I'll try to respin something in time for folks to look at it before
> the next sync.

Thanks for posting. I was looking at how to get rid of
filemap_invalidate_lock() from conversion path, and having that separate
rwlock seems to resolve a lot of the potential races I was looking at.
I'm working on rebasing SNP 2MB support on top of your v7 series now.

-Mike

> 
> Cheers,
> /fuad
> 
> > [1] https://elixir.bootlin.com/linux/v6.14-rc6/source/include/linux/mm.h#L1483
> >
> > > +       folio_lock(folio);
> > > +       kvm_gmem_restore_pending_folio(folio, inode);
> > > +       folio_unlock(folio);
> > > +       WARN_ON_ONCE(xa_err(xa_store(shared_offsets, index, xval, GFP_KERNEL)));
> > > +       filemap_invalidate_unlock(inode->i_mapping);
> > >  }
> > >  EXPORT_SYMBOL_GPL(kvm_gmem_handle_folio_put);
> > >
> > > --
> > > 2.49.0.rc1.451.g8f38331e32-goog
> > >
>