Re: [PATCH v2 10/20] KVM: s390: KVM page table management functions: walks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 11 Sep 2025 14:56:59 +0200
Janosch Frank <frankja@xxxxxxxxxxxxx> wrote:

> On 9/10/25 8:07 PM, Claudio Imbrenda wrote:
> > Add page table management functions to be used for KVM guest (gmap)
> > page tables.
> > 
> > This patch adds functions to walk to specific table entries, or to
> > perform actions on a range of entries.
> > 
> > Signed-off-by: Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx>
> > ---
> >   arch/s390/kvm/dat.c | 351 ++++++++++++++++++++++++++++++++++++++++++++
> >   arch/s390/kvm/dat.h |  38 +++++
> >   2 files changed, 389 insertions(+)
> > 
> > diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c
> > index f26e3579bd77..fe93e1c07158 100644
> > --- a/arch/s390/kvm/dat.c
> > +++ b/arch/s390/kvm/dat.c
> > @@ -209,3 +209,354 @@ union pgste __dat_ptep_xchg(union pte *ptep, union pgste pgste, union pte new, g
> >   	WRITE_ONCE(*ptep, new);
> >   	return pgste;
> >   }
> > +
> > +/*
> > + * dat_split_pmd is assumed to be called with mmap_lock held in read or write mode
> > + */
> > +static int dat_split_pmd(union pmd *pmdp, gfn_t gfn, union asce asce)
> > +{
> > +	struct page_table *pt;
> > +	union pmd new, old;
> > +	union pte init;
> > +	int i;
> > +
> > +	old = READ_ONCE(*pmdp);
> > +
> > +	/* Already split, nothing to do */
> > +	if (!old.h.i && !old.h.fc)
> > +		return 0;
> > +
> > +	pt = dat_alloc_pt_noinit();
> > +	if (!pt)
> > +		return -ENOMEM;
> > +	new.val = virt_to_phys(pt);
> > +
> > +	while (old.h.i || old.h.fc) {
> > +		init.val = pmd_origin_large(old);
> > +		init.h.p = old.h.p;
> > +		init.h.i = old.h.i;
> > +		init.s.d = old.s.fc1.d;
> > +		init.s.w = old.s.fc1.w;
> > +		init.s.y = old.s.fc1.y;
> > +		init.s.sd = old.s.fc1.sd;
> > +		init.s.pr = old.s.fc1.pr;  
> 
> This looks horrible but I haven't found a better solution.

I know what you mean :)

> 
> > +		if (old.h.fc) {
> > +			for (i = 0; i < _PAGE_ENTRIES; i++)
> > +				pt->ptes[i].val = init.val | i * PAGE_SIZE;
> > +			/* no need to take locks as the page table is not installed yet */
> > +			dat_init_pgstes(pt, old.s.fc1.prefix_notif ? PGSTE_IN_BIT : 0);  
> 
> So, if the pmd had the IN bit, all PGSTEs will have it as well, right?

yes

> Why can't we notify and not copy this bit, so the notifier sets it for 
> the ptes which actually need it? Or does it happen later?

because the notifier is a gmap/kvm thing, and this function (the whole
dat.c actually) only deals with page tables.

Keeping things this way makes the code more structured and cleaner and
less convoluted.

The unnecessary prefix_notif bits will be cleared once they get
invalidated. This will happen very rarely, since most of the time a pmd
or pud prefix will not need to be split. The only case I can think of
right now is a weird interaction between huge pages and vsie.

In short: it only needs to be correct, does not need to be fast :)

> 
> > +		} else {
> > +			dat_init_page_table(pt, init.val, 0);
> > +		}
> > +
> > +		if (dat_pmdp_xchg_atomic(pmdp, old, new, gfn, asce))
> > +			return 0;
> > +		old = READ_ONCE(*pmdp);
> > +	}
> > +
> > +	dat_free_pt(pt);
> > +	return 0;
> > +}  





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux