On Thu, 11 Sep 2025 14:56:59 +0200 Janosch Frank <frankja@xxxxxxxxxxxxx> wrote: > On 9/10/25 8:07 PM, Claudio Imbrenda wrote: > > Add page table management functions to be used for KVM guest (gmap) > > page tables. > > > > This patch adds functions to walk to specific table entries, or to > > perform actions on a range of entries. > > > > Signed-off-by: Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx> > > --- > > arch/s390/kvm/dat.c | 351 ++++++++++++++++++++++++++++++++++++++++++++ > > arch/s390/kvm/dat.h | 38 +++++ > > 2 files changed, 389 insertions(+) > > > > diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c > > index f26e3579bd77..fe93e1c07158 100644 > > --- a/arch/s390/kvm/dat.c > > +++ b/arch/s390/kvm/dat.c > > @@ -209,3 +209,354 @@ union pgste __dat_ptep_xchg(union pte *ptep, union pgste pgste, union pte new, g > > WRITE_ONCE(*ptep, new); > > return pgste; > > } > > + > > +/* > > + * dat_split_pmd is assumed to be called with mmap_lock held in read or write mode > > + */ > > +static int dat_split_pmd(union pmd *pmdp, gfn_t gfn, union asce asce) > > +{ > > + struct page_table *pt; > > + union pmd new, old; > > + union pte init; > > + int i; > > + > > + old = READ_ONCE(*pmdp); > > + > > + /* Already split, nothing to do */ > > + if (!old.h.i && !old.h.fc) > > + return 0; > > + > > + pt = dat_alloc_pt_noinit(); > > + if (!pt) > > + return -ENOMEM; > > + new.val = virt_to_phys(pt); > > + > > + while (old.h.i || old.h.fc) { > > + init.val = pmd_origin_large(old); > > + init.h.p = old.h.p; > > + init.h.i = old.h.i; > > + init.s.d = old.s.fc1.d; > > + init.s.w = old.s.fc1.w; > > + init.s.y = old.s.fc1.y; > > + init.s.sd = old.s.fc1.sd; > > + init.s.pr = old.s.fc1.pr; > > This looks horrible but I haven't found a better solution. I know what you mean :) > > > + if (old.h.fc) { > > + for (i = 0; i < _PAGE_ENTRIES; i++) > > + pt->ptes[i].val = init.val | i * PAGE_SIZE; > > + /* no need to take locks as the page table is not installed yet */ > > + dat_init_pgstes(pt, old.s.fc1.prefix_notif ? PGSTE_IN_BIT : 0); > > So, if the pmd had the IN bit, all PGSTEs will have it as well, right? yes > Why can't we notify and not copy this bit, so the notifier sets it for > the ptes which actually need it? Or does it happen later? because the notifier is a gmap/kvm thing, and this function (the whole dat.c actually) only deals with page tables. Keeping things this way makes the code more structured and cleaner and less convoluted. The unnecessary prefix_notif bits will be cleared once they get invalidated. This will happen very rarely, since most of the time a pmd or pud prefix will not need to be split. The only case I can think of right now is a weird interaction between huge pages and vsie. In short: it only needs to be correct, does not need to be fast :) > > > + } else { > > + dat_init_page_table(pt, init.val, 0); > > + } > > + > > + if (dat_pmdp_xchg_atomic(pmdp, old, new, gfn, asce)) > > + return 0; > > + old = READ_ONCE(*pmdp); > > + } > > + > > + dat_free_pt(pt); > > + return 0; > > +}