Ackerley Tng wrote: > Refactor out hugetlb_alloc_folio() from alloc_hugetlb_folio(), which > handles allocation of a folio and cgroup charging. > > Other than flags to control charging in the allocation process, > hugetlb_alloc_folio() also has parameters for memory policy. > > This refactoring as a whole decouples the hugetlb page allocation from > hugetlbfs, (1) where the subpool is stored at the fs mount, (2) > reservations are made during mmap and stored in the vma, and (3) mpol > must be stored at vma->vm_policy (4) a vma must be used for allocation > even if the pages are not meant to be used by host process. > > This decoupling will allow hugetlb_alloc_folio() to be used by > guest_memfd in later patches. In guest_memfd, (1) a subpool is created > per-fd and is stored on the inode, (2) no vma-related reservations are > used (3) mpol may not be associated with a vma since (4) for private > pages, the pages will not be mappable to userspace and hence have to > associated vmas. > > This could hopefully also open hugetlb up as a more generic source of > hugetlb pages that are not bound to hugetlbfs, with the complexities > of userspace/mmap/vma-related reservations contained just to > hugetlbfs. > > Signed-off-by: Ackerley Tng <ackerleytng@xxxxxxxxxx> > Change-Id: I60528f246341268acbf0ed5de7752ae2cacbef93 > --- > include/linux/hugetlb.h | 12 +++ > mm/hugetlb.c | 192 ++++++++++++++++++++++------------------ > 2 files changed, 118 insertions(+), 86 deletions(-) > [snip] > > +/** > + * hugetlb_alloc_folio() - Allocates a hugetlb folio. > + * > + * @h: struct hstate to allocate from. > + * @mpol: struct mempolicy to apply for this folio allocation. > + * @ilx: Interleave index for interpretation of @mpol. > + * @charge_cgroup_rsvd: Set to true to charge cgroup reservation. > + * @use_existing_reservation: Set to true if this allocation should use an > + * existing hstate reservation. > + * > + * This function handles cgroup and global hstate reservations. VMA-related > + * reservations and subpool debiting must be handled by the caller if necessary. > + * > + * Return: folio on success or negated error otherwise. > + */ > +struct folio *hugetlb_alloc_folio(struct hstate *h, struct mempolicy *mpol, > + pgoff_t ilx, bool charge_cgroup_rsvd, > + bool use_existing_reservation) > +{ > + unsigned int nr_pages = pages_per_huge_page(h); > + struct hugetlb_cgroup *h_cg = NULL; > + struct folio *folio = NULL; > + nodemask_t *nodemask; > + gfp_t gfp_mask; > + int nid; > + int idx; > + int ret; > + > + idx = hstate_index(h); > + > + if (charge_cgroup_rsvd) { > + if (hugetlb_cgroup_charge_cgroup_rsvd(idx, nr_pages, &h_cg)) > + goto out; Why not just return here? return ERR_PTR(-ENOSPC); > + } > + > + if (hugetlb_cgroup_charge_cgroup(idx, nr_pages, &h_cg)) > + goto out_uncharge_cgroup_reservation; > + > + gfp_mask = htlb_alloc_mask(h); > + nid = policy_node_nodemask(mpol, gfp_mask, ilx, &nodemask); > + > + spin_lock_irq(&hugetlb_lock); > + > + if (use_existing_reservation || available_huge_pages(h)) > + folio = dequeue_hugetlb_folio(h, gfp_mask, mpol, nid, nodemask); > + > + if (!folio) { > + spin_unlock_irq(&hugetlb_lock); > + folio = alloc_surplus_hugetlb_folio(h, gfp_mask, mpol, nid, nodemask); > + if (!folio) > + goto out_uncharge_cgroup; > + spin_lock_irq(&hugetlb_lock); > + list_add(&folio->lru, &h->hugepage_activelist); > + folio_ref_unfreeze(folio, 1); > + /* Fall through */ > + } > + > + if (use_existing_reservation) { > + folio_set_hugetlb_restore_reserve(folio); > + h->resv_huge_pages--; > + } > + > + hugetlb_cgroup_commit_charge(idx, nr_pages, h_cg, folio); > + > + if (charge_cgroup_rsvd) > + hugetlb_cgroup_commit_charge_rsvd(idx, nr_pages, h_cg, folio); > + > + spin_unlock_irq(&hugetlb_lock); > + > + gfp_mask = htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL; > + ret = mem_cgroup_charge_hugetlb(folio, gfp_mask); > + /* > + * Unconditionally increment NR_HUGETLB here. If it turns out that > + * mem_cgroup_charge_hugetlb failed, then immediately free the page and > + * decrement NR_HUGETLB. > + */ > + lruvec_stat_mod_folio(folio, NR_HUGETLB, pages_per_huge_page(h)); > + > + if (ret == -ENOMEM) { > + free_huge_folio(folio); > + return ERR_PTR(-ENOMEM); > + } > + > + return folio; > + > +out_uncharge_cgroup: > + hugetlb_cgroup_uncharge_cgroup(idx, nr_pages, h_cg); > +out_uncharge_cgroup_reservation: > + if (charge_cgroup_rsvd) > + hugetlb_cgroup_uncharge_cgroup_rsvd(idx, nr_pages, h_cg); I find the direct copy of the unwind logic from alloc_hugetlb_folio() cumbersome and it seems like a good opportunity to clean it up. > +out: > + folio = ERR_PTR(-ENOSPC); > + goto out; Endless loop? Ira [snip]