Ira Weiny <ira.weiny@xxxxxxxxx> writes: > Ackerley Tng wrote: >> Refactor dequeue_hugetlb_folio() and alloc_surplus_hugetlb_folio() to >> take mpol, nid and nodemask. This decouples allocation of a folio from >> a vma. >> >> Signed-off-by: Ackerley Tng <ackerleytng@xxxxxxxxxx> >> Change-Id: I890fb46fe8c6349383d8cf89befc68a4994eb416 >> --- >> mm/hugetlb.c | 64 ++++++++++++++++++++++++---------------------------- >> 1 file changed, 30 insertions(+), 34 deletions(-) >> > > [snip] > >> >> @@ -2993,6 +2974,11 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, >> int ret, idx; >> struct hugetlb_cgroup *h_cg = NULL; >> gfp_t gfp = htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL; >> + struct mempolicy *mpol; >> + nodemask_t *nodemask; >> + gfp_t gfp_mask; >> + pgoff_t ilx; >> + int nid; >> >> idx = hstate_index(h); >> >> @@ -3032,7 +3018,6 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, >> >> subpool_reservation_exists = npages_req == 0; >> } >> - >> reservation_exists = vma_reservation_exists || subpool_reservation_exists; >> >> /* >> @@ -3048,21 +3033,30 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, >> goto out_subpool_put; >> } >> >> + mpol = get_vma_policy(vma, addr, h->order, &ilx); > > Why does the memory policy need to be acquired here instead of after the > cgroup charge? AFAICT this is not needed and would at least eliminate 1 > of the error conditions puts. > I was hoping that by taking this early, the upcoming refactoring out of hugetlb_alloc_folio() will look like a nice, clean removal of the middle of this function, leaving acquiring of the mpol and mpol_cond_put() in-place. In the next revision I'm splitting up the refactoring in this patch further so if this is still an issue in some number of revisions' time, I can fix this. >> + >> ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg); >> - if (ret) >> + if (ret) { >> + mpol_cond_put(mpol); > ^^^^ > here > > All that said I think the use of some new cleanup macros could really help > a lot of this code. > I'm happy to try that out... > What do folks in this area of the kernel think of those? > not sure though. > Ira > > [snip]