On 4/19/25 12:32 AM, Jinjiang Tu wrote: > When echo 0 > /proc/sys/vm/nr_hugepages is concurrent with freeing in-use > huge pages to the huge page pool, some free huge pages may fail to be > destroyed and accounted as surplus. The counts are like below: > > HugePages_Total: 1024 > HugePages_Free: 1024 > HugePages_Surp: 1024 > > When set_max_huge_pages() decrease the pool size, it first return free > pages to the buddy allocator, and then account other pages as surplus. > Between the two steps, the hugetlb_lock is released to free memory and > require the hugetlb_lock again. If another process free huge pages to the > pool between the two steps, these free huge pages will be accounted as > surplus. > > Besides, Free surplus huge pages come from failing to restore vmemmap. > > Once the two situation occurs, users couldn't directly shrink the huge > page pool via echo 0 > nr_hugepages, should use one of the two ways to > destroy these free surplus huge pages: > 1) echo $nr_surplus > nr_hugepages to convert the surplus free huge pages > to persistent free huge pages first, and then echo 0 > nr_hugepages to > destroy these huge pages. > 2) allocate these free surplus huge pages, and will try to destroy them > when freeing them. > > However, there is no documentation to describe it, users may be confused > and don't know how to handle in such case. So update the documention. > > Signed-off-by: Jinjiang Tu <tujinjiang@xxxxxxxxxx> > --- > Documentation/admin-guide/mm/hugetlbpage.rst | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst > index 67a941903fd2..0456cefae039 100644 > --- a/Documentation/admin-guide/mm/hugetlbpage.rst > +++ b/Documentation/admin-guide/mm/hugetlbpage.rst > @@ -239,6 +239,17 @@ this condition holds--that is, until ``nr_hugepages+nr_overcommit_hugepages`` is > increased sufficiently, or the surplus huge pages go out of use and are freed-- > no more surplus huge pages will be allowed to be allocated. > > +Caveat: Shrinking the persistent huge page pool via ``nr_hugepages`` may be > +concurrent with freeing in-use huge pages to the huge page pool, leading to some > +huge pages are still in the huge page pool and accounted as surplus. Besides, > +When the feature of freeing unused vmemmap pages associated with each hugetlb page when > +is enabled, free huge page may be accounted as surplus too. In such two cases, users > +couldn't directly shrink the huge page pool via echo 0 to ``nr_hugepages``, should but should Also, please limit each line to <80 characters. > +echo $nr_surplus to ``nr_hugepages`` to convert the surplus free huge pages to > +persistent free huge pages first, and then echo 0 to ``nr_hugepages`` to destroy > +these huge pages. Another way to destroy is allocating these free surplus huge > +pages and these huge pages will be tried to destroy when they are freed. > + But I don't see why this is a user problem to be solved by users... > With support for multiple huge page pools at run-time available, much of > the huge page userspace interface in ``/proc/sys/vm`` has been duplicated in > sysfs. -- ~Randy