On Wed, May 28, 2025 at 8:04 AM Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> wrote: > > > > On 2025/5/28 17:26, David Hildenbrand wrote: > > On 22.05.25 11:39, Baolin Wang wrote: > >> > >> > >> On 2025/5/21 18:23, Nico Pache wrote: > >>> On Tue, May 20, 2025 at 4:09 AM Baolin Wang > >>> <baolin.wang@xxxxxxxxxxxxxxxxx> wrote: > >>>> > >>>> Sorry for late reply. > >>>> > >>>> On 2025/5/17 14:47, Nico Pache wrote: > >>>>> On Thu, May 15, 2025 at 9:20 PM Baolin Wang > >>>>> <baolin.wang@xxxxxxxxxxxxxxxxx> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 2025/5/15 11:22, Nico Pache wrote: > >>>>>>> khugepaged scans anons PMD ranges for potential collapse to a > >>>>>>> hugepage. > >>>>>>> To add mTHP support we use this scan to instead record chunks of > >>>>>>> utilized > >>>>>>> sections of the PMD. > >>>>>>> > >>>>>>> khugepaged_scan_bitmap uses a stack struct to recursively scan a > >>>>>>> bitmap > >>>>>>> that represents chunks of utilized regions. We can then determine > >>>>>>> what > >>>>>>> mTHP size fits best and in the following patch, we set this > >>>>>>> bitmap while > >>>>>>> scanning the anon PMD. A minimum collapse order of 2 is used as > >>>>>>> this is > >>>>>>> the lowest order supported by anon memory. > >>>>>>> > >>>>>>> max_ptes_none is used as a scale to determine how "full" an order > >>>>>>> must > >>>>>>> be before being considered for collapse. > >>>>>>> > >>>>>>> When attempting to collapse an order that has its order set to > >>>>>>> "always" > >>>>>>> lets always collapse to that order in a greedy manner without > >>>>>>> considering the number of bits set. > >>>>>>> > >>>>>>> Signed-off-by: Nico Pache <npache@xxxxxxxxxx> > >>>>>> > >>>>>> Sigh. You still haven't addressed or explained the issues I > >>>>>> previously > >>>>>> raised [1], so I don't know how to review this patch again... > >>>>> Can you still reproduce this issue? > >>>> > >>>> Yes, I can still reproduce this issue with today's (5/20) mm-new > >>>> branch. > >>>> > >>>> I've disabled PMD-sized THP in my system: > >>>> [root]# cat /sys/kernel/mm/transparent_hugepage/enabled > >>>> always madvise [never] > >>>> [root]# cat > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled > >>>> always inherit madvise [never] > >>>> > >>>> And I tried calling madvise() with MADV_COLLAPSE for anonymous memory, > >>>> and I can still see it collapsing to a PMD-sized THP. > >>> Hi Baolin ! Thank you for your reply and willingness to test again :) > >>> > >>> I didn't realize we were talking about madvise collapse-- this makes > >>> sense now. I also figured out why I could "reproduce" it before. My > >>> script was always enabling the THP settings in two places, and I only > >>> commented out one to test this. But this time I was doing more manual > >>> testing. > >>> > >>> The original design of madvise_collapse ignores the sysfs and > >>> collapses even if you have an order disabled. I believe this behavior > >>> is wrong, but by design. I spent some time playing around with madvise > >>> collapses with and w/o my changes. This is not a new thing, I > >>> reproduced the issue in 6.11 (Fedora 41), and I think its been > >>> possible since the inception of madvise collapse 3 years ago. I > >>> noticed a similar behavior on one of my RFC since it was "breaking" > >>> selftests, and the fix was to reincorporate this broken sysfs > >>> behavior. > >> > >> OK. Thanks for the explanation. > >> > >>> 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage > >>> collapse") > >>> "This call is independent of the system-wide THP sysfs settings, but > >>> will fail for memory marked VM_NOHUGEPAGE." > >>> > >>> The second condition holds true (and fails for VM_NOHUGEPAGE), but I > >>> dont know if we actually want madvise_collapse to be independent of > >>> the system-wide. > >> > >> This design principle surprised me a bit, and I failed to find the > >> reason in the commit log. I agree that "never should mean never," and we > >> should respect the THP/mTHP sysfs setting. Additionally, for the > >> 'shmem_enabled' sysfs interface controlled for shmem/tmpfs, THP collapse > >> can still be prohibited through the 'deny' configuration. The rules here > >> are somewhat confusing. > > > > I recall that we decided to overwrite "VM_NOHUGEPAGE", because the > > assumption is that the same app that triggered MADV_NOHUGEPAGE triggers > > the collapse. So the app decides on its own behavior. > > > > Similarly, allowing for collapsing in a VM without VM_HUGEPAGE in the > > "madvise" mode would be fine. > > > > But in the "never" case, we should just "never" collapse. > > OK. Let's fix the "never" case first. Thanks. Great, I will update that in the next version! >