On 30/05/2025 09:44, David Hildenbrand wrote: > On 30.05.25 10:04, Ryan Roberts wrote: >> On 29/05/2025 09:23, Baolin Wang wrote: >>> As we discussed in the previous thread [1], the MADV_COLLAPSE will ignore >>> the system-wide anon/shmem THP sysfs settings, which means that even though >>> we have disabled the anon/shmem THP configuration, MADV_COLLAPSE will still >>> attempt to collapse into a anon/shmem THP. This violates the rule we have >>> agreed upon: never means never. This patch set will address this issue. >> >> This is a drive-by comment from me without having the previous context, but... >> >> Surely MADV_COLLAPSE *should* ignore the THP sysfs settings? It's a deliberate >> user-initiated, synchonous request to use huge pages for a range of memory. >> There is nothing *transparent* about it, it just happens to be implemented using >> the same logic that THP uses. >> >> I always thought this was a deliberate design decision. > > If the admin said "never", then why should a user be able to overwrite that? Well my interpretation would be that the admin is saying never *transparently* give anyone any hugepages; on balance it does more harm than good for my workloads. The toggle is called transparent_hugepage/enabled, after all. Whereas MADV_COLLAPSE is deliberately applied to a specific region at an opportune moment in time, presumably because the user knows that the region *will* benefit and because that point in the execution is not sensitive to latency. I see them as logically separate. > > The design decision I recall is that if VM_NOHUGEPAGE is set, we'll ignore that. > Because that was set by the app itself (MADV_NOHUEPAGE). Hmm, ok. My instinct would have been the opposite; MADV_NOHUGEPAGE means "I don't want the risk of latency spikes and memory bloat that THP can cause". Not "ignore my explicit requests to MADV_COLLAPSE". But if that descision was already taken and that's the current behavior then I agree we have an inconsistency with respect to the sysfs control. Perhaps we should be guided by real world usage - AIUI there is a cloud that disables THP at system level today (Google?). Is there any concern that there are workloads in such environments that are using MADV_COLLAPSE today that would then see a performance drop?