On Mon, Jun 09, 2025 at 01:12:25PM +0100, Usama Arif wrote: > > > I dont like it either :) > > > > Pressed "Ctrl+enter" instead of "enter" by mistake which sent the email prematurely :) > Adding replies to the rest of the comments in this email. We've all been there :) > > As I mentioned in reply to David now in [1], pageblock_nr_pages is not really > 1 << PAGE_BLOCK_ORDER but is 1 << min(HPAGE_PMD_ORDER, PAGE_BLOCK_ORDER) when > THP is enabled. > > It needs a better name, but I think the right approach is just to change > pageblock_order as recommended in [2] > > [1] https://lore.kernel.org/all/4adf1f8b-781d-4ab0-b82e-49795ad712cb@xxxxxxxxx/ > [2] https://lore.kernel.org/all/c600a6c0-aa59-4896-9e0d-3649a32d1771@xxxxxxxxx/ > Replied there. > > > > >>> +{ > >>> + return (1UL << min(thp_highest_allowable_order(), PAGE_BLOCK_ORDER)); > >>> +} > >>> + > >>> static void set_recommended_min_free_kbytes(void) > >>> { > >>> struct zone *zone; > >>> @@ -2638,12 +2658,16 @@ static void set_recommended_min_free_kbytes(void) > >> > >> You provide a 'patchlet' in > >> https://lore.kernel.org/all/a179fd65-dc3f-4769-9916-3033497188ba@xxxxxxxxx/ > >> > >> That also does: > >> > >> /* Ensure 2 pageblocks are free to assist fragmentation avoidance */ > >> - recommended_min = pageblock_nr_pages * nr_zones * 2; > >> + recommended_min = min_thp_pageblock_nr_pages() * nr_zones * 2; > >> > >> So comment here - this comment is now incorrect, this isn't 2 page blocks, > >> it's 2 of 'sub-pageblock size as if page blocks were dynamically altered by > >> always/madvise THP size'. > >> > >> Again, this whole thing strikes me as we're doing things at the wrong level > >> of abstraction. > >> > >> And you're definitely now not helping avoid pageblock-sized > >> fragmentation. You're accepting that you need less so... why not reduce > >> pageblock size? :) > >> > > Yes agreed. :) > > >> /* > >> * Make sure that on average at least two pageblocks are almost free > >> * of another type, one for a migratetype to fall back to and a > >> > >> ^ remainder of comment > >> > >>> * second to avoid subsequent fallbacks of other types There are 3 > >>> * MIGRATE_TYPES we care about. > >>> */ > >>> - recommended_min += pageblock_nr_pages * nr_zones * > >>> + recommended_min += min_thp_pageblock_nr_pages() * nr_zones * > >>> MIGRATE_PCPTYPES * MIGRATE_PCPTYPES; > >> > >> This just seems wrong now and contradicts the comment - you're setting > >> minimum pages based on migrate PCP types that operate at pageblock order > >> but without reference to the actual number of page block pages? > >> > >> So the comment is just wrong now? 'make sure there are at least two > >> pageblocks', well this isn't what you're doing is it? So why there are we > >> making reference to PCP counts etc.? > >> > >> This seems like we're essentially just tuning these numbers someswhat > >> arbitrarily to reduce them? > >> > >>> > >>> - /* don't ever allow to reserve more than 5% of the lowmem */ > >>> - recommended_min = min(recommended_min, > >>> - (unsigned long) nr_free_buffer_pages() / 20); > >>> + /* > >>> + * Don't ever allow to reserve more than 5% of the lowmem. > >>> + * Use a min of 128 pages when all THP orders are set to never. > >> > >> Why? Did you just choose this number out of the blue? > > > Mentioned this in the previous comment. Ack > >> > >> Previously, on x86-64 with thp -> never on everything a pageblock order-9 > >> wouldn't this be a much higher value? > >> > >> I mean just putting '128' here is not acceptable. It needs to be justified > >> (even if empirically with data to back it) and defined as a named thing. > >> > >> > >>> + */ > >>> + recommended_min = clamp(recommended_min, 128, > >>> + (unsigned long) nr_free_buffer_pages() / 20); > >>> + > >>> recommended_min <<= (PAGE_SHIFT-10); > >>> > >>> if (recommended_min > min_free_kbytes) { > >>> diff --git a/mm/shmem.c b/mm/shmem.c > >>> index 0c5fb4ffa03a..8e92678d1175 100644 > >>> --- a/mm/shmem.c > >>> +++ b/mm/shmem.c > >>> @@ -136,10 +136,10 @@ struct shmem_options { > >>> }; > >>> > >>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE > >>> -static unsigned long huge_shmem_orders_always __read_mostly; > >>> -static unsigned long huge_shmem_orders_madvise __read_mostly; > >>> -static unsigned long huge_shmem_orders_inherit __read_mostly; > >>> -static unsigned long huge_shmem_orders_within_size __read_mostly; > >>> +unsigned long huge_shmem_orders_always __read_mostly; > >>> +unsigned long huge_shmem_orders_madvise __read_mostly; > >>> +unsigned long huge_shmem_orders_inherit __read_mostly; > >>> +unsigned long huge_shmem_orders_within_size __read_mostly; > >> > >> Again, we really shouldn't need to do this. > > Agreed, for the RFC, I just did it similar to the anon ones when I got the build error > trying to use these, but yeah a much better approach would be to just have a > function in shmem that would return the largest shmem thp allowable order. Ack, yeah it's fiddly but would be better this way. > > > >> > >>> static bool shmem_orders_configured __initdata; > >>> #endif > >>> > >>> @@ -516,25 +516,6 @@ static bool shmem_confirm_swap(struct address_space *mapping, > >>> return xa_load(&mapping->i_pages, index) == swp_to_radix_entry(swap); > >>> } > >>> > >>> -/* > >>> - * Definitions for "huge tmpfs": tmpfs mounted with the huge= option > >>> - * > >>> - * SHMEM_HUGE_NEVER: > >>> - * disables huge pages for the mount; > >>> - * SHMEM_HUGE_ALWAYS: > >>> - * enables huge pages for the mount; > >>> - * SHMEM_HUGE_WITHIN_SIZE: > >>> - * only allocate huge pages if the page will be fully within i_size, > >>> - * also respect madvise() hints; > >>> - * SHMEM_HUGE_ADVISE: > >>> - * only allocate huge pages if requested with madvise(); > >>> - */ > >>> - > >>> -#define SHMEM_HUGE_NEVER 0 > >>> -#define SHMEM_HUGE_ALWAYS 1 > >>> -#define SHMEM_HUGE_WITHIN_SIZE 2 > >>> -#define SHMEM_HUGE_ADVISE 3 > >>> - > >> > >> Again we really shouldn't need to do this, just provide some function from > >> shmem that gives you what you need. > >> > >>> /* > >>> * Special values. > >>> * Only can be set via /sys/kernel/mm/transparent_hugepage/shmem_enabled: > >>> @@ -551,7 +532,7 @@ static bool shmem_confirm_swap(struct address_space *mapping, > >>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE > >>> /* ifdef here to avoid bloating shmem.o when not necessary */ > >>> > >>> -static int shmem_huge __read_mostly = SHMEM_HUGE_NEVER; > >>> +int shmem_huge __read_mostly = SHMEM_HUGE_NEVER; > >> > >> Same comment. > >> > >>> static int tmpfs_huge __read_mostly = SHMEM_HUGE_NEVER; > >>> > >>> /** > >>> -- > >>> 2.47.1 > >>> > > >