Re: [PATCH v3] mm/filemap: Allow arch to request folio size for exec memory

Zi Yan <ziy@xxxxxxxxxx> · Thu, 27 Mar 2025 20:07:23 -0400

On 27 Mar 2025, at 12:44, Matthew Wilcox wrote:

> On Thu, Mar 27, 2025 at 04:06:58PM +0000, Ryan Roberts wrote:
>> So let's special-case the read(ahead) logic for executable mappings. The
>> trade-off is performance improvement (due to more efficient storage of
>> the translations in iTLB) vs potential read amplification (due to
>> reading too much data around the fault which won't be used), and the
>> latter is independent of base page size. I've chosen 64K folio size for
>> arm64 which benefits both the 4K and 16K base page size configs and
>> shouldn't lead to any read amplification in practice since the old
>> read-around path was (usually) reading blocks of 128K. I don't
>> anticipate any write amplification because text is always RO.
>
> Is there not also the potential for wasted memory due to ELF alignment?
> Kalesh talked about it in the MM BOF at the same time that Ted and I
> were discussing it in the FS BOF.  Some coordination required (like
> maybe Kalesh could have mentioned it to me rathere than assuming I'd be
> there?)
>
>> +#define arch_exec_folio_order() ilog2(SZ_64K >> PAGE_SHIFT)
>
> I don't think the "arch" really adds much value here.
>
> #define exec_folio_order()	get_order(SZ_64K)

How about AMD’s PTE coalescing, which does PTE compression at
16KB or 32KB level? It covers 4 16KB and 2 32KB, at least it will
not hurt AMD PTE coalescing. Starting with 64KB across all arch
might be simpler to see the performance impact. Just a comment,
no objection. :)

Best Regards,
Yan, Zi