Pankaj, There seems to be quite a lot to work on here, and it seems rather speculative, so can we respin as an RFC please? Thanks! :) On Mon, Jul 07, 2025 at 04:23:14PM +0200, Pankaj Raghav (Samsung) wrote: > From: Pankaj Raghav <p.raghav@xxxxxxxxxxx> > > There are many places in the kernel where we need to zeroout larger > chunks but the maximum segment we can zeroout at a time by ZERO_PAGE > is limited by PAGE_SIZE. > > This concern was raised during the review of adding Large Block Size support > to XFS[1][2]. > > This is especially annoying in block devices and filesystems where we > attach multiple ZERO_PAGEs to the bio in different bvecs. With multipage > bvec support in block layer, it is much more efficient to send out > larger zero pages as a part of a single bvec. > > Some examples of places in the kernel where this could be useful: > - blkdev_issue_zero_pages() > - iomap_dio_zero() > - vmalloc.c:zero_iter() > - rxperf_process_call() > - fscrypt_zeroout_range_inline_crypt() > - bch2_checksum_update() > ... > > We already have huge_zero_folio that is allocated on demand, and it will be > deallocated by the shrinker if there are no users of it left. > > At moment, huge_zero_folio infrastructure refcount is tied to the process > lifetime that created it. This might not work for bio layer as the completions > can be async and the process that created the huge_zero_folio might no > longer be alive. > > Add a config option STATIC_PMD_ZERO_PAGE that will always allocate > the huge_zero_folio via memblock, and it will never be freed. > > I have converted blkdev_issue_zero_pages() as an example as a part of > this series. > > I will send patches to individual subsystems using the huge_zero_folio > once this gets upstreamed. > > Looking forward to some feedback. > > [1] https://lore.kernel.org/linux-xfs/20231027051847.GA7885@xxxxxx/ > [2] https://lore.kernel.org/linux-xfs/ZitIK5OnR7ZNY0IG@xxxxxxxxxxxxx/ > > Changes since v1: > - Move from .bss to allocating it through memblock(David) > > Changes since RFC: > - Added the config option based on the feedback from David. > - Encode more info in the header to avoid dead code (Dave hansen > feedback) > - The static part of huge_zero_folio in memory.c and the dynamic part > stays in huge_memory.c > - Split the patches to make it easy for review. > > Pankaj Raghav (5): > mm: move huge_zero_page declaration from huge_mm.h to mm.h > huge_memory: add huge_zero_page_shrinker_(init|exit) function > mm: add static PMD zero page > mm: add largest_zero_folio() routine > block: use largest_zero_folio in __blkdev_issue_zero_pages() > > block/blk-lib.c | 17 +++++---- > include/linux/huge_mm.h | 31 ---------------- > include/linux/mm.h | 81 +++++++++++++++++++++++++++++++++++++++++ > mm/Kconfig | 9 +++++ > mm/huge_memory.c | 62 +++++++++++++++++++++++-------- > mm/memory.c | 25 +++++++++++++ > mm/mm_init.c | 1 + > 7 files changed, 173 insertions(+), 53 deletions(-) > > > base-commit: d7b8f8e20813f0179d8ef519541a3527e7661d3a > -- > 2.49.0 >