On Mon, Aug 11, 2025 at 11:52:11AM +0200, David Hildenbrand wrote: > On 11.08.25 11:49, David Hildenbrand wrote: > > On 11.08.25 11:43, Kiryl Shutsemau wrote: > > > On Mon, Aug 11, 2025 at 10:41:08AM +0200, Pankaj Raghav (Samsung) wrote: > > > > From: Pankaj Raghav <p.raghav@xxxxxxxxxxx> > > > > > > > > Many places in the kernel need to zero out larger chunks, but the > > > > maximum segment we can zero out at a time by ZERO_PAGE is limited by > > > > PAGE_SIZE. > > > > > > > > This concern was raised during the review of adding Large Block Size support > > > > to XFS[2][3]. > > > > > > > > This is especially annoying in block devices and filesystems where > > > > multiple ZERO_PAGEs are attached to the bio in different bvecs. With multipage > > > > bvec support in block layer, it is much more efficient to send out > > > > larger zero pages as a part of single bvec. > > > > > > > > Some examples of places in the kernel where this could be useful: > > > > - blkdev_issue_zero_pages() > > > > - iomap_dio_zero() > > > > - vmalloc.c:zero_iter() > > > > - rxperf_process_call() > > > > - fscrypt_zeroout_range_inline_crypt() > > > > - bch2_checksum_update() > > > > ... > > > > > > > > Usually huge_zero_folio is allocated on demand, and it will be > > > > deallocated by the shrinker if there are no users of it left. At the moment, > > > > huge_zero_folio infrastructure refcount is tied to the process lifetime > > > > that created it. This might not work for bio layer as the completions > > > > can be async and the process that created the huge_zero_folio might no > > > > longer be alive. And, one of the main point that came during discussion > > > > is to have something bigger than zero page as a drop-in replacement. > > > > > > > > Add a config option PERSISTENT_HUGE_ZERO_FOLIO that will always allocate > > > > the huge_zero_folio, and disable the shrinker so that huge_zero_folio is > > > > never freed. > > > > This makes using the huge_zero_folio without having to pass any mm struct and does > > > > not tie the lifetime of the zero folio to anything, making it a drop-in > > > > replacement for ZERO_PAGE. > > > > > > > > I have converted blkdev_issue_zero_pages() as an example as a part of > > > > this series. I also noticed close to 4% performance improvement just by > > > > replacing ZERO_PAGE with persistent huge_zero_folio. > > > > > > > > I will send patches to individual subsystems using the huge_zero_folio > > > > once this gets upstreamed. > > > > > > > > Looking forward to some feedback. > > > > > > Why does it need to be compile-time? Maybe whoever needs huge zero page > > > would just call get_huge_zero_page()/folio() on initialization to get it > > > pinned? > > > > That's what v2 did, and this way here is cleaner. > > Sorry, RFC v2 I think. It got a bit confusing with series names/versions. Well, my worry is that 2M can be a high tax for smaller machines. Compile-time might be cleaner, but it has downsides. It is also not clear if these users actually need physical HZP or virtual is enough. Virtual is cheap. -- Kiryl Shutsemau / Kirill A. Shutemov