On Wed, 20 Aug 2025 20:02:38 +0300 Mike Rapoport <rppt@xxxxxxxxxx> wrote: > On Wed, Aug 20, 2025 at 10:00:50AM +0000, Shiju Jose wrote: > > >-----Original Message----- > > >From: Jonathan Cameron <jonathan.cameron@xxxxxxxxxx> > > >Sent: 20 August 2025 09:54 > > >To: Mike Rapoport <rppt@xxxxxxxxxx> > > >Cc: Shiju Jose <shiju.jose@xxxxxxxxxx>; rafael@xxxxxxxxxx; bp@xxxxxxxxx; > > >akpm@xxxxxxxxxxxxxxxxxxxx; dferguson@xxxxxxxxxxxxxxxxxxx; linux- > > >edac@xxxxxxxxxxxxxxx; linux-acpi@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; linux- > > >doc@xxxxxxxxxxxxxxx; tony.luck@xxxxxxxxx; lenb@xxxxxxxxxx; > > >leo.duran@xxxxxxx; Yazen.Ghannam@xxxxxxx; mchehab@xxxxxxxxxx; > > >Linuxarm <linuxarm@xxxxxxxxxx>; rientjes@xxxxxxxxxx; > > >jiaqiyan@xxxxxxxxxx; Jon.Grimm@xxxxxxx; dave.hansen@xxxxxxxxxxxxxxx; > > >naoya.horiguchi@xxxxxxx; james.morse@xxxxxxx; jthoughton@xxxxxxxxxx; > > >somasundaram.a@xxxxxxx; erdemaktas@xxxxxxxxxx; pgonda@xxxxxxxxxx; > > >duenwen@xxxxxxxxxx; gthelen@xxxxxxxxxx; > > >wschwartz@xxxxxxxxxxxxxxxxxxx; wbs@xxxxxxxxxxxxxxxxxxxxxx; > > >nifan.cxl@xxxxxxxxx; tanxiaofei <tanxiaofei@xxxxxxxxxx>; Zengtao (B) > > ><prime.zeng@xxxxxxxxxxxxx>; Roberto Sassu <roberto.sassu@xxxxxxxxxx>; > > >kangkang.shen@xxxxxxxxxxxxx; wanghuiqiang <wanghuiqiang@xxxxxxxxxx> > > >Subject: Re: [PATCH v11 1/3] mm: Add support to retrieve physical address > > >range of memory from the node ID > > > > > >On Wed, 20 Aug 2025 10:34:13 +0300 > > >Mike Rapoport <rppt@xxxxxxxxxx> wrote: > > > > > >> On Tue, Aug 19, 2025 at 05:54:20PM +0100, Jonathan Cameron wrote: > > >> > On Tue, 12 Aug 2025 15:26:13 +0100 > > >> > <shiju.jose@xxxxxxxxxx> wrote: > > >> > > > >> > > From: Shiju Jose <shiju.jose@xxxxxxxxxx> > > >> > > > > >> > > In the numa_memblks, a lookup facility is required to retrieve the > > >> > > physical address range of memory in a NUMA node. ACPI RAS2 memory > > >> > > features are among the use cases. > > >> > > > > >> > > Suggested-by: Jonathan Cameron <jonathan.cameron@xxxxxxxxxx> > > >> > > Signed-off-by: Shiju Jose <shiju.jose@xxxxxxxxxx> > > >> > > > >> > Looks fine to me. Mike, what do you think? > > >> > > >> I still don't see why we can't use existing functions like > > >> get_pfn_range_for_nid() or memblock_search_pfn_nid(). > > >> > > >> Or even node_start_pfn() and node_spanned_pages(). > > > > > >Good point. No reason anyone would scrub this on memory that hasn't been > > >hotplugged yet, so no need to use numa-memblk to get the info. > > >I guess I was thinking of the wrong hammer :) > > > > > >I'm not sure node_spanned_pages() works though as we need not to include > > >ranges that might be on another node as we'd give a wrong impression of what > > >was being scrubbed. > > If nodes are not interleaved node_spanned_pages() would work, even if there > are holes inside the node, like e.g. e820-reserved memory. > So with non-interleaved nodes node_start_pfn() and either > node_spanned_pages() or node_end_pfn() will give the node extents and they > are faster than get_pfn_range_for_nid(). > > If the nodes are interleaved, though, a single mem_base, mem_size are not > enough for a node as there are a few contiguous ranges in that node, e.g. > > 0 4G 8G 12G 16G > +-------------+ +-------------+ +-------------+ +-------------+ > | node 0 | | node 1 | | node 0 | | node 1 | > +-------------+ +-------------+ +-------------+ +-------------+ > > I didn't look into the details of the RAS2 driver, but isn't it's something > it should handle? The aim here is that a query prior to setting a specific range returns data for at least a range that the scrub controller covers and nothing it doesn't. So just presenting the first chunk for a node is fine. There is plenty of info for userspace to figure things out if it wants to trigger a scrub on 8-12G in your example, but until it does we want to return 0-4G for the default range. I hacked up some SRAT tables to give something like the above for testing. > > > >Should be able to use some combination of node_start_pfn() and maybe > > >memblock_search_pfn_nid() to get it though (that also gets the nid we already > > >know but meh, no ral harm in that.) > > > > Thanks Mike and Jonathan. > > > > The following approaches were tried as you suggested, instead of newly proposed > > nid_get_mem_physaddr_range(). > > Methods 1 to 3 give the same result as nid_get_mem_physaddr_range(), but > > Method 4 gives a different value for the size. > > I believe that's because on x86 the node 0 is really scrambled because of > e820/efi reservations that never make it to memblock. Fun question of whether we should take any notice of those. Would depend on whether anyone's scrub firmware gets confused if we scrub them and they aren't backed by memory. If they are we can rely on system constraints refusing to scrub that stuff at an 'unsafe' level and if we set it higher than it otherwise would be only possibility is we see earlier error detections in those and have to deal with them. Jonathan > > > Please advise which method should be used for the RAS2? > > > > Thanks, > > Shiju > > >