Re: [PATCH v4 7/7] cxl/dax: Defer DAX consumption of SOFT RESERVED resources until after CXL region creation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Alison,

Sorry I missed this email before sending out v5. Comments inline.

On 7/15/2025 9:27 AM, Alison Schofield wrote:
On Tue, Jun 03, 2025 at 10:19:49PM +0000, Smita Koralahalli wrote:
From: Nathan Fontenot <nathan.fontenot@xxxxxxx>

The DAX HMEM driver currently consumes all SOFT RESERVED iomem resources
during initialization. This interferes with the CXL driver’s ability to
create regions and trim overlapping SOFT RESERVED ranges before DAX uses
them.

To resolve this, defer the DAX driver's resource consumption if the
cxl_acpi driver is enabled. The DAX HMEM initialization skips walking the
iomem resource tree in this case. After CXL region creation completes,
any remaining SOFT RESERVED resources are explicitly registered with the
DAX driver by the CXL driver.

This sequencing ensures proper handling of overlaps and fixes hotplug
failures.

Hi Smita,

About the issue I first mentioned here [1]. The HMEM driver is not
waiting for region probe to finish. By the time region probe attempts
to hand off the memory to DAX, the memory is already marked as System RAM.

See 'case CXL_PARTMODE_RAM:' in cxl_region_probe(). The is_system_ram()
test fails so devm_cxl_add_dax_region() not possible.

This means that in appearance, just looking at /proc/iomem/, this
seems to have worked. There is no soft reserved and the dax and
kmem resources are child resources of the region resource. But they
were not set up by the region driver, hence no unregister callback
is triggered when the region is disabled.

I believe this should be resolved in v5. I see the following dmesg entries indicating that devm_cxl_add_dax_region() is being called successfully for all regions:

# dmesg | grep devm_cxl_add_dax_region
[ 40.730864] devm_cxl_add_dax_region: cxl_region region0: region0: register dax_region0 [ 40.756307] devm_cxl_add_dax_region: cxl_region region1: region1: register dax_region1 [ 43.689882] devm_cxl_add_dax_region: cxl_region region2: region2: register dax_region2

cat /proc/iomem

850000000-284fffffff : CXL Window 0
  850000000-284fffffff : region0
    850000000-284fffffff : dax0.0
      850000000-284fffffff : System RAM (kmem)
2850000000-484fffffff : CXL Window 1
  2850000000-484fffffff : region1
    2850000000-484fffffff : dax1.0
      2850000000-484fffffff : System RAM (kmem)
4850000000-684fffffff : CXL Window 2
  4850000000-684fffffff : region2
    4850000000-684fffffff : dax2.0
      4850000000-684fffffff : System RAM (kmem)

I suspect devm_cxl_add_dax_region() didn't execute in v4 because hmem_register_resource() (called from hmat.c) preemptively created hmem_active entries. These were consumed during walk_hmem_resources() and registered by hmem_register_device() before the CXL region probe could complete.

In v5, if CONFIG_CXL_ACPI is enabled, soft reserved resources are stored in hmem_deferred_active instead and hmem_register_resource() calls from hmat.c are disabled. This ensures DAX registration happens only through CXL drivers probing.


It appears like this:

c080000000-17dbfffffff : CXL Window 0
   c080000000-c47fffffff : region2
     c080000000-c47fffffff : dax0.0
       c080000000-c47fffffff : System RAM (kmem)

Now, to make the memory available for reuse, need to do:
# daxctl offline-memory dax0.0
# daxctl destroy-device --force dax0.0
# cxl disable-region 2
# cxl destroy-region 2

Whereas previously, did this:
# daxctl offline-memory dax0.0
# cxl disable-region 2
   After disabling region, dax device unregistered.
# cxl destroy-region 2

I haven’t yet tested this specific unregister flow with v5. I’ll verify and follow up if I see any issues. Meanwhile, please let me know if you still observe the same behavior or need additional debug info from my side.

Thanks
Smita

I do see that __cxl_region_softreserv_update() is not called until
after cxl_region_probe() completes, so that is waiting properly to
pick up the scraps. I'm actually not sure there would be any scraps
though, if the HMEM driver has already done it's thing. In my case
the Soft Reserved size is same as region, so I cannot tell what
would happen if that Soft Reserved had more capacity than the region.

If I do this: # CONFIG_DEV_DAX_HMEM is not set, works same as before,
which is as expected.

Let me know if I can try anything else out or collect more info.

--Alison


[1] https://lore.kernel.org/nvdimm/20250603221949.53272-1-Smita.KoralahalliChannabasappa@xxxxxxx/T/#m10c0eb7b258af7cd0c84c7ee2c417c055724f921



Co-developed-by: Nathan Fontenot <Nathan.Fontenot@xxxxxxx>
Signed-off-by: Nathan Fontenot <Nathan.Fontenot@xxxxxxx>
Co-developed-by: Terry Bowman <terry.bowman@xxxxxxx>
Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@xxxxxxx>
---
  drivers/cxl/core/region.c | 10 +++++++++
  drivers/dax/hmem/device.c | 43 ++++++++++++++++++++-------------------
  drivers/dax/hmem/hmem.c   |  3 ++-
  include/linux/dax.h       |  6 ++++++
  4 files changed, 40 insertions(+), 22 deletions(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 3a5ca44d65f3..c6c0c7ba3b20 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -10,6 +10,7 @@
  #include <linux/sort.h>
  #include <linux/idr.h>
  #include <linux/memory-tiers.h>
+#include <linux/dax.h>
  #include <cxlmem.h>
  #include <cxl.h>
  #include "core.h"
@@ -3553,6 +3554,11 @@ static struct resource *normalize_resource(struct resource *res)
  	return NULL;
  }
+static int cxl_softreserv_mem_register(struct resource *res, void *unused)
+{
+	return hmem_register_device(phys_to_target_node(res->start), res);
+}
+
  static int __cxl_region_softreserv_update(struct resource *soft,
  					  void *_cxlr)
  {
@@ -3590,6 +3596,10 @@ int cxl_region_softreserv_update(void)
  				    __cxl_region_softreserv_update);
  	}
+ /* Now register any remaining SOFT RESERVES with DAX */
+	walk_iomem_res_desc(IORES_DESC_SOFT_RESERVED, IORESOURCE_MEM,
+			    0, -1, NULL, cxl_softreserv_mem_register);
+
  	return 0;
  }
  EXPORT_SYMBOL_NS_GPL(cxl_region_softreserv_update, "CXL");
diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
index 59ad44761191..cc1ed7bbdb1a 100644
--- a/drivers/dax/hmem/device.c
+++ b/drivers/dax/hmem/device.c
@@ -8,7 +8,6 @@
  static bool nohmem;
  module_param_named(disable, nohmem, bool, 0444);
-static bool platform_initialized;
  static DEFINE_MUTEX(hmem_resource_lock);
  static struct resource hmem_active = {
  	.name = "HMEM devices",
@@ -35,9 +34,7 @@ EXPORT_SYMBOL_GPL(walk_hmem_resources);
static void __hmem_register_resource(int target_nid, struct resource *res)
  {
-	struct platform_device *pdev;
  	struct resource *new;
-	int rc;
new = __request_region(&hmem_active, res->start, resource_size(res), "",
  			       0);
@@ -47,21 +44,6 @@ static void __hmem_register_resource(int target_nid, struct resource *res)
  	}
new->desc = target_nid;
-
-	if (platform_initialized)
-		return;
-
-	pdev = platform_device_alloc("hmem_platform", 0);
-	if (!pdev) {
-		pr_err_once("failed to register device-dax hmem_platform device\n");
-		return;
-	}
-
-	rc = platform_device_add(pdev);
-	if (rc)
-		platform_device_put(pdev);
-	else
-		platform_initialized = true;
  }
void hmem_register_resource(int target_nid, struct resource *res)
@@ -83,9 +65,28 @@ static __init int hmem_register_one(struct resource *res, void *data)
static __init int hmem_init(void)
  {
-	walk_iomem_res_desc(IORES_DESC_SOFT_RESERVED,
-			IORESOURCE_MEM, 0, -1, NULL, hmem_register_one);
-	return 0;
+	struct platform_device *pdev;
+	int rc;
+
+	if (!IS_ENABLED(CONFIG_CXL_ACPI)) {
+		walk_iomem_res_desc(IORES_DESC_SOFT_RESERVED,
+				    IORESOURCE_MEM, 0, -1, NULL,
+				    hmem_register_one);
+	}
+
+	pdev = platform_device_alloc("hmem_platform", 0);
+	if (!pdev) {
+		pr_err("failed to register device-dax hmem_platform device\n");
+		return -1;
+	}
+
+	rc = platform_device_add(pdev);
+	if (rc) {
+		pr_err("failed to add device-dax hmem_platform device\n");
+		platform_device_put(pdev);
+	}
+
+	return rc;
  }
/*
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index 3aedef5f1be1..a206b9b383e4 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -61,7 +61,7 @@ static void release_hmem(void *pdev)
  	platform_device_unregister(pdev);
  }
-static int hmem_register_device(int target_nid, const struct resource *res)
+int hmem_register_device(int target_nid, const struct resource *res)
  {
  	struct device *host = &dax_hmem_pdev->dev;
  	struct platform_device *pdev;
@@ -124,6 +124,7 @@ static int hmem_register_device(int target_nid, const struct resource *res)
  	platform_device_put(pdev);
  	return rc;
  }
+EXPORT_SYMBOL_GPL(hmem_register_device);
static int dax_hmem_platform_probe(struct platform_device *pdev)
  {
diff --git a/include/linux/dax.h b/include/linux/dax.h
index a4ad3708ea35..5052dca8b3bc 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -299,10 +299,16 @@ static inline int dax_mem2blk_err(int err)
#ifdef CONFIG_DEV_DAX_HMEM_DEVICES
  void hmem_register_resource(int target_nid, struct resource *r);
+int hmem_register_device(int target_nid, const struct resource *res);
  #else
  static inline void hmem_register_resource(int target_nid, struct resource *r)
  {
  }
+
+static inline int hmem_register_device(int target_nid, const struct resource *res)
+{
+	return 0;
+}
  #endif
typedef int (*walk_hmem_fn)(int target_nid, const struct resource *res);
--
2.17.1






[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux