On 4/29/25 11:19 AM, Hannes Reinecke wrote: > On 4/28/25 09:39, Nilay Shroff wrote: >> >> >> On 4/28/25 12:27 PM, Hannes Reinecke wrote: >>> On 4/25/25 12:33, Nilay Shroff wrote: >>>> Currently, a multipath head disk node is not created for single-ported >>>> NVMe adapters or private namespaces. However, creating a head node in >>>> these cases can help transparently handle transient PCIe link failures. >>>> Without a head node, features like delayed removal cannot be leveraged, >>>> making it difficult to tolerate such link failures. To address this, >>>> this commit introduces nvme_core module parameter multipath_head_always. >>>> >>>> When this param is set to true, it forces the creation of a multipath >>>> head node regardless NVMe disk or namespace type. So this option allows >>>> the use of delayed removal of head node functionality even for single- >>>> ported NVMe disks and private namespaces and thus helps transparently >>>> handling transient PCIe link failures. >>>> >>>> By default multipath_head_always is set to false, thus preserving the >>>> existing behavior. Setting it to true enables improved fault tolerance >>>> in PCIe setups. Moreover, please note that enabling this option would >>>> also implicitly enable nvme_core.multipath. >>>> >>>> Signed-off-by: Nilay Shroff <nilay@xxxxxxxxxxxxx> >>>> --- >>>> drivers/nvme/host/multipath.c | 70 +++++++++++++++++++++++++++++++---- >>>> 1 file changed, 63 insertions(+), 7 deletions(-) >>>> >>> I really would model this according to dm-multipath where we have the >>> 'fail_if_no_path' flag. >>> This can be set for PCIe devices to retain the current behaviour >>> (which we need for things like 'md' on top of NVMe) whenever the >>> this flag is set. >>> >> Okay so you meant that when sysfs attribute "delayed_removal_secs" >> under head disk node is _NOT_ configured (or delayed_removal_secs >> is set to zero) we have internal flag "fail_if_no_path" is set to >> true. However in other case when "delayed_removal_secs" is set to >> a non-zero value we set "fail_if_no_path" to false. Is that correct? >> > Don't make it overly complicated. > 'fail_if_no_path' (and the inverse 'queue_if_no_path') can both be > mapped onto delayed_removal_secs; if the value is '0' then the head > disk is immediately removed (the 'fail_if_no_path' case), and if it's > -1 it is never removed (the 'queue_if_no_path' case). > Yes if the value of delayed_removal_secs is 0 then the head is immediately removed, however if value of delayed_removal_secs is anything but zero (i.e. greater than zero as delayed_removal_secs is unsigned) then head is removed only after delayed_removal_secs is elapsed and hence disk couldn't recover from transient link failure. We never pin head node indefinitely. > Question, though: How does it interact with the existing 'ctrl_loss_tmo'? Both describe essentially the same situation... > The delayed_removal_secs is modeled for NVMe PCIe adapter. So it really doesn't interact or interfere with ctrl_loss_tmo which is fabric controller option. Thanks, --Nilay