Re: [RFC PATCHv2 2/3] nvme: introduce multipath_head_always module param

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/29/25 08:24, Nilay Shroff wrote:


On 4/29/25 11:19 AM, Hannes Reinecke wrote:
On 4/28/25 09:39, Nilay Shroff wrote:


On 4/28/25 12:27 PM, Hannes Reinecke wrote:
On 4/25/25 12:33, Nilay Shroff wrote:
Currently, a multipath head disk node is not created for single-ported
NVMe adapters or private namespaces. However, creating a head node in
these cases can help transparently handle transient PCIe link failures.
Without a head node, features like delayed removal cannot be leveraged,
making it difficult to tolerate such link failures. To address this,
this commit introduces nvme_core module parameter multipath_head_always.

When this param is set to true, it forces the creation of a multipath
head node regardless NVMe disk or namespace type. So this option allows
the use of delayed removal of head node functionality even for single-
ported NVMe disks and private namespaces and thus helps transparently
handling transient PCIe link failures.

By default multipath_head_always is set to false, thus preserving the
existing behavior. Setting it to true enables improved fault tolerance
in PCIe setups. Moreover, please note that enabling this option would
also implicitly enable nvme_core.multipath.

Signed-off-by: Nilay Shroff <nilay@xxxxxxxxxxxxx>
---
    drivers/nvme/host/multipath.c | 70 +++++++++++++++++++++++++++++++----
    1 file changed, 63 insertions(+), 7 deletions(-)

I really would model this according to dm-multipath where we have the
'fail_if_no_path' flag.
This can be set for PCIe devices to retain the current behaviour
(which we need for things like 'md' on top of NVMe) whenever the
this flag is set.

Okay so you meant that when sysfs attribute "delayed_removal_secs"
under head disk node is _NOT_ configured (or delayed_removal_secs
is set to zero) we have internal flag "fail_if_no_path" is set to
true. However in other case when "delayed_removal_secs" is set to
a non-zero value we set "fail_if_no_path" to false. Is that correct?

Don't make it overly complicated.
'fail_if_no_path' (and the inverse 'queue_if_no_path') can both be
mapped onto delayed_removal_secs; if the value is '0' then the head
disk is immediately removed (the 'fail_if_no_path' case), and if it's
-1 it is never removed (the 'queue_if_no_path' case).

Yes if the value of delayed_removal_secs is 0 then the head is immediately
removed, however if value of delayed_removal_secs is anything but zero
(i.e. greater than zero as delayed_removal_secs is unsigned) then head
is removed only after delayed_removal_secs is elapsed and hence disk
couldn't recover from transient link failure. We never pin head node
indefinitely.

Question, though: How does it interact with the existing 'ctrl_loss_tmo'? Both describe essentially the same situation...

The delayed_removal_secs is modeled for NVMe PCIe adapter. So it really
doesn't interact or interfere with ctrl_loss_tmo which is fabric controller
option.

Not so sure here.
You _could_ expand the scope for ctrl_loss_tmo to PCI, too;
as most PCI devices will only ever have one controller 'ctrl_loss_tmo'
will be identical to 'delayed_removal_secs'.

So I guess my question is: is there a value for fabrics to control
the lifetime of struct ns_head independent on the lifetime of the
controller?

Cheers,

Hannes
--
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@xxxxxxx                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux