Re: What should we do about the nvme atomics mess?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 7/7/25 7:48 PM, Christoph Hellwig wrote:
> Hi all,
> 
> I'm a bit lost on what to do about the sad state of NVMe atomic writes.
> 
> As a short reminder the main issues are:
> 
>  1) there is no flag on a command to request atomic (aka non-torn)
>     behavior, instead writes adhering to the atomicy requirements will
>     never be torn, and writes not adhering them can be torn any time.
>     This differs from SCSI where atomic writes have to be be explicitly
>     requested and fail when they can't be satisfied
>  2) the original way to indicate the main atomicy limit is the AWUPF
>     field, which is in Identify Controller, but specified in logical
>     blocks which only exist at a namespace layer.  This a) lead to
>     various problems because the limit is a mess when namespace have
>     different logical block sizes, and it b) also causes additional
>     issues because NVMe allows it to be different for different
>     controllers in the same subsystem.
> 
> Commit 8695f060a029 added some sanity checks to deal with issue 2b,
> but we kept running into more issues with it.  Partially because
> the check wasn't quite correct, but also because we've gotten
> reports of controllers that change the AWUPF value when reformatting
> namespaces to deal with issue 2a.
> 
> And I'm a bit lost on what to do here.
> 
> We could:
> 
>  I.	 revert the check and the subsequent fixup.  If you really want
>          to use the nvme atomics you already better pray a lot anyway
> 	 due to issue 1)
>  II.	 limit the check to multi-controller subsystems
>  III.	 don't allow atomics on controllers that only report AWUPF and
>  	 limit support to controllers that support that more sanely
> 	 defined NAWUPF
> 
> I guess for 6.16 we are limited to I. to bring us back to the previous
> state, but I have a really bad gut feeling about it given the really
> bad spec language and a lot of low quality NVMe implementations we're
> seeing these days.
>  not the 
> 
I believe there are multi-controller NVMe disks in the field (including the 
one I have) that do not exhibit such inconsistencies, i.e., they report a
consistent AWUPF value across controllers and do not change it based on 
namespace format. The NVMe specification states this (quoting it from 
NVM-Command-Set-Specification-1.0e):

"The values (referencing AWUPF / AWUN) reported in the Identify Controller
data structure are valid across all namespaces with any supported namespace
format, forming a baseline value that is guaranteed not to change."

While the spec doesn’t explicitly require that AWUPF be consistent across
controllers within the same subsystem, it seems to be implied. That said,
I agree this should have been stated explicitly in the specification.

If vendors strictly adhered to the current spec, we likely wouldn’t be 
facing this issue. That said, given the current behavior, I also support
approach III. However, choosing this approach effectively penalizes vendors
who have implemented atomic write support correctly—that is, those who use
AWUPF to advertise atomic write capabilities, do not rely on NAWUPF, and
report a consistent AWUPF across controllers.

In my opinion, the proper long-term fix is to escalate this to the NVMe 
Technical Work Group (TWG) and propose a specification update that:

- Deprecates the use of AWUPF for advertising atomic write capabilities
- Mandates the use of NAWUPF instead

Once such a spec update is ratified, we can move forward with approach III.

Thanks,
--Nilay





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux