Re: [PATCH] block: don't use submit_bio_noacct_nocheck in blk_zone_wplug_bio_work

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/25/25 2:35 AM, Bart Van Assche wrote:
> On 6/23/25 6:18 PM, Damien Le Moal wrote:
>> For a nicer solution, which is mostly DM-based, combine what I sent you to
>> force write BIOs to be split early for zoned DM devices together with the patch
>> [1], which I sent already but needs more work. This combination was tested by
>> Shin'ichiro and he could not reproduce the hang with both patches applied.
>>
>> [1] https://lore.kernel.org/dm-devel/20250611011340.92226-1-dlemoal@xxxxxxxxxx/
>>
>> As far as I can tell, dm-crypt is the only DM target driver supporting zones
>> that splits write operations "under the hood". But I will check again.
> 
> Hi Damien,
> 
> With both patches applied on top of Jens' for-next branch (2d5a3220c1f5
> ("Merge branch 'block-6.16' into for-next"), I can't reproduce the
> deadlock anymore. This is unexpected because the deadlock happens
> between the queue freezing mechanism and zwplug->bio_list. No
> matter how bios are split, if bios are queued faster than these are
> processed, one or more bios end up on zwplug->bio_list and this deadlock
> can happen.
> 
> Did I perhaps overlook or misunderstand something?

Yes, because you focused on the block layer when the actual issue is in DM.

Any zoned DM target that uses zone append emulation will use zone write
plugging. If in addition to this, the target driver uses
dm_accept_partial_bio() to internally split BIOs, it can happen that a BIO that
was plugged and issued from a zone write plug bio work is split using
dm_accept_partial_bio(). In this case, the reminder of the BIO is issued again
and thus there is a call to blk_queue_enter() which will block if a queue
freeze is ongoing. This blocking is in the zone write plug bio work, which
result in no forward progress: BIOs plugged are never unplugged and processed.
Here is your deadlock.

So the solution is to force a split to the DM device limits of any write BIO in
dm core, before the BIO is passed to the DM target map() function, *AND*
prevent the target driver from further splitting a write BIO using
dm_accept_partial_bio().

Only dm-crypt is affected by this. dm-flakey supports zoned targets and uses
dm_accept_partial_bio() but it does not require zone append emulation so does
not use zone write plugging.

Sending clean patches in a short while. I tested with your zbd/013 reproducer
and all is good.

-- 
Damien Le Moal
Western Digital Research




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux