On 6/25/25 2:35 AM, Bart Van Assche wrote: > On 6/23/25 6:18 PM, Damien Le Moal wrote: >> For a nicer solution, which is mostly DM-based, combine what I sent you to >> force write BIOs to be split early for zoned DM devices together with the patch >> [1], which I sent already but needs more work. This combination was tested by >> Shin'ichiro and he could not reproduce the hang with both patches applied. >> >> [1] https://lore.kernel.org/dm-devel/20250611011340.92226-1-dlemoal@xxxxxxxxxx/ >> >> As far as I can tell, dm-crypt is the only DM target driver supporting zones >> that splits write operations "under the hood". But I will check again. > > Hi Damien, > > With both patches applied on top of Jens' for-next branch (2d5a3220c1f5 > ("Merge branch 'block-6.16' into for-next"), I can't reproduce the > deadlock anymore. This is unexpected because the deadlock happens > between the queue freezing mechanism and zwplug->bio_list. No > matter how bios are split, if bios are queued faster than these are > processed, one or more bios end up on zwplug->bio_list and this deadlock > can happen. > > Did I perhaps overlook or misunderstand something? Yes, because you focused on the block layer when the actual issue is in DM. Any zoned DM target that uses zone append emulation will use zone write plugging. If in addition to this, the target driver uses dm_accept_partial_bio() to internally split BIOs, it can happen that a BIO that was plugged and issued from a zone write plug bio work is split using dm_accept_partial_bio(). In this case, the reminder of the BIO is issued again and thus there is a call to blk_queue_enter() which will block if a queue freeze is ongoing. This blocking is in the zone write plug bio work, which result in no forward progress: BIOs plugged are never unplugged and processed. Here is your deadlock. So the solution is to force a split to the DM device limits of any write BIO in dm core, before the BIO is passed to the DM target map() function, *AND* prevent the target driver from further splitting a write BIO using dm_accept_partial_bio(). Only dm-crypt is affected by this. dm-flakey supports zoned targets and uses dm_accept_partial_bio() but it does not require zone append emulation so does not use zone write plugging. Sending clean patches in a short while. I tested with your zbd/013 reproducer and all is good. -- Damien Le Moal Western Digital Research