Re: [PATCH v20 07/13] blk-zoned: Support pipelining of zoned writes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/9/25 7:07 AM, Bart Van Assche wrote:
> Support pipelining of zoned writes if the block driver preserves the write
> order per hardware queue. Track per zone to which software queue writes
> have been queued. If zoned writes are pipelined, submit new writes to the
> same software queue as the writes that are already in progress. This
> prevents reordering by submitting requests for the same zone to different
> software or hardware queues.
> 
> Cc: Christoph Hellwig <hch@xxxxxx>
> Cc: Damien Le Moal <dlemoal@xxxxxxxxxx>
> Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx>

[...]

> @@ -764,14 +767,18 @@ static bool blk_zone_wplug_handle_reset_all(struct bio *bio)
>  static void disk_zone_wplug_schedule_bio_work(struct gendisk *disk,
>  					      struct blk_zone_wplug *zwplug)
>  {
> +	int cpu;
> +
> +	lockdep_assert_held(&zwplug->lock);
> +
>  	/*
>  	 * Take a reference on the zone write plug and schedule the submission
>  	 * of the next plugged BIO. blk_zone_wplug_bio_work() will release the
>  	 * reference we take here.
>  	 */
> -	WARN_ON_ONCE(!(zwplug->flags & BLK_ZONE_WPLUG_PLUGGED));
>  	refcount_inc(&zwplug->ref);
> -	queue_work(disk->zone_wplugs_wq, &zwplug->bio_work);
> +	cpu = zwplug->from_cpu >= 0 ? zwplug->from_cpu : WORK_CPU_UNBOUND;
> +	queue_work_on(cpu, disk->zone_wplugs_wq, &zwplug->bio_work);

Please change this to a more readable form with an expanded "if". The local
variable "cpu" should probably be named "work_cpu" for clarity too.

>  }
>  
>  static inline void disk_zone_wplug_add_bio(struct gendisk *disk,
> @@ -932,7 +939,8 @@ static bool blk_zone_wplug_prepare_bio(struct blk_zone_wplug *zwplug,
>  	 * We know such BIO will fail, and that would potentially overflow our
>  	 * write pointer offset beyond the end of the zone.
>  	 */
> -	if (disk_zone_wplug_is_full(disk, zwplug))
> +	if (!disk->queue->limits.driver_preserves_write_order
> +	    && disk_zone_wplug_is_full(disk, zwplug))

Writing to a zone that is full is an error, pipelining or not. So why do you
change this ? This does not make sense.

>  		return false;
>  
>  	if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
> @@ -956,7 +964,8 @@ static bool blk_zone_wplug_prepare_bio(struct blk_zone_wplug *zwplug,
>  		 * with a start sector not unaligned to the zone write pointer
>  		 * will fail.
>  		 */
> -		if (bio_offset_from_zone_start(bio) != zwplug->wp_offset)
> +		if (!disk->queue->limits.driver_preserves_write_order
> +		    && bio_offset_from_zone_start(bio) != zwplug->wp_offset)

Same here. This does not depend pipelining: write should have been received in
order and be aligned with the wp. So why change this condition ?

>  			return false;
>  	}
>  

> @@ -1033,8 +1044,23 @@ static bool blk_zone_wplug_handle_write(struct bio *bio, unsigned int nr_segs)
>  		return true;
>  	}
>  
> -	/* Otherwise, plug and submit the BIO. */
> -	zwplug->flags |= BLK_ZONE_WPLUG_PLUGGED;
> +	if (dpwo && zwplug->from_cpu < 0) {
> +		/* No zoned writes are in progress. Select the current CPU. */
> +		zwplug->from_cpu = raw_smp_processor_id();
> +		goto plug;
> +	} else if (dpwo) {
> +		/*
> +		 * The block driver preserves the write order. Submit the bio
> +		 * from zwplug->from_cpu.
> +		 */
> +		goto plug;

Can you change this to:

	if (dpwo) {
		/*
		 * The block driver preserves the write order: if we do not
		 * have any writes in progress already, use the current CPU
		 * to submit the BIO. Otherwise, keep using zwplug->from_cpu.
		 */
		if (zwplug->from_cpu < 0)
			zwplug->from_cpu = raw_smp_processor_id();
		goto plug;
	}

	/*
	 * The block driver does not preserve the write order. Plug and
	 * submit the BIO.
	 */
	zwplug->flags |= BLK_ZONE_WPLUG_PLUGGED;

That is a lot simpler and easier to read.

> +	} else {
> +		/*
> +		 * The block driver does not preserve the write order. Plug and
> +		 * submit the BIO.
> +		 */
> +		zwplug->flags |= BLK_ZONE_WPLUG_PLUGGED;
> +	}
>  
>  	spin_unlock_irqrestore(&zwplug->lock, flags);
>  

> @@ -1298,6 +1329,9 @@ static void blk_zone_submit_one_bio(struct blk_zone_wplug *zwplug)
>  	} else {
>  		blk_mq_submit_bio(bio);
>  	}
> +
> +	return disk->queue->limits.driver_preserves_write_order &&
> +		!need_resched();

I think we really need a helper for that
"disk->queue->limits.driver_preserves_write_order". But if you make this a
feature, it will be better.

Also, here, the test using need_resched() really need a comment explaining why
you look at that. I do not get it personally.

>  }
>  
>  static void blk_zone_wplug_bio_work(struct work_struct *work)
> @@ -1305,7 +1339,8 @@ static void blk_zone_wplug_bio_work(struct work_struct *work)
>  	struct blk_zone_wplug *zwplug =
>  		container_of(work, struct blk_zone_wplug, bio_work);
>  
> -	blk_zone_submit_one_bio(zwplug);
> +	while (blk_zone_submit_one_bio(zwplug))
> +		;

So patch 6 split created that blk_zone_submit_one_bio() function, but all you
do here is this change. I really do not see how that is better. Let's not do
that and simply have the loop expanded here to something like:

	do {
		...
	} while (disk->queue->limits.driver_preserves_write_order &&
		 !need_resched());

>  
>  	/* Drop the reference we took in disk_zone_wplug_schedule_bio_work(). */
>  	disk_put_zone_wplug(zwplug);
> @@ -1831,6 +1866,7 @@ static void queue_zone_wplug_show(struct blk_zone_wplug *zwplug,
>  	unsigned int zwp_zone_no, zwp_ref;
>  	unsigned int zwp_bio_list_size;
>  	unsigned long flags;
> +	int from_cpu;
>  
>  	spin_lock_irqsave(&zwplug->lock, flags);
>  	zwp_zone_no = zwplug->zone_no;
> @@ -1838,10 +1874,12 @@ static void queue_zone_wplug_show(struct blk_zone_wplug *zwplug,
>  	zwp_ref = refcount_read(&zwplug->ref);
>  	zwp_wp_offset = zwplug->wp_offset;
>  	zwp_bio_list_size = bio_list_size(&zwplug->bio_list);
> +	from_cpu = zwplug->from_cpu;
>  	spin_unlock_irqrestore(&zwplug->lock, flags);
>  
> -	seq_printf(m, "%u 0x%x %u %u %u\n", zwp_zone_no, zwp_flags, zwp_ref,
> -		   zwp_wp_offset, zwp_bio_list_size);
> +	seq_printf(m, "zone_no %u flags 0x%x ref %u wp_offset %u bio_list_size %u from_cpu %d\n",
> +		   zwp_zone_no, zwp_flags, zwp_ref, zwp_wp_offset,
> +		   zwp_bio_list_size, from_cpu);
>  }
>  
>  int queue_zone_wplugs_show(void *data, struct seq_file *m)


-- 
Damien Le Moal
Western Digital Research




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux