Re: [PATCH 2/3] ublk: add feature UBLK_F_QUIESCE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ming,

So I know it's a radical situation but my only concern is that:

0) On our application timeout of IO may be set to few minutes as it is goes over the network.
1) Let's assume we have 1 queue with QD=1.
2) the Only IO is in the userspace application but as we send the IOs over the network it may be stuck due to connectivy issues.
3) User trying to upgrade/stop the application so we issue Quiesce_DEV with infinite timeout as we want to ensure it works.
4) We are stuck now until network connection will recover or Our datapath will somehow Issue the COMMIT_AND_FETCH back to to the kernel so it we can get the ABORT later and QUICESE_DEV can finish.

Problem is that I don't want to wait for this IO until recovery but on the other end I don't want to complete the IO with error to the user.
So on this case I guess we can abort the application or something but maybe it will be cleaner that on Quiesce_DEV will need to issue another SQE per queue or something so we can notify the application this way about it.

Anyway also on our case it will be super rare to happen where there is a queue without an idle operation + network is currently down but we try to be complete as possible.
What do you think?

Thank you

________________________________________
From: Ming Lei <ming.lei@xxxxxxxxxx>
Sent: Thursday, June 12, 2025 12:15 PM
To: Yoav Cohen
Cc: Jens Axboe; linux-block@xxxxxxxxxxxxxxx; Uday Shankar; Caleb Sander Mateos
Subject: Re: [PATCH 2/3] ublk: add feature UBLK_F_QUIESCE

External email: Use caution opening links or attachments


On Thu, Jun 12, 2025 at 08:17:49AM +0000, Yoav Cohen wrote:
> Hi Ming,
>
> Thank you very much, I managed to integrate the feature to our application and it seems to work perfectly fine during our update tests.
> Just a double check: when UBLK_F_USER_RECOVERY & UBLK_F_USER_RECOVERY_REISSUE
> and QUIECE_DEV was called - does any IO that will be completed using COMMIT_AND_FETCH with a failure (i.e result=-EIO) will be retry after the recovery stage?
>

UBLK_F_USER_RECOVERY_REISSUE supposes all inflight IOs are failed, so
these IOs will be re-delivered to ublk server after recovering to LIVE.

So you needn't to complete the IOs with -EIO for retrying them during
recovery, in short:

- if ublk server handles inflight IOs, complete them by sending COMMIT_AND_FETCH
with result before closing '/dev/ublkcN', and these IOs will not be re-issued
by driver after recovery

- otherwise, just ignore & discard inflight IOs, they all will be
re-issued by driver after recovery via UBLK_F_USER_RECOVERY_REISSUE.


Thanks,
Ming






[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux