On 7/8/25 1:04 PM, Ming Lei wrote: > On Tue, Jul 08, 2025 at 10:42:00AM +0530, Nilay Shroff wrote: >> >> >> On 7/5/25 6:45 AM, Yu Kuai wrote: >>> Hi, >>> >>> 在 2025/07/02 15:30, Yu Kuai 写道: >>>> Hi, >>>> >>>> 在 2025/07/02 14:22, Nilay Shroff 写道: >>>>> >>>>> >>>>> On 7/2/25 8:02 AM, Ming Lei wrote: >>>>>> On Wed, Jul 02, 2025 at 09:12:09AM +0800, Yu Kuai wrote: >>>>>>> Hi, >>>>>>> >>>>>>> 在 2025/07/01 21:28, Nilay Shroff 写道: >>>>>>>> >>>>>>>> >>>>>>>> On 6/28/25 6:18 AM, Yu Kuai wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> 在 2025/06/27 19:04, Ming Lei 写道: >>>>>>>>>> I guess the patch in the following link may be simper, both two take >>>>>>>>>> similar approach: >>>>>>>>>> >>>>>>>>>> https://lore.kernel.org/linux-block/aFjbavzLAFO0Q7n1@fedora/ >>>>>>>>> >>>>>>>>> I this the above approach has concurrent problems if nbd_start_device >>>>>>>>> concurrent with nbd_start_device: >>>>>>>>> >>>>>>>>> t1: >>>>>>>>> nbd_start_device >>>>>>>>> lock >>>>>>>>> num_connections = 1 >>>>>>>>> unlock >>>>>>>>> t2: >>>>>>>>> nbd_add_socket >>>>>>>>> lock >>>>>>>>> config->num_connections++ >>>>>>>>> unlock >>>>>>>>> t3: >>>>>>>>> nbd_start_device >>>>>>>>> lock >>>>>>>>> num_connections = 2 >>>>>>>>> unlock >>>>>>>>> blk_mq_update_nr_hw_queues >>>>>>>>> >>>>>>>>> blk_mq_update_nr_hw_queues >>>>>>>>> //nr_hw_queues updated to 1 before failure >>>>>>>>> return -EINVAL >>>>>>>>> >>>>>>>> >>>>>>>> In the above case, yes I see that t1 would return -EINVAL (as >>>>>>>> config->num_connections doesn't match with num_connections) >>>>>>>> but then t3 would succeed to update nr_hw_queue (as both >>>>>>>> config->num_connections and num_connections set to 2 this >>>>>>>> time). Isn't it? If yes, then the above patch (from Ming) >>>>>>>> seems good. >>>>>>> >>>>>>> Emm, I'm confused, If you agree with the concurrent process, then >>>>>>> t3 update nr_hw_queues to 2 first and return sucess, later t1 update >>>>>>> nr_hw_queues back to 1 and return failure. >>>>>> >>>>>> It should be easy to avoid failure by simple retrying. >>>>>> >>>>> Yeah I think retry should be a safe bet here. >>>>> >>>> >>>> I really not sure about the retry, the above is just a scenario that I >>>> think of with a quick review, and there are still many concurrent >>>> scenarios that need to be checked, I'm kind of lost here. >>>> >>>> Except nbd_start_device() and nbd_add_socked(), I'm not confident >>>> other context that is synchronized with config_lock is not broken. >>>> However, I'm ok with the bet. >>>> >>>>> On another note, synchronizing nbd_start_device and nbd_add_socket >>>>> using nbd->task_setup looks more complex and rather we may use >>>>> nbd->pid to synchronize both. We need to move setting of nbd->pid >>>>> before we invoke blk_mq_update_nr_hw_queues in nbd_start_device. >>>>> Then in nbd_add_socket we can evaluate nbd->pid and if it's >>>>> non-NULL then we could assume that either nr_hw_queues update is in >>>>> progress or device has been setup and so return -EBUSY. I think >>>>> anyways updating number of connections once device is configured >>>>> would not be possible, so once nbd_start_device is initiated, we >>>>> shall prevent user adding more connections. If we follow this >>>>> approach then IMO we don't need to add retry discussed above. >>>> >>>> It's ok for me to forbit nbd_add_socked after nbd is configured, there >>>> is nowhere to use the added sock. And if there really are other contexts >>>> need to be synchronized, I think nbd->pid can be used as well. >>>> >>> >>> Do we have a conclusion now? Feel free to send the retry version, or let >>> me know if I should send a new synchronize version. >>> >> Personally, I prefer synchronizing nbd_start_device and nbd_add_socket >> using nbd->pid but I do agree retry version would also work. Having >> said that, lets wait for Ming's feedback as well. > > IMO simpler fix, especially in concept, is more effective for addressing > this kind of issue, with less chance to introduce regression. > > If no one objects, I may send the retry version. > Adding new socket/connection, after nr_hw_queue update is initiated or device is setup, is of no use. But I am also good with retry version. So please send your patch. Thanks, --Nilay