On Tue, Jul 01, 2025 at 03:48:41PM +0800, Zigit Zo wrote: > On 6/30/25 10:54 PM, Michael S. Tsirkin wrote: > > On Mon, Jun 30, 2025 at 10:50:55AM -0400, Michael S. Tsirkin wrote: > >> On Mon, Jun 30, 2025 at 05:51:09PM +0800, Zigit Zo wrote: > >>> This bug happens if the VMM sends a VIRTIO_NET_S_ANNOUNCE request while > >>> the virtio-net driver is still probing with rtnl_lock() hold, this will > >>> cause a recursive mutex in netdev_notify_peers(). > >>> > >>> Fix it by skip acking the annouce in virtnet_config_changed_work() when > >>> probing. The annouce will still get done when ndo_open() enables the > >>> virtio_config_driver_enable(). > >> > >> I am not so sure it will be - while driver is not loaded, device does > >> not have to send interrupts, and there's no rule I'm aware of that says > >> we'll get one after DRIVER_OK. > > Yep, at first we're thinking that when the VIRTIO_NET_S_ANNOUNCE flag set, > we can always assure an interrupt has fired by VMM, to notify the driver > to do the announcement. > > But later we realized that the S_ANNOUNCE flag can be sent before the > driver's probing, and for QEMU seems to set the status flag regardless of > whether driver is ready, so the problem you're talking still may happens. > >> How about, we instead just schedule the work to do it later?I'm not sure if scheduling the work later will break df28de7b0050, the work > was being scheduled before that commit, and we have no much idea of why that > commit removes the schedule_work, we just keep it for safe... Well managing async things is always tricky. Direct call is safer. If you reintroduce it, you need to audit all call paths for safely. > Then, for plan A, we change the check_announce to schedule_announce, and if > that's true, we do another schedule_work to call virtnet_config_changed_work > again to finish the announcement, like > > if (v & VIRTIO_NET_S_ANNOUNCE) { > if (unlikely(schedule_announce)) > schedule_work(&vi->config_work); > else { > netdev_notify_peers(vi->dev); > virtnet_ack_link_announce(vi); > } > } > > >> > >> Also, there is another bug here. > >> If ndo_open did not run, we actually should not send any announcements. > >> > >> Do we care if carrier on is set on probe or on open? > >> If not, let's just defer this to ndo_open? > > > > Hmm yes I think we do, device is visible to userspace is it not? > > > > Hmm. We can keep the announce bit set in vi->status and on open, check > > it and then schedule a work to do the announcement. > > Okay, so there's a plan B, we save the bit and re-check it in ndo_open, like > > /* __virtnet_config_changed_work() */ > if (v & VIRTIO_NET_S_ANNOUNCE) { > vi->status |= VIRTIO_NET_S_ANNOUNCE; > if (unlikely(!check_announce)) > goto check_link; > > netdev_notify_peers(vi->dev); > virtnet_ack_link_announce(vi); > vi->status &= ~VIRTIO_NET_S_ANNOUNCE; > } > > /* virtnet_open() */ > if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) { > if (vi->status & VIRTIO_NET_S_LINK_UP) > netif_carrier_on(vi->dev); > if > if (vi->status & VIRTIO_NET_S_ANNOUNCE) > schedule_work(&vi->config_work); > virtio_config_driver_enable(vi->vdev); > } > > This is a dirty demo, any ideas are welcomed :) > > (I think in virtnet_open() we can make the S_LINK_UP being scheduled as well?)