From: Leon Romanovsky <leon@xxxxxxxxxx> Date: Thu, 10 Apr 2025 16:58:43 +0300 > On Thu, Apr 10, 2025 at 03:33:40PM +0200, Alexander Lobakin wrote: >> From: Leon Romanovsky <leon@xxxxxxxxxx> >> Date: Thu, 10 Apr 2025 16:27:06 +0300 >> >>> On Thu, Apr 10, 2025 at 02:58:28PM +0200, Larysa Zaremba wrote: >>>> On Thu, Apr 10, 2025 at 02:23:49PM +0300, Leon Romanovsky wrote: >>>>> On Thu, Apr 10, 2025 at 12:44:33PM +0200, Larysa Zaremba wrote: >>>>>> On Thu, Apr 10, 2025 at 11:21:37AM +0300, Leon Romanovsky wrote: >>>>>>> On Tue, Apr 08, 2025 at 02:47:51PM +0200, Larysa Zaremba wrote: >>>>>>>> From: Phani R Burra <phani.r.burra@xxxxxxxxx> >>>>>>>> >>>>>>>> Libeth will now support control queue setup and configuration APIs. >>>>>>>> These are mainly used for mailbox communication between drivers and >>>>>>>> control plane. >>>>>>>> >>>>>>>> Make use of the page pool support for managing controlq buffers. >>>>>>> >>>>>>> <...> >>>>>>> >>>>>>>> libeth-y := rx.o >>>>>>>> >>>>>>>> +obj-$(CONFIG_LIBETH_CP) += libeth_cp.o >>>>>>>> + >>>>>>>> +libeth_cp-y := controlq.o >>>>>>> >>>>>>> So why did you create separate module for it? >>>>>>> Now you have pci -> libeth -> libeth_cp -> ixd, with the potential races between ixd and libeth, am I right? >>>>>>> >>>>>> >>>>>> I am not sure what kind of races do you mean, all libeth modules themselves are >>>>>> stateless and will stay this way [0], all used data is owned by drivers. >>>>> >>>>> Somehow such separation doesn't truly work. There are multiple syzkaller >>>>> reports per-cycle where module A tries to access module C, which already >>>>> doesn't exist because it was proxied through module B. >>>> >>>> Are there similar reports for libeth and libie modules when iavf is enabled? >>> >>> To get such report, syzkaller should run on physical iavf, it looks like it doesn't. >>> Did I miss it here? >>> https://syzkaller.appspot.com/upstream/s/net >>> >>>> It is basically the same hierarchy. (iavf uses both libeth and libie, libie >>>> depends on libeth). >>>> >>>> I am just trying to understand, is this a regular situation or did I just mess >>>> smth up? >>> >>> My review comment was general one. It is almost impossible to review >>> this newly proposed architecture split for correctness. >>> >>>> >>>>> >>>>>> >>>>>> As for the module separation, I think there is no harm in keeping it modular. >>>>> >>>>> Syzkaller reports disagree with you. >>>>> >>>> >>>> Could you please share them? >>> >>> It is not an easy question to answer, because all these reports are complaining >>> about some wrong locking order or NULL-pointer access. You will never know if >>> it is because of programming or design error. >>> >>> As an approximate example, see commits a27c6f46dcec ("RDMA/bnxt_re: Fix an issue in bnxt_re_async_notifier") >>> and f0df225d12fc ("RDMA/bnxt_re: Add sanity checks on rdev validity"). >>> At the first glance, they look unrelated to our discussion, however >>> they can serve as an example or races between deinit/disable paths in >>> parent module vs. child. >> >> Unrelated. At first, you were talking about module dependencies, now >> you're talking about struct device etc dependencies, which is a >> completely different thing. >> >> As already said, libeth is stateless, so the latter one can't happen. >> The former one is impossible at all. As long as at least 1 child module >> is loaded, you can't unload the parent. And load/unload is serialized, >> see module core code. > > It is not only module load/unload. It is bind/unbind, devlink operations > e.t.c, everything that can cause to reload driver in module C. You still mix module deps vs struct device/driver deps. Atomic modules are not anyhow tied to the latter, esp. given that the whole libeth is stateless and is just a bunch of helpers. Thanks, Olek