On Mon, 16 Jun 2025 at 13:47, Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote: > > On Mon, Jun 16, 2025 at 1:37 PM Claudiu Beznea <claudiu.beznea@xxxxxxxxx> wrote: > > > > > > > > On 16.06.2025 14:18, Rafael J. Wysocki wrote: > > > On Mon, Jun 16, 2025 at 11:37 AM Claudiu Beznea > > > <claudiu.beznea@xxxxxxxxx> wrote: > > >> > > >> Hi, Rafael, > > >> > > >> On 13.06.2025 13:02, Rafael J. Wysocki wrote: > > >>> On Fri, Jun 13, 2025 at 9:39 AM Claudiu Beznea <claudiu.beznea@xxxxxxxxx> wrote: > > >>>> > > >>>> Hi, Rafael, > > >>>> > > >>>> On 09.06.2025 22:59, Rafael J. Wysocki wrote: > > >>>>> On Sat, Jun 7, 2025 at 3:06 PM Jonathan Cameron <jic23@xxxxxxxxxx> wrote: > > >>>>>> > > >>>>>> On Fri, 6 Jun 2025 22:01:52 +0200 > > >>>>>> "Rafael J. Wysocki" <rafael@xxxxxxxxxx> wrote: > > >>>>>> > > >>>>>> Hi Rafael, > > >>>>>> > > >>>>>>> On Fri, Jun 6, 2025 at 8:55 PM Dmitry Torokhov > > >>>>>>> <dmitry.torokhov@xxxxxxxxx> wrote: > > >>>>>>>> > > >>>>>>>> On Fri, Jun 06, 2025 at 06:00:34PM +0200, Rafael J. Wysocki wrote: > > >>>>>>>>> On Fri, Jun 6, 2025 at 1:18 PM Claudiu <claudiu.beznea@xxxxxxxxx> wrote: > > >>>>>>>>>> > > >>>>>>>>>> From: Claudiu Beznea <claudiu.beznea.uj@xxxxxxxxxxxxxx> > > >>>>>>>>>> > > >>>>>>>>>> The dev_pm_domain_attach() function is typically used in bus code alongside > > >>>>>>>>>> dev_pm_domain_detach(), often following patterns like: > > >>>>>>>>>> > > >>>>>>>>>> static int bus_probe(struct device *_dev) > > >>>>>>>>>> { > > >>>>>>>>>> struct bus_driver *drv = to_bus_driver(dev->driver); > > >>>>>>>>>> struct bus_device *dev = to_bus_device(_dev); > > >>>>>>>>>> int ret; > > >>>>>>>>>> > > >>>>>>>>>> // ... > > >>>>>>>>>> > > >>>>>>>>>> ret = dev_pm_domain_attach(_dev, true); > > >>>>>>>>>> if (ret) > > >>>>>>>>>> return ret; > > >>>>>>>>>> > > >>>>>>>>>> if (drv->probe) > > >>>>>>>>>> ret = drv->probe(dev); > > >>>>>>>>>> > > >>>>>>>>>> // ... > > >>>>>>>>>> } > > >>>>>>>>>> > > >>>>>>>>>> static void bus_remove(struct device *_dev) > > >>>>>>>>>> { > > >>>>>>>>>> struct bus_driver *drv = to_bus_driver(dev->driver); > > >>>>>>>>>> struct bus_device *dev = to_bus_device(_dev); > > >>>>>>>>>> > > >>>>>>>>>> if (drv->remove) > > >>>>>>>>>> drv->remove(dev); > > >>>>>>>>>> dev_pm_domain_detach(_dev); > > >>>>>>>>>> } > > >>>>>>>>>> > > >>>>>>>>>> When the driver's probe function uses devres-managed resources that depend > > >>>>>>>>>> on the power domain state, those resources are released later during > > >>>>>>>>>> device_unbind_cleanup(). > > >>>>>>>>>> > > >>>>>>>>>> Releasing devres-managed resources that depend on the power domain state > > >>>>>>>>>> after detaching the device from its PM domain can cause failures. > > >>>>>>>>>> > > >>>>>>>>>> For example, if the driver uses devm_pm_runtime_enable() in its probe > > >>>>>>>>>> function, and the device's clocks are managed by the PM domain, then > > >>>>>>>>>> during removal the runtime PM is disabled in device_unbind_cleanup() after > > >>>>>>>>>> the clocks have been removed from the PM domain. It may happen that the > > >>>>>>>>>> devm_pm_runtime_enable() action causes the device to be runtime-resumed. > > >>>>>>>>> > > >>>>>>>>> Don't use devm_pm_runtime_enable() then. > > >>>>>>>> > > >>>>>>>> What about other devm_ APIs? Are you suggesting that platform drivers > > >>>>>>>> should not be using devm_clk*(), devm_regulator_*(), > > >>>>>>>> devm_request_*_irq() and devm_add_action_or_reset()? Because again, > > >>>>>>>> dev_pm_domain_detach() that is called by platform bus_remove() may shut > > >>>>>>>> off the device too early, before cleanup code has a chance to execute > > >>>>>>>> proper cleanup. > > >>>>>>>> > > >>>>>>>> The issue is not limited to runtime PM. > > >>>>>>>> > > >>>>>>>>> > > >>>>>>>>>> If the driver specific runtime PM APIs access registers directly, this > > >>>>>>>>>> will lead to accessing device registers without clocks being enabled. > > >>>>>>>>>> Similar issues may occur with other devres actions that access device > > >>>>>>>>>> registers. > > >>>>>>>>>> > > >>>>>>>>>> Add devm_pm_domain_attach(). When replacing the dev_pm_domain_attach() and > > >>>>>>>>>> dev_pm_domain_detach() in bus probe and bus remove, it ensures that the > > >>>>>>>>>> device is detached from its PM domain in device_unbind_cleanup(), only > > >>>>>>>>>> after all driver's devres-managed resources have been release. > > >>>>>>>>>> > > >>>>>>>>>> For flexibility, the implemented devm_pm_domain_attach() has 2 state > > >>>>>>>>>> arguments, one for the domain state on attach, one for the domain state on > > >>>>>>>>>> detach. > > >>>>>>>>> > > >>>>>>>>> dev_pm_domain_attach() is not part driver API and I'm not convinced at > > >>>>>>>> > > >>>>>>>> Is the concern that devm_pm_domain_attach() will be [ab]used by drivers? > > >>>>>>> > > >>>>>>> Yes, among other things. > > >>>>>> > > >>>>>> Maybe naming could make abuse at least obvious to spot? e.g. > > >>>>>> pm_domain_attach_with_devm_release() > > >>>>> > > >>>>> If I'm not mistaken, it is not even necessary to use devres for this. > > >>>>> > > >>>>> You might as well add a dev_pm_domain_detach() call to > > >>>>> device_unbind_cleanup() after devres_release_all(). There is a slight > > >>>>> complication related to the second argument of it, but I suppose that > > >>>>> this can be determined at the attach time and stored in a new device > > >>>>> PM flag, or similar. > > >>>>> > > >>>> > > >>>> I looked into this solution. I've tested it for all my failure cases and > > >>>> went good. > > >>> > > >>> OK > > >>> > > >>>>> Note that dev->pm_domain is expected to be cleared by ->detach(), so > > >>>>> this should not cause the domain to be detached twice in a row from > > >>>>> the same device, but that needs to be double-checked. > > >>>> > > >>>> The genpd_dev_pm_detach() calls genpd_remove_device() -> > > >>>> dev_pm_domain_set(dev, NULL) which sets the dev->pm_domain = NULL. I can't > > >>>> find any other detach function in the current code base. > > >>> > > >>> There is also acpi_dev_pm_detach() which can be somewhat hard to find, > > >>> but it calls dev_pm_domain_set(dev, NULL) either. > > >>> > > >>>> The code I've tested for this solution is this one: > > >>>> > > >>>> diff --git a/drivers/base/dd.c b/drivers/base/dd.c > > >>>> index b526e0e0f52d..5e9750d007b4 100644 > > >>>> --- a/drivers/base/dd.c > > >>>> +++ b/drivers/base/dd.c > > >>>> @@ -25,6 +25,7 @@ > > >>>> #include <linux/kthread.h> > > >>>> #include <linux/wait.h> > > >>>> #include <linux/async.h> > > >>>> +#include <linux/pm_domain.h> > > >>>> #include <linux/pm_runtime.h> > > >>>> #include <linux/pinctrl/devinfo.h> > > >>>> #include <linux/slab.h> > > >>>> @@ -552,8 +553,11 @@ static void device_unbind_cleanup(struct device *dev) > > >>>> dev->dma_range_map = NULL; > > >>>> device_set_driver(dev, NULL); > > >>>> dev_set_drvdata(dev, NULL); > > >>>> - if (dev->pm_domain && dev->pm_domain->dismiss) > > >>>> - dev->pm_domain->dismiss(dev); > > >>>> + if (dev->pm_domain) { > > >>>> + if (dev->pm_domain->dismiss) > > >>>> + dev->pm_domain->dismiss(dev); > > >>>> + dev_pm_domain_detach(dev, dev->pm_domain->detach_power_off); > > >>> > > >>> I would do the "detach" before the "dismiss" to retain the current ordering. > > >> > > >> I applied on my local development branch all your suggestions except this > > >> one because genpd_dev_pm_detach() as well as acpi_dev_pm_detach() set > > >> dev->pm_domain = NULL. > > >> > > >> Due to this I would call first ->dismiss() then ->detach(), as initially > > >> proposed. Please let me know if you consider it otherwise. > > > > > > This is a matter of adding one more dev->pm_domain check AFAICS, but OK. > > > > I don't know all the subtleties around this, my concern with adding one > > more check on dev->pm_domain was that the > > dev->pm_domain->dismiss() will never be called if the ->detach() function > > will be called before ->dismiss() and it will set dev->pm_domain = NULL (as > > it does today (though genpd_dev_pm_detach() and acpi_dev_pm_detach())). > > > > For platform drivers that used to call dev_pm_domain_detach() in platform > > bus remove function, if I'm not wrong, the dev->pm_domain->dismiss() was > > never called previously. If that is a valid scenario, the code proposed in > > this thread will change the behavior for devices that have ->dismiss() > > implemented. > > ->dismiss() and ->detach() are supposed to be mutually exclusive, so > this should not be a problem either way and in practice so far the > only user of ->dismiss() has been acpi_lpss_pm_domain which doesn't do > ->detach() at all. May I ask if you know if there remains any real good reasons to keep the ->dismiss() callback around? Can't acpi_lpss_pm_domain() convert to use the ->detach() callback instead? Just thinking that it would be easier, but maybe it doesn't work. Kind regards Uffe