Re: [PATCH] PCI/sysfs: Ensure devices are powered for config reads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 8/21/2025 10:56 AM, Brian Norris wrote:
On Thu, Aug 21, 2025 at 08:54:52AM +0800, Ethan Zhao wrote:
On 8/21/2025 1:26 AM, Brian Norris wrote:
From: Brian Norris <briannorris@xxxxxxxxxx>

max_link_speed, max_link_width, current_link_speed, current_link_width,
secondary_bus_number, and subordinate_bus_number all access config
registers, but they don't check the runtime PM state. If the device is
in D3cold, we may see -EINVAL or even bogus values.
My understanding, if your device is in D3cold, returning of -EINVAL is
the right behavior.

That's not the guaranteed result though. Some hosts don't properly
return PCIBIOS_DEVICE_NOT_FOUND, for one. But also, it's racy -- because
we don't even try to hold a pm_runtime reference, the device could
possibly enter D3cold while we're in the middle of reading from it. If
you're lucky, that'll get you a completion timeout and an all-1's
result, and we'll return a garbage result.

So if we want to purposely not resume the device and retain "I can't
give you what you asked for" behavior, we'd at least need a
pm_runtime_get_noresume() or similar.
I understand you just want the stable result of these caps, meanwhile
you don't want the side effect either.>
Wrap these access in pci_config_pm_runtime_{get,put}() like most of the
rest of the similar sysfs attributes.

Fixes: 56c1af4606f0 ("PCI: Add sysfs max_link_speed/width, current_link_speed/width, etc")
Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Brian Norris <briannorris@xxxxxxxxxx>
Signed-off-by: Brian Norris <briannorris@xxxxxxxxxxxx>
---

   drivers/pci/pci-sysfs.c | 32 +++++++++++++++++++++++++++++---
   1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 5eea14c1f7f5..160df897dc5e 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -191,9 +191,16 @@ static ssize_t max_link_speed_show(struct device *dev,
   				   struct device_attribute *attr, char *buf)
   {
   	struct pci_dev *pdev = to_pci_dev(dev);
+	ssize_t ret;
+
+	pci_config_pm_runtime_get(pdev);
This function would potentially change the power state of device,
that would be a complex process, beyond the meaning of
max_link_speed_show(), given the semantics of these functions (
max_link_speed_show()/max_link_width_show()/current_link_speed_show()/
....),
this cannot be done !

What makes this different than the 'config' attribute (i.e., "read
config register")? Why shouldn't that just return -EINVAL? I don't
really buy your reasoning -- "it's a complex process" is not a reason
It is a reason to know there is side effect to be taken into account.> not to do something. The user asked for the link speed; why not give it?
If the user wanted to know if the device was powered, they could check
the 'power_state' attribute instead.

(Side note: these attributes don't show up anywhere in Documentation/,
so it's also a bit hard to declare "best" semantics for them.)

To flip this question around a bit: if I have a system that aggressively
suspends devices when there's no recent activity, how am I supposed to
check what the link speed is? Probabilistically hammer the file while
hoping some other activity wakes the device, so I can find the small
windows of time where it's RPM_ACTIVE? Disable runtime_pm for the device
while I check?
Hold a PM reference by pci_config_pm_runtime_get() and then write some
data to the PCIe config space, no objection.

To know about the linkspeed etc capabilities/not status, how about
creating a cached version of these caps, no need to change their
power state.

If there is aggressive power saving requirement, and the polling
of these caps will make up wakeup/poweron bugs.


Thanks,
Ethan








Brian





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux