On Sun, Jul 13, 2025 at 12:05:18AM +0800, Hans Zhang wrote: > On 2025/7/12 17:35, Manivannan Sadhasivam wrote: > ... > > > IMO the "someday" goal should be that we get rid of aspm_policy > > > and enable all the available power saving states by default. We > > > have sysfs knobs that administrators can use if necessary, and > > > drivers or quirks can disable states if they need to work around > > > hardware defects. > > > > Yeah, I think the default should be powersave and let the users > > disable it for performance if they want. > > Perhaps I don't think so. At present, our company's testing team has > tested quite a few NVMe SSDS. As far as I can remember, the SSDS > from two companies have encountered problems and will hang directly > when turned on. We have set CONFIG_PCIEASPM_POWERSAVE=y by default. > When encountering SSDS from these two companies, we had to add > "pcie_aspm.policy=default" in the cmdline, and then the boot worked > normally. Currently, we do not have a PCIe protocol analyzer to > analyze such issues. The current approach is to modify the cmdline. > So I can't prove whether it's a problem with the Root Port of our > SOC or the SSD device. Have you reported these? > Here I agree with Bjorn's statement that sometimes the EP is not > necessarily very standard and there are no hardware issues. > Personally, I think the default is default or performance. When > users need to save power, they should then decide whether to > configure it as powersave or powersupersave. Sometimes, if the EP > device connected by the customer is perfect, they can turn it on to > save power. But if the EP is not perfect, at least they will > immediately know what caused the problem. We should discover device defects as early as possible so we can add quirks for them. Defaulting to ASPM being partly disabled means it gets much less testing and users end up passing around "fixes" like booting with "pcie_aspm.policy=default" or similar. I do not want users to trip over a device that doesn't work and have to look for workarounds on the web. I also think it's somewhat irresponsible of us to consume more power than necessary. But as Mani said, this would be a big change and might have to be done with a BIOS date check or something to try to avoid regressions. Bjorn