Re: [PATCH] ACPI: EC: Set ec_no_wakeup for Lenovo Go S

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/1/2025 5:06 PM, Antheas Kapenekakis wrote:
On Tue, 1 Apr 2025 at 22:54, Mario Limonciello <superm1@xxxxxxxxxx> wrote:

On 4/1/2025 1:39 PM, Antheas Kapenekakis wrote:
On Tue, 1 Apr 2025 at 17:24, Mario Limonciello <superm1@xxxxxxxxxx> wrote:

On 4/1/2025 10:03 AM, Antheas Kapenekakis wrote:
On Tue, 1 Apr 2025 at 16:09, Mario Limonciello <superm1@xxxxxxxxxx> wrote:

On 4/1/2025 7:45 AM, Antheas Kapenekakis wrote:
On Tue, 1 Apr 2025 at 14:30, Mario Limonciello <superm1@xxxxxxxxxx> wrote:

Here are tags for linking to your patch development to be picked up.

Link:
https://github.com/bazzite-org/patchwork/commit/95b93b2852718ee1e808c72e6b1836da4a95fc63
Co-developed-by: Antheas Kapenekakis <lkml@xxxxxxxxxxx>
Signed-off-by: Antheas Kapenekakis <lkml@xxxxxxxxxxx>


I don't believe that b4 will pick these up, so I will send out a v2 with
them and mark this patch as superceded in patchwork so that Rafael
doesn't have to pull everything out of this thread manually.

FTR I don't have permission on patchwork for linux-acpi.

I sent out v2 though.



And to avoid having this conversation again, there is another Legion
Go S [3] patch you nacked and froze the testing for, so you could go
on the manhunt for the real cause of this one. But it will probably be
needed and you will find that as you get TDP controls going. So if you
want me to prepare that in a timely manner, because that one actually
needs rewriting to be posted, now is the time to say so.

Can you please propose what you have in mind on the mailing lists to
discuss?  It's relatively expensive (in the unit of tech debt) to add
quirk infrastructure and so we need to make sure it is the right solution.

Derek is working on CPU coefficient tuning in a completely separate
driver.  If there are issues with that, I would generally prefer the
fixes to be in that driver.

CPU coefficient tuning? If you mean the lenovo-wmi-driver, yes I will
try to make sure the quirk can be potentially added there, or in any
driver*.

Yes things like fPPT, sPPT, STAPM, STT limits.


The idea is to rewrite the patch series to just add a simple delay
field on the s2idle quirk struct. Then the biggest delay wins and gets
placed in ->begin. We have been using that series for ~6 months now,
and it turns out that having a delay system for every call is quite
pointless. But there are also situations where you might have a device
such as the Z13 Folio which looks like a USB device but listens to
s2idle notifications through ACPI, so the hid subsystem might need to
be able to inject a small delay there.

So the "general" problem with injecting delays is they are typically not
scalable as they're usually empirically measured and there is no
handshake with the firmware.

Say for example the EC has some hardcoded value of 200ms to wait for
something.  IIRC the Linux timer infrastructure can be off by ~13%.  So
if you put 175ms it might work sometimes.  You get some reports of this,
so you extend it to 200ms.  Great it works 100% of the time because the
old hardcoded value in the EC was 200ms.

Now say a new EC firmware comes out that for $REASONS changes it to
250ms.  Your old empirically measured value stops working, spend a bunch
of cycles debugging it, measure the new one.  You change it to 250ms,
but people with the old one have a problem now because the timing changed.

So now you have to add infrastructure to say what version of the
firmware gets what delay.

Then you find out there is another SKU of that model which needs a
different delay, so your complexity has ballooned.

What if all these "delays" were FW timeouts from failing to service an
interrupt?  Or what if they were a flow problem like the device expected
you to issue a STOP command before a RESET command?

So we need to be /incredibly careful/ with delays and 100% they are the
right answer to a problem.

I do get your points. In this case though we sideskirt through a lot
of the points because of where the delay is placed.

If the instrumentation is in-place, this delay happens before sleep
after the screen of the device has turned off (due to early DPMS), the
keyboard backlight has turned off (DIsplay off call), and the suspend
light pulses (Sleep Entry). So it does not affect device behavior and
you can be quite liberal. The user has left the device alone.

If the device needs e.g., 250ms you will not put 250ms, you will put
500ms. Still unsure, you bump it to 750ms. Also, even if the
manufacturer comes up with a new firmware that fixes this issue, you
can keep the delay for the life of the product, because keeping it
does not affect device behavior, and writing kernel patches takes time.

This is how I think about it, at least. A universal delay might be
needed eventually. But for now, limiting the scope to some devices and
seeing how that goes should be enough.

Antheas

My take is that "universal" delays are never popular.  IE hardware that
"previously" worked perfectly is now slower.  So if there /must/ be a
delay it should be as narrow as possible and justified.

Let me give you an example of another case I'm *actively considering* a
delay.

I have an OEM's system that if you enter and exit s0i3 too quickly you
can trigger the over voltage protection (OVP) feature of the VR module.
When OVP is tripped the system is forced off immediately. This *only
happens* on the VR module in that vendor's systems. "Normal" Linux
userspace suspend/resume can't trip it.  But connecting a dock "does"
trip it.

If you look on a scope you can see SLP_S3# pin is toggling faster than
spec says it should.  Naïvely you would say well the easy solution is to
add a delay somewhere so that SLP_S3# stays in spec.  I have a patch
that does just that.

diff --git a/drivers/platform/x86/amd/pmc/pmc.c
b/drivers/platform/x86/amd/pmc/pmc.c
index e6124498b195f..97387ddb281e1 100644
--- a/drivers/platform/x86/amd/pmc/pmc.c
+++ b/drivers/platform/x86/amd/pmc/pmc.c
@@ -724,10 +724,20 @@ static void amd_pmc_s2idle_check(void)
           struct smu_metrics table;
           int rc;

-       /* CZN: Ensure that future s0i3 entry attempts at least 10ms
passed */
-       if (pdev->cpu_id == AMD_CPU_ID_CZN && !get_metrics_table(pdev,
&table) &&
-           table.s0i3_last_entry_status)
-               usleep_range(10000, 20000);
+       if (!get_metrics_table(pdev, &table) &&
table.s0i3_last_entry_status) {
+               switch (pdev->cpu_id) {
+               /* CZN: Ensure that future s0i3 entry attempts at least
10ms passed */
+               case AMD_CPU_ID_CZN:
+                       usleep_range(10000, 20000);
+                       break;
+               /* PHX/HPT: Ensure enough time to avoid VR OVP */
+               case AMD_CPU_ID_PS:
+                       msleep(2500);
+                       break;
+               default:
+                       break;
+               }
+       }

This stops all the failures, but it also has an impact that any time the
EC SCI is raised for any reason (like plug in power adapter) the system
will take 2.5s to go back into s0i3.

Digging further - the intended behavior by the EC and BIOS was to wake
the system when the dock is connected.

That is the reason this happens is because the EC SCI is raised when the
dock is connected, but the Notify() the EC sent wasn't received by any
driver.  I've got a patch I'll be sending out soon that adds support for
the correct driver to wake up on this event.

This prevents the case of the OVP and now we don't *need* to penalize
everyone to wait 2.5s after EC SCI events and going back to s0i3.  If I
find out there are other ways to trip the problem I still have that
option though.

So you are talking about missing the AC/DC burst feature of Windows
here right? I do agree with you that yeah for most devices it is not
necessary.

No; I wasn't talking about that, my point was that timing delays are a
tempting to solution to a problem, but they're very often papering over
something else and a hint to dive deeper.

What I gleaned from what you said is that X manufacturer has a problem
due to missing AC/DC bursts in linux, where all AC/DC burst is is a 5s
delay.

The intended behavior of AC/DC bursts is to fully wake up the kernel
for 5 seconds, and then sleep again. In windows, if a power supply is
connected, userspace wakes up too, and then the Windows power manager
sleeps the system again if there is no user activity for 5 seconds.
However, this should not affect device drivers, so we may consider it
optional on the Linux side until DEs get support for it and enable it
themselves I would say.

So in effect, AC/DC bursts are Windows' solution to problems like the
one you faced.

I am not saying penalize everyone. If I do make a patch for AC/DC it
will be device specific. But after a point, if random devices start
getting issues and the quirk list starts to grow, it might become
inevitable to force it for all of them.

I do get what you are saying with delays though. We had to merge one
of the initial SOF delay patch variants for the Steam Deck which
prevents audio crashing on resume, and that was definitely a bandage.


Maybe I'm failing at my search-engine-foo, could you point me at some docs about this AC/DC burst stuff?

[
And FTR unfortunately it's seeming that my proposal for the alternate wake source has negative side effects to other machines, so I'm going back to exploring a timing based quirk tied to SMBIOS data again too :( ]


But Microsoft guarantees 5 seconds. We already have the original Ally
unit which gets stuck in prochot due to this so it would be nice to
fix. For the Ally X I am unsure what Asus did but it stays awake for a
nice three seconds after you plug/unplug the charger so it has no
issues.

So if devices keep getting issues like we will have to eat it and do
AC/DC bursts with all of them.

And it is the same with entering s0i3 too fast. Some devices just need
a tiny amount of time to do whatever it is their manufacturer
programmed them to do after the Modern Standby notifications. For
handhelds, it is to turn off the controllers because XInput. Asus put
the fade animation so that takes 300ms and if you do it earlier the
controller gets cut before it saves its state and starts to do weird
RGB stuff. Other manufacturers typically do not malfunction but they
still use the notification.

Only MSI does not, but that controller is quirky before/after sleep
and they released a firmware update today saying something about
controller S3/S4 improvements so they probably do that too now, I need
to check.

For the Go S, it sets itself to 5W after sleep entry and turns off the
fan. A little delay went a long way in fixing the hang there, which I
suspect is due to aggressive tuning. But I do not know if you guys get
that. We did when we did the initial testing for it and carry the
delay now so I cannot tell you either way. So you should max out the
TDP, run stress -c 16, and make the device sleep 100-200 times to make
sure that is not an issue.

I do have a plan for trying to rework AC/DC bursts, but first the
s2idle ordering needs to be fixed and I need to rewrite the series for
that. The series we have for that works _fine_ so it is not a priority
to rework but it is not upstreamable in its current state so if you
need that (for the Go S) I need to know now.

For ACDC my idea would be after the reordering is done to have a quirk
that makes the kernel resume, fire the sleep exit notification, loop
for 5 (maybe 3?) seconds inside the device suspend section prior to
userspace resume, and then as long as a wakeup did not arrive restart
the suspend sequence to sleep again. I would also combine that with
the little s2idle wakeup device you made, so that userspace can enable
wakeups for that if it wants to do resume on dock connection. But that
has a lot of moving parts, including moving the DPMS action to happen
even earlier than your patch does and making sure display on/off does
not fire so that the keyboard backlight does not do weird stuff.

Antheas

I think a good start for what you're talking about would be to rebase
your series that reworked s2idle flow on 6.15 code (maybe it's a clean
rebase, idk) and then if/when all of us on LKML are happy with it we can
layer other concepts on top of that.

Yeah, I will try to do that. However, I have around 30 submitted
patches in the air right now, and we are about to add another 5-6 to
the list for the Claw. So it will probably be after a bunch of those
merge. For the interest of sustainability, if nothing else. So let's
put a dot on this and pick up the discussion again mid 6.16 in a month
or so.

Unless you need this series for the Go S, in which case I can try to
re-order stuff around. So, one of you should use the red light TDP
mode with an artificial load (or actual, such as a game) and see if
sleep works properly. Do that on battery.

Take your time and get to it when you get to it. I just want to make sure we build a clean foundation for big changes like you have in mind.


I would do at least 100 suspends with this, as most users do 50-70
suspends per reboot. I think I did around 300 to validate the Go S
quirk.

Antheas





[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]
  Powered by Linux