Re: [PATCH] ACPI: EC: Set ec_no_wakeup for Lenovo Go S

Mario Limonciello <superm1@xxxxxxxxxx> · Tue, 1 Apr 2025 10:24:09 -0500

On 4/1/2025 10:03 AM, Antheas Kapenekakis wrote:
On Tue, 1 Apr 2025 at 16:09, Mario Limonciello <superm1@xxxxxxxxxx> wrote:

On 4/1/2025 7:45 AM, Antheas Kapenekakis wrote:
On Tue, 1 Apr 2025 at 14:30, Mario Limonciello <superm1@xxxxxxxxxx> wrote:

Here are tags for linking to your patch development to be picked up.

Link:
https://github.com/bazzite-org/patchwork/commit/95b93b2852718ee1e808c72e6b1836da4a95fc63
Co-developed-by: Antheas Kapenekakis <lkml@xxxxxxxxxxx>
Signed-off-by: Antheas Kapenekakis <lkml@xxxxxxxxxxx>


I don't believe that b4 will pick these up, so I will send out a v2 with
them and mark this patch as superceded in patchwork so that Rafael
doesn't have to pull everything out of this thread manually.

FTR I don't have permission on patchwork for linux-acpi.

I sent out v2 though.



And to avoid having this conversation again, there is another Legion
Go S [3] patch you nacked and froze the testing for, so you could go
on the manhunt for the real cause of this one. But it will probably be
needed and you will find that as you get TDP controls going. So if you
want me to prepare that in a timely manner, because that one actually
needs rewriting to be posted, now is the time to say so.

Can you please propose what you have in mind on the mailing lists to
discuss?  It's relatively expensive (in the unit of tech debt) to add
quirk infrastructure and so we need to make sure it is the right solution.

Derek is working on CPU coefficient tuning in a completely separate
driver.  If there are issues with that, I would generally prefer the
fixes to be in that driver.

CPU coefficient tuning? If you mean the lenovo-wmi-driver, yes I will
try to make sure the quirk can be potentially added there, or in any
driver*.

Yes things like fPPT, sPPT, STAPM, STT limits.


The idea is to rewrite the patch series to just add a simple delay
field on the s2idle quirk struct. Then the biggest delay wins and gets
placed in ->begin. We have been using that series for ~6 months now,
and it turns out that having a delay system for every call is quite
pointless. But there are also situations where you might have a device
such as the Z13 Folio which looks like a USB device but listens to
s2idle notifications through ACPI, so the hid subsystem might need to
be able to inject a small delay there.

So the "general" problem with injecting delays is they are typically not
scalable as they're usually empirically measured and there is no
handshake with the firmware.

Say for example the EC has some hardcoded value of 200ms to wait for
something.  IIRC the Linux timer infrastructure can be off by ~13%.  So
if you put 175ms it might work sometimes.  You get some reports of this,
so you extend it to 200ms.  Great it works 100% of the time because the
old hardcoded value in the EC was 200ms.

Now say a new EC firmware comes out that for $REASONS changes it to
250ms.  Your old empirically measured value stops working, spend a bunch
of cycles debugging it, measure the new one.  You change it to 250ms,
but people with the old one have a problem now because the timing changed.

So now you have to add infrastructure to say what version of the
firmware gets what delay.

Then you find out there is another SKU of that model which needs a
different delay, so your complexity has ballooned.

What if all these "delays" were FW timeouts from failing to service an
interrupt?  Or what if they were a flow problem like the device expected
you to issue a STOP command before a RESET command?

So we need to be /incredibly careful/ with delays and 100% they are the
right answer to a problem.

I do get your points. In this case though we sideskirt through a lot
of the points because of where the delay is placed.

If the instrumentation is in-place, this delay happens before sleep
after the screen of the device has turned off (due to early DPMS), the
keyboard backlight has turned off (DIsplay off call), and the suspend
light pulses (Sleep Entry). So it does not affect device behavior and
you can be quite liberal. The user has left the device alone.

If the device needs e.g., 250ms you will not put 250ms, you will put
500ms. Still unsure, you bump it to 750ms. Also, even if the
manufacturer comes up with a new firmware that fixes this issue, you
can keep the delay for the life of the product, because keeping it
does not affect device behavior, and writing kernel patches takes time.

This is how I think about it, at least. A universal delay might be
needed eventually. But for now, limiting the scope to some devices and
seeing how that goes should be enough.

Antheas

My take is that "universal" delays are never popular.  IE hardware that 
"previously" worked perfectly is now slower.  So if there /must/ be a 
delay it should be as narrow as possible and justified.

Let me give you an example of another case I'm *actively considering* a 
delay.

I have an OEM's system that if you enter and exit s0i3 too quickly you 
can trigger the over voltage protection (OVP) feature of the VR module.
When OVP is tripped the system is forced off immediately. This *only 
happens* on the VR module in that vendor's systems. "Normal" Linux 
userspace suspend/resume can't trip it.  But connecting a dock "does" 
trip it.

If you look on a scope you can see SLP_S3# pin is toggling faster than 
spec says it should.  Naïvely you would say well the easy solution is to 
add a delay somewhere so that SLP_S3# stays in spec.  I have a patch 
that does just that.

diff --git a/drivers/platform/x86/amd/pmc/pmc.c 
b/drivers/platform/x86/amd/pmc/pmc.c
index e6124498b195f..97387ddb281e1 100644
--- a/drivers/platform/x86/amd/pmc/pmc.c
+++ b/drivers/platform/x86/amd/pmc/pmc.c
@@ -724,10 +724,20 @@ static void amd_pmc_s2idle_check(void)
        struct smu_metrics table;
        int rc;

-       /* CZN: Ensure that future s0i3 entry attempts at least 10ms 
passed */
-       if (pdev->cpu_id == AMD_CPU_ID_CZN && !get_metrics_table(pdev, 
&table) &&
-           table.s0i3_last_entry_status)
-               usleep_range(10000, 20000);
+       if (!get_metrics_table(pdev, &table) && 
table.s0i3_last_entry_status) {
+               switch (pdev->cpu_id) {
+               /* CZN: Ensure that future s0i3 entry attempts at least 
10ms passed */
+               case AMD_CPU_ID_CZN:
+                       usleep_range(10000, 20000);
+                       break;
+               /* PHX/HPT: Ensure enough time to avoid VR OVP */
+               case AMD_CPU_ID_PS:
+                       msleep(2500);
+                       break;
+               default:
+                       break;
+               }
+       }

This stops all the failures, but it also has an impact that any time the 
EC SCI is raised for any reason (like plug in power adapter) the system 
will take 2.5s to go back into s0i3.

Digging further - the intended behavior by the EC and BIOS was to wake 
the system when the dock is connected.

That is the reason this happens is because the EC SCI is raised when the 
dock is connected, but the Notify() the EC sent wasn't received by any 
driver.  I've got a patch I'll be sending out soon that adds support for 
the correct driver to wake up on this event.

This prevents the case of the OVP and now we don't *need* to penalize 
everyone to wait 2.5s after EC SCI events and going back to s0i3.  If I 
find out there are other ways to trip the problem I still have that 
option though.