Re: [PATCH] ACPI: EC: Set ec_no_wakeup for Lenovo Go S

Antheas Kapenekakis <lkml@xxxxxxxxxxx> · Wed, 2 Apr 2025 22:37:25 +0200

On Wed, 2 Apr 2025 at 21:19, Mario Limonciello <superm1@xxxxxxxxxx> wrote:
>
> On 4/1/2025 5:06 PM, Antheas Kapenekakis wrote:
> > On Tue, 1 Apr 2025 at 22:54, Mario Limonciello <superm1@xxxxxxxxxx> wrote:
> >>
> >> On 4/1/2025 1:39 PM, Antheas Kapenekakis wrote:
> >>> On Tue, 1 Apr 2025 at 17:24, Mario Limonciello <superm1@xxxxxxxxxx> wrote:
> >>>>
> >>>> On 4/1/2025 10:03 AM, Antheas Kapenekakis wrote:
> >>>>> On Tue, 1 Apr 2025 at 16:09, Mario Limonciello <superm1@xxxxxxxxxx> wrote:
> >>>>>>
> >>>>>> On 4/1/2025 7:45 AM, Antheas Kapenekakis wrote:
> >>>>>>> On Tue, 1 Apr 2025 at 14:30, Mario Limonciello <superm1@xxxxxxxxxx> wrote:
> >>>>>>>>
> >>>>>>>>>> Here are tags for linking to your patch development to be picked up.
> >>>>>>>>>>
> >>>>>>>>>> Link:
> >>>>>>>>>> https://github.com/bazzite-org/patchwork/commit/95b93b2852718ee1e808c72e6b1836da4a95fc63
> >>>>>>>>>> Co-developed-by: Antheas Kapenekakis <lkml@xxxxxxxxxxx>
> >>>>>>>>>> Signed-off-by: Antheas Kapenekakis <lkml@xxxxxxxxxxx>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> I don't believe that b4 will pick these up, so I will send out a v2 with
> >>>>>>>> them and mark this patch as superceded in patchwork so that Rafael
> >>>>>>>> doesn't have to pull everything out of this thread manually.
> >>>>>>
> >>>>>> FTR I don't have permission on patchwork for linux-acpi.
> >>>>>>
> >>>>>> I sent out v2 though.
> >>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> And to avoid having this conversation again, there is another Legion
> >>>>>>>>> Go S [3] patch you nacked and froze the testing for, so you could go
> >>>>>>>>> on the manhunt for the real cause of this one. But it will probably be
> >>>>>>>>> needed and you will find that as you get TDP controls going. So if you
> >>>>>>>>> want me to prepare that in a timely manner, because that one actually
> >>>>>>>>> needs rewriting to be posted, now is the time to say so.
> >>>>>>>>
> >>>>>>>> Can you please propose what you have in mind on the mailing lists to
> >>>>>>>> discuss?  It's relatively expensive (in the unit of tech debt) to add
> >>>>>>>> quirk infrastructure and so we need to make sure it is the right solution.
> >>>>>>>>
> >>>>>>>> Derek is working on CPU coefficient tuning in a completely separate
> >>>>>>>> driver.  If there are issues with that, I would generally prefer the
> >>>>>>>> fixes to be in that driver.
> >>>>>>>
> >>>>>>> CPU coefficient tuning? If you mean the lenovo-wmi-driver, yes I will
> >>>>>>> try to make sure the quirk can be potentially added there, or in any
> >>>>>>> driver*.
> >>>>>>
> >>>>>> Yes things like fPPT, sPPT, STAPM, STT limits.
> >>>>>>
> >>>>>>>
> >>>>>>> The idea is to rewrite the patch series to just add a simple delay
> >>>>>>> field on the s2idle quirk struct. Then the biggest delay wins and gets
> >>>>>>> placed in ->begin. We have been using that series for ~6 months now,
> >>>>>>> and it turns out that having a delay system for every call is quite
> >>>>>>> pointless. But there are also situations where you might have a device
> >>>>>>> such as the Z13 Folio which looks like a USB device but listens to
> >>>>>>> s2idle notifications through ACPI, so the hid subsystem might need to
> >>>>>>> be able to inject a small delay there.
> >>>>>>
> >>>>>> So the "general" problem with injecting delays is they are typically not
> >>>>>> scalable as they're usually empirically measured and there is no
> >>>>>> handshake with the firmware.
> >>>>>>
> >>>>>> Say for example the EC has some hardcoded value of 200ms to wait for
> >>>>>> something.  IIRC the Linux timer infrastructure can be off by ~13%.  So
> >>>>>> if you put 175ms it might work sometimes.  You get some reports of this,
> >>>>>> so you extend it to 200ms.  Great it works 100% of the time because the
> >>>>>> old hardcoded value in the EC was 200ms.
> >>>>>>
> >>>>>> Now say a new EC firmware comes out that for $REASONS changes it to
> >>>>>> 250ms.  Your old empirically measured value stops working, spend a bunch
> >>>>>> of cycles debugging it, measure the new one.  You change it to 250ms,
> >>>>>> but people with the old one have a problem now because the timing changed.
> >>>>>>
> >>>>>> So now you have to add infrastructure to say what version of the
> >>>>>> firmware gets what delay.
> >>>>>>
> >>>>>> Then you find out there is another SKU of that model which needs a
> >>>>>> different delay, so your complexity has ballooned.
> >>>>>>
> >>>>>> What if all these "delays" were FW timeouts from failing to service an
> >>>>>> interrupt?  Or what if they were a flow problem like the device expected
> >>>>>> you to issue a STOP command before a RESET command?
> >>>>>>
> >>>>>> So we need to be /incredibly careful/ with delays and 100% they are the
> >>>>>> right answer to a problem.
> >>>>>
> >>>>> I do get your points. In this case though we sideskirt through a lot
> >>>>> of the points because of where the delay is placed.
> >>>>>
> >>>>> If the instrumentation is in-place, this delay happens before sleep
> >>>>> after the screen of the device has turned off (due to early DPMS), the
> >>>>> keyboard backlight has turned off (DIsplay off call), and the suspend
> >>>>> light pulses (Sleep Entry). So it does not affect device behavior and
> >>>>> you can be quite liberal. The user has left the device alone.
> >>>>>
> >>>>> If the device needs e.g., 250ms you will not put 250ms, you will put
> >>>>> 500ms. Still unsure, you bump it to 750ms. Also, even if the
> >>>>> manufacturer comes up with a new firmware that fixes this issue, you
> >>>>> can keep the delay for the life of the product, because keeping it
> >>>>> does not affect device behavior, and writing kernel patches takes time.
> >>>>>
> >>>>> This is how I think about it, at least. A universal delay might be
> >>>>> needed eventually. But for now, limiting the scope to some devices and
> >>>>> seeing how that goes should be enough.
> >>>>>
> >>>>> Antheas
> >>>>
> >>>> My take is that "universal" delays are never popular.  IE hardware that
> >>>> "previously" worked perfectly is now slower.  So if there /must/ be a
> >>>> delay it should be as narrow as possible and justified.
> >>>>
> >>>> Let me give you an example of another case I'm *actively considering* a
> >>>> delay.
> >>>>
> >>>> I have an OEM's system that if you enter and exit s0i3 too quickly you
> >>>> can trigger the over voltage protection (OVP) feature of the VR module.
> >>>> When OVP is tripped the system is forced off immediately. This *only
> >>>> happens* on the VR module in that vendor's systems. "Normal" Linux
> >>>> userspace suspend/resume can't trip it.  But connecting a dock "does"
> >>>> trip it.
> >>>>
> >>>> If you look on a scope you can see SLP_S3# pin is toggling faster than
> >>>> spec says it should.  Naïvely you would say well the easy solution is to
> >>>> add a delay somewhere so that SLP_S3# stays in spec.  I have a patch
> >>>> that does just that.
> >>>>
> >>>> diff --git a/drivers/platform/x86/amd/pmc/pmc.c
> >>>> b/drivers/platform/x86/amd/pmc/pmc.c
> >>>> index e6124498b195f..97387ddb281e1 100644
> >>>> --- a/drivers/platform/x86/amd/pmc/pmc.c
> >>>> +++ b/drivers/platform/x86/amd/pmc/pmc.c
> >>>> @@ -724,10 +724,20 @@ static void amd_pmc_s2idle_check(void)
> >>>>            struct smu_metrics table;
> >>>>            int rc;
> >>>>
> >>>> -       /* CZN: Ensure that future s0i3 entry attempts at least 10ms
> >>>> passed */
> >>>> -       if (pdev->cpu_id == AMD_CPU_ID_CZN && !get_metrics_table(pdev,
> >>>> &table) &&
> >>>> -           table.s0i3_last_entry_status)
> >>>> -               usleep_range(10000, 20000);
> >>>> +       if (!get_metrics_table(pdev, &table) &&
> >>>> table.s0i3_last_entry_status) {
> >>>> +               switch (pdev->cpu_id) {
> >>>> +               /* CZN: Ensure that future s0i3 entry attempts at least
> >>>> 10ms passed */
> >>>> +               case AMD_CPU_ID_CZN:
> >>>> +                       usleep_range(10000, 20000);
> >>>> +                       break;
> >>>> +               /* PHX/HPT: Ensure enough time to avoid VR OVP */
> >>>> +               case AMD_CPU_ID_PS:
> >>>> +                       msleep(2500);
> >>>> +                       break;
> >>>> +               default:
> >>>> +                       break;
> >>>> +               }
> >>>> +       }
> >>>>
> >>>> This stops all the failures, but it also has an impact that any time the
> >>>> EC SCI is raised for any reason (like plug in power adapter) the system
> >>>> will take 2.5s to go back into s0i3.
> >>>>
> >>>> Digging further - the intended behavior by the EC and BIOS was to wake
> >>>> the system when the dock is connected.
> >>>>
> >>>> That is the reason this happens is because the EC SCI is raised when the
> >>>> dock is connected, but the Notify() the EC sent wasn't received by any
> >>>> driver.  I've got a patch I'll be sending out soon that adds support for
> >>>> the correct driver to wake up on this event.
> >>>>
> >>>> This prevents the case of the OVP and now we don't *need* to penalize
> >>>> everyone to wait 2.5s after EC SCI events and going back to s0i3.  If I
> >>>> find out there are other ways to trip the problem I still have that
> >>>> option though.
> >>>
> >>> So you are talking about missing the AC/DC burst feature of Windows
> >>> here right? I do agree with you that yeah for most devices it is not
> >>> necessary.
> >>
> >> No; I wasn't talking about that, my point was that timing delays are a
> >> tempting to solution to a problem, but they're very often papering over
> >> something else and a hint to dive deeper.
> >
> > What I gleaned from what you said is that X manufacturer has a problem
> > due to missing AC/DC bursts in linux, where all AC/DC burst is is a 5s
> > delay.
> >
> > The intended behavior of AC/DC bursts is to fully wake up the kernel
> > for 5 seconds, and then sleep again. In windows, if a power supply is
> > connected, userspace wakes up too, and then the Windows power manager
> > sleeps the system again if there is no user activity for 5 seconds.
> > However, this should not affect device drivers, so we may consider it
> > optional on the Linux side until DEs get support for it and enable it
> > themselves I would say.
> >
> > So in effect, AC/DC bursts are Windows' solution to problems like the
> > one you faced.
> >
> > I am not saying penalize everyone. If I do make a patch for AC/DC it
> > will be device specific. But after a point, if random devices start
> > getting issues and the quirk list starts to grow, it might become
> > inevitable to force it for all of them.
> >
> > I do get what you are saying with delays though. We had to merge one
> > of the initial SOF delay patch variants for the Steam Deck which
> > prevents audio crashing on resume, and that was definitely a bandage.
> >
>
> Maybe I'm failing at my search-engine-foo, could you point me at some
> docs about this AC/DC burst stuff?

AC/DC Burst/AC/DC Burst Suppresed are the events in Sleep Study
https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/modern-standby-sleepstudy

You can see those when running a sleep study and unplugging a
connector. I think suppressed is unplugging

Then here is the description for plugging in a charger
https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/modern-standby-wake-sources#environmental-context-changes-1

> The Windows power manager will turn on the display when the battery subsystem has indicated
> AC power has been connected. The GPIO interrupt for power source changes must cause the
> ACPI _PSR method under the power supply device to be executed. The power subsystem must
> wake the SoC any time the power source changes, including when the system is attached or
> removed from a dock that has a battery or AC power source. After AC power is connected,
> the display will remain on for five seconds, unless there is input to the system during this five-second window.

And here for unplugging:
https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/modern-standby-wake-sources#environmental-context-changes-2

> The GPIO interrupt for power source changes must cause the ACPI _PSR method under
> the power supply device to be executed. The power subsystem must wake the SoC any time
> the power source changes, including when the system is attached or removed from a dock
> that has a battery or AC power source.

I guess from the description it is not clear that the device stays on
for 5 seconds when unplugging, but from empirical testing I want to
say it does. It has been a while. I left 3 devices like an hour ago on
Windows and none of them managed to sleep, so I cannot verify this at
the moment though.

Antheas

> [
> And FTR unfortunately it's seeming that my proposal for the alternate
> wake source has negative side effects to other machines, so I'm going
> back to exploring a timing based quirk tied to SMBIOS data again too :( ]
>
> >>>
> >>> But Microsoft guarantees 5 seconds. We already have the original Ally
> >>> unit which gets stuck in prochot due to this so it would be nice to
> >>> fix. For the Ally X I am unsure what Asus did but it stays awake for a
> >>> nice three seconds after you plug/unplug the charger so it has no
> >>> issues.
> >>>
> >>> So if devices keep getting issues like we will have to eat it and do
> >>> AC/DC bursts with all of them.
> >>>
> >>> And it is the same with entering s0i3 too fast. Some devices just need
> >>> a tiny amount of time to do whatever it is their manufacturer
> >>> programmed them to do after the Modern Standby notifications. For
> >>> handhelds, it is to turn off the controllers because XInput. Asus put
> >>> the fade animation so that takes 300ms and if you do it earlier the
> >>> controller gets cut before it saves its state and starts to do weird
> >>> RGB stuff. Other manufacturers typically do not malfunction but they
> >>> still use the notification.
> >>>
> >>> Only MSI does not, but that controller is quirky before/after sleep
> >>> and they released a firmware update today saying something about
> >>> controller S3/S4 improvements so they probably do that too now, I need
> >>> to check.
> >>>
> >>> For the Go S, it sets itself to 5W after sleep entry and turns off the
> >>> fan. A little delay went a long way in fixing the hang there, which I
> >>> suspect is due to aggressive tuning. But I do not know if you guys get
> >>> that. We did when we did the initial testing for it and carry the
> >>> delay now so I cannot tell you either way. So you should max out the
> >>> TDP, run stress -c 16, and make the device sleep 100-200 times to make
> >>> sure that is not an issue.
> >>>
> >>> I do have a plan for trying to rework AC/DC bursts, but first the
> >>> s2idle ordering needs to be fixed and I need to rewrite the series for
> >>> that. The series we have for that works _fine_ so it is not a priority
> >>> to rework but it is not upstreamable in its current state so if you
> >>> need that (for the Go S) I need to know now.
> >>>
> >>> For ACDC my idea would be after the reordering is done to have a quirk
> >>> that makes the kernel resume, fire the sleep exit notification, loop
> >>> for 5 (maybe 3?) seconds inside the device suspend section prior to
> >>> userspace resume, and then as long as a wakeup did not arrive restart
> >>> the suspend sequence to sleep again. I would also combine that with
> >>> the little s2idle wakeup device you made, so that userspace can enable
> >>> wakeups for that if it wants to do resume on dock connection. But that
> >>> has a lot of moving parts, including moving the DPMS action to happen
> >>> even earlier than your patch does and making sure display on/off does
> >>> not fire so that the keyboard backlight does not do weird stuff.
> >>>
> >>> Antheas
> >>
> >> I think a good start for what you're talking about would be to rebase
> >> your series that reworked s2idle flow on 6.15 code (maybe it's a clean
> >> rebase, idk) and then if/when all of us on LKML are happy with it we can
> >> layer other concepts on top of that.
> >
> > Yeah, I will try to do that. However, I have around 30 submitted
> > patches in the air right now, and we are about to add another 5-6 to
> > the list for the Claw. So it will probably be after a bunch of those
> > merge. For the interest of sustainability, if nothing else. So let's
> > put a dot on this and pick up the discussion again mid 6.16 in a month
> > or so.
> >
> > Unless you need this series for the Go S, in which case I can try to
> > re-order stuff around. So, one of you should use the red light TDP
> > mode with an artificial load (or actual, such as a game) and see if
> > sleep works properly. Do that on battery.
>
> Take your time and get to it when you get to it.  I just want to make
> sure we build a clean foundation for big changes like you have in mind.
>
> >
> > I would do at least 100 suspends with this, as most users do 50-70
> > suspends per reboot. I think I did around 300 to validate the Go S
> > quirk.
> >
> > Antheas
>