Re: [PATCH] ACPI: EC: Set ec_no_wakeup for Lenovo Go S

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 1 Apr 2025 at 22:54, Mario Limonciello <superm1@xxxxxxxxxx> wrote:
>
> On 4/1/2025 1:39 PM, Antheas Kapenekakis wrote:
> > On Tue, 1 Apr 2025 at 17:24, Mario Limonciello <superm1@xxxxxxxxxx> wrote:
> >>
> >> On 4/1/2025 10:03 AM, Antheas Kapenekakis wrote:
> >>> On Tue, 1 Apr 2025 at 16:09, Mario Limonciello <superm1@xxxxxxxxxx> wrote:
> >>>>
> >>>> On 4/1/2025 7:45 AM, Antheas Kapenekakis wrote:
> >>>>> On Tue, 1 Apr 2025 at 14:30, Mario Limonciello <superm1@xxxxxxxxxx> wrote:
> >>>>>>
> >>>>>>>> Here are tags for linking to your patch development to be picked up.
> >>>>>>>>
> >>>>>>>> Link:
> >>>>>>>> https://github.com/bazzite-org/patchwork/commit/95b93b2852718ee1e808c72e6b1836da4a95fc63
> >>>>>>>> Co-developed-by: Antheas Kapenekakis <lkml@xxxxxxxxxxx>
> >>>>>>>> Signed-off-by: Antheas Kapenekakis <lkml@xxxxxxxxxxx>
> >>>>>>>
> >>>>>>
> >>>>>> I don't believe that b4 will pick these up, so I will send out a v2 with
> >>>>>> them and mark this patch as superceded in patchwork so that Rafael
> >>>>>> doesn't have to pull everything out of this thread manually.
> >>>>
> >>>> FTR I don't have permission on patchwork for linux-acpi.
> >>>>
> >>>> I sent out v2 though.
> >>>>
> >>>>>>
> >>>>>>>
> >>>>>>> And to avoid having this conversation again, there is another Legion
> >>>>>>> Go S [3] patch you nacked and froze the testing for, so you could go
> >>>>>>> on the manhunt for the real cause of this one. But it will probably be
> >>>>>>> needed and you will find that as you get TDP controls going. So if you
> >>>>>>> want me to prepare that in a timely manner, because that one actually
> >>>>>>> needs rewriting to be posted, now is the time to say so.
> >>>>>>
> >>>>>> Can you please propose what you have in mind on the mailing lists to
> >>>>>> discuss?  It's relatively expensive (in the unit of tech debt) to add
> >>>>>> quirk infrastructure and so we need to make sure it is the right solution.
> >>>>>>
> >>>>>> Derek is working on CPU coefficient tuning in a completely separate
> >>>>>> driver.  If there are issues with that, I would generally prefer the
> >>>>>> fixes to be in that driver.
> >>>>>
> >>>>> CPU coefficient tuning? If you mean the lenovo-wmi-driver, yes I will
> >>>>> try to make sure the quirk can be potentially added there, or in any
> >>>>> driver*.
> >>>>
> >>>> Yes things like fPPT, sPPT, STAPM, STT limits.
> >>>>
> >>>>>
> >>>>> The idea is to rewrite the patch series to just add a simple delay
> >>>>> field on the s2idle quirk struct. Then the biggest delay wins and gets
> >>>>> placed in ->begin. We have been using that series for ~6 months now,
> >>>>> and it turns out that having a delay system for every call is quite
> >>>>> pointless. But there are also situations where you might have a device
> >>>>> such as the Z13 Folio which looks like a USB device but listens to
> >>>>> s2idle notifications through ACPI, so the hid subsystem might need to
> >>>>> be able to inject a small delay there.
> >>>>
> >>>> So the "general" problem with injecting delays is they are typically not
> >>>> scalable as they're usually empirically measured and there is no
> >>>> handshake with the firmware.
> >>>>
> >>>> Say for example the EC has some hardcoded value of 200ms to wait for
> >>>> something.  IIRC the Linux timer infrastructure can be off by ~13%.  So
> >>>> if you put 175ms it might work sometimes.  You get some reports of this,
> >>>> so you extend it to 200ms.  Great it works 100% of the time because the
> >>>> old hardcoded value in the EC was 200ms.
> >>>>
> >>>> Now say a new EC firmware comes out that for $REASONS changes it to
> >>>> 250ms.  Your old empirically measured value stops working, spend a bunch
> >>>> of cycles debugging it, measure the new one.  You change it to 250ms,
> >>>> but people with the old one have a problem now because the timing changed.
> >>>>
> >>>> So now you have to add infrastructure to say what version of the
> >>>> firmware gets what delay.
> >>>>
> >>>> Then you find out there is another SKU of that model which needs a
> >>>> different delay, so your complexity has ballooned.
> >>>>
> >>>> What if all these "delays" were FW timeouts from failing to service an
> >>>> interrupt?  Or what if they were a flow problem like the device expected
> >>>> you to issue a STOP command before a RESET command?
> >>>>
> >>>> So we need to be /incredibly careful/ with delays and 100% they are the
> >>>> right answer to a problem.
> >>>
> >>> I do get your points. In this case though we sideskirt through a lot
> >>> of the points because of where the delay is placed.
> >>>
> >>> If the instrumentation is in-place, this delay happens before sleep
> >>> after the screen of the device has turned off (due to early DPMS), the
> >>> keyboard backlight has turned off (DIsplay off call), and the suspend
> >>> light pulses (Sleep Entry). So it does not affect device behavior and
> >>> you can be quite liberal. The user has left the device alone.
> >>>
> >>> If the device needs e.g., 250ms you will not put 250ms, you will put
> >>> 500ms. Still unsure, you bump it to 750ms. Also, even if the
> >>> manufacturer comes up with a new firmware that fixes this issue, you
> >>> can keep the delay for the life of the product, because keeping it
> >>> does not affect device behavior, and writing kernel patches takes time.
> >>>
> >>> This is how I think about it, at least. A universal delay might be
> >>> needed eventually. But for now, limiting the scope to some devices and
> >>> seeing how that goes should be enough.
> >>>
> >>> Antheas
> >>
> >> My take is that "universal" delays are never popular.  IE hardware that
> >> "previously" worked perfectly is now slower.  So if there /must/ be a
> >> delay it should be as narrow as possible and justified.
> >>
> >> Let me give you an example of another case I'm *actively considering* a
> >> delay.
> >>
> >> I have an OEM's system that if you enter and exit s0i3 too quickly you
> >> can trigger the over voltage protection (OVP) feature of the VR module.
> >> When OVP is tripped the system is forced off immediately. This *only
> >> happens* on the VR module in that vendor's systems. "Normal" Linux
> >> userspace suspend/resume can't trip it.  But connecting a dock "does"
> >> trip it.
> >>
> >> If you look on a scope you can see SLP_S3# pin is toggling faster than
> >> spec says it should.  Naïvely you would say well the easy solution is to
> >> add a delay somewhere so that SLP_S3# stays in spec.  I have a patch
> >> that does just that.
> >>
> >> diff --git a/drivers/platform/x86/amd/pmc/pmc.c
> >> b/drivers/platform/x86/amd/pmc/pmc.c
> >> index e6124498b195f..97387ddb281e1 100644
> >> --- a/drivers/platform/x86/amd/pmc/pmc.c
> >> +++ b/drivers/platform/x86/amd/pmc/pmc.c
> >> @@ -724,10 +724,20 @@ static void amd_pmc_s2idle_check(void)
> >>           struct smu_metrics table;
> >>           int rc;
> >>
> >> -       /* CZN: Ensure that future s0i3 entry attempts at least 10ms
> >> passed */
> >> -       if (pdev->cpu_id == AMD_CPU_ID_CZN && !get_metrics_table(pdev,
> >> &table) &&
> >> -           table.s0i3_last_entry_status)
> >> -               usleep_range(10000, 20000);
> >> +       if (!get_metrics_table(pdev, &table) &&
> >> table.s0i3_last_entry_status) {
> >> +               switch (pdev->cpu_id) {
> >> +               /* CZN: Ensure that future s0i3 entry attempts at least
> >> 10ms passed */
> >> +               case AMD_CPU_ID_CZN:
> >> +                       usleep_range(10000, 20000);
> >> +                       break;
> >> +               /* PHX/HPT: Ensure enough time to avoid VR OVP */
> >> +               case AMD_CPU_ID_PS:
> >> +                       msleep(2500);
> >> +                       break;
> >> +               default:
> >> +                       break;
> >> +               }
> >> +       }
> >>
> >> This stops all the failures, but it also has an impact that any time the
> >> EC SCI is raised for any reason (like plug in power adapter) the system
> >> will take 2.5s to go back into s0i3.
> >>
> >> Digging further - the intended behavior by the EC and BIOS was to wake
> >> the system when the dock is connected.
> >>
> >> That is the reason this happens is because the EC SCI is raised when the
> >> dock is connected, but the Notify() the EC sent wasn't received by any
> >> driver.  I've got a patch I'll be sending out soon that adds support for
> >> the correct driver to wake up on this event.
> >>
> >> This prevents the case of the OVP and now we don't *need* to penalize
> >> everyone to wait 2.5s after EC SCI events and going back to s0i3.  If I
> >> find out there are other ways to trip the problem I still have that
> >> option though.
> >
> > So you are talking about missing the AC/DC burst feature of Windows
> > here right? I do agree with you that yeah for most devices it is not
> > necessary.
>
> No; I wasn't talking about that, my point was that timing delays are a
> tempting to solution to a problem, but they're very often papering over
> something else and a hint to dive deeper.

What I gleaned from what you said is that X manufacturer has a problem
due to missing AC/DC bursts in linux, where all AC/DC burst is is a 5s
delay.

The intended behavior of AC/DC bursts is to fully wake up the kernel
for 5 seconds, and then sleep again. In windows, if a power supply is
connected, userspace wakes up too, and then the Windows power manager
sleeps the system again if there is no user activity for 5 seconds.
However, this should not affect device drivers, so we may consider it
optional on the Linux side until DEs get support for it and enable it
themselves I would say.

So in effect, AC/DC bursts are Windows' solution to problems like the
one you faced.

I am not saying penalize everyone. If I do make a patch for AC/DC it
will be device specific. But after a point, if random devices start
getting issues and the quirk list starts to grow, it might become
inevitable to force it for all of them.

I do get what you are saying with delays though. We had to merge one
of the initial SOF delay patch variants for the Steam Deck which
prevents audio crashing on resume, and that was definitely a bandage.

> >
> > But Microsoft guarantees 5 seconds. We already have the original Ally
> > unit which gets stuck in prochot due to this so it would be nice to
> > fix. For the Ally X I am unsure what Asus did but it stays awake for a
> > nice three seconds after you plug/unplug the charger so it has no
> > issues.
> >
> > So if devices keep getting issues like we will have to eat it and do
> > AC/DC bursts with all of them.
> >
> > And it is the same with entering s0i3 too fast. Some devices just need
> > a tiny amount of time to do whatever it is their manufacturer
> > programmed them to do after the Modern Standby notifications. For
> > handhelds, it is to turn off the controllers because XInput. Asus put
> > the fade animation so that takes 300ms and if you do it earlier the
> > controller gets cut before it saves its state and starts to do weird
> > RGB stuff. Other manufacturers typically do not malfunction but they
> > still use the notification.
> >
> > Only MSI does not, but that controller is quirky before/after sleep
> > and they released a firmware update today saying something about
> > controller S3/S4 improvements so they probably do that too now, I need
> > to check.
> >
> > For the Go S, it sets itself to 5W after sleep entry and turns off the
> > fan. A little delay went a long way in fixing the hang there, which I
> > suspect is due to aggressive tuning. But I do not know if you guys get
> > that. We did when we did the initial testing for it and carry the
> > delay now so I cannot tell you either way. So you should max out the
> > TDP, run stress -c 16, and make the device sleep 100-200 times to make
> > sure that is not an issue.
> >
> > I do have a plan for trying to rework AC/DC bursts, but first the
> > s2idle ordering needs to be fixed and I need to rewrite the series for
> > that. The series we have for that works _fine_ so it is not a priority
> > to rework but it is not upstreamable in its current state so if you
> > need that (for the Go S) I need to know now.
> >
> > For ACDC my idea would be after the reordering is done to have a quirk
> > that makes the kernel resume, fire the sleep exit notification, loop
> > for 5 (maybe 3?) seconds inside the device suspend section prior to
> > userspace resume, and then as long as a wakeup did not arrive restart
> > the suspend sequence to sleep again. I would also combine that with
> > the little s2idle wakeup device you made, so that userspace can enable
> > wakeups for that if it wants to do resume on dock connection. But that
> > has a lot of moving parts, including moving the DPMS action to happen
> > even earlier than your patch does and making sure display on/off does
> > not fire so that the keyboard backlight does not do weird stuff.
> >
> > Antheas
>
> I think a good start for what you're talking about would be to rebase
> your series that reworked s2idle flow on 6.15 code (maybe it's a clean
> rebase, idk) and then if/when all of us on LKML are happy with it we can
> layer other concepts on top of that.

Yeah, I will try to do that. However, I have around 30 submitted
patches in the air right now, and we are about to add another 5-6 to
the list for the Claw. So it will probably be after a bunch of those
merge. For the interest of sustainability, if nothing else. So let's
put a dot on this and pick up the discussion again mid 6.16 in a month
or so.

Unless you need this series for the Go S, in which case I can try to
re-order stuff around. So, one of you should use the red light TDP
mode with an artificial load (or actual, such as a game) and see if
sleep works properly. Do that on battery.

I would do at least 100 suspends with this, as most users do 50-70
suspends per reboot. I think I did around 300 to validate the Go S
quirk.

Antheas





[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]
  Powered by Linux