Re: [Patch 0/5] Support Autonomous Selection mode in cppc_cpufreq

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hi Sumit,

May I resend the patch 8 in [1] first? Because I really need this new
feature.

After that patch being merged, you can resend this series base on that,
change the paths of the sysfs files, add a new cppc_cpufreq instance or do
anything in that series. Then we can continue this discussion.

Is that all right?

Hi Sumit,

Please let me know if you are OK with it.


Hi Zhenglifeng,

Both the ways do the same job i.e. set CPC registers.

To move this work forward, I am Ok with adding sysfs entries under
cpufreq syfs node and not under acpi_cppc if the maintainers are fine.

I will later send my updated patch to add more entries under cpufreq
sysfs for updating more CPC registers as done in patch 8 in [1].

Thank you,
Sumit Gupta


On 2025/4/1 21:56, zhenglifeng (A) wrote:

Sorry for the delay.

On 2025/3/14 20:48, Sumit Gupta wrote:



There seems to be some quite fundamental disagreement on how this
should be done, so I'm afraid I cannot do much about it ATM.

Please agree on a common approach and come back to me when you are ready.

Sending two concurrent patchsets under confusingly similar names again
and again isn't particularly helpful.

Thanks!

Hi Rafael,

Thank you for looking into this.

Hi Lifeng,

As per the discussion, we can make the driver future extensible and
also can optimize the register read/write access.

I gave some thought and below is my proposal.

1) Pick 'Patch 1-7' from your patch series [1] which optimize API's
     to read/write a cpc register.

2) Pick my patches in [2]:
     - Patch 1-4: Keep all cpc registers together under acpi_cppc sysfs.
                  Also, update existing API's to read/write regs in batch.
     - Patch 5: Creates 'cppc_cpufreq_epp_driver' instance for booting
       all CPU's in Auto mode and set registers with right values.
       They can be updated after boot from sysfs to change hints to HW.
       I can use the optimized API's from [1] where required in [2].

Let me know if you are okay with this proposal.
I can also send an updated patch series with all the patches combined?

[1] https://lore.kernel.org/all/20250206131428.3261578-1-zhenglifeng1@xxxxxxxxxx/
[2] https://lore.kernel.org/lkml/20250211103737.447704-1-sumitg@xxxxxxxxxx/

Regards,
Sumit Gupta


Hi Sumit,

Over the past few days, I've been thinking about your proposal and
scenario.

I think we both agree that PATCH 1-7 in [1] doesn't conflicts with [2], so
the rest of the discussion focuses on the differences between [2] and the
PATCH 8 in [1].

We both tried to support autonomous selection mode in cppc_cpufreq but on
different ways. I think the differences between these two approaches can be
summarized into three questions:

1. Which sysfs files to expose? I think this is not a problem, we can keep
all of them.

2. Where to expose these sysfs files? I understand your willing to keep all
cpc registers together under acpi_cppc sysfs. But in my opinion, it is more
suitable to expose them under cppc_cpufreq_attr, for these reasons:

    1) It may probably introduce concurrency and data consistency issues, as
I mentioned before.


As explained in previous reply, this will be solved with the ifdef
check to enable the attributes for only those CPUFREQ drivers which want
to use the generic nodes.
  e.g: '#ifdef CONFIG_ACPI_CPPC_CPUFREQ' for 'cppc_cpufreq'.

These CPC register read/write sysfs nodes are generic as per the ACPI
specification and without any vendor specific logic.
So, adding them in the lib file 'cppc_acpi.c'(CONFIG_ACPI_CPPC_LIB) will
avoid code duplication if a different or new ACPI based CPUFREQ driver
also wants to use them just by adding their macro check. Such ifdef check is also used in other places for attributes creation like below.
So, don't look like a problem.
  $ grep -A4 "acpi_cpufreq_attr\[" drivers/cpufreq/acpi-cpufreq.c
  static struct freq_attr *acpi_cpufreq_attr[] = {
     &freqdomain_cpus,
  #ifdef CONFIG_X86_ACPI_CPUFREQ_CPB
     &cpb,
  #endif

So in the future, we will see:

static struct attribute *cppc_attrs[] = {
      ...
#ifdef CONFIG_XXX
      &xxx.attr,
      &xxx.attr,
#endif
#ifdef CONFIG_XXX
      &xxx.attr,
#endif
#ifdef CONFIG_XXX
      &xxx.attr,
      ...
};

I think you are making things more complicated.


    2) The store functions call cpufreq_cpu_get() to get policy and update
the driver_data which is a cppc_cpudata. Only the driver_data in
cppc_cpufreq's policy is a cppc_cpudata! These operations are inappropriate
in cppc_acpi. This file currently provides interfaces for cpufreq drivers
to use. Reverse calls might mess up call relationships, break code
structures, and cause problems that are hard to pinpoint the root cause!


If we don't want to update the cpufreq policy from 'cppc_acpi.c' and only update it from within the cpufreq,    then this could be one valid
point to not add the write syfs nodes in 'cppc_acpi.c' lib file.

@Rafael, @Viresh : Do you have any comments on this?

I think updating cpufreq policy from 'cppc_acpi.c' should be forbidden.


    3) Difficult to extend. Different cpufreq drivers may have different
processing logic when reading from and writing to these CPC registers.
Limiting all sysfs here makes it difficult for each cpufreq driver to
extend. I think this is why there are only read-only interfaces under
cppc_attrs before.


We are updating the CPC registers as per the generic ACPI specification.
So, any ACPI based CPUFREQ driver can use these generic nodes to
read/write reg's until they have a vendor specific requirement or
implementation.
As explained above, If someone wants to update in different way and use
their own CPUFREQ driver then these generic attributes won't be created
due to the CPUFREQ driver macro check.
I think AMD and Intel are doing more than just reading/updating the registers. That's why they needed their driver specific implementations.

Adding a 'ifdef' is not a good way to solve these problems. Defining this
config does not necessarily mean that the cpufreq driver is cppc_cpufreq.


It means that only.
  ./drivers/cpufreq/Makefile:obj-$(CONFIG_ACPI_CPPC_CPUFREQ) += cppc_cpufreq.o

Compile this file does not mean that the cpufreq driver is cppc_cpufreq.
Driver registration may fail, and the actually loaded driver may be
another. It'll be dangerous to expose these sysfs files for users to update
registers' value in this case.


3. Is it necessary to add a new driver instance? [1] exposed the sysfs
files to support users dynamically change the auto selection mode of each
policy. Each policy can be operated seperately. It seems to me that if you
want to boot all CPUs in auto mode, it should be sufficient to set all
relevant registers to the correct values at boot time. I can't see why the
new instance is necessary unless you explain it further. Could you explain
more about why you add a new instance starting from answer these questions:

For a specific CPU, what is the difference between using the two instances
when auto_sel is 1? And what is the difference when auto_sel is 0?


Explained this in previous reply. Let me elaborate more.

For hundred's of CPU's, we don't need to explicitly set multiple sysfs
after boot to enable and configure Auto mode with right params. That's why an easy option is to pass boot argument or module param for enabling
and configuration.
A separate instance 'cppc_cpufreq_epp' of the 'cppc_cpufreq' driver is
added because policy min/max need to be updated to the min/max_perf
and not nominal/lowest nonlinear perf which is done by the default
init hook. Min_perf value can be lower than lowest nonlinear perf and Max_perf can be higher than nominal perf.
If some CPU is booted with epp instance and later the auto mode is disabled or min/max_perf is changed from sysfs then also the policy
min/max need to be updated accordingly.

Another is that in Autonomous mode the freq selection and setting is
done by HW. So, cpufreq_driver->target() hook is not needed.
These are few reasons which I am aware of as of now.
I think in future there can be more. Having a separate instance
reflecting a HW based Autonomous frequency selection will make it easy
for any future changes.

So CPUs will act totally differently under these two instance. But what if
I want part of the CPUs in HW mode and others in SW mode? Should I boot on
HW mode and set some policies' auto_set to false or the other way? It seems
like the effects of theses two approaches are completely different. In my
opinion, this new instance is more like a completely different driver than
cppc_cpufreq.


If it turns out that the new instance is necessary, I think we can reach a
common approach by adding this new cpufreq driver instance and place the
attributes in 'cppc_cpufreq_epp_attr', like amd-pstate did.

What do you think?

I initially thought about this but there was a problem.
What if we boot with non-epp instance which doesn't have these attributes and later want to enable Auto mode for few CPU's from sysfs.

That's the problem. CPUs can be set to Auto mode with or without this new
instance. So what's the point of it?



Best Regards,
Sumit Gupta










[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]
  Powered by Linux