Re: [RFC PATCH v1] platform/x86: thinkpad_acpi: Add parameter to suppress invalid thermal sensors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Fri, Aug 22, 2025, at 7:54 AM, Armin Wolf wrote:
> Am 21.08.25 um 19:32 schrieb Marc Burkhardt:
>
>> Am 2025-08-20 00:03, schrieb Mark Pearson:
>>
>> Hi Mark,
>>
>> thanks for replying.
>>
>>> Hi Marc,
>>>
>>> On Mon, Aug 18, 2025, at 4:39 PM, Marc Burkhardt wrote:
>>>> While moving an existing Icinga installation to a Lenovo P15 20SU I 
>>>> came
>>>> across broken JSON output from a "sensors -Aj" command consumed by the
>>>> Icinga check_lm_sensors plugin. After fiddling around trying to build a
>>>> fix in either lm_sensors or Icinga I found out the error was rooted in
>>>> some sysfs file that was created but threw errors while being read. 
>>>> On my
>>>> Lenovo ThinkPad the default fallback to 8 temperature sensors creates
>>>> sysfs entries like in my case "temp8_input" that fail when read, 
>>>> causing
>>>> the issue in user-space.
>>>>
>>>> This patch adds a module parameter (suppress_sensor) using
>>>> module_param_array() to allow users to specify a comma-separated 
>>>> list of
>>>> zero-based sensor indices to suppress sysfs file creation (e.g.
>>>> suppress_sensor=3,7). Instead of a model-specific quirk, this provides
>>>> flexible configuration for any ThinkPad with similar issues and is 
>>>> working
>>>> out-of-the-box without additional models being marked for the quirk. 
>>>> The
>>>> parameter uses a fixed-size array based on 
>>>> TPACPI_MAX_THERMAL_SENSORS (16)
>>>> consistent with the driver’s thermal sensor handling (ie.
>>>> ibm_thermal_sensors_struct or sensor_dev_attr_thermal_temp_input).
>>>>
>>>> Logging via pr_info() reports the number of suppressed sensors at 
>>>> module
>>>> initialization, and pr_info() logs each suppressed sensor during sysfs
>>>> attribute creation. Invalid sensor indices are logged once via 
>>>> pr_warn()
>>>> to avoid repetitive warnings. Tested on a ThinkPad P15 with
>>>> suppress_sensor=3,7, confirming suppression of temp4_input and 
>>>> temp8_input
>>>> with no sysfs errors. Bounds checking for uncommon values is in 
>>>> place or
>>>> will be logged.
>>>>
>>>> The patch applies to the current
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
>>>> although
>>>> it was initially written for a 6.16.0 kernel.
>>>>
>>>> I look forward to any feedback on the patch and/or handling of 
>>>> submission.
>>>> Please CC: for now as I am not (yet) subscribed. Thank you.
>>>>
>>>> Signed-off-by: Marc Burkhardt <marc.burkhardt@protoco.consulting>
>>>> ---
>>>> Notes:
>>>> I haven't posted on LKML or send a patch for over a decade now so
>>>> please forgive any possible mistakes I made regarding current coding
>>>> conventions or more generally in submitting this patch. The patch was
>>>> running for some time here with faulty sensors removed from sysfs 
>>>> and no
>>>> problems otherwise detected and was surely run through checkpatch.pl 
>>>> before
>>>> submission. get_maintainer.pl was helpful to find the hopefully right
>>>> people for CC:ing but I am otherweise totally unaware of any current
>>>> procedures or best-practices when it comes to submitting a patch.
>>>>
>>>> drivers/platform/x86/lenovo/thinkpad_acpi.c | 35 
>>>> +++++++++++++++++++++++++++++
>>>>  1 file changed, 35 insertions(+)
>>>>
>>>> diff --git a/drivers/platform/x86/lenovo/thinkpad_acpi.c
>>>> b/drivers/platform/x86/lenovo/thinkpad_acpi.c
>>>> index cc19fe520ea9..30ff01f87403 100644
>>>> --- a/drivers/platform/x86/lenovo/thinkpad_acpi.c
>>>> +++ b/drivers/platform/x86/lenovo/thinkpad_acpi.c
>>>> @@ -6019,6 +6019,30 @@ struct ibm_thermal_sensors_struct {
>>>>      s32 temp[TPACPI_MAX_THERMAL_SENSORS];
>>>>  };
>>>>
>>>> +static int suppress_sensor[TPACPI_MAX_THERMAL_SENSORS];
>>>> +static unsigned int suppress_sensor_count;
>>>> +
>>>> +static bool is_sensor_suppressed(int index)
>>>> +{
>>>> +    unsigned int i;
>>>> +    bool logged = false;
>>>> +
>>>> +    for (i = 0; i < suppress_sensor_count; i++) {
>>>> +        if (suppress_sensor[i] == index)
>>>> +            return true;
>>>> +
>>>> +        if (!logged &&
>>>> +            (suppress_sensor[i] < 0
>>>> +                || suppress_sensor[i] >= 
>>>> TPACPI_MAX_THERMAL_SENSORS)) {
>>>> +            pr_warn("Invalid sensor index %d in suppress_sensor\n",
>>>> +                suppress_sensor[i]);
>>>> +            logged = true;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    return false;
>>>> +}
>>>> +
>>>>  static const struct tpacpi_quirk thermal_quirk_table[] __initconst = {
>>>>      /* Non-standard address for thermal registers on some ThinkPads */
>>>>      TPACPI_Q_LNV3('R', '1', 'F', true),    /* L13 Yoga Gen 2 */
>>>> @@ -6313,6 +6337,11 @@ static umode_t thermal_attr_is_visible(struct
>>>> kobject *kobj,
>>>>
>>>>      int idx = sensor_attr->index;
>>>>
>>>> +    if (is_sensor_suppressed(idx)) {
>>>> +        pr_info("Sensor %d suppressed\n", idx);
>>>> +        return 0;
>>>> +    }
>>>> +
>>>>      switch (thermal_read_mode) {
>>>>      case TPACPI_THERMAL_NONE:
>>>>          return 0;
>>>> @@ -11653,6 +11682,9 @@ static void __init
>>>> thinkpad_acpi_init_banner(void)
>>>>              thinkpad_id.model_str,
>>>>              (thinkpad_id.nummodel_str) ?
>>>>                  thinkpad_id.nummodel_str : "unknown");
>>>> +
>>>> +    pr_info("Suppressing %d user-supplied sensor(s) via parameter
>>>> suppress_sensor\n",
>>>> +        suppress_sensor_count);
>>>>  }
>>>>
>>>>  /* Module init, exit, parameters */
>>>> @@ -11785,6 +11817,9 @@ MODULE_PARM_DESC(experimental,
>>>>  module_param_named(debug, dbg_level, uint, 0);
>>>>  MODULE_PARM_DESC(debug, "Sets debug level bit-mask");
>>>>
>>>> +module_param_array(suppress_sensor, int, &suppress_sensor_count, 
>>>> 0444);
>>>> +MODULE_PARM_DESC(suppress_sensor, "Comma-separated sensor indices to
>>>> suppress (e.g., 3,7)");
>>>> +
>>>>  module_param(force_load, bool, 0444);
>>>>  MODULE_PARM_DESC(force_load,
>>>>           "Attempts to load the driver even on a mis-identified 
>>>> ThinkPad when
>>>> true");
>>>
>>> The P15 is one of the Linux certified platforms...though it's a bit 
>>> older now.
>>>
>>> I'd be more interested in figuring out which sensors are returning an 
>>> error and figuring out how we address that. I have access to the FW 
>>> and platform team for questions (though this platform is a bit older 
>>> now, so if we need FW fixes that will be trickier). My gut feeling is 
>>> we shouldn't be creating sysfs entries if the sensors don't exist or 
>>> aren't accessible.
>>
>> That is what my patch does - it prevents creating the sysfs entries 
>> but not based on a check for validity of the sensor in code (as 
>> probably desired by Ilpo when I understand a previous mail correctly) 
>> but rather on a user-provided configuration via the new parameter. I 
>> reply to the other mail as well soon.
>>
> Such sensors are meant to be ignored using /etc/sensors3.conf (provided 
> by libsensors) unless the driver itself can
> automatically determine this by asking the platform firmware. I suggest 
> that you use this mechanism instead of adding
> additional module parameters.
>
> Thanks,
> Armin Wolf
>
> (I also CCed the hwmon mailing list as libsensors originally came from there)
>
>>>
>>> I do have a P15 so can check it out (I'm going to have to blow some 
>>> dust off it). If you've got the details on which sensors need 
>>> suppressing that would be useful. I have seen previously where it's 
>>> trying to access a GPU sensor on a UMA model.
>>
>> On my hardware it's sensor temp8_input which is unreadable at all und 
>> sensor temp4_input that has a constant value of 0, no matter how hot, 
>> cold or loud the machine is running. I am, however, able to monitor 
>> GPU temps via nvidia _and_ thinkpad ACPI. The values are mostly equal, 
>> differ a bit due to internal timing sometimes.
>>
>>>

I tried this on my P15, and I do get an error when the GPU sensor is accessed as it's not available (no Nvidia card on mine).

A suggestion (based a bit on Armin's suggestions): If the is_visible function is changed so if the sensor returns an error (or not available) then the sysfs entry isn't displayed. 
I think that would prevent errors/access issues from user space - at least it worked on my system.

Something like the below (I can send this up as a proper patch if it makes sense)

diff --git a/drivers/platform/x86/lenovo/thinkpad_acpi.c b/drivers/platform/x86/lenovo/thinkpad_acpi.c
index cc19fe520ea9..075d15df183c 100644
--- a/drivers/platform/x86/lenovo/thinkpad_acpi.c
+++ b/drivers/platform/x86/lenovo/thinkpad_acpi.c
@@ -6312,6 +6312,8 @@ static umode_t thermal_attr_is_visible(struct kobject *kobj,
                                        to_sensor_dev_attr(dev_attr);
 
        int idx = sensor_attr->index;
+       s32 value;
+       int res;
 
        switch (thermal_read_mode) {
        case TPACPI_THERMAL_NONE:
@@ -6334,6 +6336,11 @@ static umode_t thermal_attr_is_visible(struct kobject *kobj,
 
        }
 
+       /* Check if sensor is available */
+       res = thermal_get_sensor(idx, &value);
+       if ((res) || (value == TPACPI_THERMAL_SENSOR_NA))
+               return 0;
+
        return attr->mode;
 }

I think this would generally be useful for removing unwanted sensors without having to do extra steps?

Mark


_______________________________________________
ibm-acpi-devel mailing list
ibm-acpi-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/ibm-acpi-devel




[Index of Archives]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Photo]     [Yosemite Photos]     [Yosemite Advice]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]

  Powered by Linux