Hi, On Fri, Aug 22, 2025, at 7:54 AM, Armin Wolf wrote: > Am 21.08.25 um 19:32 schrieb Marc Burkhardt: > >> Am 2025-08-20 00:03, schrieb Mark Pearson: >> >> Hi Mark, >> >> thanks for replying. >> >>> Hi Marc, >>> >>> On Mon, Aug 18, 2025, at 4:39 PM, Marc Burkhardt wrote: >>>> While moving an existing Icinga installation to a Lenovo P15 20SU I >>>> came >>>> across broken JSON output from a "sensors -Aj" command consumed by the >>>> Icinga check_lm_sensors plugin. After fiddling around trying to build a >>>> fix in either lm_sensors or Icinga I found out the error was rooted in >>>> some sysfs file that was created but threw errors while being read. >>>> On my >>>> Lenovo ThinkPad the default fallback to 8 temperature sensors creates >>>> sysfs entries like in my case "temp8_input" that fail when read, >>>> causing >>>> the issue in user-space. >>>> >>>> This patch adds a module parameter (suppress_sensor) using >>>> module_param_array() to allow users to specify a comma-separated >>>> list of >>>> zero-based sensor indices to suppress sysfs file creation (e.g. >>>> suppress_sensor=3,7). Instead of a model-specific quirk, this provides >>>> flexible configuration for any ThinkPad with similar issues and is >>>> working >>>> out-of-the-box without additional models being marked for the quirk. >>>> The >>>> parameter uses a fixed-size array based on >>>> TPACPI_MAX_THERMAL_SENSORS (16) >>>> consistent with the driver’s thermal sensor handling (ie. >>>> ibm_thermal_sensors_struct or sensor_dev_attr_thermal_temp_input). >>>> >>>> Logging via pr_info() reports the number of suppressed sensors at >>>> module >>>> initialization, and pr_info() logs each suppressed sensor during sysfs >>>> attribute creation. Invalid sensor indices are logged once via >>>> pr_warn() >>>> to avoid repetitive warnings. Tested on a ThinkPad P15 with >>>> suppress_sensor=3,7, confirming suppression of temp4_input and >>>> temp8_input >>>> with no sysfs errors. Bounds checking for uncommon values is in >>>> place or >>>> will be logged. >>>> >>>> The patch applies to the current >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git >>>> although >>>> it was initially written for a 6.16.0 kernel. >>>> >>>> I look forward to any feedback on the patch and/or handling of >>>> submission. >>>> Please CC: for now as I am not (yet) subscribed. Thank you. >>>> >>>> Signed-off-by: Marc Burkhardt <marc.burkhardt@protoco.consulting> >>>> --- >>>> Notes: >>>> I haven't posted on LKML or send a patch for over a decade now so >>>> please forgive any possible mistakes I made regarding current coding >>>> conventions or more generally in submitting this patch. The patch was >>>> running for some time here with faulty sensors removed from sysfs >>>> and no >>>> problems otherwise detected and was surely run through checkpatch.pl >>>> before >>>> submission. get_maintainer.pl was helpful to find the hopefully right >>>> people for CC:ing but I am otherweise totally unaware of any current >>>> procedures or best-practices when it comes to submitting a patch. >>>> >>>> drivers/platform/x86/lenovo/thinkpad_acpi.c | 35 >>>> +++++++++++++++++++++++++++++ >>>> 1 file changed, 35 insertions(+) >>>> >>>> diff --git a/drivers/platform/x86/lenovo/thinkpad_acpi.c >>>> b/drivers/platform/x86/lenovo/thinkpad_acpi.c >>>> index cc19fe520ea9..30ff01f87403 100644 >>>> --- a/drivers/platform/x86/lenovo/thinkpad_acpi.c >>>> +++ b/drivers/platform/x86/lenovo/thinkpad_acpi.c >>>> @@ -6019,6 +6019,30 @@ struct ibm_thermal_sensors_struct { >>>> s32 temp[TPACPI_MAX_THERMAL_SENSORS]; >>>> }; >>>> >>>> +static int suppress_sensor[TPACPI_MAX_THERMAL_SENSORS]; >>>> +static unsigned int suppress_sensor_count; >>>> + >>>> +static bool is_sensor_suppressed(int index) >>>> +{ >>>> + unsigned int i; >>>> + bool logged = false; >>>> + >>>> + for (i = 0; i < suppress_sensor_count; i++) { >>>> + if (suppress_sensor[i] == index) >>>> + return true; >>>> + >>>> + if (!logged && >>>> + (suppress_sensor[i] < 0 >>>> + || suppress_sensor[i] >= >>>> TPACPI_MAX_THERMAL_SENSORS)) { >>>> + pr_warn("Invalid sensor index %d in suppress_sensor\n", >>>> + suppress_sensor[i]); >>>> + logged = true; >>>> + } >>>> + } >>>> + >>>> + return false; >>>> +} >>>> + >>>> static const struct tpacpi_quirk thermal_quirk_table[] __initconst = { >>>> /* Non-standard address for thermal registers on some ThinkPads */ >>>> TPACPI_Q_LNV3('R', '1', 'F', true), /* L13 Yoga Gen 2 */ >>>> @@ -6313,6 +6337,11 @@ static umode_t thermal_attr_is_visible(struct >>>> kobject *kobj, >>>> >>>> int idx = sensor_attr->index; >>>> >>>> + if (is_sensor_suppressed(idx)) { >>>> + pr_info("Sensor %d suppressed\n", idx); >>>> + return 0; >>>> + } >>>> + >>>> switch (thermal_read_mode) { >>>> case TPACPI_THERMAL_NONE: >>>> return 0; >>>> @@ -11653,6 +11682,9 @@ static void __init >>>> thinkpad_acpi_init_banner(void) >>>> thinkpad_id.model_str, >>>> (thinkpad_id.nummodel_str) ? >>>> thinkpad_id.nummodel_str : "unknown"); >>>> + >>>> + pr_info("Suppressing %d user-supplied sensor(s) via parameter >>>> suppress_sensor\n", >>>> + suppress_sensor_count); >>>> } >>>> >>>> /* Module init, exit, parameters */ >>>> @@ -11785,6 +11817,9 @@ MODULE_PARM_DESC(experimental, >>>> module_param_named(debug, dbg_level, uint, 0); >>>> MODULE_PARM_DESC(debug, "Sets debug level bit-mask"); >>>> >>>> +module_param_array(suppress_sensor, int, &suppress_sensor_count, >>>> 0444); >>>> +MODULE_PARM_DESC(suppress_sensor, "Comma-separated sensor indices to >>>> suppress (e.g., 3,7)"); >>>> + >>>> module_param(force_load, bool, 0444); >>>> MODULE_PARM_DESC(force_load, >>>> "Attempts to load the driver even on a mis-identified >>>> ThinkPad when >>>> true"); >>> >>> The P15 is one of the Linux certified platforms...though it's a bit >>> older now. >>> >>> I'd be more interested in figuring out which sensors are returning an >>> error and figuring out how we address that. I have access to the FW >>> and platform team for questions (though this platform is a bit older >>> now, so if we need FW fixes that will be trickier). My gut feeling is >>> we shouldn't be creating sysfs entries if the sensors don't exist or >>> aren't accessible. >> >> That is what my patch does - it prevents creating the sysfs entries >> but not based on a check for validity of the sensor in code (as >> probably desired by Ilpo when I understand a previous mail correctly) >> but rather on a user-provided configuration via the new parameter. I >> reply to the other mail as well soon. >> > Such sensors are meant to be ignored using /etc/sensors3.conf (provided > by libsensors) unless the driver itself can > automatically determine this by asking the platform firmware. I suggest > that you use this mechanism instead of adding > additional module parameters. > > Thanks, > Armin Wolf > > (I also CCed the hwmon mailing list as libsensors originally came from there) > >>> >>> I do have a P15 so can check it out (I'm going to have to blow some >>> dust off it). If you've got the details on which sensors need >>> suppressing that would be useful. I have seen previously where it's >>> trying to access a GPU sensor on a UMA model. >> >> On my hardware it's sensor temp8_input which is unreadable at all und >> sensor temp4_input that has a constant value of 0, no matter how hot, >> cold or loud the machine is running. I am, however, able to monitor >> GPU temps via nvidia _and_ thinkpad ACPI. The values are mostly equal, >> differ a bit due to internal timing sometimes. >> >>> I tried this on my P15, and I do get an error when the GPU sensor is accessed as it's not available (no Nvidia card on mine). A suggestion (based a bit on Armin's suggestions): If the is_visible function is changed so if the sensor returns an error (or not available) then the sysfs entry isn't displayed. I think that would prevent errors/access issues from user space - at least it worked on my system. Something like the below (I can send this up as a proper patch if it makes sense) diff --git a/drivers/platform/x86/lenovo/thinkpad_acpi.c b/drivers/platform/x86/lenovo/thinkpad_acpi.c index cc19fe520ea9..075d15df183c 100644 --- a/drivers/platform/x86/lenovo/thinkpad_acpi.c +++ b/drivers/platform/x86/lenovo/thinkpad_acpi.c @@ -6312,6 +6312,8 @@ static umode_t thermal_attr_is_visible(struct kobject *kobj, to_sensor_dev_attr(dev_attr); int idx = sensor_attr->index; + s32 value; + int res; switch (thermal_read_mode) { case TPACPI_THERMAL_NONE: @@ -6334,6 +6336,11 @@ static umode_t thermal_attr_is_visible(struct kobject *kobj, } + /* Check if sensor is available */ + res = thermal_get_sensor(idx, &value); + if ((res) || (value == TPACPI_THERMAL_SENSOR_NA)) + return 0; + return attr->mode; } I think this would generally be useful for removing unwanted sensors without having to do extra steps? Mark _______________________________________________ ibm-acpi-devel mailing list ibm-acpi-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.sourceforge.net/lists/listinfo/ibm-acpi-devel