Hi, Reply at end.. > -----Original Message----- > From: Christian Loehle <christian.loehle@xxxxxxx> > Sent: 26 March 2025 16:27 > To: King, Colin <colin.king@xxxxxxxxx>; Bart Van Assche > <bvanassche@xxxxxxx>; Jens Axboe <axboe@xxxxxxxxx>; Rafael J. Wysocki > <rafael@xxxxxxxxxx>; Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>; linux- > block@xxxxxxxxxxxxxxx; linux-pm@xxxxxxxxxxxxxxx > Cc: linux-kernel@xxxxxxxxxxxxxxx > Subject: Re: [PATCH] cpuidle: psd: add power sleep demotion prevention for > fast I/O devices > > On 3/26/25 15:04, King, Colin wrote: > > Hi, > > > >> -----Original Message----- > >> From: Bart Van Assche <bvanassche@xxxxxxx> > >> Sent: 23 March 2025 12:36 > >> To: King, Colin <colin.king@xxxxxxxxx>; Christian Loehle > >> <christian.loehle@xxxxxxx>; Jens Axboe <axboe@xxxxxxxxx>; Rafael J. > >> Wysocki <rafael@xxxxxxxxxx>; Daniel Lezcano > >> <daniel.lezcano@xxxxxxxxxx>; linux-block@xxxxxxxxxxxxxxx; > >> linux-pm@xxxxxxxxxxxxxxx > >> Cc: linux-kernel@xxxxxxxxxxxxxxx > >> Subject: Re: [PATCH] cpuidle: psd: add power sleep demotion > >> prevention for fast I/O devices > >> > >> On 3/17/25 3:03 AM, King, Colin wrote: > >>> This code is optional, one can enable it or disable it via the > >>> config option. Also, even when it is built-in one can disable it by > >>> writing 0 to the > >> sysfs file > >>> /sys/devices/system/cpu/cpuidle/psd_cpu_lat_timeout_ms > >> > >> I'm not sure we need even more configuration knobs in sysfs. > > > > It's useful for enabling / disabling the functionality, as well as some form of > tuning for slower I/O devices, so I think it is justifiable. > > > >> How are users > >> expected to find this configuration option? How should they decide > >> whether to enable or to disable it? > > > > I can send a V2 with some documentation if that's required. > > > >> > >> Please take a look at this proposal and let me know whether this > >> would solve the issue that you are looking into: "[LSF/MM/BPF Topic] > Energy- Efficient I/O" > >> (https://lore.kernel.org/linux-block/ad1018b6-7c0b-4d70- > >> b845-c869287d3cf3@xxxxxxx/). The only disadvantage of this approach > >> compared to the cpuidle patch is that it requires RPM (runtime power > >> management) to be enabled. Maybe I should look into modifying the > >> approach such that it does not rely on RPM. > > > > I've had a look, the scope of my patch is a bit wider. If my patch > > gets accepted I'm going to also look at putting the psd call into > > other devices (such as network devices) to also stop deep states while > > these devices are busy. Since the code is very lightweight I was hoping this > was going to be relatively easy and simple to use in various devices in the > future. > > IMO this needs to be a lot more fine-grained then, both in terms of which > devices or even IO is affected (Surely some IO is fine with at least *some* > latency) but also how aggressive we are in blocking. > Just looking at some common latency/residency of idle states out there I don't > think it's reasonable to force polling for a 3-10ms (rounding up with the jiffie) > period. The current solution by a customer is that they are resorting to disabling C6/C6P and hence all the CPUs are essentially in a non-low power state all the time. The opt-in solution provided in the patch provides nearly the same performance and will re-enable deeper C-states once the I/O is completed. As I mentioned earlier, the jiffies are used because it's low-touch and very fast with negligible impact on the I/O paths. Using finer grained timing is far more an expensive operation and is a huge overhead on very fast I/O devices. Also, this is a user config and tune-able choice. Users can opt-in to using this if they want to pay for the extra CPU overhead for a bit more I/O performance. If they don't want it, they don't need to enable it. > Playing devil's advocate if the system is under some thermal/power pressure > we might actually reduce throughput by burning so much power on this. > This seems like the stuff that is easily convincing because it improves > throughput and then taking care of power afterwards is really hard. :/ > The current solution is when the user is trying to get maximum bandwidth and disabling C6/C6P so they are already keeping the system busy. This solution at least will save power when I/O is idling. Colin