On Mon, 2025-07-28 at 16:43 +0200, Neil Armstrong wrote: > On 25/07/2025 16:16, André Draszik wrote: > > Commit 3c7ac40d7322 ("scsi: ufs: core: Delegate the interrupt service > > routine to a threaded IRQ handler") introduced a massive performance > > drop for various work loads on UFSHC versions < 4 due to the extra > > latency introduced by moving all of the IRQ handling into a threaded > > handler. See below for a summary. > > > > To resolve this performance drop, move IRQ handling back into hardirq > > context, but apply a time limit which, once expired, will cause the > > remainder of the work to be deferred to the threaded handler. > > > > Above commit is trying to avoid unduly delay of other subsystem > > interrupts while the UFS events are being handled. By limiting the > > amount of time spent in hardirq context, we can still ensure that. > > > > The time limit itself was chosen because I have generally seen > > interrupt handling to have been completed within 20 usecs, with the > > occasional spikes of a couple 100 usecs. > > > > This commits brings UFS performance roughly back to original > > performance, and should still avoid other subsystem's starvation thanks > > to dealing with these spikes. > > > > fio results for 4k block size on Pixel 6, all values being the average > > of 5 runs each: > > read / 1 job original after this commit > > min IOPS 4,653.60 2,704.40 3,902.80 > > max IOPS 6,151.80 4,847.60 6,103.40 > > avg IOPS 5,488.82 4,226.61 5,314.89 > > cpu % usr 1.85 1.72 1.97 > > cpu % sys 32.46 28.88 33.29 > > bw MB/s 21.46 16.50 20.76 > > > > read / 8 jobs original after this commit > > min IOPS 18,207.80 11,323.00 17,911.80 > > max IOPS 25,535.80 14,477.40 24,373.60 > > avg IOPS 22,529.93 13,325.59 21,868.85 > > cpu % usr 1.70 1.41 1.67 > > cpu % sys 27.89 21.85 27.23 > > bw MB/s 88.10 52.10 84.48 > > > > write / 1 job original after this commit > > min IOPS 6,524.20 3,136.00 5,988.40 > > max IOPS 7,303.60 5,144.40 7,232.40 > > avg IOPS 7,169.80 4,608.29 7,014.66 > > cpu % usr 2.29 2.34 2.23 > > cpu % sys 41.91 39.34 42.48 > > bw MB/s 28.02 18.00 27.42 > > > > write / 8 jobs original after this commit > > min IOPS 12,685.40 13,783.00 12,622.40 > > max IOPS 30,814.20 22,122.00 29,636.00 > > avg IOPS 21,539.04 18,552.63 21,134.65 > > cpu % usr 2.08 1.61 2.07 > > cpu % sys 30.86 23.88 30.64 > > bw MB/s 84.18 72.54 82.62 > > Thanks for this updated change, I'm running the exact same run on SM8650 to check the impact, > and I'll report something comparable. Btw, my complete command was (should probably have added that to the commit message in the first place): for rw in read write ; do echo "rw: ${rw}" for jobs in 1 8 ; do echo "jobs: ${jobs}" for it in $(seq 1 5) ; do fio --name=rand${rw} --rw=rand${rw} \ --ioengine=libaio --direct=1 \ --bs=4k --numjobs=${jobs} --size=32m \ --runtime=30 --time_based --end_fsync=1 \ --group_reporting --filename=/foo \ | grep -E '(iops|sys=|READ:|WRITE:)' sleep 5 done done done Cheers, Andre'