Re: [PATCH v2 2/2] scsi: ufs: core: move some irq handling back to hardirq (with time limit)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 28/07/2025 17:19, Bart Van Assche wrote:
On 7/28/25 7:49 AM, André Draszik wrote:
Btw, my complete command was (should probably have added that
to the commit message in the first place):

for rw in read write ; do
     echo "rw: ${rw}"
     for jobs in 1 8 ; do
         echo "jobs: ${jobs}"
         for it in $(seq 1 5) ; do
             fio --name=rand${rw} --rw=rand${rw} \
                 --ioengine=libaio --direct=1 \
                 --bs=4k --numjobs=${jobs} --size=32m \
                 --runtime=30 --time_based --end_fsync=1 \
                 --group_reporting --filename=/foo \
             | grep -E '(iops|sys=|READ:|WRITE:)'
             sleep 5
         done
     done
done

Please run performance tests in recovery mode against a block
device (/dev/block/sd...) instead of running performance tests on
top of a filesystem. One possible approach for retrieving the block
device name is as follows:

adb shell readlink /dev/block/by-name/userdata

There may be other approaches for retrieving the name of the block
device associated with /data. Additionally, tuning for maximum
performance is useful because it eliminates impact from the process
scheduler on block device performance measurement. An extract from a
scrip that I use myself to measure block device performance on Pixel
devices is available below.

Of course, I did all that and ran on the SM8650 QRD & HDK boards, one has
an UFS 3.1 device and the other an UFS 4.0 device.

Here's the raw data:

Board: sm8650-qrd
read / 1 job
                    v6.15               v6.16                   v6.16 + this commit
min IOPS            3,996.00            5,921.60                           3,424.80
max IOPS            4,772.80            6,491.20                           4,541.20
avg IOPS            4,526.25            6,295.31                           4,320.58
cpu % usr               4.62                2.96                               4.50
cpu % sys              21.45               17.88                              21.62
bw MB/s                18.54               25.78                              17.64

read / 8 job
                    v6.15               v6.16                   v6.16 + this commit
min IOPS           51,867.60           51,575.40                          45,257.00
max IOPS           67,513.60           64,456.40                          56,336.00
avg IOPS           64,314.80           62,136.76                          52,505.72
cpu % usr               3.98                3.72                               3.52
cpu % sys              16.70               17.16                              18.74
bw MB/s               263.60              254.40                             215.00

write / 1 job
                    v6.15               v6.16                   v6.16 + this commit
min IOPS            5,654.80            8,060.00                           5,730.80
max IOPS            6,720.40            8,852.00                           6,981.20
avg IOPS            6,576.91            8,579.81                           6,726.51
cpu % usr               7.48                3.79                               8.49
cpu % sys              41.09               23.27                              34.86
bw MB/s                26.96               35.16                              27.52

write / 8 job
                    v6.15               v6.16                   v6.16 + this commit
min IOPS           84,687.80           95,043.40                          74,799.60
max IOPS          107,620.80          113,572.00                          96,377.20
avg IOPS           97,910.86          105,927.38                          87,239.07
cpu % usr               5.43                4.38                               3.72
cpu % sys              21.73               20.29                              30.97
bw MB/s               400.80              433.80                             357.40

Board: sm8650-hdk
read / 1 job
                    v6.15               v6.16                   v6.16 + this commit
min IOPS            4,867.20            5,596.80                           4,242.80
max IOPS            5,211.60            5,970.00                           4,548.80
avg IOPS            5,126.12            5,847.93                           4,370.14
cpu % usr               3.83                2.81                               2.62
cpu % sys              18.29               13.44                              16.89
bw MB/s                20.98               17.88                              23.96

read / 8 job
                    v6.15               v6.16                   v6.16 + this commit
min IOPS           47,583.80           46,831.60                          47,671.20
max IOPS           58,913.20           59,442.80                          56,282.80
avg IOPS           53,609.04           44,396.88                          53,621.46
cpu % usr               3.57                3.06                               3.11
cpu % sys              15.23               19.31                              15.90
bw MB/s               219.40              219.60                             210.80

write / 1 job
                    v6.15               v6.16                   v6.16 + this commit
min IOPS            6,529.42            8,367.20                           6,492.80
max IOPS            7,856.92            9,244.40                           7,184.80
avg IOPS            7,676.21            8,991.67                           6,904.67
cpu % usr              10.17                7.98                               3.68
cpu % sys              37.55               34.41                              23.07
bw MB/s                31.44               28.28                              36.84

write / 8 job
                    v6.15               v6.16                   v6.16 + this commit
min IOPS           86,304.60           94,288.80                          78,433.60
max IOPS          105,670.80          110,373.60                          96,330.80
avg IOPS           97,418.81          103,789.76                          88,468.27
cpu % usr               4.98                3.27                               3.67
cpu % sys              21.45               30.85                              20.08
bw MB/s               399.00              362.40                             425.00

Assisted analysis gives:

IOPS (Input/Output Operations Per Second):
The v6.16 kernel shows a slight increase in average IOPS compared to v6.15 (43245.69 vs. 42144.88).
The v6.16+fix kernel significantly reduces average IOPS, dropping to 36946.17.

Bandwidth (MB/s):
The v6.16 kernel shows an increase in average bandwidth compared to v6.15 (180.72 MB/s vs. 172.59 MB/s).
The v6.16 with this commit significantly reduces average bandwidth, dropping to 151.32 MB/s.

Detailed Analysis:
Impact of v6.16 Kernel:
The v6.16 kernel introduces a minor improvement in IO performance compared to v6.15.
Both average IOPS and average bandwidth saw a small increase. This suggests that the v6.16
kernel might have introduced some optimizations that slightly improved overall IO performance.

Impact of the Fix:
The potential introduced appears to have a negative impact on both IOPS and bandwidth.
Both metrics show a substantial decrease compared to both v6.15 and v6.16.
This indicates that the fix might be detrimental to IO performance.

The threaded IRQ change did increase IOPS and Bandwidth, and stopped starving interrupts.
This change gives worse numbers than before the threaded IRQ.

Neil


Best regards,

Bart.


optimize() {
     local clkgate_enable c d devfreq disable_cpuidle governor nomerges iostats
     local target_freq ufs_irq_path

     if [ "$1" = performance ]; then
     clkgate_enable=0
     devfreq=max
     disable_cpuidle=1
     governor=performance
     # Enable I/O statistics because the performance impact is low and
     # because fio reports the I/O statistics.
     iostats=1
     # Disable merging to make tests follow the fio arguments.
     nomerges=2
     target_freq=cpuinfo_max_freq
     persist_logs=false
     else
     clkgate_enable=1
     devfreq=min
     disable_cpuidle=0
     governor=sched_pixel
     iostats=1
     nomerges=0
     target_freq=cpuinfo_min_freq
     persist_logs=true
     fi

     for c in $(adb shell "echo /sys/devices/system/cpu/cpu[0-9]*"); do
     for d in $(adb shell "echo $c/cpuidle/state[1-9]*"); do
         adb shell "if [ -e $d ]; then echo $disable_cpuidle > $d/disable; fi"
     done
     adb shell "cat $c/cpufreq/cpuinfo_max_freq > $c/cpufreq/scaling_max_freq;
                    cat $c/cpufreq/${target_freq} > $c/cpufreq/scaling_min_freq;
                    echo ${governor} > $c/cpufreq/scaling_governor; true" \
             2>/dev/null
     done

     if [ "$(adb shell grep -c ufshcd /proc/interrupts)" = 1 ]; then
     # No MCQ or MCQ disabled. Make the fastest CPU core process UFS
     # interrupts.
     # shellcheck disable=SC2016
     ufs_irq_path=$(adb shell 'a=$(echo /proc/irq/*/ufshcd); echo ${a%/ufshcd}')
     adb shell "echo ${fastest_cpucore} > ${ufs_irq_path}/smp_affinity_list; true"
     else
     # MCQ is enabled. Distribute the completion interrupts over the
     # available CPU cores.
     local i=0
     local irqs
     irqs=$(adb shell "sed -n 's/:.*GIC.*ufshcd.*//p' /proc/interrupts")
     for irq in $irqs; do
         adb shell "echo $i > /proc/irq/$irq/smp_affinity_list; true"
         i=$((i+1))
     done
     fi

     for d in $(adb shell echo /sys/class/devfreq/*); do
     case "$d" in
         *gpu0)
         continue
         ;;
     esac
     local min_freq
     min_freq=$(adb shell "cat $d/available_frequencies |
         tr ' ' '\n' |
         sort -n |
         case $devfreq in
             min) head -n1;;
             max) tail -n1;;
         esac")
     adb shell "echo $min_freq > $d/min_freq"
     # shellcheck disable=SC2086
     if [ "$devfreq" = "max" ]; then
         echo "$(basename $d)/min_freq: $(adb shell cat $d/min_freq) <> $min_freq"
     fi
     done

     for d in $(adb shell echo /sys/devices/platform/*.ufs); do
     adb shell "echo $clkgate_enable > $d/clkgate_enable"
     done

     adb shell setprop logd.logpersistd.enable ${persist_logs}

     adb shell "for b in /sys/class/block/{sd[a-z],dm*}; do
             if [ -e \$b ]; then
             [ -e \$b/queue/iostats     ] && echo ${iostats}   >\$b/queue/iostats;
             [ -e \$b/queue/nomerges    ] && echo ${nomerges}  >\$b/queue/nomerges;
             [ -e \$b/queue/rq_affinity ] && echo 2            >\$b/queue/rq_affinity;
             [ -e \$b/queue/scheduler   ] && echo ${iosched}   >\$b/queue/scheduler;
             fi
         done; true"

     adb shell "grep -q '^[^[:blank:]]* /sys/kernel/debug' /proc/mounts || mount -t debugfs none /sys/kernel/debug"
}








[Index of Archives]     [Linux SoC Development]     [Linux Rockchip Development]     [Linux for Synopsys ARC Processors]    
  • [Linux on Unisoc (RDA Micro) SoCs]     [Linux Actions SoC]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Linux SCSI]     [Yosemite News]

  •   Powered by Linux