On Sat, Aug 02, 2025 at 06:20:20PM +0930, Qu Wenruo wrote: > > > 在 2025/8/2 18:12, Ojaswin Mujoo 写道: > > On Fri, Jul 11, 2025 at 06:13:05PM +0930, Qu Wenruo wrote: > > > [FAILURE] > > > Test case btrfs/282 still fails on some setup: > > > > > > output mismatch (see /opt/xfstests/results//btrfs/282.out.bad) > > > --- tests/btrfs/282.out 2025-06-27 22:00:35.000000000 +0200 > > > +++ /opt/xfstests/results//btrfs/282.out.bad 2025-07-08 20:40:50.042410321 +0200 > > > @@ -1,3 +1,4 @@ > > > QA output created by 282 > > > wrote 2147483648/2147483648 bytes at offset 0 > > > XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > > > +scrub speed 2152038400 Bytes/s is not properly throttled, target is 1076019200 Bytes/s > > > ... > > > (Run diff -u /opt/xfstests/tests/btrfs/282.out /opt/xfstests/results//btrfs/282.out.bad to see the entire diff) > > > > > > [CAUSE] > > > Checking the full output, it shows the scrub is running too fast: > > > > > > Starting scrub on devid 1 > > > scrub done for c45c8821-4e55-4d29-8172-f1bf30b7182c > > > Scrub started: Tue Jul 8 20:40:47 2025 > > > Status: finished > > > Duration: 0:00:00 <<< > > > Total to scrub: 2.00GiB > > > Rate: 2.00GiB/s > > > Error summary: no errors found > > > Starting scrub on devid 1 > > > scrub done for c45c8821-4e55-4d29-8172-f1bf30b7182c > > > Scrub started: Tue Jul 8 20:40:48 2025 > > > Status: finished > > > Duration: 0:00:01 > > > Total to scrub: 2.00GiB > > > Rate: 2.00GiB/s > > > Error summary: no errors found > > > > > > The original run takes less than 1 seconds, making the scrub rate > > > calculation very unreliable, no wonder the speed limit is not able to > > > properly work. > > > > > > [FIX] > > > Instead of using fixed 2GiB file size, let the test create a filler for > > > 4 seconds with direct IO, this would more or less ensure the scrub will > > > take 4 seconds to run. > > > > > > With 4 seconds as run time, the scrub rate can be calculated more or > > > less reliably. > > > > > > Furthermore since btrfs-progs currently only reports scrub duration in > > > seconds, to prevent problems due to 1 second difference, enlarge the > > > tolerance to +/- 25%, to be extra safe. > > > > > > On my testing VM, the result looks like this: > > > > > > Starting scrub on devid 1 > > > scrub done for b542bdfb-7be4-44b3-add0-ad3621927e2b > > > Scrub started: Fri Jul 11 09:13:31 2025 > > > Status: finished > > > Duration: 0:00:04 > > > Total to scrub: 2.72GiB > > > Rate: 696.62MiB/s > > > Error summary: no errors found > > > Starting scrub on devid 1 > > > scrub done for b542bdfb-7be4-44b3-add0-ad3621927e2b > > > Scrub started: Fri Jul 11 09:13:35 2025 > > > Status: finished > > > Duration: 0:00:08 > > > Total to scrub: 2.72GiB > > > Rate: 348.31MiB/s > > > Error summary: no errors found > > > > > > However this exposed a new failure mode, that if the storage is too > > > fast, like the original report, that the initial 4 seconds write can > > > fill the fs and exit early. > > > > > > In that case we have no other solution but skipping the test case. > > > > Hi Qu, > > > > Thanks for tuning the test, we have also been facing intermittent failures > > on btrfs/282. > > > > I was just wondering if for faster devices, would it make sense to use > > the io cgroup controller, eg: > > > > echo "252:0 rbps=1048576 wbps=1048576" > /sys/fs/cgroup/io_limit/io.max > > > > To limit the throughput so we have >= 4s scrub runs. Or does that also have > > some undesired effects like you mentioned for dm_delay here [1] > > If cgroup works, it will be the best solution, we can fix the throughput to > 512MiB/s, and use a 2GiB file to ensure 4s of scrub runtime. > > This will get rid of the speed test part. > > The only problem is I'm not familiar with the cgroup infrastructure, if you > can enhance the test case to use cgroup, it would be awesome. > > Thanks, > Qu Hi Qu, So the command I pasted above did help me limit the dd throughput to 1MB/s so it does seem to be doing what it advertises :) I don't know much about cgroups either but sure I can spend some time understanding the io controller better and try to incorporate it in the test. Will try to send a separate patch for it. Thanks, Ojaswin > > > > > Regards, > > Ojaswin > > > > [1] https://lore.kernel.org/all/103e1b45-19d9-4438-b70d-892757f695fc@xxxxxxx/ > > > > > > Signed-off-by: Qu Wenruo <wqu@xxxxxxxx> > > > --- > > >