On Sun, Jun 8, 2025 at 12:13 PM Anuj gupta <anuj1072538@xxxxxxxxx> wrote: > > On Sat, Jun 7, 2025 at 5:12 AM Joanne Koong <joannelkoong@xxxxxxxxx> wrote: > > > > This series adds fuse iomap support for buffered writes and dirty folio > > writeback. This is needed so that granular dirty tracking can be used in > > fuse when large folios are enabled so that if only a few bytes in a large > > folio are dirty, only a smaller portion is written out instead of the entire > > folio. > > > > In order to do so, a new iomap type, IOMAP_IN_MEM, is added that is more > > generic and does not depend on the block layer. The parts of iomap buffer io > > that depend on bios and CONFIG_BLOCK is moved to a separate file, > > buffered-io-bio.c, in order to allow filesystems that do not have CONFIG_BLOCK > > set to use IOMAP_IN_MEM buffered io. > > > > This series was run through fstests with large folios enabled and through > > some quick sanity checks on passthrough_hp with a) writing 1 GB in 1 MB chunks > > and then going back and dirtying a few bytes in each chunk and b) writing 50 MB > > in 1 MB chunks and going through dirtying the entire chunk for several runs. > > a) showed about a 40% speedup increase with iomap support added and b) showed > > roughly the same performance. > > > > This patchset does not enable large folios yet. That will be sent out in a > > separate future patchset. > > > > > > Thanks, > > Joanne > > Hi Joanne, > > I tried experimenting with your patch series to evaluate its impact. To > measure the improvement, I enabled large folios for FUSE. In my setup, > I observed a ~43% reduction in writeback time. > > Here’s the script[1] I used to benchmark FUSE writeback performance > based on the details you shared: It formats and mounts an XFS volume, > runs the passthrough_hp FUSE daemon, writes 1MB chunks to populate the > file, and then issues 4-byte overwrites to test fine-grained writeback > behavior. > > If I’ve missed anything or there’s a better way to evaluate this, I’d > really appreciate your input — I’m still getting up to speed with FUSE > internals. Hi Anuj, Thanks for testing it out locally and sharing the benchmarks. Your test is pretty much the same as mine (except the underlying filesystem I used for passthrough is ext4). I saw roughly a 40% speedup as well for buffered writes. My main concern was whether iomap overhead ends up slowing down writes that don't need granular dirty tracking. I didn't see any noticeable difference in performance though when I tested it out by writing out all entire chunks. > > [1] > > #!/bin/bash > set -e > > DEVICE="/dev/nvme0n1" > BACKING_MNT="/mnt" > FUSE_MNT="/tmp/fusefs" > CHUNK_MB=1 > TOTAL_MB=1024 > DIRTY_BYTES=4 > REPEATS=5 > LOGFILE="fuse_test_results.csv" > DIR=$(date +"%H-%M-%S-%d-%m-%y") > > mkdir $DIR > echo "$DIR created" > > mkdir -p "$BACKING_MNT" "$FUSE_MNT" > > echo "run,duration_seconds" > "$LOGFILE" > > for run in $(seq 1 $REPEATS); do > echo "[Run $run] Formatting $DEVICE with XFS..." > mkfs.xfs -f "$DEVICE" > > echo "[Run $run] Mounting XFS to $BACKING_MNT..." > mount "$DEVICE" "$BACKING_MNT" > > echo "[Run $run] Starting passthrough_hp on $FUSE_MNT..." > ./passthrough_hp --nopassthrough "$BACKING_MNT" "$FUSE_MNT" & > sleep 2 > > echo "[Run $run] Dropping caches and syncing..." > sync > echo 3 > /proc/sys/vm/drop_caches > > TEST_FILE="$FUSE_MNT/testfile_run${run}" > > for ((i=0; i<$TOTAL_MB; i++)); do > dd if=/dev/urandom bs=1M count=1 oflag=direct seek=$i > of=$TEST_FILE status=none > done > > START=$(date +%s.%N) > for ((i=0; i<$TOTAL_MB; i++)); do > offset=$((i * 1048576 + 1048572)) > #offset=$((i * 1048576)) > dd if=/dev/urandom bs=1 count=$DIRTY_BYTES of=$TEST_FILE > seek=$offset status=none > done > > fusermount -u "$FUSE_MNT" > umount "$BACKING_MNT" > > END=$(date +%s.%N) > DURATION=$(echo "$END - $START" | bc) > echo "$run,$DURATION" >> $DIR/"$LOGFILE" > echo "[Run $run] Duration: ${DURATION}s" > done > > echo "All runs complete. Results saved to $DIR/$LOGFILE."