Re: [PATCH] generic/551: prevent OOM when running on tmpfs with low memory

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



On Wed, Jun 18, 2025 at 03:00:12PM +0200, Daniel Gomez wrote:
> From: Daniel Gomez <da.gomez@xxxxxxxxxxx>
> 
> Running generic/551 on a tmpfs filesystem with less than 10 GB (ish)
> of RAM can lead to the system running out of memory, triggering the
> kernel's OOM killer and terminating the aio-dio-write-v process.
> 
> Fix generic/551 by substracting the amount of available memory allocated
                     ^^^^^^
                     subtracting

> for the tmpfs scratch device to the total available free memory.
> 
> Reported-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
> Signed-off-by: Daniel Gomez <da.gomez@xxxxxxxxxxx>
> ---
> While integrating tmpfs support for xfstests in kdevops CI [1], we noticed
> that generic/551 could trigger the OOM killer when the scratch device
> is tmpfs, due to not properly accounting for available system memory.
> Fix the test for tmpfs by subtracting the memory allocated to the
> scratch tmpfs mount from the total available memory, ensuring the test
> runs within safe limits.
> 
> [1]
> https://lore.kernel.org/all/20250615-ci-workflow-v1-0-53b267cd2f0a@xxxxxxxxxxx/
> 
> These are the kernel oom-killer logs for generic/551 run on a system
> with less than 10G of RAM:
> 
> run fstests generic/551 at 2025-06-18 11:42:44
> aio-dio-write-v invoked oom-killer:
> gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0,
> oom_score_adj=250
> CPU: 5 UID: 0 PID: 1717 Comm: aio-dio-write-v Not tainted
> 6.16.0-rc2-00049-g52da431bf03b #10 PREEMPT(full)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2025.02-6
> 04/08/2025
> {...}
> Tasks state (memory values in pages):
> [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem
> pgtables_bytes swapents oom_score_adj name
> {...}
> [   1717]     0  1717   876600   875978   875945       33         0
> 7065600        0           250 aio-dio-write-v
> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowe
> d=0,global_oom,task_memcg=/system.slice/fstests-generic-551.scope,task=a
> io-dio-write-v,pid=1717,uid=0
> Out of memory: Killed process 1717 (aio-dio-write-v) total-vm:3506400kB,
> anon-rss:3503780kB, file-rss:132kB, shmem-rss:0kB, UID:0 pgtables:6900kB
> oom_score_adj:250
> 
> Results collected with kdevops on the following tmpfs profiles
> (before/after the changes):
> 
> diff --git a/workflows/fstests/results/last-run/
> 6.16.0-rc2-00049-g52da431bf03b/xunit_results.txt
> b/workflows/fstests/results/last-run/
> 6.16.0-rc2-00049-g52da431bf03b/xunit_results.txt
> index 02a1f09..0229294 100644
> --- a/workflows/fstests/results/last-run/
> 6.16.0-rc2-00049-g52da431bf03b/xunit_results.txt
> +++ b/workflows/fstests/results/last-run/
> 6.16.0-rc2-00049-g52da431bf03b/xunit_results.txt
> @@ -1,21 +1,21 @@
>  KERNEL:    6.16.0-rc2-00049-g52da431bf03b
>  CPUS:      8
> 
> -tmpfs_noswap_huge_never: 1 tests, 1 failures, 9 seconds
> -  generic/551  Failed   8s
> -tmpfs_default: 1 tests, 1 failures, 5 seconds
> -  generic/551  Failed   4s
> -tmpfs_noswap_huge_within_size: 1 tests, 1 failures, 8 seconds
> -  generic/551  Failed   8s
> -tmpfs_huge_always: 1 tests, 1 failures, 11 seconds
> -  generic/551  Failed   11s
> -tmpfs_huge_within_size: 1 tests, 1 failures, 8 seconds
> -  generic/551  Failed   8s
> -tmpfs_noswap_huge_always: 1 tests, 1 failures, 6 seconds
> -  generic/551  Failed   6s
> -tmpfs_noswap_huge_advise: 1 tests, 1 failures, 8 seconds
> -  generic/551  Failed   7s
> -tmpfs_huge_advise: 1 tests, 1 failures, 8 seconds
> -  generic/551  Failed   7s
> -Totals: 8 tests, 0 skipped, 8 failures, 0 errors, 59s
> +tmpfs_noswap_huge_advise: 1 tests, 134 seconds
> +  generic/551  Pass     134s
> +tmpfs_noswap_huge_never: 1 tests, 141 seconds
> +  generic/551  Pass     141s
> +tmpfs_huge_advise: 1 tests, 142 seconds
> +  generic/551  Pass     142s
> +tmpfs_default: 1 tests, 139 seconds
> +  generic/551  Pass     139s
> +tmpfs_noswap_huge_always: 1 tests, 109 seconds
> +  generic/551  Pass     108s
> +tmpfs_noswap_huge_within_size: 1 tests, 116 seconds
> +  generic/551  Pass     115s
> +tmpfs_huge_within_size: 1 tests, 112 seconds
> +  generic/551  Pass     111s
> +tmpfs_huge_always: 1 tests, 145 seconds
> +  generic/551  Pass     145s
> +Totals: 8 tests, 0 skipped, 0 failures, 0 errors, 1035s
> ---
>  tests/generic/551 | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tests/generic/551 b/tests/generic/551
> index 4a7f0a638235e272ef55ffeb3b3e548707568379..6a7376d7a8e3580bee0a5c98eacdaf93c60c8d5c 100755
> --- a/tests/generic/551
> +++ b/tests/generic/551
> @@ -38,6 +38,7 @@ do_test()
>  	local truncsize
>  	local total_size=0
>  	local avail_mem=`_available_memory_bytes`
> +	[ "$FSTYP" = "tmpfs" ] && avail_mem=$((avail_mem - free_size_k * 1024))

This makes sense to me. But better to have a comment for this. If you don't
have more suggestions, I'll add below comment when I merge this patch.

# To avoid OOM on tmpfs, subtract the amount of available memory allocated
# for the tmpfs

Reviewed-by: Zorro Lang <zlang@xxxxxxxxxx>

>  
>  	# the number of AIO write operation
>  	num_oper=$((RANDOM % 64 + 1))
> 
> ---
> base-commit: b7680adf9ff7bdc962fb95b5cbd304abd3137b69
> change-id: 20250618-fix-tmpfs-generic-551-4c74b15d4c25
> 
> Best regards,
> -- 
> Daniel Gomez <da.gomez@xxxxxxxxxxx>
> 
> 





[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux