Hi all, While examining mm/filemap.c, I noticed that file->f_ra.mmap_miss does not increase as expected under mmap-based random read workloads, which prevents readahead from being disabled—even when it's clearly ineffective. Test case: 4GB file mmap'd and randomly accessed in a KVM guest with 2GB RAM. See benchmark code attached at the end. I used the following bpftrace to monitor readahead activity: kfunc:vmlinux:do_page_cache_ra { printf("size: %d start: %d mmap_miss: %d from %s\n", args->ractl->file->f_ra.size, args->ractl->file->f_ra.start, args->ractl->file->f_ra.mmap_miss, comm); } The result is that mmap_miss remains low, and readahead remains enabled. From filemap_map_pages(), this appears to be due to the logic in mm/filemap.c:filemap_map_pages that treats the surrounding folios of a faulted-in page as asynchronous hits and subtracts them from mmap_miss: mmap_miss_saved = READ_ONCE(file->f_ra.mmap_miss); if (mmap_miss >= mmap_miss_saved) WRITE_ONCE(file->f_ra.mmap_miss, 0); else WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss_saved - mmap_miss); This suppresses mmap_miss growth even when faults are clearly synchronous. I commented out the above block, re-run the test and saw the benchmark time drop from ~6200 ms to ~1500 ms, indicating that readahead was being wrongly retained. Jan Kara previously mentioned a similar issue in [1]: > I see, OK. But that's a (longstanding) bug in how mmap_miss is handled. Can > you please test whether attached patches fix the trashing for you? At least > now I can see mmap_miss properly increments when we are hitting uncached > pages... [1] https://lore.kernel.org/all/20240201173130.frpaqpy7iyzias5j@quack3/ So my questions are: 1. Is this mmap_miss suppression intentional? 2. Was the design intended to avoid false positives for disabling readahead? 3. Would it make sense to reclassify the "asynchronous hits" in filemap_map_pages() to exclude those resulting directly from the current fault? Benchmark below. #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/mman.h> #include <stdint.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <errno.h> #include <time.h> #include <string.h> #define PAGE_SIZE 4096 void clear_page_cache() { sync(); int fd = open("/proc/sys/vm/drop_caches", O_WRONLY); if (fd == -1) { perror("open"); return; } if (write(fd, "3\n", 2) == -1) { perror("write"); } close(fd); } void rand_read(const char *memblock, uint64_t size, uint64_t nr) { for (uint64_t i = 0; i < nr; i++) { uint64_t pos = ((uint64_t)rand()) * rand() % size; if (memblock[pos] == '7') printf("Magic number!\n"); } } long long get_time_ms() { struct timespec ts; clock_gettime(CLOCK_MONOTONIC, &ts); return (long long)ts.tv_sec * 1000 + ts.tv_nsec / 1000000; } int main(int argc, char *argv[]) { if (argc < 2) { fprintf(stderr, "Usage: %s <filename> [num_accesses]\n", argv[0]); return 1; } int fd = open(argv[1], O_RDONLY); if (fd == -1) { perror("open file"); return 1; } struct stat sb; fstat(fd, &sb); const char *memblock = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0); if (memblock == MAP_FAILED) { perror("mmap"); return 1; } uint64_t nr_access = (argc > 2) ? strtoull(argv[2], NULL, 10) : (512 * 1024); clear_page_cache(); long long start = get_time_ms(); rand_read(memblock, sb.st_size, nr_access); long long end = get_time_ms(); printf("Rand Read Time: %lldms\n", end - start); return 0; } Reproduction steps: 1. save the above code as randread.c 2. # gcc -O2 -o randread randread.c 3. # fallocate -l 4G testfile 4. # ./randread testfile 524288 5. Example output: Rand Read Time: 1400ms Thanks, Haoran Zhu