On Fri, Aug 01, 2025 at 04:41:46PM +0800, liuhuan01@xxxxxxxxxx wrote: > From: liuh <liuhuan01@xxxxxxxxxx> > > When a directory contains billions subdirs, readdir() repeatedly > got same data and goes to infinate loop. FWIW, we don't support "billions of dirents in a directory" in XFS. The max capacity is 1.43 billion dirents, and much less than that is filenames are anything but minimum length to encode 1.43 billion unique names. e.g. if we limit outselves to ascii hex (e.g. for hash based filenames in an object store), we need 31 characters to encode enough entries to fill the entire directory data space (32GB). At this point, the dirent size is 48 bytes (instead of 24 for the minimum length encoding), and so the maximum number of entries ends up being around 700 million. In this case, we'd hit the looping problem at about 350 million entries into the getdents operation. The issue is that when we start filling the upper 16GB of the data segment, the dataptr exceeds 2^31 in length and that high bit is then filtered off, even on 64 bit systems. IOWs, the problem is not triggered by the number of entries, but by the amount of space being consumed in the directory data segment. Thing is, the kernel directory context uses a loff_t for the dirent position (i.e. the readdir cookie). So, in the kernel, it is always 64 bits because: typedef long long __kernel_loff_t; And so the low level directory iteration code in XFS does not need to truncate the dir_context->pos value to 31 bits, especially as the position is always a 32 bit value. > @@ -491,9 +491,9 @@ xfs_dir2_leaf_getdents( > * All done. Set output offset value to current offset. > */ > if (curoff > xfs_dir2_dataptr_to_byte(XFS_DIR2_MAX_DATAPTR)) > - ctx->pos = XFS_DIR2_MAX_DATAPTR & 0x7fffffff; > + ctx->pos = XFS_DIR2_MAX_DATAPTR; I think that code is wrong to begin with: if the curoff is beyond 32GB, something is badly wrong with the directory structure. i.e. we've had a directory data segement overrun. This can only happen if there's been a corruption we haven't caught or some kind of bug was tripped over. This condition should result in failing the operation and returning -EFSCORRUPTED, not truncating the directory offset.... This also points out the big problem with the seekdir/telldir APIs on 32 bit systems. telldir returns a signed long for the dirent cookie, and whilst the man page says: Application programs should treat this strictly as an opaque value, making no assumptions about its contents. Despite this, the value of -1 (0xffffffffff on 32 bit systems) is not allowed to be used as the dir cookie on 32 bit systems as this is the indicator that telldir() uses to inform the application that it encountered an error. Hence we cannot return XFS_DIR2_MAX_DATAPTR as a valid file position during getdents on 32 bit systems, nor should we accept it from seekdir() operations on directories..... Similarly, seekdir() on a 32 bit system won't support cookie values over 2^31 (i.e. negative 32 bit values) because XFS doesn't set FOP_UNSIGNED_OFFSET for directory file objects. Hence seekdir() will fail with -EINVAL as the offset gets interpretted as being less than zero... IOWs, 32 bit APIs are a mess w.r.t. unsigned 32 bit dir cookies, and so the filtering of the high bit was an attempt to avoid those issues. Using hundreds of millions of entries in a single directory is pretty much always a bad idea, so this really hasn't been an issue that anyone has cared about in production systems. If we want to fix it, the handling of 32 bit offset overflows stuff should really be moved to a higher level (e.g. to xfs_file_readdir() and, perhaps, new directory llseek functions). We probably should also be configuring directory files on 32 bit systems with FOP_UNSIGNED_OFFSET so that everything knows that we are using the whole 32 bit cookie space with seekdir/telldir... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx