Re: [PATCH v1] xfs: prevent readdir infinite loop with billions subdirs

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 12 Aug 2025 10:22:24 +1000

On Fri, Aug 01, 2025 at 04:41:46PM +0800, liuhuan01@xxxxxxxxxx wrote:
> From: liuh <liuhuan01@xxxxxxxxxx>
> 
> When a directory contains billions subdirs, readdir() repeatedly
> got same data and goes to infinate loop.

FWIW, we don't support "billions of dirents in a directory" in XFS.
The max capacity is 1.43 billion dirents, and much less than that is
filenames are anything but minimum length to encode 1.43 billion
unique names.

e.g. if we limit outselves to ascii hex (e.g. for hash based
filenames in an object store), we need 31 characters to encode
enough entries to fill the entire directory data space (32GB).
At this point, the dirent size is 48 bytes (instead of 24 for the
minimum length encoding), and so the maximum
number of entries ends up being around 700 million.

In this case, we'd hit the looping problem at about 350 million
entries into the getdents operation.

The issue is that when we start filling the upper 16GB of the data
segment, the dataptr exceeds 2^31 in length and that high bit is
then filtered off, even on 64 bit systems.

IOWs, the problem is not triggered by the number of entries, but by
the amount of space being consumed in the directory data segment.

Thing is, the kernel directory context uses a loff_t for the dirent
position (i.e. the readdir cookie). So, in the kernel, it is always
64 bits because:

typedef long long       __kernel_loff_t;

And so the low level directory iteration code in XFS does not need
to truncate the dir_context->pos value to 31 bits, especially as
the position is always a 32 bit value.

> @@ -491,9 +491,9 @@ xfs_dir2_leaf_getdents(
>  	 * All done.  Set output offset value to current offset.
>  	 */
>  	if (curoff > xfs_dir2_dataptr_to_byte(XFS_DIR2_MAX_DATAPTR))
> -		ctx->pos = XFS_DIR2_MAX_DATAPTR & 0x7fffffff;
> +		ctx->pos = XFS_DIR2_MAX_DATAPTR;

I think that code is wrong to begin with: if the curoff is beyond
32GB, something is badly wrong with the directory structure. i.e.
we've had a directory data segement overrun.

This can only happen if there's been a corruption we haven't caught
or some kind of bug was tripped over.  This condition should result
in failing the operation and returning -EFSCORRUPTED, not truncating
the directory offset....

This also points out the big problem with the seekdir/telldir APIs
on 32 bit systems. telldir returns a signed long for the dirent
cookie, and whilst the man page says:

	Application programs should treat this strictly as an opaque
	value, making no assumptions about its contents.

Despite this, the value of -1 (0xffffffffff on 32 bit systems) is
not allowed to be used as the dir cookie on 32 bit systems as this
is the indicator that telldir() uses to inform the application that
it encountered an error.

Hence we cannot return XFS_DIR2_MAX_DATAPTR as a valid file position
during getdents on 32 bit systems, nor should we accept it from
seekdir() operations on directories.....

Similarly, seekdir() on a 32 bit system won't support cookie
values over 2^31 (i.e. negative 32 bit values) because XFS doesn't
set FOP_UNSIGNED_OFFSET for directory file objects. Hence seekdir()
will fail with -EINVAL as the offset gets interpretted as being less
than zero...

IOWs, 32 bit APIs are a mess w.r.t. unsigned 32 bit dir cookies,
and so the filtering of the high bit was an attempt to avoid those
issues. Using hundreds of millions of entries in a single directory
is pretty much always a bad idea, so this really hasn't been an
issue that anyone has cared about in production systems.

If we want to fix it, the handling of 32 bit offset overflows stuff
should really be moved to a higher level (e.g. to xfs_file_readdir()
and, perhaps, new directory llseek functions). We probably should
also be configuring directory files on 32 bit systems with
FOP_UNSIGNED_OFFSET so that everything knows that we are using the
whole 32 bit cookie space with seekdir/telldir...

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx