Recent Linux kernel commit breaks Gnulib test suite.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Using the following testdir:

    $ git clone https://git.savannah.gnu.org/git/gnulib.git && cd gnulib
    $ ./gnulib-tool --create-testdir --dir testdir1 --single-configure `./gnulib-tool --list | grep acl`

I see the following result:

    $ cd testdir1 && ./configure && make check
    [...]
    FAIL: test-copy-acl.sh
    [...]
    FAIL: test-file-has-acl.sh

This occurs with these two kernels:

    $ uname -r
    6.14.9-300.fc42.x86_64
    $ uname -r
    6.14.8-300.fc42.x86_64

But with this kernel:

    $ uname -r
    6.14.6-300.fc42.x86_64

The result is:

    $ cd testdir1 && ./configure && make check
    [...]
    PASS: test-copy-acl.sh
    [...]
    PASS: test-file-has-acl.sh

Here is the test-suite.log from 6.14.9-300.fc42.x86_64:

    FAIL: test-copy-acl.sh
    ======================
    
    /home/collin/.local/src/gnulib/testdir1/gltests/test-copy-acl: preserving permissions for 'tmpfile2': Numerical result out of range
    FAIL test-copy-acl.sh (exit status: 1)
    
    FAIL: test-file-has-acl.sh
    ==========================
    
    file_has_acl("tmpfile0") returned no, expected yes
    FAIL test-file-has-acl.sh (exit status: 1)

To investigate further, I created the testdir again after applying the
following diff:

    diff --git a/tests/test-copy-acl.sh b/tests/test-copy-acl.sh
    index 061755f124..f9457e884f 100755
    --- a/tests/test-copy-acl.sh
    +++ b/tests/test-copy-acl.sh
    @@ -209,7 +209,7 @@ cd "$builddir" ||
       {
         echo "Simple contents" > "$2"
         chmod 600 "$2"
    -    ${CHECKER} "$builddir"/test-copy-acl${EXEEXT} "$1" "$2" || exit 1
    +    ${CHECKER} strace "$builddir"/test-copy-acl${EXEEXT} "$1" "$2" || exit 1
         ${CHECKER} "$builddir"/test-sameacls${EXEEXT} "$1" "$2" || exit 1
         func_test_same_acls                           "$1" "$2" || exit 1
       }

Then running the test from inside testdir1/gltests:

    $ ./test-copy-acl.sh
    [...]
    access("/etc/selinux/config", F_OK)     = 0
    openat(AT_FDCWD, "tmpfile0", O_RDONLY)  = 3
    fstat(3, {st_mode=S_IFREG|0610, st_size=16, ...}) = 0
    openat(AT_FDCWD, "tmpfile2", O_WRONLY)  = 4
    fchmod(4, 0610)                         = 0
    flistxattr(3, NULL, 0)                  = 17
    flistxattr(3, 0x7ffda3f6c900, 17)       = -1 ERANGE (Numerical result out of range)
    write(2, "/home/collin/.local/src/gnulib/t"..., 63/home/collin/.local/src/gnulib/testdir1/gltests/test-copy-acl: ) = 63
    write(2, "preserving permissions for 'tmpf"..., 37preserving permissions for 'tmpfile2') = 37
    write(2, ": Numerical result out of range", 31: Numerical result out of range) = 31
    write(2, "\n", 1
    )                       = 1
    exit_group(1)                           = ?
    +++ exited with 1 +++

So, we get the buffer size from 'flistxattr(3, NULL, 0)' and then call
it again after allocating it 'flistxattr(3, 0x7ffda3f6c900, 17)'. This
shouldn't fail with ERANGE then.

To confirm, I replaced 'strace' with 'gdb --args'. Here is the result:

    (gdb) b qcopy_acl 
    Breakpoint 1 at 0x400a10: file qcopy-acl.c, line 84.
    (gdb) run
    Starting program: /home/collin/.local/src/gnulib/testdir1/gltests/test-copy-acl tmpfile0 tmpfile2
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib64/libthread_db.so.1".
    
    Breakpoint 1, qcopy_acl (src_name=src_name@entry=0x7fffffffd7c3 "tmpfile0", source_desc=source_desc@entry=3, 
        dst_name=dst_name@entry=0x7fffffffd7cc "tmpfile2", dest_desc=dest_desc@entry=4, mode=mode@entry=392) at qcopy-acl.c:84
    84	  ret = chmod_or_fchmod (dst_name, dest_desc, mode);
    (gdb) n
    90	  if (ret == 0)
    (gdb) n
    92	      ret = source_desc <= 0 || dest_desc <= 0
    (gdb) s
    attr_copy_fd (src_path=src_path@entry=0x7fffffffd7c3 "tmpfile0", src_fd=src_fd@entry=3, dst_path=dst_path@entry=0x7fffffffd7cc "tmpfile2", 
        dst_fd=dst_fd@entry=4, check=check@entry=0x4009b0 <is_attr_permissions>, ctx=ctx@entry=0x0) at libattr/attr_copy_fd.c:73
    73		if (check == NULL)
    (gdb) n
    76		size = flistxattr (src_fd, NULL, 0);
    (gdb) n
    77		if (size < 0) {
    (gdb) print size
    $1 = 17
    (gdb) n
    86		names = (char *) my_alloc (size+1);
    (gdb) n
    92		size = flistxattr (src_fd, names, size);
    (gdb) print errno
    $2 = 0
    (gdb) n
    93		if (size < 0) {
    (gdb) print size
    $3 = -1
    (gdb) print errno
    $4 = 34

After confirming with the Fedora Kernel tags [1], I am fairly confident
that it was caused by this commit [2].

But I am not familiar enough with ACLs, SELinux, or the Kernel to know
the fix.

Adding the lists where this was discussed and some of the signers to CC,
since they will know better than me.

Collin

[1] https://gitlab.com/cki-project/kernel-ark
[2] https://github.com/torvalds/linux/commit/8b0ba61df5a1c44e2b3cf683831a4fc5e24ea99d




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux