Re: [PATCH v5 0/5] for-each-ref: introduce seeking functionality via '--start-after'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 17, 2025 at 12:26 PM Karthik Nayak <karthik.188@xxxxxxxxx> wrote:
>
> Kyle Lippincott <spectral@xxxxxxxxxx> writes:
>
> > On Wed, Jul 16, 2025 at 6:54 PM Jeff King <peff@xxxxxxxx> wrote:
> >>
> >> On Wed, Jul 16, 2025 at 06:19:32PM -0700, Kyle Lippincott wrote:
> >>
> >> > Unfortunately I can't provide great instructions for reproducing this
> >> > locally, because it relies on our internal build stack (which uses
> >> > blaze). Getting MemorySanitizer running can be quite annoying, though
> >> > you might not have any issues if this test doesn't invoke any third
> >> > party libraries (like zlib).
> >> >
> >> > I need to sign off for the night soon, but if this isn't sufficient
> >> > enough information to identify what's happening here, I can try to dig
> >> > deeper tomorrow. This run was executed on an import of upstream commit
> >> > 4ea3c74afd42a503b3e0d60e1fec33bc0431e7bc (Junio's merge of this
> >> > series)
> >>
> >> valgrind can often find the same issues as MSan without as much headache
> >> to get it running (the downside is that it is _way_ slower). And indeed:
> >>
> >>   git checkout 4ea3c74afd42a503b3e0d60e1fec33bc0431e7bc &&
> >>   make &&
> >>   (cd t && ./t6302-for-each-ref-filter.sh --valgrind-only=48)
> >>
> >> yields:
> >>
> >>   ==2177572== Conditional jump or move depends on uninitialised value(s)
> >>   ==2177572==    at 0x3BC380: cache_ref_iterator_advance (ref-cache.c:409)
> >>   ==2177572==    by 0x3B69D7: ref_iterator_advance (iterator.c:15)
> >>   ==2177572==    by 0x3B6CC3: merge_ref_iterator_advance (iterator.c:179)
> >>   ==2177572==    by 0x3B69D7: ref_iterator_advance (iterator.c:15)
> >>   ==2177572==    by 0x3A9770: files_ref_iterator_advance (files-backend.c:902)
> >>   ==2177572==    by 0x3B69D7: ref_iterator_advance (iterator.c:15)
> >>   ==2177572==    by 0x3B7457: do_for_each_ref_iterator (iterator.c:478)
> >>   ==2177572==    by 0x399B43: for_each_fullref_with_seek (ref-filter.c:2718)
> >>   ==2177572==    by 0x399C09: for_each_fullref_in_pattern (ref-filter.c:2756)
> >>   ==2177572==    by 0x39B031: do_filter_refs (ref-filter.c:3263)
> >>   ==2177572==    by 0x39B2B7: filter_and_format_refs (ref-filter.c:3364)
> >>   ==2177572==    by 0x18C1D2: cmd_for_each_ref (for-each-ref.c:115)
> >>   ==2177572==  Uninitialised value was created by a heap allocation
> >>   ==2177572==    at 0x484BDD0: realloc (vg_replace_malloc.c:1801)
> >>   ==2177572==    by 0x44E941: xrealloc (wrapper.c:140)
> >>   ==2177572==    by 0x3BCAD9: cache_ref_iterator_begin (ref-cache.c:580)
> >>   ==2177572==    by 0x3A988A: files_ref_iterator_begin (files-backend.c:995)
> >>   ==2177572==    by 0x3A295E: refs_ref_iterator_begin (refs.c:1776)
> >>   ==2177572==    by 0x399AF6: for_each_fullref_with_seek (ref-filter.c:2710)
> >>   ==2177572==    by 0x399C09: for_each_fullref_in_pattern (ref-filter.c:2756)
> >>   ==2177572==    by 0x39B031: do_filter_refs (ref-filter.c:3263)
> >>   ==2177572==    by 0x39B2B7: filter_and_format_refs (ref-filter.c:3364)
> >>   ==2177572==    by 0x18C1D2: cmd_for_each_ref (for-each-ref.c:115)
> >>   ==2177572==    by 0x128C90: run_builtin (git.c:480)
> >>   ==2177572==    by 0x1290EB: handle_builtin (git.c:746)
> >>
> >> Bisecting doesn't tell us much, though (the first commit that introduces
> >> the test shows the problem). I didn't dig further than that.
> >>
> >> -Peff
> >
> > Thanks for that, that helped me a bit too as it provides more
> > information than I was getting out of MemorySanitizer (I suspect
> > MemorySanitizer was producing the information it just wasn't going to
> > stderr or something, or maybe I was missing a flag to get it to report
> > more).
> >
>
> Thanks both for raising the issue. Thanks Jeff for also the valgrind
> instructions.
>
> On a sidenote, was discussing this at work and Patrick also mentioned
> that we could try clang's MemorySanitizer. This seems to also be raising
> issues on master, so it was hard to find the exact output that valgrind
> was providing.
>
> $ git checkout master
> $ CC=clang meson setup --reconfigure memory_build . -Db_sanitize=memory
> $ cd memory_build
> $ meson test -i --test-args="-ix" t6302-for-each-ref-filter
> ...
> ==3275333==WARNING: MemorySanitizer: use-of-uninitialized-value
>     #0 0x557bd886f4bb in git_mkstemps_mode ../wrapper.c:487:27
>     #1 0x557bd886fb55 in git_mkstemp_mode ../wrapper.c:509:9
>     #2 0x557bd8100d1a in create_tmpfile ../object-file.c:736:7
>     #3 0x557bd80f1630 in start_loose_object_common ../object-file.c:781:7
>     #4 0x557bd80f5203 in write_loose_object ../object-file.c:881:7
>     #5 0x557bd80f4875 in write_object_file_flags ../object-file.c:1086:6
>     #6 0x557bd80f9f65 in write_object_file ../object-file.h:181:9
>     #7 0x557bd8101eb8 in index_mem ../object-file.c:1177:9
>     #8 0x557bd80f8bd5 in index_core ../object-file.c:1247:10
>     #9 0x557bd80f731d in index_fd ../object-file.c:1274:9
>     #10 0x557bd80f95e4 in index_path ../object-file.c:1295:7
>     #11 0x557bd831132d in add_to_index ../read-cache.c:771:7
>     #12 0x557bd8313cb1 in add_file_to_index ../read-cache.c:804:9
>     #13 0x557bd73f892c in add_files ../builtin/add.c:355:7
>     #14 0x557bd73f4752 in cmd_add ../builtin/add.c:578:18
>     #15 0x557bd7a38b6f in run_builtin ../git.c:480:11
>     #16 0x557bd7a31d54 in handle_builtin ../git.c:746:9
>     #17 0x557bd7a36644 in run_argv ../git.c:813:4
>     #18 0x557bd7a30e09 in cmd_main ../git.c:953:19
>     #19 0x557bd7a3ca01 in main ../common-main.c:9:11
>     #20 0x7f7e3f02a4d7 in __libc_start_call_main
> (/nix/store/g2jzxk3s7cnkhh8yq55l4fbvf639zy37-glibc-2.40-66/lib/libc.so.6+0x2a4d7)
> (BuildId: f117ee0f586dfa828cbdd08e37393c8f04f6480a)
>     #21 0x7f7e3f02a59a in __libc_start_main@GLIBC_2.2.5
> (/nix/store/g2jzxk3s7cnkhh8yq55l4fbvf639zy37-glibc-2.40-66/lib/libc.so.6+0x2a59a)
> (BuildId: f117ee0f586dfa828cbdd08e37393c8f04f6480a)
>     #22 0x557bd7352b34 in _start (git+0x5db34)
>
> Possibly something we need to look into cleaning up.

I also saw those msan issues when trying `make
CFLAGS=-fsanitize=memory CC=clang`, but not with Google's internal
msan build. I don't know which variable in wrapper.c:487 it's
complaining about - you'd think it'd be `letters`, but if it's `v`,
then that potentially comes from OpenSSL or some other library, and
that library would also need to be built with msan (which is why it's
such a pain to get msan builds working - EVERY library needs to be
built with memory sanitizer).

>
> > I'm not sure what the right fix would be; my guess is that the
> > fix would be to modify the places where we set levels_nr and
> > initialize the other fields in level to also set it to prefix_state
> > (around lines 488 and 527 in ref-cache.c); and indeed setting the
> > prefix_state to PREFIX_CONTAINS_DIR (the 0 value of the enum) makes
> > the test pass even under valgrind. Unfortunately without a much more
> > in-depth knowledge of the code and the enum values I can't
> > definitively state that those are the correct values. I can say that
> > setting it to PREFIX_WITHIN_DIR causes both additional valgrind
> > failures and test failures even without valgrind, but setting it to
> > PREFIX_EXCLUDES_DIR doesn't seem to be a problem. I also moved the
> > `if` around like 409 into the following if, because that was the only
> > time entry_prefix_state was used, I'd been thinking that maybe it
> > needed the check for entry->flag & REF_DIR prior to referencing
> > level->prefix_state, but that didn't resolve it on its own.
> >
> > I don't mind if anyone else picks up this fix and runs with it, but
> > I'm not comfortable sending this patch myself because I don't have
> > enough knowledge of this are of the code to know if it's right, just
> > that it fixes the issue we encountered, and I'm extremely overloaded
> > right now and can't get that knowledge nor see the patch through to
> > the end.
> >
>
> Thanks for taking a stab at this, your inference is correct. Let me
> clairfy some parts of it.
>
> So the 'ref-cache' iteration logic is used to provide iteration over
> loose refs (which consists of directories and entries). Anytime we come
> across a directory, we add it to the level variable, which acts as
> stack, when all entries under the current level are yielded, we pop the
> stack to obtain the next level to iterate. This ensures we iterate over
> all directories recursively.
>
> Before this series, the seek function was used to set the prefix for
> iteration, which meant we need to find the directory for matching the
> prefix and only iterate over that level and its subdirs. If the prefix
> provided was a directory like 'refs/heads/' then all refs under that
> would be yielded (PREFIX_CONTAINS_DIR). If the prefix was
> 'refs/heads/foo', then the level would be set to 'ref/heads/' with the
> PREFIX_WITHIN_DIR flag set since only some refs within the dir would
> match the prefix. Entries which didn't overlap the prefix are denoted by
> PREFIX_EXCLUDES_DIR.
>
> This series allows the seek function to set the cursor without setting
> the prefix, which is a requirement for pagination. So there is no need
> to set 'prefix_state' for this functionality. Which is why I didn't set
> it, since the default value of '0' (PREFIX_CONTAINS_DIR) would be the
> correct setting for all dirs. This causes the issue.
>
> So the only fix required would be
>
> diff --git a/refs/ref-cache.c b/refs/ref-cache.c
> index 1d95b56d40..ceef3a2008 100644
> --- a/refs/ref-cache.c
> +++ b/refs/ref-cache.c
> @@ -527,6 +527,7 @@ static int cache_ref_iterator_seek(struct
> ref_iterator *ref_iterator,
>                                 level = &iter->levels[iter->levels_nr++];
>                                 level->dir = dir;
>                                 level->index = -1;
> +                               level->prefix_state = PREFIX_CONTAINS_DIR;
>                         } else {
>                                 /* reduce the index so the leaf node is iterated over */
>                                 if (cmp <= 0 && !slash)
>
>
> The other location (Line 488), is not needed because that is the root
> directory and the 'prefix_state' for it is set in
> 'cache_ref_iterator_set_prefix()' when the iterator begins.
>
> >
> > diff --git a/refs/ref-cache.c b/refs/ref-cache.c
> > index 1d95b56d40..24feb33fcb 100644
> > --- a/refs/ref-cache.c
> > +++ b/refs/ref-cache.c
> > @@ -391,7 +391,6 @@ static int cache_ref_iterator_advance(struct
> > ref_iterator *ref_iterator)
> >                         &iter->levels[iter->levels_nr - 1];
> >                 struct ref_dir *dir = level->dir;
> >                 struct ref_entry *entry;
> > -               enum prefix_state entry_prefix_state;
> >
> >                 if (level->index == -1)
> >                         sort_ref_dir(dir);
> > @@ -406,16 +405,17 @@ static int cache_ref_iterator_advance(struct
> > ref_iterator *ref_iterator)
> >
> >                 entry = dir->entries[level->index];
> >
> > -               if (level->prefix_state == PREFIX_WITHIN_DIR) {
> > -                       entry_prefix_state =
> > overlaps_prefix(entry->name, iter->prefix);
> > -                       if (entry_prefix_state == PREFIX_EXCLUDES_DIR ||
> > -                           (entry_prefix_state == PREFIX_WITHIN_DIR
> > && !(entry->flag & REF_DIR)))
> > -                               continue;
> > -               } else {
> > -                       entry_prefix_state = level->prefix_state;
> > -               }
> > -
> >                 if (entry->flag & REF_DIR) {
> > +                       enum prefix_state entry_prefix_state;
> > +                       if (level->prefix_state == PREFIX_WITHIN_DIR) {
> > +                               entry_prefix_state =
> > overlaps_prefix(entry->name, iter->prefix);
> > +                               if (entry_prefix_state == PREFIX_EXCLUDES_DIR ||
> > +                                   (entry_prefix_state ==
> > PREFIX_WITHIN_DIR && !(entry->flag & REF_DIR)))
> > +                                       continue;
> > +                       } else {
> > +                               entry_prefix_state = level->prefix_state;
> > +                       }
> > +
> >                         /* push down a level */
> >                         ALLOC_GROW(iter->levels, iter->levels_nr + 1,
> >                                    iter->levels_alloc);
> > @@ -489,6 +489,7 @@ static int cache_ref_iterator_seek(struct
> > ref_iterator *ref_iterator,
> >                 level = &iter->levels[0];
> >                 level->index = -1;
> >                 level->dir = dir;
> > +               level->prefix_state = PREFIX_EXCLUDES_DIR;      //
> > FIXME: PROBABLY NOT CORRECT
> >
> >                 /* Unset any previously set prefix */
> >                 FREE_AND_NULL(iter->prefix);
> > @@ -527,6 +528,7 @@ static int cache_ref_iterator_seek(struct
> > ref_iterator *ref_iterator,
> >                                 level = &iter->levels[iter->levels_nr++];
> >                                 level->dir = dir;
> >                                 level->index = -1;
> > +                               level->prefix_state =
> > PREFIX_EXCLUDES_DIR;      // FIXME: PROBABLY NOT CORRECT
> >                         } else {
> >                                 /* reduce the index so the leaf node
> > is iterated over */
> >                                 if (cmp <= 0 && !slash)





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux