On 8/21/2025 4:01 AM, Patrick Steinhardt wrote: > On Wed, Aug 20, 2025 at 03:42:11PM -0400, Derrick Stolee wrote: >> On 8/20/2025 3:02 PM, Junio C Hamano wrote: >>> "Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: >> The key integration point is the "pending" list operating a bit >> different from walking directly from tags or commits. I was trying >> to reproduce the issue from all of those other sources before unlocking >> the "singleton" nature of the problem, and failed to do so. >> >> The resolve-undo cache (REUC) is something that I had not tested >> previously. Adding "git rm --cached x/y" to the test in the previous >> case leads to the 'git fsck' call giving a "dangling blob" warning, >> so that could be an interesting way to strengthen the test. Thanks, > > I also wonder a bit about the future -- if we ever add a new source for > pending objects, would the author have to amend "path-walk.c" to take > this new pending source into account? I don't think the risk comes from new ways to add pending objects, but that the path-walk algorithm added an "optimization" without appropriately modifying how it handled the pending objects. New sources of pending objects shouldn't cause any issues, unless the revision API itself changed substantially. For example, in my draft for v2 I add changes to my test to account for the REUC in the repack. The new logic for the path-walk feature worked just fine. It does point out that 'git fsck' doesn't follow links in the REUC and so reports the blob as dangling. If we have a new source for pending objects in the context of pack-objects or repack, one would hope that tests would be added to demonstrate the behavior and then that could be double-checked with GIT_TEST_PACK_PATH_WALK=1. One problem in this situation is that the tests were not substantial enough for these sources. We are correcting for that now. > I guess the answer is "yes", which does make me feel a bit uneasy as > it is very easy to now corrupt the repository. Perhaps this has always been very easy to make a bug that leads to losing objects, but we're purposefully wary to touch the logic around the repacking processes. One feature that could provide increased confidence is some step that double-checks the object list before and after a step like this to make sure that no objects are dropped (or only the set we expect is dropped). Something like comparing the object lists in the pack-indexes before and after a repack. This might be too problematic to enable in all cases, but could be enabled in the test suite and certain critical places. Azure DevOps has something similar in its backend to prevent any change to the Git object database that can't be undone, but these steps happen asynchronously to "production data" so it may not be appropriate for all server architectures. Thanks, -Stolee