Hi, this patch series fixes races around locking the "packed-refs" file when auto-maintenance decides to repack it. This issue has been reported e.g. via [1] and [2]. The root cause is that git-gc(1) used to know to detach _after_ having repacked references. As such, callers wouldn't continue with their thing until we have already packed refs, and thus the race does not exist there. git-maintenance(1) didn't have the same split though, so this patch series retrofits that logic. The series is structured as follows: - Patches 1 and 2 do some light refactorings. - Patches 3 to 5 refactor how we set up the list of tasks to not rely on globals anymore. Instead, we now have a single source of truth for which tasks exactly will be run. - The remaining patches introduce the split of before/after-detach tasks and wire them up for "pack-refs", "reflog-expire" and "gc" tasks. Changes in v2: - A couple of commit message improvements. - Introduce `die(NULL)` to die with the correct exit code but no error message. This gets rid of some magic numbers. - Introduce an enum to discern the phases before and after detach. - Link to v1: https://lore.kernel.org/r/20250527-b4-pks-maintenance-ref-lock-race-v1-0-e1ceb2dea66e@xxxxxx Thanks! Patrick [1]: <CAJR-fbZ4X1+gN75m2dUvocR6NkowLOZ9F26cjBy8w1qd181OoQ@xxxxxxxxxxxxxx> [2]: <CANi7bVAkNc+gY1NoXfJuDRjxjZLTgL8Lfn8_ZmWsvLAoiLPkNg@xxxxxxxxxxxxxx> --- Patrick Steinhardt (12): builtin/gc: use designated field initializers for maintenance tasks builtin/gc: drop redundant local variable builtin/maintenance: centralize configuration of explicit tasks builtin/maintenance: mark "--task=" and "--schedule=" as incompatible builtin/maintenance: stop modifying global array of tasks builtin/maintenance: extract function to run tasks builtin/maintenance: fix typedef for function pointers builtin/maintenance: let tasks do maintenance before and after detach builtin/maintenance: fix locking race when packing refs and reflogs usage: allow dying without writing an error message builtin/gc: avoid global state in `gc_before_repack()` builtin/maintenance: fix locking race when handling "gc" task builtin/am.c | 4 +- builtin/checkout.c | 4 +- builtin/fetch.c | 2 +- builtin/gc.c | 394 +++++++++++++++++++++++++------------------- builtin/submodule--helper.c | 12 +- t/t7900-maintenance.sh | 19 ++- usage.c | 2 + 7 files changed, 250 insertions(+), 187 deletions(-) Range-diff versus v1: 1: 74fcc4e2251 = 1: 87df070a9e7 builtin/gc: use designated field initializers for maintenance tasks 2: 1cc513a7b0f = 2: e2acea10f7e builtin/gc: drop redundant local variable 3: be8c8a98892 = 3: 48a5e25c8bc builtin/maintenance: centralize configuration of explicit tasks 4: b19fa152c81 ! 4: 680b36e2fa6 builtin/maintenance: mark "--task=" and "--schedule=" as incompatible @@ Commit message The "--task=" option explicitly allows the user to say which maintenance tasks should be run, whereas "--schedule=" only respects the maintenance - strategy configured for a specific repository. As such, it is sensible - to accept both options at the same time. + strategy configured for a specific repository. As such, it is not + sensible to accept both options at the same time. Mark them as incompatible with one another. While at it, also convert the existing logic that marks "--auto" and "--schedule=" as incompatible 5: 8f692f30829 = 5: 9eaabb93edd builtin/maintenance: stop modifying global array of tasks 6: fc0ea110c01 = 6: ffee3ca3c6c builtin/maintenance: extract function to run tasks 7: b5821ef6cfe = 7: 66e1bb2111b builtin/maintenance: fix typedef for function pointers 8: 42a9210e445 ! 8: 4eed6a8dc9c builtin/maintenance: let tasks do maintenance before and after detach @@ Commit message the maintenance tasks are performed in the background. git-gc(1) has some special logic though to not perform _all_ housekeeping tasks in the background: both references and reflogs are still handled synchronously - ni the foreground. + in the foreground. This split exists because otherwise it may easily happen that git-gc(1) - keeps for the "packed-refs" file locked for an extended amount of time, + keeps the "packed-refs" file locked for an extended amount of time, where the next Git command that wants to modify any reference could now fail. This was especially important in the past, where git-gc(1) was still executed directly as part of our automatic maintenance: git-gc(1) @@ builtin/gc.c: typedef int (*maintenance_auto_fn)(struct gc_config *cfg); .auto_condition = rerere_gc_condition, }, }; -@@ builtin/gc.c: static const struct maintenance_task tasks[] = { + ++enum task_phase { ++ TASK_PHASE_BEFORE_DETACH, ++ TASK_PHASE_AFTER_DETACH, ++}; ++ static int maybe_run_task(const struct maintenance_task *task, struct repository *repo, struct maintenance_run_opts *opts, - struct gc_config *cfg) + struct gc_config *cfg, -+ int before) ++ enum task_phase phase) { ++ int before = (phase == TASK_PHASE_BEFORE_DETACH); + maintenance_task_fn fn = before ? task->before_detach : task->after_detach; + const char *region = before ? "maintenance before" : "maintenance"; int ret = 0; @@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts free(lock_path); + for (size_t i = 0; i < opts->tasks_nr; i++) -+ if (maybe_run_task(&tasks[opts->tasks[i]], r, opts, cfg, 1)) ++ if (maybe_run_task(&tasks[opts->tasks[i]], r, opts, cfg, ++ TASK_PHASE_BEFORE_DETACH)) + result = 1; + /* Failure to daemonize is ok, we'll continue in foreground. */ @@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts for (size_t i = 0; i < opts->tasks_nr; i++) - if (maybe_run_task(&tasks[opts->tasks[i]], r, opts, cfg)) -+ if (maybe_run_task(&tasks[opts->tasks[i]], r, opts, cfg, 0)) ++ if (maybe_run_task(&tasks[opts->tasks[i]], r, opts, cfg, ++ TASK_PHASE_AFTER_DETACH)) result = 1; rollback_lock_file(&lk); 9: 7859d3b9b4f = 9: 41ae51294e6 builtin/maintenance: fix locking race when packing refs and reflogs -: ----------- > 10: 2e1b4deb668 usage: allow dying without writing an error message 10: 18bce954787 ! 11: 7d9be688eb4 builtin/gc: avoid global state in `gc_before_repack()` @@ builtin/gc.c: int cmd_gc(int argc, - gc_before_repack(&opts, &cfg); /* dies on failure */ + if (gc_before_repack(&opts, &cfg) < 0) -+ exit(127); ++ die(NULL); delete_tempfile(&pidfile); /* 11: ff92709bf6c ! 12: 291498849b8 builtin/maintenance: fix locking race when handling "gc" task @@ builtin/gc.c: int cmd_gc(int argc, + } - if (gc_before_repack(&opts, &cfg) < 0) -- exit(127); +- die(NULL); - delete_tempfile(&pidfile); + if (gc_before_detach(&opts, &cfg) < 0) -+ exit(127); ++ die(NULL); + delete_tempfile(&pidfile); + } --- base-commit: 845c48a16a7f7b2c44d8cb137b16a4a1f0140229 change-id: 20250527-b4-pks-maintenance-ref-lock-race-11ae5d68e06f