[PATCH v2 00/12] builtin/maintenance: fix ref lock races when detaching

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

this patch series fixes races around locking the "packed-refs" file when
auto-maintenance decides to repack it. This issue has been reported e.g.
via [1] and [2].

The root cause is that git-gc(1) used to know to detach _after_ having
repacked references. As such, callers wouldn't continue with their thing
until we have already packed refs, and thus the race does not exist
there. git-maintenance(1) didn't have the same split though, so this
patch series retrofits that logic.

The series is structured as follows:

  - Patches 1 and 2 do some light refactorings.

  - Patches 3 to 5 refactor how we set up the list of tasks to not rely
    on globals anymore. Instead, we now have a single source of truth
    for which tasks exactly will be run.

  - The remaining patches introduce the split of before/after-detach
    tasks and wire them up for "pack-refs", "reflog-expire" and "gc"
    tasks.

Changes in v2:
  - A couple of commit message improvements.
  - Introduce `die(NULL)` to die with the correct exit code but no error
    message. This gets rid of some magic numbers.
  - Introduce an enum to discern the phases before and after detach.
  - Link to v1: https://lore.kernel.org/r/20250527-b4-pks-maintenance-ref-lock-race-v1-0-e1ceb2dea66e@xxxxxx

Thanks!

Patrick

[1]: <CAJR-fbZ4X1+gN75m2dUvocR6NkowLOZ9F26cjBy8w1qd181OoQ@xxxxxxxxxxxxxx>
[2]: <CANi7bVAkNc+gY1NoXfJuDRjxjZLTgL8Lfn8_ZmWsvLAoiLPkNg@xxxxxxxxxxxxxx>

---
Patrick Steinhardt (12):
      builtin/gc: use designated field initializers for maintenance tasks
      builtin/gc: drop redundant local variable
      builtin/maintenance: centralize configuration of explicit tasks
      builtin/maintenance: mark "--task=" and "--schedule=" as incompatible
      builtin/maintenance: stop modifying global array of tasks
      builtin/maintenance: extract function to run tasks
      builtin/maintenance: fix typedef for function pointers
      builtin/maintenance: let tasks do maintenance before and after detach
      builtin/maintenance: fix locking race when packing refs and reflogs
      usage: allow dying without writing an error message
      builtin/gc: avoid global state in `gc_before_repack()`
      builtin/maintenance: fix locking race when handling "gc" task

 builtin/am.c                |   4 +-
 builtin/checkout.c          |   4 +-
 builtin/fetch.c             |   2 +-
 builtin/gc.c                | 394 +++++++++++++++++++++++++-------------------
 builtin/submodule--helper.c |  12 +-
 t/t7900-maintenance.sh      |  19 ++-
 usage.c                     |   2 +
 7 files changed, 250 insertions(+), 187 deletions(-)

Range-diff versus v1:

 1:  74fcc4e2251 =  1:  87df070a9e7 builtin/gc: use designated field initializers for maintenance tasks
 2:  1cc513a7b0f =  2:  e2acea10f7e builtin/gc: drop redundant local variable
 3:  be8c8a98892 =  3:  48a5e25c8bc builtin/maintenance: centralize configuration of explicit tasks
 4:  b19fa152c81 !  4:  680b36e2fa6 builtin/maintenance: mark "--task=" and "--schedule=" as incompatible
    @@ Commit message
     
         The "--task=" option explicitly allows the user to say which maintenance
         tasks should be run, whereas "--schedule=" only respects the maintenance
    -    strategy configured for a specific repository. As such, it is sensible
    -    to accept both options at the same time.
    +    strategy configured for a specific repository. As such, it is not
    +    sensible to accept both options at the same time.
     
         Mark them as incompatible with one another. While at it, also convert
         the existing logic that marks "--auto" and "--schedule=" as incompatible
 5:  8f692f30829 =  5:  9eaabb93edd builtin/maintenance: stop modifying global array of tasks
 6:  fc0ea110c01 =  6:  ffee3ca3c6c builtin/maintenance: extract function to run tasks
 7:  b5821ef6cfe =  7:  66e1bb2111b builtin/maintenance: fix typedef for function pointers
 8:  42a9210e445 !  8:  4eed6a8dc9c builtin/maintenance: let tasks do maintenance before and after detach
    @@ Commit message
         the maintenance tasks are performed in the background. git-gc(1) has
         some special logic though to not perform _all_ housekeeping tasks in the
         background: both references and reflogs are still handled synchronously
    -    ni the foreground.
    +    in the foreground.
     
         This split exists because otherwise it may easily happen that git-gc(1)
    -    keeps for the "packed-refs" file locked for an extended amount of time,
    +    keeps the "packed-refs" file locked for an extended amount of time,
         where the next Git command that wants to modify any reference could now
         fail. This was especially important in the past, where git-gc(1) was
         still executed directly as part of our automatic maintenance: git-gc(1)
    @@ builtin/gc.c: typedef int (*maintenance_auto_fn)(struct gc_config *cfg);
      		.auto_condition = rerere_gc_condition,
      	},
      };
    -@@ builtin/gc.c: static const struct maintenance_task tasks[] = {
    + 
    ++enum task_phase {
    ++	TASK_PHASE_BEFORE_DETACH,
    ++	TASK_PHASE_AFTER_DETACH,
    ++};
    ++
      static int maybe_run_task(const struct maintenance_task *task,
      			  struct repository *repo,
      			  struct maintenance_run_opts *opts,
     -			  struct gc_config *cfg)
     +			  struct gc_config *cfg,
    -+			  int before)
    ++			  enum task_phase phase)
      {
    ++	int before = (phase == TASK_PHASE_BEFORE_DETACH);
     +	maintenance_task_fn fn = before ? task->before_detach : task->after_detach;
     +	const char *region = before ? "maintenance before" : "maintenance";
      	int ret = 0;
    @@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts
      	free(lock_path);
      
     +	for (size_t i = 0; i < opts->tasks_nr; i++)
    -+		if (maybe_run_task(&tasks[opts->tasks[i]], r, opts, cfg, 1))
    ++		if (maybe_run_task(&tasks[opts->tasks[i]], r, opts, cfg,
    ++				   TASK_PHASE_BEFORE_DETACH))
     +			result = 1;
     +
      	/* Failure to daemonize is ok, we'll continue in foreground. */
    @@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts
      
      	for (size_t i = 0; i < opts->tasks_nr; i++)
     -		if (maybe_run_task(&tasks[opts->tasks[i]], r, opts, cfg))
    -+		if (maybe_run_task(&tasks[opts->tasks[i]], r, opts, cfg, 0))
    ++		if (maybe_run_task(&tasks[opts->tasks[i]], r, opts, cfg,
    ++				   TASK_PHASE_AFTER_DETACH))
      			result = 1;
      
      	rollback_lock_file(&lk);
 9:  7859d3b9b4f =  9:  41ae51294e6 builtin/maintenance: fix locking race when packing refs and reflogs
 -:  ----------- > 10:  2e1b4deb668 usage: allow dying without writing an error message
10:  18bce954787 ! 11:  7d9be688eb4 builtin/gc: avoid global state in `gc_before_repack()`
    @@ builtin/gc.c: int cmd_gc(int argc,
      
     -		gc_before_repack(&opts, &cfg); /* dies on failure */
     +		if (gc_before_repack(&opts, &cfg) < 0)
    -+			exit(127);
    ++			die(NULL);
      		delete_tempfile(&pidfile);
      
      		/*
11:  ff92709bf6c ! 12:  291498849b8 builtin/maintenance: fix locking race when handling "gc" task
    @@ builtin/gc.c: int cmd_gc(int argc,
     +			}
      
     -		if (gc_before_repack(&opts, &cfg) < 0)
    --			exit(127);
    +-			die(NULL);
     -		delete_tempfile(&pidfile);
     +			if (gc_before_detach(&opts, &cfg) < 0)
    -+				exit(127);
    ++				die(NULL);
     +			delete_tempfile(&pidfile);
     +		}
      

---
base-commit: 845c48a16a7f7b2c44d8cb137b16a4a1f0140229
change-id: 20250527-b4-pks-maintenance-ref-lock-race-11ae5d68e06f





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux