On Fri, Mar 28, 2025 at 04:41:05PM +0100, Johannes Schindelin wrote: > On Fri, 28 Mar 2025, Patrick Steinhardt wrote: > > On Wed, Mar 26, 2025 at 01:20:12PM +0100, Johannes Schindelin wrote: > > > I know that e.g. PostgreSQL used this undocumented function at least at > > > some stage, but SQLite avoided it by introducing a simple poll strategy. > > > We could also do that, but if there is already code in the reftable > > > library that skips doing things if a `.lock` file exists, then doing the > > > same if the `.lock` file cannot be created, too, should be a safe argument > > > to make. > > > > I did stumble over the PostgreSQL patch at one point indeed, yeah. > > > > Thanks for the pointer to SQLite. It indeed has the following snippet: > > > > #define winIoerrCanRetry1(a) (((a)==ERROR_ACCESS_DENIED) || \ > > ((a)==ERROR_SHARING_VIOLATION) || \ > > ((a)==ERROR_LOCK_VIOLATION) || \ > > ((a)==ERROR_DEV_NOT_EXIST) || \ > > ((a)==ERROR_NETNAME_DELETED) || \ > > ((a)==ERROR_SEM_TIMEOUT) || \ > > ((a)==ERROR_NETWORK_UNREACHABLE)) > > > > The function gets used via `winRetryIoerr()`, which is used in various > > I/O functions to retry the operation, including `winOpen()` to open or > > create a file. And it indeed uses a rather simple polling system there > > where it sleeps for 25ms up to 10 times. > > > > This certainly is something we could implement in `mingw_open()`: when > > we see that `CreateFileW()` has returned any of the above errors we > > simply retry the operation. It wouldn't fix the race itself, but it > > would hopefully make it less likely to hit. If you would be okay with > > such a solution I can implement it. > > > > Also, one thing to note: this problem isn't caused by the reftable > > library, it's caused by the lockfile subsystem. So if we don't want to > > do this in `mingw_open()`, any self-contained fix should go into the > > lockfile system, not into the reftable library, because we may hit the > > same symptoms anywhere else where we race around creation/deletion of a > > lockfile. We just happen to hit this case in the reftable library > > because the test is intentionally stress-testing and racing this code > > path. > > As I mentioned, I had hoped that we could address this at another layer. > > But let's move forward with the `RtlGetLastNtStatus()` solution because, > as you correctly pointed out, it is the only solution so far that lets Git > determine precisely whether the underlying problem is a pending delete. > > I had only one remaining concern: If `RtlGetLastNtStatus()` has not yet > been initialized, would we not potentially overwrite the last NTSTATUS > while initializing it? And the answer I can give to myself is: unlikely. > The `ntdll` is already loaded, so there won't be an update to the > `NTSTATUS` there, likewise the `GetProcAddress()` call won't fail and > hence also not update it. > > So let's go ahead with v2! Great, thanks a lot for your expertise and guidance! Patrick