Re: [PATCH] submodule: truncate the oid when fetchig commits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2025-08-18 at 09:30:51, Michael Schroeder wrote:
> On Thu, Aug 14, 2025 at 10:16:24PM +0000, brian m. carlson wrote:
> > On 2025-08-14 at 15:06:32, Michael Schroeder wrote:
> > > If a submodule uses a different hash algorithm than used in
> > > the main repository, the recorded submodule commit is padded
> > > with zeros. This is usually not a problem as the default is to
> > > do submodule clones non-shallow and the commit can be found
> > > in the local objects.
> > 
> > This should not even work at all.  It may currently behave as you
> > suggest when the main repository is SHA-256 and the submodule is SHA-1,
> > but it will corrupt the data if the submodule is SHA-256 and the main
> > repository is SHA-1, since then the data will be truncated.
> 
> But it works, and I'm pretty sure people already use it. If you
> have a sha1 main repo and a sha256 submodule, git will truncate
> the commit when recording the gitlink. The checkout done by
> git submodule update will work as it does the normal prefix matching.

Unfortunately, that will break with the interoperability work.  The
protocol will learn to convert the object ID on the server side by
announcing the mapping and when the object ID doesn't exist, the client
will die because it can't remap the object and the process will fail.
By doing that, you'll end up with a repository that you can never use
interoperability code on, ever, without rewriting history.  There's no
way around this problem because we don't keep the object format in
trees, so we can't distinguish between a SHA-256 submodule that happens
to end in 24 zeros and a SHA-1 submodule.

The entire hash function transition has mandated exactly one object
format on disk and in data structures from the very beginning:

    This affects both object names and object content -- both the names
    of objects and all references to other objects within an object are
    switched to the new hash function.

I apologize that I didn't think about this problem and make the code die
on this case earlier, but it's not a supported configuration and it will
absolutely break in the future.  Sorry to be the bearer of bad news.

> > The proper way for this to work is that the SHA-1 version of the
> > repository stores submodules in their SHA-1 states and the SHA-256
> > version of the repository stores submodules in their SHA-256 states.
> 
> You mean by using "compatObjectFormat"? I couldn't make that work,
> but maybe I missed something. Anyway, I think this also will not
> work for shallow clones.

There is interoperability code only for loose objects now.  The code
that handles packs and interoperability between repositories is in a
branch on my remote.  It's work that I'm doing for a talk at Git Merge
and I will send it upstream when it's ready.

Right now, I'm working on shallow clones at the moment and then
submodules are next.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux