On Sat, May 10, 2025 at 9:05 AM Gabriel Scherer <Gabriel.Scherer@xxxxxxx> wrote: > > Dear git list, > > sparse-checkout interacts badly with symlinks within a git repository: > if b/file is a symlink to a/file, and the user asks for a > sparse-checkout with only b/, they get a dead link (b/file points to > nothing). That's what I'd expect. > I initially assumed that replacing a file by a symlink to another file > with the same content would not be observable by other users of the > repository. This assumption is incorrect in presence of sparse checkouts. > > I would find it natural to have sparse-checkout "follow symlinks". When > checking b/file as the user requests, git would notice that it is a > symlink and do one of the following: > > 1. if the link target a/file is not in the specified sparse checkout > set, copy its content instead of creating a dead symlink > (Downside: this could lead to duplication if several in-checkout > files point to a/file.) And the file would immediately show as modified in status, which seems like a rather negative surprise. If someone does a `git add -u` or similar, they'd convert the symlink to a regular file, which could be another form of gotcha. > 2. or add a/file to the sparse checkout set > (Note: simply checking it out silently is not enough as 'reapply' > would then drop it) This is a solution that does not work with the default cone mode. It may also surprise users who expected the sparse checkout rules to be something entirely under their control. Both solutions would also interact rather poorly with sparse indexes; either looks to me like a bit of a foot-gun for them. > Does this sound reasonable to you? Would you have recommendations on > what the interface for such a feature should look like? > - which of the alternatives above would you recommend? Honestly, neither. The problem isn't limited to symlinks; some examples: * you could have a script in one part of the checkout that tries to invoke a script in the other part * you could have a source code file in the non-sparse part that has a directive to include/import/require source code in the non-sparse-checkout ...and there are many other ways files could depend on others. Symlinks are only special in that they require no programming or other knowledge to determine that there is a dependency between files. I'd rather continue to follow the expectation that users of sparse checkouts need to determine the relevant set of dependencies and determine which sparsity rules make sense in their repo. I suspect that each repo might be somewhat special here, and thus each might have their own tool for creating sparse-checkouts using repo-specific knowledge (e.g. "I want moduleA plus whatever it depends upon") which their repo-specific tool then translates into the appropriate set of paths or patterns to use. symlinks would be just one of many kinds of dependencies that such a tool would consider. I understand that some repos might be big enough that users want to use sparse-checkouts, but not big enough that one of the developers wants to write such a tool. Still, I'd rather not attempt dependency analysis in git[*], and instead require the users to do the dependency analysis. > - should this be enabled only by a new configuration or command-line > option (to which subcommand?), how would you name it? > > Thanks in advance > > > ## More details on the use-case > > I'm trying to reduce the working directory size of a gigabyte-large git > repository ( https://github.com/typst/packages > <https://github.com/typst/packages> ) which contains a substantial > amount of duplicated files, by replacing duplicates by symlinks. The > repository uses a continuous integration script to run automated tests > on each proposed change, which uses sparse-checkout on only the > directories listed as containing modified files.(The directories > correspond to independent "packages" so it makes sense to check them > separately.) This breaks when the modified directories contain symlinks > to other, non-modified directories. I know it's not quite what you want to hear, but I believe a better solution here is to have your script check for the dependencies it needs (via symlinks, in this case) and include those dependencies in the sparse-checkout it creates. Hope that helps, Elijah [*] I'll add a slight carve-out to this statement if there was a git-specific way to declare dependencies that we can then parse. Such a thing has been proposed before; see https://lore.kernel.org/git/pull.627.git.1588857462.gitgitgadget@xxxxxxxxx/ . However, multiple gotchas were identified that derailed that proposal, so those would need some solutions. Even if we were to do that, though, you'd still have to specify the dependency explicitly in some additional file rather than just depending upon the symlink. Further, that particular proposal would have only worked with cone mode which goes against your specific request here.