Re: [BUG] Hard links to large files cause unexpected refresh

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 25/05/22 04:45PM, Roel Sengers wrote:
> Thank you for filling out a Git bug report!
> Please answer the following questions to help us understand your issue.
> 
> What did you do before the bug happened? (Steps to reproduce your issue)
> 
> In my workflow there is a step which hard-links about 80GB of such files to
> locations where a 3rd party application can find them. However, after this
> hard-linking step, git status (or other git commands) hang for a long time.
> Note that making a copy of the file does not cause such slowdowns, which was
> surprising to me.
> 
> I am using Git with git-lfs for these large files, however I was able to
> reproduce this issue without LFS enabled.
> 
> The script below reproduces the environment which triggers the issue:
> 
> mkdir git-hardlink-test
> cd git-hardlink-test
> 
> git init
> echo ignore.bin > .gitignore
> git add .gitignore
> git commit -m 'Initial commit'
> 
> dd if=/dev/urandom of=file.bin bs=1M count=1000
> git add file.bin
> git commit -m 'Add file.bin'
> 
> # $ cp file.bin ignore.bin; time git status
> # On branch main
> # nothing to commit, working tree clean
> #
> # real	0m0,002s
> # user	0m0,000s
> # sys	0m0,002s
> 
> # $ ln file.bin ignore.bin; time git status
> # Refresh index: 100% (2/2), done.
> # On branch main
> # nothing to commit, working tree clean
> #
> # real	0m16,100s
> # user	0m15,700s
> # sys	0m0,255s
> 
> 
> What did you expect to happen? (Expected behavior)
> 
> After creating a hard link to a checked-in object, I expected the final git
> status to finish in a time that is barely noticeable.
> 
> The file itself is large, so having Git taking its time to refresh the state
> of the working directory would not be suspicious were it not for the fact
> that creating a copy of the same file does not suffer from the same
> performance penalty.

When git-status(1) is run, the index state is checked to see if any
changes have occurred. From the provided example, while the contents of
"file.bin" remain unchanged, the act of creating the hardlink does
update metadata such as the number of links and the file ctime. You can
see this when you stat(1) "file.bin" before and after creating
"ignore.bin". 

Git sees these changes and then refreshes the index entry. When the file
is copied, no such modification happens to "file.bin" and thus that
entry does not need to be refreshed.

To workaround this problem, you could set `core.trustCTime=false` which
tells Git to ignore ctime differences between the index and working
tree.

-Justin




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux