Try to shrink sizes of large Git repository Back
Wow, recently I have found that the size of this project has become so large, which is upper than 300 Mb. That's terrible as it means that a lot of time has been cost when cloning, especially when I need to build this as a site with CI.
After some checking, I found that .git/objects/pack
was so large, which was a file used for recording any commit objects in the repository. Why so large? I have checked the 10 biggest commits I have committed before by running the following command:
git verify-pack -v .git/objects/pack/pack-xxx.idx | sort -k 3 -n | tail -10
And see what files are corresponding to this commit, by pasting SHA hash:
git rev-list --objects --all | grep xxx
Finally find that I have committed some large files and delete after, which should be recorded as a large object here. It is very dirty but Git has to record so that it can know the history. However, we do not really need those objects. How to shrink such sizes?
In a common way, we will use git filter-branch
to remove cache and garbage collect them by Git:
git filter-branch --index-filter 'git rm --cached --ignore-unmatch *.mov' -- --all
rm -Rf .git/refs/original
rm -Rf .git/logs/
git gc --aggressive --prune=now
However, it is SLOW. Fortunately, bfg-repo-cleaner has worked great for this situation, which specify claim that it is 10~740x faster than using git filter-branch
.
-
Export Function:
# ~/.bash_aliases function bfg() { java -jar /mnt/e/bfg-1.13.1.jar $@ } export -f bfg
-
Call it under the folder:
cd repo && bfg --strip-blobs-bigger-than 100M .
-
Garbage collecting unused objects:
git reflog expire --expire=now --all && git gc --prune=now --aggressive
-
Update remote
git push