-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add --blob-exec to run system commands for each blob #169
base: main
Are you sure you want to change the base?
Conversation
I'm interested in whether there are some metrics that show how much faster this is than running the equivalent filter-branch, if anyone had those numbers. |
Assuming that all files are replaced with new versions, this is my estimation of the cost for each. given my experience with this branch (running on linux) and filter-branch, my view is that the savings are a function that depends:
With BFG, the cost of the replacement is a function of (k1 * C + k2 * UF)/(k3*SpeedFileSystem) With filter-branch, this is what happens: For every commit:
For example, say a filter replaces the contents of every file in every revision. For revision 1 the contents are checked-out. Lets say we have n1 files. This process basically makes filter-branch cost include:
so, bfg is proportional to the avg number of files in commit multiplied by the number of commits (basically, the number of unique BLOB files found in a repo) while the cost of filter-branch is proportional to th avg number of files IN repo multiplied by number of commits. Let's say we can do 100 file processed operations per second. And we have 1M commits, with an average of 10k files in the repo, and 10 files modified per commit (I am thinking linux here). Let us assume that the cost of checking out the files (filter-branch) and processing the commits (bfg and filter-branch) is neglegible (it is not, but bear with me). if my numbers are right, it would take: (1M * 10k)/100 seconds to process this repo => 1157 days => 3 years with BFG it would take: (1M * 10) /100 seconds => 27 hrs. so, in conclusion: BFG processing time is Order(#commits *#avgNumberOfFilesPerCommit) |
May I be allowed to use and learn |
This is a rebase of Paul Draper's implementation of blob-exec:
#83
Is there any chance to merge this? I'm currently using it to replace tabs by spaces exactly like Paul mentioned it under his use cases.