Skip to content

Commit

Permalink
Speed up git-annex operations
Browse files Browse the repository at this point in the history
Turns out I was misled by git-annex v8's default configuration of

```
# .git/info/attributes
*  filter=annex
```

This is wildly slow. This means git-annex processes every single file on every single commit; it has optimizations it seems, enough to not need to rehash unchanged files, but even just opening them up to check is slow on a dataset this large.

In our application we don't want to annex every single file. That's painfully wasteful and that's not how I set it up. It turns out I can do one better though: by *only* letting git-annex get its fingers on the files we want to annex, and making sure git processes the rest directly, commit times are hugely improved. And it's not actually necessary for git-annex to see all the files; it is happy to accept this `.gitattributes`; it only writes its overly-greedy default to a clone's `.git/info/attributes` if there is no preexisting `.gitattributes`, presumably in an attempt to give consistent user experience (at the hidden cost of performance).
  • Loading branch information
kousu authored and jcohenadad committed Dec 10, 2020
1 parent 31e16ae commit 7faa5b1
Showing 1 changed file with 2 additions and 3 deletions.
5 changes: 2 additions & 3 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
* filter=annex annex.largefiles=nothing
*.nii annex.largefiles=anything
*.nii.gz annex.largefiles=anything
*.nii filter=annex annex.largefiles=anything
*.nii.gz filter=annex annex.largefiles=anything

0 comments on commit 7faa5b1

Please sign in to comment.