Haskell implementation of a git remote helper to store git repositories on IPFS.
Reverse-engineered from the Go implementation, usable as both an executable and a library. Future development is going to focus on the latter.
- Build the
git-remote-ipfs
executable using eitherstack install
orcabal v2-install
(requirescabal-install
>= 2.4.0.0). - Make sure the executable is on your
$PATH
:export PATH=$HOME/.local/bin:$PATH
, respectivelyexport PATH=$HOME/.cabal/bin:$PATH
- Download and install the go-ipfs binary, and make sure the ipfs daemon is running.
To push a (branch of an) existing git repo to IPFS, using .git/config to keep track of pushes:
$ # Add a new remote
$ git remote add ipfs ipfs://
$ # Push master
$ git push ipfs master
$ # Inspect the IPFS path
$ git remote get-url ipfs
ipfs://ipfs/Qm....
Note that every push yields a new IPFS hash. The remote helper will rewrite the remote URL locally to keep track of the latest remote refs. To collaborate with other people (i.e. clone/pull from another machine), this URL needs to be communicated out-of-band. The output of git remote get-url can be used to git clone.
An alternative is to use IPNS, which provides a stable name the remote helper can update whenever the remote refs are updated:
$ repoid=$(ipfs key gen --type=ed25519 myrepo)
$ # Publish a pointer to the empty directory initially
$ ipfs name publish --key $repoid QmUNLLsPACCz1vLxQVkXqqLX5R1X345qqfHbsf67hvA3Nn
$ git remote add ipns ipfs://ipns/$repoid
Note that resolving and updating an IPNS name is rather slow on the main IPFS network. Also note that only the owner of the IPNS name (that is, the keypair) can update it.
A number of options regarding IPFS can be configured via git-config:
ipfs.apiurl
The URL of the IPFS daemon, default:
http://localhost:5001
ipfs.maxconnections
The max number of connections to open to the IPFS daemon, default:
30
ipfs.maxblocksize
The maximum block size (in bytes) supported by bitswap. Most users will not want to change this. Default:
2048000
The API URL can be overriden per-remote using the key
remote.<remote name>.ipfsapiurl
, e.g.:
$ git config remote.origin.ipfsapiurl http://127.0.0.2:5001
Additionally, if the environment variable IPFS_API_URL
is set, it will be
used instead of any git-config settings.
IPFS blocks can be created with the git-raw
CID format, which allows IPFS to
interpret the data as loose git objects. When created with sha1
as the
(multihash) hash function, the block's CID corresponds to the SHA1 hash of the
git object, i.e. one can be recovered from the other. Crucially, this allows the
SHA1 references embedded in a loose git object (eg. parents and tree of a
commit) to be traversed given a head reference.
In order to obtain the head reference, IPLD links are created corresponding to
the refs/heads
directory hierarchy. Note that adding links to an IPFS object
changes its hash - this means each push results in a new object (CID),
which must be retained in order to clone or pull.
Which git objects need to be pushed or fetched is determined via the git remote helper protocol, respectively by inspecting the local git repo and remote refs.
- It is currently unclear how to keep track of the latest "anchor" object (the one linking to the most recent heads). The obvious solution is to to use IPFS' native name resolution mechanism (IPNS), yet IPNS names have a very limited lifetime on the main IPFS network.
- IPFS blocks have a maximum size of 2MB. To work around this limitation,
objects exceeding this limit are created as regular IPFS objects, linked back
to the "anchor" object under the
objects/
hierarchy. When fetching, those large objects are given precedence over blocks, so as to not stall forever attempting to fetch blocks which the network does not replicate. - The approach to keep all git objects content-addressable in IPFS is nice conceptually, but terribly inefficient: regular git resorts to packfiles, which use delta encoding and compression in order to obtain a more space-efficient on-disk and wire format. There is, however, no global optimum of how to pack any given git repo, and in fact git re-packs occasionally, as it sees fit. It is thus unclear how to optimise git storage in a fully distributed setting lacking online coordination.
The project employs an end-to-end test suite, which is disabled by default as it requires a running IPFS daemon. The preferred way to run it is against a local IPFS network, as this speeds up IPNS resolution considerably.
$ docker run --detach --rm --name=ipfs-test-network --publish 19301:5001 \
gcr.io/opensourcecoin/ipfs-test-network
$ IPFS_API_URL=http://127.0.0.1:19301 \
stack test --flag git-remote-ipfs:with-e2e-tests git-remote-ipfs