docs: implementation summary (libp2p#135)

* docs: implementation summary * docs: implementation summary fixes * docs: better GET description in impl summary
rootstock · Jul 8, 2019 · b36b871 · b36b871
1 parent a8f1df5
commit b36b871
Show file tree

Hide file tree

Showing 3 changed files with 71 additions and 1 deletion.
diff --git a/.gitignore b/.gitignore
@@ -1,4 +1,3 @@
-docs
 **/node_modules/
 **/*.log
 test/repo-tests*

diff --git a/README.md b/README.md
@@ -62,6 +62,10 @@ The libp2p-kad-dht module offers 3 APIs: Peer Routing, Content Routing and Peer
 
 `libp2p-kad-dht` provides a discovery service called `Random Walk` (random walks on the DHT to discover more nodes). It is accessible through `dht.randomWalk` and exposes the [Peer Discovery interface](https://github.com/libp2p/interface-peer-discovery).
 
+### Implementation Summary
+
+A [summary](docs/IMPL_SUMMARY.MD) of the algorithms and API for this implementation of Kademlia.
+
 ## Contribute
 
 Feel free to join in. All welcome. Open an [issue](https://github.com/libp2p/js-libp2p-ipfs/issues)!

diff --git a/docs/IMPL_SUMMARY.MD b/docs/IMPL_SUMMARY.MD
@@ -0,0 +1,67 @@
+# js-libp2p-kad-dht
+
+js-libp2p-kad-dht is a JavaScript implementation of the Kademlia DHT with some features of S/Kademlia. A "provider" node uses the DHT to advertise that it has a particular piece of content, and "querying" nodes will search the DHT for peers that have a particular piece of content. Content is modeled as a value that is identified by a key, where the key and value are Buffers.
+
+#### DHT Identifiers
+
+The DHT uses a sha2-256 hash for identifiers:
+- For peers the DHT identifier is the hash of the [PeerId][PeerId]
+- For content the DHT identifier is the hash of the key (eg a Block CID)
+
+#### FIND_NODE
+
+`findPeer (PeerId):` [PeerInfo][PeerInfo]
+
+The address space is so large (256 bits) that there are big gaps between DHT ids, and nodes frequently join and leave the DHT.
+
+To find a particular node
+- the `querying node` converts the [PeerId][PeerId] to a DHT id
+- the `querying node` sends a request to the nearest peers to that DHT id that it knows about
+- those peers respond with the nearest peers to the DHT id that they know about
+- the `querying node` sorts the responses and recursively queries the closest peers to the DHT id, continuing until it finds the node or it has queried all the closest peers.
+
+#### PUT
+
+`put (Key, Value)`
+
+To store a value in the DHT, the `provider node`
+- converts the key to a DHT id
+- follows the "closest peers" algorithm as above to find the nearest peers to the DHT id
+- sends the value to those nearest peers
+
+Note that DHT nodes will only store values that are accepted by its "validators", configurable functions that validate the key/value to ensure the node can control what kind of content it stores (eg IPNS records).
+
+#### GET
+
+`get (Key): [Value]`
+
+To retrieve a value from the DHT
+- the `querying node` converts the key to a DHT id
+- the `querying node` follows the "closest peers" algorithm to find the nearest peers to the DHT id
+- at each iteration of the algorithm, if the peer has the value it responds with the value itself in addition to closer peers.
+
+Note that the value for a particular key is stored by many nodes, and these nodes receive `PUT` requests asynchronously, so it's possible that nodes may have distinct values for the same key. For example if node A `PUT`s the value `hello` to key `greeting` and node B concurrently `PUT`s the value `bonjour` to key `greeting`, some nodes close to the key `greeting` may receive `hello` first and others may receive `bonjour` first.
+
+Therefore a `GET` request to the DHT may collect distinct values (eg `hello` and `bonjour`) for a particular key from the nodes close to the key. The DHT has "selectors", configurable functions that choose the "best" value (for example IPNS records include a sequence number, and the "best" value is the record with the highest sequence number).
+
+#### PROVIDE
+
+`provide (Key)`
+
+To advertise that it has the content for a particular key
+- the `provider node` converts the key to a DHT id
+- the `provider node` follows the "closest peers" algorithm to find the nearest peers to the DHT id
+- the `provider node` sends a "provide" message to each of the nearest peers
+- each of the nearest peers saves the association between the "provider" peer and the key
+
+#### FIND_PROVIDERS
+
+`findProviders (Key):` [[PeerInfo][PeerInfo]]
+
+To find providers for a particular key
+- the `querying node` converts the key to a DHT id
+- the `querying node` follows the "closest peers" algorithm to find the nearest peers to the DHT id
+- at each iteration of the algorithm, if the peer knows which nodes are providing the value it responds with the provider nodes in addition to closer peers.
+
+[PeerId]: https://github.com/libp2p/js-peer-id
+[PeerInfo]: https://github.com/libp2p/js-peer-info