Skip to content

Latest commit

 

History

History
315 lines (194 loc) · 4.58 KB

README.md

File metadata and controls

315 lines (194 loc) · 4.58 KB

Scalable Secure Scuttlebutt

  • @dominictarr

  • scuttlebutt.nz


Scuttlebutt

"anti-entropy" gossip protocol, part of amazon dynamo

  • "flow control for anti-entropy protocols"
  • eventually consistent
  • used by within a trusted system

Secure Scuttlebutt

  • peer id is public key
  • data model signed hash chains
  • peers can relay other's messages
  • cannot insert, modify, or reorder messages
  • large objects in optional "attachments"

  • subscribe ("follow") model solves sybil attacks
  • provides social discovery
  • messages arrive in natural order - easy to build database on top of

Scalable Secure Scuttlebutt

  • optimized for bandwidth and latency
  • overhead proportional to feeds updated

part 1: network topology


Star Network

naive design: connect to everyone


return require('./demos').centralized()

Random Network

  • connect to N > 2 peers ~= fully connected
  • tradeoff between bandwidth & latency

return require('./demos').random()

Rules for gossip flooding

send new messages to everyone that didn't send it to you

(don't resend messages you already know!)


message passing diagram

message passing diagram


bandwidth complexity

O(messages*(spanning+2*redundant))

how many redundant connections

  • 1: 0%
  • 2: 66%
  • 3: 80%
  • 4: 86%
  • 5: 89%
  • 10: 95%

at 5 connections per peer, we use 9 times as much bandwidth

Spanning Tree Network

remove duplicate connections

  • when you receive an old message, disable that connection
  • bandwidth complexity now approaches O(messages)

return require('./demos').spanning()

trees are fragile

return require('./demos').fragile()

spanning tree

  • network partitions are really easy
  • we just added many points of failure!

Epidemic Broadcast Trees

  • (from paper with the same name)

  • best of flooding and spanning

  • we run a separate tree per feed


Eager and Lazy (aka push and pull)

  • connections have two modes: eager and lazy
  • switch redundant connections into lazy
  • eager mode pushes message immediately
  • lazy mode sends short note that message exists

  • if you receive a known message, ask them to be lazy
  • (if in lazy mode, just send a short note)
  • if you receive a note for a new message, switch connection back to eager

switching back to eager


state per feed

  • feed id (string)
  • local sequence (integer)
  • remote sequence (integer)
  • local request (integer)
  • remote request (integer)
  • local mode (boolean)
  • remote mode (boolean)

part 2: handshakes


basic vector clock

(as described in flow gossip paper)

Alice: {A: 1, B: 0, C: 3}

Bob:   {A: 0, B: 1, C: 3}

alice sends A1, Bob sends B1 handshake size is O(feeds)


skipping vector clock

Alice stores Bob's last vector clock

Alice (local): {A: 2, B: 2, C: 3}

Bob (stored):  {A: 0, B: 1, C: 3}

drops elements which havn't changed (Carol hasn't updated)


Note: Alice stores Bob's A:0 even though she sent A1. unless Bob acknowledges A1, Alice doesn't know for sure he has it.


so Alice sends:

Alice: {A: 2, B: 2}

Bob: {A: 1, B: 3}

Alice sends A2, Bob sends B3


if Bob then gossips with Carol

Alice: {}

Bob:   {C: 4}

Alice sends an empty clock, because she thinks nothing has changed

but Bob has an update from Carol

Alice: {C: 3}

and Bob sends C4

so, it costs one more message pass, but usually saves lots of requests.

note, we also save tracking this in memory


distribution of updates

  • power law is typical
  • a few active users
  • some moderate users
  • many occasional / infrequent users

  • basic scuttlebutt

    O(messages + feeds)

  • scalable scuttlebutt

    O(messages + updated_feeds)

    (updated_users is probably much smaller)

    reconnections are very cheap!


complexity overview

bandwidth: O(messages + updated_feeds)

latency: replication in 1 or 2 round trips


Part 3: Comparison to IPFS


  • replicate data without global consistency
  • secure scuttlebutt optimizes for chains
  • this is worst case for ipfs

  • ssb streams updates in order, per feed
  • ipfs has pointer to latest message then works backwards, one message at a time

  • ipfs better for arbitary partial datasets
  • ssb better for social applications
  • ssb: database vs ipfs: file system

Thank you!