Skip to content

Commit

Permalink
Genesis milestone 12 - Chain Sync Jumping (#1033)
Browse files Browse the repository at this point in the history
The main contribution of this PR is the implementation of ChainSync
Jumping (0e265c4), a mechanism to avoid overloading the honest peers in
the network when one or more peers connect to them for syncing. The
details of ChainSync Jumping are discussed in the [Jumping
module](https://github.com/IntersectMBO/ouroboros-consensus/blob/78e16d0/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/MiniProtocol/ChainSync/Client/Jumping.hs).

A range of supporting commits are provided in addition.

## More tests

* The time limited leashing attack test is configured and enabled in
6c25310. This test is aimed at the GDD governor implemented in #1015.
* A test to show that a node can resume syncing after being disconnected
for a while has been implemented in 5d3c200.
* There is a test checking that the GDD governor doesn't regret
disconnecting nodes as more headers are known to it (dbbc3ab ). This was
included in the previous milestone but we discovered later it wasn't
checking as much as we intended.
* In da7a6db and 3220e7d there is a test for ChainSync Jumping that
ensures that a syncing node downloads headers from at most one peer when
there is no disagreement between the peers.
* There are various fixes to tests and documentation in d6f6ddf,
b0ed1c8, d70e604 , cee466f, and dbbc3ab.

## GDD refinement

The ability of the GDD governor to disconnect nodes has increased in
dcb20f5. With respect to #1015, the GDD governor can now disconnect
peers in the following scenarios:
1. when a peer has a chain that can't be denser than another chain
served by another peer (before, we would disconnect the peer only if
there was a denser chain); or
2. when a peer has a chain of density 0 and it claims to have more
headers after the genesis window; or
3. when a peer sent a header for the last slot of the genesis window or
later, and loses the density comparison to another peer (before, we
would disconnect the peer only after it loses the density comparison to
a peer that sent more than k headers in the genesis window).

These changes help ChainSync Jumping make progress while downloading
headers from only two disagreeing peers (even if both are adversarial).

## Other changes

* A version of shrinking of honest schedules of messages has been
implemented in 9794292.
* 9572256 contains a fix to the implementation of Limit on Patience.
* 915238e implements a recording tracer that doesn't affect scheduling
in IOSim.
  • Loading branch information
nbacquey authored May 29, 2024
2 parents 2c3ca84 + ce3b2c6 commit 18f43bd
Show file tree
Hide file tree
Showing 43 changed files with 3,039 additions and 560 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
### Breaking

- Implemented a first version of CSJ (ChainSync Jumping). (disabled by default)
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,7 @@ test-suite consensus-test
Test.Consensus.Genesis.Setup.Classifiers
Test.Consensus.Genesis.Setup.GenChains
Test.Consensus.Genesis.Tests
Test.Consensus.Genesis.Tests.CSJ
Test.Consensus.Genesis.Tests.DensityDisconnect
Test.Consensus.Genesis.Tests.LoE
Test.Consensus.Genesis.Tests.LoP
Expand All @@ -244,6 +245,7 @@ test-suite consensus-test
Test.Consensus.PeerSimulator.ChainSync
Test.Consensus.PeerSimulator.Config
Test.Consensus.PeerSimulator.Handlers
Test.Consensus.PeerSimulator.NodeLifecycle
Test.Consensus.PeerSimulator.Resources
Test.Consensus.PeerSimulator.Run
Test.Consensus.PeerSimulator.ScheduledBlockFetchServer
Expand All @@ -257,8 +259,10 @@ test-suite consensus-test
Test.Consensus.PeerSimulator.Tests.Timeouts
Test.Consensus.PeerSimulator.Trace
Test.Consensus.PointSchedule
Test.Consensus.PointSchedule.NodeState
Test.Consensus.PointSchedule.Peers
Test.Consensus.PointSchedule.Shrinking
Test.Consensus.PointSchedule.Shrinking.Tests
Test.Consensus.PointSchedule.SinglePeer
Test.Consensus.PointSchedule.SinglePeer.Indices
Test.Consensus.PointSchedule.Tests
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -540,10 +540,11 @@ mkApps ::
-> ByteLimits bCS bBF bTX bKA
-> m ChainSyncTimeout
-> CsClient.ChainSyncLoPBucketConfig
-> CsClient.CSJConfig
-> ReportPeerMetrics m (ConnectionId addrNTN)
-> Handlers m addrNTN blk
-> Apps m addrNTN bCS bBF bTX bKA bPS NodeToNodeInitiatorResult ()
mkApps kernel Tracers {..} mkCodecs ByteLimits {..} genChainSyncTimeout lopBucketConfig ReportPeerMetrics {..} Handlers {..} =
mkApps kernel Tracers {..} mkCodecs ByteLimits {..} genChainSyncTimeout lopBucketConfig csjConfig ReportPeerMetrics {..} Handlers {..} =
Apps {..}
where
aChainSyncClient
Expand Down Expand Up @@ -572,6 +573,7 @@ mkApps kernel Tracers {..} mkCodecs ByteLimits {..} genChainSyncTimeout lopBucke
them
version
lopBucketConfig
csjConfig
$ \csState -> do
chainSyncTimeout <- genChainSyncTimeout
(r, trailing) <-
Expand All @@ -593,6 +595,7 @@ mkApps kernel Tracers {..} mkCodecs ByteLimits {..} genChainSyncTimeout lopBucke
, CsClient.idling = csvIdling csState
, CsClient.loPBucket = csvLoPBucket csState
, CsClient.setLatestSlot = csvSetLatestSlot csState
, CsClient.jumping = csvJumping csState
}
return (ChainSyncInitiatorResult r, trailing)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ import Ouroboros.Consensus.Fragment.InFuture (CheckInFuture,
import qualified Ouroboros.Consensus.Fragment.InFuture as InFuture
import Ouroboros.Consensus.Ledger.Extended (ExtLedgerState (..))
import Ouroboros.Consensus.MiniProtocol.ChainSync.Client
(ChainSyncLoPBucketConfig (..))
(CSJConfig (..), ChainSyncLoPBucketConfig (..))
import qualified Ouroboros.Consensus.MiniProtocol.ChainSync.Client.InFutureCheck as InFutureCheck
import qualified Ouroboros.Consensus.Network.NodeToClient as NTC
import qualified Ouroboros.Consensus.Network.NodeToNode as NTN
Expand Down Expand Up @@ -252,6 +252,9 @@ data LowLevelRunNodeArgs m addrNTN addrNTC versionDataNTN versionDataNTC blk
-- | See 'CsClient.ChainSyncLoPBucketConfig'
, llrnChainSyncLoPBucketConfig :: ChainSyncLoPBucketConfig

-- | See 'CsClient.CSJConfig'
, llrnCSJConfig :: CSJConfig

-- | How to run the data diffusion applications
--
-- 'run' will not return before this does.
Expand Down Expand Up @@ -519,6 +522,7 @@ runWith RunNodeArgs{..} encAddrNtN decAddrNtN LowLevelRunNodeArgs{..} =
NTN.byteLimits
llrnChainSyncTimeout
llrnChainSyncLoPBucketConfig
llrnCSJConfig
(reportMetric Diffusion.peerMetricsConfiguration peerMetrics)
(NTN.mkHandlers nodeKernelArgs nodeKernel)

Expand Down Expand Up @@ -857,6 +861,7 @@ stdLowLevelRunNodeArgsIO RunNodeArgs{ rnProtocolInfo
{ llrnBfcSalt
, llrnChainSyncTimeout = fromMaybe Diffusion.defaultChainSyncTimeout srnChainSyncTimeout
, llrnChainSyncLoPBucketConfig = ChainSyncLoPBucketDisabled
, llrnCSJConfig = CSJDisabled
, llrnCustomiseHardForkBlockchainTimeArgs = id
, llrnGsmAntiThunderingHerd
, llrnKeepAliveRng
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1058,6 +1058,7 @@ runThreadNetwork systemTime ThreadNetworkArgs
, idleTimeout = waitForever
})
CSClient.ChainSyncLoPBucketDisabled
CSClient.CSJDisabled
nullMetric
-- The purpose of this test is not testing protocols, so
-- returning constant empty list is fine if we have thorough
Expand Down
2 changes: 2 additions & 0 deletions ouroboros-consensus-diffusion/test/consensus-test/Main.hs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import qualified Test.Consensus.GSM (tests)
import qualified Test.Consensus.HardFork.Combinator (tests)
import qualified Test.Consensus.Node (tests)
import qualified Test.Consensus.PeerSimulator.Tests (tests)
import qualified Test.Consensus.PointSchedule.Shrinking.Tests (tests)
import qualified Test.Consensus.PointSchedule.Tests (tests)
import Test.Tasty
import Test.Util.TestEnv (defaultMainWithTestEnv,
Expand All @@ -25,5 +26,6 @@ tests =
, Test.Consensus.Genesis.Tests.tests
, testGroup "GSM" Test.Consensus.GSM.tests
, Test.Consensus.PeerSimulator.Tests.tests
, Test.Consensus.PointSchedule.Shrinking.Tests.tests
, Test.Consensus.PointSchedule.Tests.tests
]
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ mkTrunk btTrunk = BlockTree { btTrunk, btBranches = [] }

-- | Add a branch to an existing block tree.
--
-- PRECONDITION: The given fragment intersects with the trunk or its anchor.
-- Yields @Nothing@ if the given fragment does not intersect with the trunk or its anchor.
--
-- FIXME: we should enforce that the branch's prefix shares the same anchor as
-- the trunk.
Expand All @@ -94,7 +94,7 @@ addBranch branch bt = do
let btbFull = fromJust $ AF.join btbPrefix btbSuffix
pure $ bt { btBranches = BlockTreeBranch { .. } : btBranches bt }

-- | Same as @addBranch@ but assumes that the precondition holds.
-- | Same as @addBranch@ but calls to 'error' if the former yields 'Nothing'.
addBranch' :: AF.HasHeader blk => AF.AnchoredFragment blk -> BlockTree blk -> BlockTree blk
addBranch' branch blockTree =
fromMaybe (error "addBranch': precondition does not hold") $ addBranch branch blockTree
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ import Test.QuickCheck
import Test.Util.Orphans.IOLike ()
import Test.Util.QuickCheck (forAllGenRunShrinkCheck)
import Test.Util.TestBlock (TestBlock)
import Test.Util.Tracer (recordingTracerTVar)
import Test.Util.Tracer (recordingTracerM)
import Text.Printf (printf)


Expand All @@ -56,7 +56,7 @@ runGenesisTest ::
RunGenesisTestResult
runGenesisTest schedulerConfig genesisTest =
runSimStrictShutdownOrThrow $ do
(recordingTracer, getTrace) <- recordingTracerTVar
(recordingTracer, getTrace) <- recordingTracerM
let tracer = if scDebug schedulerConfig then debugTracer else recordingTracer

traceLinesWith tracer $ prettyGenesisTest prettyPeersSchedule genesisTest
Expand Down Expand Up @@ -104,6 +104,8 @@ forAllGenesisTest generator schedulerConfig shrinker mkProperty =
classify (genesisWindowAfterIntersection cls) "Full genesis window after intersection" $
classify (adversaryRollback schCls) "An adversary did a rollback" $
classify (honestRollback schCls) "The honest peer did a rollback" $
classify (allAdversariesEmpty schCls) "All adversaries have empty schedules" $
classify (allAdversariesTrivial schCls) "All adversaries have trivial schedules" $
tabulate "Adversaries killed by LoP" [printf "%.1f%%" $ adversariesKilledByLoP resCls] $
tabulate "Adversaries killed by GDD" [printf "%.1f%%" $ adversariesKilledByGDD resCls] $
tabulate "Adversaries killed by Timeout" [printf "%.1f%%" $ adversariesKilledByTimeout resCls] $
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -199,16 +199,24 @@ resultClassifiers GenesisTest{gtSchedule} RunGenesisTestResult{rgtrStateView} =
data ScheduleClassifiers =
ScheduleClassifiers{
-- | There is an adversary that did a rollback
adversaryRollback :: Bool,
adversaryRollback :: Bool,
-- | The honest peer did a rollback
honestRollback :: Bool
honestRollback :: Bool,
-- | All adversaries have an empty schedule: the only way to disconnect them are
-- network timeouts.
allAdversariesEmpty :: Bool,
-- | All adversaries have trivial schedules: they only have an initial state, and
-- do nothing afterwards.
allAdversariesTrivial :: Bool
}

scheduleClassifiers :: GenesisTestFull TestBlock -> ScheduleClassifiers
scheduleClassifiers GenesisTest{gtSchedule = schedule} =
ScheduleClassifiers
{ adversaryRollback
, honestRollback
, allAdversariesEmpty
, allAdversariesTrivial
}
where
hasRollback :: PeerSchedule TestBlock -> Bool
Expand Down Expand Up @@ -247,6 +255,15 @@ scheduleClassifiers GenesisTest{gtSchedule = schedule} =

honestRollback = value $ honest rollbacks

allAdversariesEmpty = all value $ others $ null <$> schedule

isTrivial :: PeerSchedule TestBlock -> Bool
isTrivial = \case
[] -> True
(t0, _):points -> all ((== t0) . fst) points

allAdversariesTrivial = all value $ others $ isTrivial <$> schedule

simpleHash ::
HeaderHash block ~ TestHash =>
ChainHash block ->
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -125,10 +125,11 @@ genChains genNumForks = do
gtSlotLength,
gtChainSyncTimeouts = chainSyncTimeouts gtSlotLength asc,
gtBlockFetchTimeouts = blockFetchTimeouts,
gtLoPBucketParams = LoPBucketParams { lbpCapacity = 10_000, lbpRate = 1_000 },
gtLoPBucketParams = LoPBucketParams { lbpCapacity = 100_000, lbpRate = 1_000 },
-- ^ REVIEW: Do we want to generate those randomly? For now, the chosen
-- values carry no special meaning. Someone needs to think about what values
-- would make for interesting tests.
gtCSJParams = CSJParams $ fromIntegral scg,
gtBlockTree = foldl' (flip BT.addBranch') (BT.mkTrunk goodChain) $ zipWith (genAdversarialFragment goodBlocks) [1..] alternativeChainSchemas,
gtSchedule = ()
}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
module Test.Consensus.Genesis.Tests (tests) where

import qualified Test.Consensus.Genesis.Tests.CSJ as CSJ
import qualified Test.Consensus.Genesis.Tests.DensityDisconnect as GDD
import qualified Test.Consensus.Genesis.Tests.LoE as LoE
import qualified Test.Consensus.Genesis.Tests.LongRangeAttack as LongRangeAttack
Expand All @@ -9,7 +10,8 @@ import Test.Tasty

tests :: TestTree
tests = testGroup "Genesis tests"
[ GDD.tests
[ CSJ.tests
, GDD.tests
, LongRangeAttack.tests
, LoE.tests
, LoP.tests
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE NamedFieldPuns #-}

module Test.Consensus.Genesis.Tests.CSJ (tests) where

import Control.Monad (replicateM)
import Data.Containers.ListUtils (nubOrd)
import Data.Functor (($>))
import Data.List (nub)
import Data.Maybe (mapMaybe)
import Ouroboros.Consensus.Block (blockSlot, succWithOrigin)
import Ouroboros.Consensus.MiniProtocol.ChainSync.Client
(TraceChainSyncClientEvent (..))
import Ouroboros.Consensus.Util.Condense (PaddingDirection (..),
condenseListWithPadding)
import qualified Ouroboros.Network.AnchoredFragment as AF
import Test.Consensus.BlockTree (BlockTree (..))
import Test.Consensus.Genesis.Setup
import Test.Consensus.Genesis.Tests.Uniform (genUniformSchedulePoints)
import Test.Consensus.PeerSimulator.Run (SchedulerConfig (..),
defaultSchedulerConfig)
import Test.Consensus.PeerSimulator.StateView (StateView (..))
import Test.Consensus.PeerSimulator.Trace (TraceEvent (..))
import Test.Consensus.PointSchedule
import Test.Consensus.PointSchedule.Peers (Peer (..), Peers (..),
mkPeers)
import Test.Tasty
import Test.Tasty.QuickCheck
import Test.Util.Orphans.IOLike ()
import Test.Util.TestBlock (Header, TestBlock)
import Test.Util.TestEnv (adjustQuickCheckMaxSize)

tests :: TestTree
tests =
adjustQuickCheckMaxSize (`div` 5) $
testGroup
"CSJ"
[ testGroup "Happy Path"
[ testProperty "synchronous" $ prop_happyPath True
, testProperty "asynchronous" $ prop_happyPath False
]
]

-- | Test of the “happy path” scenario of ChainSync Jumping (CSJ).
--
-- This test features one chain (ie. a block tree that is only trunk) and only
-- honest peers and syncs the chain in question with CSJ enabled. What we expect
-- to observe is that one of the honest peers becomes the dynamo while the
-- others become jumpers. Because the jumpers will agree to all the jumps, the
-- whole syncing should happen with CSJ without objectors.
--
-- The final property is that headers should only ever be downloaded once and
-- only from one peer (the dynamo). This is true except when almost caught-up:
-- when the dynamo is caught-up, it gets disengaged and one of the jumpers takes
-- its place and starts serving headers. This might lead to duplication of
-- headers, but only in a window of @jumpSize@ slots near the tip of the chain.
--
-- The boolean differentiates between “synchronous” and “asynchronous”
-- scenarios. In a synchronous scenario, all the honest peers have the same
-- schedule: they serve the chain exactly in the same way. In the asynchronous
-- scenario, a random schedule is generated for each peer (but they still serve
-- the same chain).
prop_happyPath :: Bool -> Property
prop_happyPath synchronized =
forAllGenesisTest
( do
gt <- genChains $ pure 0
honest <- genHonestSchedule gt
numOthers <- choose (1, 3)
otherHonests <- if synchronized
then pure $ replicate numOthers honest
else replicateM numOthers (genHonestSchedule gt)
pure $ gt $> mkPeers honest otherHonests
)
( defaultSchedulerConfig
{ scEnableCSJ = True
, scEnableLoE = True
, scEnableLoP = True
}
)
( -- NOTE: Shrinking makes the tests fail because the peers reject jumps
-- because their TP is G. This makes them into objectors and they then
-- start serving headers.
\_ _ -> []
)
( \gt StateView{svTrace} ->
let
-- The list of 'TraceDownloadedHeader' events that are not newer than
-- jumpSize from the tip of the chain. These are the ones that we
-- expect to see only once per header if CSJ works properly.
headerDownloadEvents =
mapMaybe
(\case
TraceChainSyncClientEvent pid (TraceDownloadedHeader hdr)
| not (isNewerThanJumpSizeFromTip gt hdr)
-> Just (pid, hdr)
_ -> Nothing
)
svTrace
receivedHeadersOnlyOnce = length (nub $ snd <$> headerDownloadEvents) == length headerDownloadEvents
-- NOTE: If all the headers are newer than jumpSize from the tip, then
-- 'headerDownloadEvents' is empty and the following condition would
-- violated if we used @==@.
receivedHeadersFromOnlyOnePeer = length (nubOrd $ fst <$> headerDownloadEvents) <= 1
in
tabulate ""
[ if headerDownloadEvents == []
then "All headers may be downloaded twice (uninteresting test)"
else "There exist headers that have to be downloaded exactly once"
] $
counterexample
("Downloaded headers (except jumpSize slots near the tip):\n" ++
( unlines $ fmap (" " ++) $ zipWith
(\peer header -> peer ++ " | " ++ header)
(condenseListWithPadding PadRight $ fst <$> headerDownloadEvents)
(condenseListWithPadding PadRight $ snd <$> headerDownloadEvents)
)
)
(receivedHeadersOnlyOnce && receivedHeadersFromOnlyOnePeer)
)
where
-- | This might seem wasteful, as we discard generated adversarial schedules.
-- It actually isn't, since we call it on trees that have no branches besides
-- the trunk, so no adversaries are generated.
genHonestSchedule :: GenesisTest TestBlock () -> Gen (PeerSchedule TestBlock)
genHonestSchedule gt = do
ps <- genUniformSchedulePoints gt
pure $ value $ honest ps

isNewerThanJumpSizeFromTip :: GenesisTestFull TestBlock -> Header TestBlock -> Bool
isNewerThanJumpSizeFromTip gt hdr =
let jumpSize = csjpJumpSize $ gtCSJParams gt
tipSlot = AF.headSlot $ btTrunk $ gtBlockTree gt
hdrSlot = blockSlot hdr
in
-- Sanity check: add @1 +@ after @>@ and watch the World burn.
hdrSlot + jumpSize >= succWithOrigin tipSlot
Loading

0 comments on commit 18f43bd

Please sign in to comment.