Skip to content

Commit

Permalink
code scratch
Browse files Browse the repository at this point in the history
  • Loading branch information
daniel-noland committed Dec 6, 2024
1 parent 5197a3b commit fd99fbd
Show file tree
Hide file tree
Showing 15 changed files with 126 additions and 72 deletions.
70 changes: 34 additions & 36 deletions design-docs/src/mdbook/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,39 +13,37 @@
- [Design session](./dataplane/design-session.md)
- [Offloading the dataplane](./dataplane/offloading-plan.md)
- [Development Plan](./dataplane/development-plan.md)
- [Control plane dev-env](dataplane/tasks/control-plane-dev-env.md)
- [Create control plane image](dataplane/tasks/create-control-plane-image.md)
- [Zebra Plugin](dataplane/tasks/frr-plugin.md)
- [Dataplane / Control plane transport](dataplane/tasks/dataplane-control-plane-transport.md)
- [Dataplane / Control plane protocol](dataplane/tasks/dataplane-control-plane-protocol.md)
- [Dataplane / Control plane reconcile](dataplane/tasks/dataplane-control-plane-reconcile.md)
- [Gateway test env](dataplane/tasks/gateway-test-env.md)
- [Identify local traffic](dataplane/tasks/identify-local-traffic.md)
- [Configuration Persistence Investigation](dataplane/tasks/configuration-persistence-investigation.md)
- [Route manager](dataplane/tasks/route-manager.md)
- [Dataplane worker lifecycle](dataplane/tasks/dataplane-worker-lifecycle.md)
- [Telemetry (investigation)](dataplane/tasks/telemetry-investigation.md)
- [Telemetry (basic)](dataplane/tasks/telemetry-basic.md)
- [Telemetry (integration)](dataplane/tasks/telemetry-integration.md)
- [Configuration database schema](dataplane/tasks/config-db-schema.md)
- [Management plane - dataplane interaction](dataplane/tasks/management-plane-dataplane-interaction.md)
- [VXLAN tunnels](dataplane/tasks/vxlan-tunnels.md)
- [Underlay routing](dataplane/tasks/underlay-routing.md)
- [Management plane - dataplane interaction](dataplane/tasks/management-plane-dataplane-interaction.md)
- [Management plane - control plane interaction](dataplane/tasks/management-plane-control-plane-interaction.md)
- [VPC routing](dataplane/tasks/vpc-routing.md)
- [Rate limiting investigation](dataplane/tasks/rate-limiting-investigation.md)
- [VPC rate-limiting](dataplane/tasks/vpc-rate-limiting.md)
- [NAT44](dataplane/tasks/NAT44.md)
- [NAT66](dataplane/tasks/NAT66.md)
- [NAT64 (investigation)](dataplane/tasks/NAT64-investigation.md)
- [NAT64](dataplane/tasks/NAT64.md)
- [State sync (design)](dataplane/tasks/state-sync-design.md)
- [State sync (implementation)](dataplane/tasks/state-sync.md)
- [Public internet access](dataplane/tasks/public-internet-access.md)
- [Fault tolerance (implementation)](dataplane/tasks/fault-tolerance-implementation.md)
- [Fault tolerance (validation)](dataplane/tasks/fault-tolerance-validation.md)
- [Performance measurement](dataplane/tasks/performance-measurement.md)
- [Core pinning](dataplane/tasks/core-pinning.md)
- [One control plane daemon per container](dataplane/tasks/one-control-plane-daemon-per-container.md)
- [Programmatic Control of FRR](dataplane/tasks/programmatic-control-of-frr.md)
- [Configuration Persistence Investigation](./dataplane/tasks2/configuration-persistence-investigation.md)
- [Configuration database schema](./dataplane/tasks2/config-db-schema.md)
- [Control plane dev-env](./dataplane/tasks2/control-plane-dev-env.md)
- [Core pinning](./dataplane/tasks2/core-pinning.md)
- [Create control plane image](./dataplane/tasks2/create-control-plane-image.md)
- [Dataplane / Control plane protocol](./dataplane/tasks2/dataplane-control-plane-protocol.md)
- [Dataplane / Control plane transport](./dataplane/tasks2/dataplane-control-plane-transport.md)
- [Dataplane worker lifecycle](./dataplane/tasks2/dataplane-worker-lifecycle.md)
- [Fault tolerance (implementation)](./dataplane/tasks2/fault-tolerance-implementation.md)
- [Fault tolerance (validation)](./dataplane/tasks2/fault-tolerance-validation.md)
- [Gateway test env](./dataplane/tasks2/gateway-test-env.md)
- [Identify local traffic](./dataplane/tasks2/identify-local-traffic.md)
- [Management plane - control plane interaction](./dataplane/tasks2/management-plane-control-plane-interaction.md)
- [Management plane - dataplane interaction](./dataplane/tasks2/management-plane-dataplane-interaction.md)
- [NAT44](./dataplane/tasks2/NAT44.md)
- [NAT64 (investigation)](./dataplane/tasks2/NAT64-investigation.md)
- [NAT64](./dataplane/tasks2/NAT64.md)
- [NAT66](./dataplane/tasks2/NAT66.md)
- [One control plane daemon per container](./dataplane/tasks2/one-control-plane-daemon-per-container.md)
- [Performance measurement](./dataplane/tasks2/performance-measurement.md)
- [Programmatic Control of FRR](./dataplane/tasks2/programmatic-control-of-frr.md)
- [Public internet access](./dataplane/tasks2/public-internet-access.md)
- [Rate limiting investigation](./dataplane/tasks2/rate-limiting-investigation.md)
- [Route manager](./dataplane/tasks2/route-manager.md)
- [State sync (design)](./dataplane/tasks2/state-sync-design.md)
- [State sync (implementation)](./dataplane/tasks2/state-sync.md)
- [Telemetry (basic)](./dataplane/tasks2/telemetry-basic.md)
- [Telemetry (integration)](./dataplane/tasks2/telemetry-integration.md)
- [Telemetry (investigation)](./dataplane/tasks2/telemetry-investigation.md)
- [Underlay routing](./dataplane/tasks2/underlay-routing.md)
- [VPC rate-limiting](./dataplane/tasks2/vpc-rate-limiting.md)
- [VPC routing](./dataplane/tasks2/vpc-routing.md)
- [VXLAN tunnels](./dataplane/tasks2/vxlan-tunnels.md)
- [Zebra Plugin](./dataplane/tasks2/zebra-plugin.md)
9 changes: 4 additions & 5 deletions design-docs/src/mdbook/src/dataplane/design-session.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ skinparam linetype ortho
"frr_agent": { "text": "FRR agent", "url": "#frr-agent" },
"zebra": { "text": "zebra", "url": "https://docs.frrouting.org/en/latest/zebra.html" },
"routing_daemons": { "text": "routing daemons", "url": "#routing-daemons" },
"hh_plugin": { "text": "plugin", "url": "#zebra-plugin" },
"zebra_plugin": { "text": "Zebra\\nplugin", "url": "#zebra-plugin" },
"kernel": { "text": "kernel", "url": "https://en.wikipedia.org/wiki/Linux_kernel" },
"interface_manager": { "text": "interface manager", "url": "#interface-manager" },
"routing_manager": { "text": "routing manager", "url": "#routing-manager" },
Expand Down Expand Up @@ -186,7 +186,7 @@ $r(kernel)
$r(control_plane) {
$r(routing_daemons)
$r(zebra) {
$r(hh_plugin)
$r(zebra_plugin)
}
$r(frr_agent)
}
Expand Down Expand Up @@ -223,7 +223,7 @@ dataplane_model <--> nat_manager
dataplane_model <--> routing_manager
management_plane_interface -- dataplane_model
nat_manager <--> dataplane_workers
hh_plugin --- control_plane_interface : [[ https://en.wikipedia.org/wiki/Unix_domain_socket unix socket ]]
zebra_plugin --- control_plane_interface : [[ https://en.wikipedia.org/wiki/Unix_domain_socket unix socket ]]
routing_daemons <-> zebra
routing_manager <--> dataplane_workers
state_sync <-> sister_state_sync : [[ https://en.wikipedia.org/wiki/Remote_direct_memory_access rdma]]
Expand Down Expand Up @@ -288,7 +288,7 @@ Be afraid. Make Fredi fill in this section. But also be afraid.

This is a planned [zebra] plugin in the same spirit as [`fpm`](https://docs.frrouting.org/projects/dev-guide/en/latest/fpm.html#id1) or [`dataplane_fpm_nl`](https://docs.frrouting.org/projects/dev-guide/en/latest/fpm.html#dplane-fpm-nl).

The core idea is to have a plugin that can be dynamically loaded into `zebra` and will listen to the `zebra` event stream for updates.
The core idea is to have a plugin that can be dynamically loaded into [zebra] and will listen to the [zebra event stream](https://github.com/FRRouting/frr/blob/ee5a3456d34a756c70ad8856ab7be7bed75ee31c/zebra/zebra_dplane.h#L114-L217) for updates.
The plugin will then take those updates and push them into the dataplane agent, allowing the dataplane to react to route updates.

</section>
Expand Down Expand Up @@ -627,4 +627,3 @@ end alt


{{#include ../links.md}}

23 changes: 12 additions & 11 deletions design-docs/src/mdbook/src/dataplane/development-plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,13 @@ digraph g {
cp_api_control_investigation [label=<<b>programmatic control of frr<br/>(investigation)</b>>, $urgent, href="$ptr/programmatic-control-of-frr.html", fontcolor=blue]
cp_dev_env [label="control plane\ndev env", href="$ptr/control-plane-dev-env.html", fontcolor=blue]
cp_image_creation [ label="Create control plane container image", href="$ptr/create-control-plane-image.html", fontcolor=blue]
dp_cp_reconciliation [ label="dp/cp reconcile", href="$ptr/dataplane-control-plane-reconcile.html", fontcolor=blue ]
dp_dev_env [label="dataplane dev env", $completed, href="../../build/index.html", fontcolor=blue]
dp_dp_state_sync [label="state sync\n(implementation)", $difficult, href="$ptr/state-sync.html", fontcolor=blue]
dp_dp_state_sync_design [label="state sync\n(design)", $urgent, href="$ptr/state-sync-design.html", fontcolor=blue]
dp_image_creation [label="dataplane image build", $completed]
fault_tolerance [label="fault tolerance (implementation)", href="$ptr/fault-tolerance-implementation.html", fontcolor=blue]
fault_tolerance_proof [label="fault tolerance (validation)", $difficult, href="$ptr/fault-tolerance-validation.html", fontcolor=blue]
frr_plugin_basic [ label="frr plugin\n(basic)", href="$ptr/frr-plugin.html", fontcolor=blue ]
zebra_plugin_basic [ label="zebra plugin\n(basic)", href="$ptr/zebra-plugin.html", fontcolor=blue ]
frr_programmatic_control [label=<<b>programmatic<br/>control of frr</b>>, $difficult, href="$ptr/programmatic-control-of-frr.html", fontcolor=blue]
gw_test_env [label="gateway test env", href="$ptr/gateway-test-env.html", fontcolor=blue]
investigate_config_persist [ label=<<b>configuration<br/>persistence<br/>(investigation)</b>>, $urgent, href="$ptr/configuration-persistence-investigation.html", fontcolor=blue ]
Expand All @@ -44,8 +43,8 @@ digraph g {
mp_dp_interaction [ label="management plane \ndataplane interaction", href="$ptr/management-plane-dataplane-interaction.html", fontcolor=blue]
nat64_investigation [label=<<b>NAT64 investigation</b>>, $urgent, href="$ptr/NAT64-investigation.html", fontcolor=blue]
performance_measurement [ label="measure performance", href="$ptr/performance-measurement.html", fontcolor=blue]
plugin_dp_proto [ label="plugin/dp protocol", $started, href="$ptr/dataplane-control-plane-protocol.html", fontcolor=blue]
plugin_dp_transport [ label="plugin/dp transport", $completed, href="$ptr/dataplane-control-plane-transport.html", fontcolor=blue]
plugin_dp_proto [ label="plugin/dataplane protocol", $started, href="$ptr/dataplane-control-plane-protocol.html", fontcolor=blue]
plugin_dp_transport [ label="plugin/dataplane transport", $completed, href="$ptr/dataplane-control-plane-transport.html", fontcolor=blue]
public_internet_access [label="public internet access", href="$ptr/public-internet-access.html", fontcolor=blue]
rate_limiting_investigation [label="rate limiting investigation", $completed]
routing_manager [label="routing manager", href="$ptr/route-manager.html", fontcolor=blue]
Expand Down Expand Up @@ -92,17 +91,16 @@ digraph g {
cp_dev_env -> gw_test_env
cp_image_creation -> cp_dev_env
cp_image_creation -> separate_cp_containers
dp_cp_reconciliation -> frr_plugin_basic
dp_dev_env -> gw_test_env
dp_image_creation -> dp_dev_env
gw_test_env -> frr_plugin_basic
frr_plugin_basic -> routing_manager
gw_test_env -> zebra_plugin_basic
zebra_plugin_basic -> routing_manager
config_db_schema -> mp_cp_interaction
config_db_schema -> mp_dp_interaction
local_traffic_ident -> frr_plugin_basic
local_traffic_ident -> zebra_plugin_basic
mp_dp_interaction -> vpc_routing
plugin_dp_proto -> dp_cp_reconciliation
plugin_dp_transport -> dp_cp_reconciliation
plugin_dp_proto -> zebra_plugin_basic
plugin_dp_transport -> zebra_plugin_basic
routing_manager -> underlay_routing
config_db_schema -> underlay_routing
vpc_routing -> vpc_nat44
Expand All @@ -128,12 +126,13 @@ digraph g {
}
@enddot
```

<figcaption>

> Graph of the engineering development plan.
> Each node on the graph represents a task or required function.
> No task can be _completed_ without all the other tasks which point to it.
>
>
> * Tasks shown in orange are points of higher uncertainty and risk.
> * Tasks shown in pink are points of expected higher difficulty.
> * Tasks shown in gray are already completed.
Expand All @@ -146,3 +145,5 @@ digraph g {
> [!WARNING]
> Tasks of high expected difficulty are different from tasks which we expect will be very time-consuming.
{{#include ../links.md}}
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Rather, this store needs to hold configuration data which is

## etcd

[`etcd`] is a reasonable choice because
[etcd] is a reasonable choice because

1. It is already in use in kubernetes and is therefore likely to be well-maintained and tested.
2. we are already using / integrating with kubernetes so any flaws in `etcd` are likely to impact us anyway.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Control plane dev-env

Create and document a development environment for the [`zebra`] [hedgehog plugin].
Create and document a development environment for the [zebra] [hedgehog plugin].

Requirements:

Expand All @@ -9,7 +9,6 @@ Requirements:
- **REQUIRE**: CI runs tests in dev-env container or,
- **IDEALLY**: tests run in a more minimal test-env container.

```yaml issue-meta
assign:
- @Fredi Raspall
```
## Likely dispatch

- [@Fredi-raspall]
2 changes: 1 addition & 1 deletion design-docs/src/mdbook/src/dataplane/tasks/core-pinning.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Core pinning

> [!NOTE]
> I think we can punt on this till the last minute!
> I think we can punt on this until the last minute!
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,9 @@ We need to generate a docker image to run our control plane.

Both [@Fredi-raspall] and [@daniel-noland] have made some progress on this task and should sync up to get it over the line.

```yaml issue-meta
assign:
- @Fredi Raspall
```
## Likely dispatch

- [@Fredi-raspall]

[Lua scripting]: https://docs.frrouting.org/en/latest/scripting.html

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

It seems like we have all agreed on [unix domain sockets].

## Likely assignment
## Likely dispatch

* [@Fredi-raspall]
* coordinate with: [@daniel-noland]
Expand Down
5 changes: 0 additions & 5 deletions design-docs/src/mdbook/src/dataplane/tasks/frr-plugin.md

This file was deleted.

32 changes: 32 additions & 0 deletions design-docs/src/mdbook/src/dataplane/tasks/pick-a-datastore.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Pick a data store

We need to officially pick a data store for configuration information.

This data store _is not_ intended for storing "fast" state.
Rather, this store needs to hold configuration data which is

1. durable
2. atomic
3. strongly typed
4. immediately consistent

[`etcd`] is a reasonable choice because

1. It is already in use in kubernetes and is therefore likely to be well-maintained and tested.
2. we are already using / integrating with kubernetes so any flaws in `etcd` are likely to impact us anyway.

I have used [`zookeeper`](https://zookeeper.apache.org/) in the past and *strongly recommend against it*.

I would also consider [`consul`](https://github.com/hashicorp/consul) but [the license](https://github.com/hashicorp/consul/blob/main/LICENSE) is *_not_* acceptable.

A newer entry in the space is [`nacos`](https://github.com/alibaba/nacos) but I think it is less well suited since it only seems to support eventual consistency.

The remaining option I know of is [`rqlite`]. _I have not used it,_ but it seems to be a reasonable option.

- has a supported [rust client](https://github.com/tomvoet/rqlite-rs) (and even a [sqlx](https://github.com/launchbadge/sqlx) client in the form of [sqlx-rqlite](https://crates.io/crates/sqlx-rqlite))
- [weak](https://rqlite.io/docs/api/read-consistency/#weak), [linearizable](https://rqlite.io/docs/api/read-consistency/#linearizable), and [strong](https://rqlite.io/docs/api/read-consistency/#strong) consistency models supported
- [transactions](https://rqlite.io/docs/api/api/#transactions) (this seems less than ideal tho)

Thus, I think the real choice is between [`etcd`] and [`rqlite`].

That choice comes down to how much we value the functionality of sqlite (multiple indexes, referential integrity, strong schema) vs. the upsides of [etcd] (watches, battle tested, and more widely used).
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,7 @@

Just rate limiting!

Explicitly not full QoS for the moment.
If we involve QoS in the MVP then we will have zero chance on this timeline.
Explicitly not full [QoS] for the moment.
If we involve [QoS] in the MVP then we will have zero chance on this timeline.

{{#include ../../links.md}}
24 changes: 24 additions & 0 deletions design-docs/src/mdbook/src/dataplane/tasks/zebra-plugin.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Zebra Plugin (basic)

The dataplane and control plane need to communicate with each other regarding

1. Full routing tables (for [state sync])
2. route updates (i.e. differential updates)
3. route offloading status (including failures)
4. Address assignments, to ensure the dataplane can configure [local delivery](./identify-local-traffic.md)

Keep in mind that route tables are, in general, notably more complex than a naive LPM trie, and may include like:

1. [ECMP]/WCMP
2. [encapsulation rules](https://www.man7.org/linux/man-pages/man8/ip-route.8.html),
3. [nexthop groups](https://man7.org/linux/man-pages/man8/ip-nexthop.8.html),
4. multicast routes (this is unlikely to be important in the near term).

We only expect to support basic IPv4 and IPv6 LPM routes in the near term, but feature evolution should be accounted for in the design.

## Likely dispatch

* [@Fredi-raspall]
* coordinate with: [@daniel-noland]

{{#include ../../links.md}}
4 changes: 4 additions & 0 deletions design-docs/src/mdbook/src/links.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
[LACP]: https://en.wikipedia.org/wiki/Link_aggregation#Link_Aggregation_Control_Protocol
[MySQL]: https://www.mysql.com/
[NAT]: https://en.wikipedia.org/wiki/Network_address_translation
[QoS]: https://en.wikipedia.org/wiki/Quality_of_service
[TiDB]: https://www.pingcap.com/
[TiKV]: https://tikv.org/
[VXLAN]: https://en.wikipedia.org/wiki/Virtual_Extensible_LAN
Expand All @@ -42,6 +43,8 @@
[cbindgen]: https://github.com/mozilla/cbindgen
[distributed SQL]: https://en.wikipedia.org/wiki/Distributed_SQL
[dpdk]: https://www.dpdk.org/
[etcd]: https://github.com/coreos/etcd
[etherparse]: https://github.com/JulianSchmid/etherparse
[frr]: https://frrouting.org/
[graphana]: https://grafana.com/
[kernel]: https://en.wikipedia.org/wiki/Linux_kernel
Expand All @@ -51,6 +54,7 @@
[network address translation]: https://en.wikipedia.org/wiki/Network_address_translation
[prometheus]: https://prometheus.io/
[protobuf]: https://protobuf.dev/
[rqlite]: https://rqlite.io/
[rte lcores]: https://doc.dpdk.org/api/rte__lcore_8h.html
[serde]: https://serde.rs/
[tracing]: https://docs.rs/tracing/latest/tracing/
Expand Down
1 change: 1 addition & 0 deletions justfile
Original file line number Diff line number Diff line change
Expand Up @@ -488,6 +488,7 @@ mdbook *args="build":
--rm \
--init \
--volume "$(pwd):$(pwd)" \
--env HOME=/tmp \
--user "$(id -u):$(id -g)" \
--mount type=bind,source=/tmp/doc-env,target=/tmp \
--workdir "$(pwd)" \
Expand Down
2 changes: 1 addition & 1 deletion scripts/dpdk-sys.env
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
DPDK_SYS_BRANCH="main"
DPDK_SYS_COMMIT="362f54faf27e7b02148fe524492455e2f6762854"
DPDK_SYS_COMMIT="ae06e718b310c8ec8838d5198fc7cb5f9704125d"

0 comments on commit fd99fbd

Please sign in to comment.