Skip to content

Commit

Permalink
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Planning progress
Browse files Browse the repository at this point in the history
daniel-noland committed Nov 29, 2024
1 parent a97ca89 commit 2702ab5
Showing 7 changed files with 109 additions and 5 deletions.
5 changes: 4 additions & 1 deletion design-docs/src/mdbook/src/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -25,4 +25,7 @@
- [Dataplane / Control plane reconcile](./dataplane/tasks2/dataplane-control-plane-reconcile.md)
- [Gateway test env](./dataplane/tasks2/gateway-test-env.md)
- [Identify local traffic](./dataplane/tasks2/identify-local-traffic.md)

- [Configuration Persistence Investigation](./dataplane/tasks2/configuration-persistence-investigation.md)
- [Route manager](./dataplane/tasks2/route-manager.md)
- [Dataplane worker lifecycle](./dataplane/tasks2/dataplane-worker-lifecycle.md)
- [Telemetry (basic)](./dataplane/tasks2/telemetry-basic.md)
8 changes: 4 additions & 4 deletions design-docs/src/mdbook/src/dataplane/development-plan.md
Original file line number Diff line number Diff line change
@@ -37,7 +37,7 @@ digraph g {
frr_plugin_basic [ label="frr plugin\n(basic)", href="$ptr/frr-plugin.html", fontcolor=blue ]
frr_programmatic_control [label=<<b>programmatic<br/>control of frr</b>>, $difficult]
gw_test_env [label="gateway test env", href="$ptr/gateway-test-env.html", fontcolor=blue]
investigate_config_persist [ label=<<b>configuration<br/>persistence<br/>(investigation)</b>>, color="orange", style=filled ]
investigate_config_persist [ label=<<b>configuration<br/>persistence<br/>(investigation)</b>>, $urgent, href="$ptr/configuration-persistence-investigation.html", fontcolor=blue ]
local_traffic_ident [ label="identify local traffic", href="$ptr/identify-local-traffic.html", fontcolor=blue]
mp_cp_interaction [ label="MP/CP interaction"]
mp_dp_interaction [ label="MP/DP interaction"]
@@ -47,9 +47,9 @@ digraph g {
plugin_dp_transport [ label="plugin/dp transport", $completed, href="$ptr/dataplane-control-plane-transport.html", fontcolor=blue]
public_internet_access [label="public internet access"]
rate_limiting_investigation [label="rate limiting investigation", $completed]
routing_manager [label="routing manager"]
routing_manager [label="routing manager", href="$ptr/route-manager.html", fontcolor=blue]
separate_cp_containers [ label="one cp daemon per container", $optional]
telemetry_basic [label="telemetry (basic)"]
telemetry_basic [label="telemetry (basic)", href="$ptr/telemetry-basic.html", fontcolor=blue]
telemetry_integrated [label="telemetry (integration)"]
telemetry_investigation [label="telemetry\n(investigation)", $completed]
vpc_nat44 [label="nat44"]
@@ -61,7 +61,7 @@ digraph g {
vxlan_decap_investigation [label="vxlan decap\n(investigation)", $completed]
vxlan_encap [label="vxlan encap"]
vxlan_encap_investigation [label="vxlan encap\n(investigation)", $completed]
worker_lifecycle [label="dp worker lifecycle"]
worker_lifecycle [label="dp worker lifecycle", href="$ptr/dataplane-worker-lifecycle.html", fontcolor=blue]
investigate_config_persist -> config_db_schema
dp_dp_state_sync_design -> dp_dp_state_sync
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Configuration persistence

We need to officially pick a data store for configuration information.

This data store _is not_ intended for storing "fast" state.
Rather, this store needs to hold configuration data which is

1. durable
2. atomic
3. strongly typed
4. immediately consistent

## etcd

[`etcd`] is a reasonable choice because

1. It is already in use in kubernetes and is therefore likely to be well-maintained and tested.
2. we are already using / integrating with kubernetes so any flaws in `etcd` are likely to impact us anyway.

I have used [`zookeeper`](https://zookeeper.apache.org/) in the past and *strongly recommend against it*.

I would also consider [`consul`](https://github.com/hashicorp/consul) but [the license](https://github.com/hashicorp/consul/blob/main/LICENSE) is *_not_* acceptable.

A newer entry in the space is [`nacos`](https://github.com/alibaba/nacos) but I think it is less well suited since it only seems to support eventual consistency.

## rqlite

_I have not used [`rqlite`],_ but it seems to be a reasonable (if young) option.
My biggest concern is that [transactions](https://rqlite.io/docs/api/api/#transactions) support seems _very_ weak.

- has a supported [rust client](https://github.com/tomvoet/rqlite-rs) (and even a [sqlx](https://github.com/launchbadge/sqlx) client in the form of [sqlx-rqlite](https://crates.io/crates/sqlx-rqlite))
- [weak](https://rqlite.io/docs/api/read-consistency/#weak), [linearizable](https://rqlite.io/docs/api/read-consistency/#linearizable), and [strong](https://rqlite.io/docs/api/read-consistency/#strong) consistency models supported
- [transactions](https://rqlite.io/docs/api/api/#transactions) (this seems less than ideal tho)

## TiKV

[TiKV] seems like the **strongest near-term option** on the list.

I think that the biggest advantage is in the case that we want to _eventually_ switch to [TiDB].
That strategy allows us the most flexibility to use a "real" database in the future while using a "simple" KV database in the near term.

## TiDB

[TiDB] is a [MySQL] compatible [distributed SQL] database built on top of [TiKV].

The thing which I find most striking about this database is the excellent documentation and robust feature set (robust all things considered).

- [Generated columns](https://docs.pingcap.com/tidb/dev/generated-columns)
- [JSON](https://docs.pingcap.com/tidb/dev/data-type-json)
- [Referential integrity](https://docs.pingcap.com/tidb/dev/foreign-key)
- [Transactions](https://docs.pingcap.com/tidb/dev/transaction-overview)
- [Views](https://docs.pingcap.com/tidb/dev/views)
- [Change data capture](https://docs.pingcap.com/tidb/stable/ticdc-overview)

## Summary

Thus, I think the real choice is between [`etcd`], [TiDB], and [TiKV].

That choice comes down to how much we value the functionality of sql (multiple indexes, referential integrity, strong schema) vs. the upsides of kv databases (watches, more easily evolved schema).

{{#include ../../links.md}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Dataplane worker lifecycle

This is mostly a design task at this point.

Things which need to be worked out and documented:

1. communication pattern between workers
2. communication pattern between workers and the control plane
3. communication pattern between workers and the management plane
4. communication pattern between workers and the telemetry / monitoring subsystems

In each case, we need to consider

1. performance impact,
2. thread safety,
3. design simplicity,
4. transactionality,
5. extensibility.

## Likely dispatch

- primary: [@daniel-noland]
- sync with: [@sergeymatov]

{{#include ../../links.md}}
8 changes: 8 additions & 0 deletions design-docs/src/mdbook/src/dataplane/tasks2/route-manager.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Route manager

This is basically a big TODO.

For the moment, I would like to get some more precise feature definition from [@sergeymatov].

It is also important to align this task with the [dataplane worker lifecycle].

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Telemetry (basic)
6 changes: 6 additions & 0 deletions design-docs/src/mdbook/src/links.md
Original file line number Diff line number Diff line change
@@ -21,9 +21,14 @@
[IPsec]: https://en.wikipedia.org/wiki/IPsec
[IPv6 ND]: https://en.wikipedia.org/wiki/Neighbor_Discovery_Protocol
[LACP]: https://en.wikipedia.org/wiki/Link_aggregation#Link_Aggregation_Control_Protocol
[MySQL]: https://www.mysql.com/
[NAT]: https://en.wikipedia.org/wiki/Network_address_translation
[TiDB]: https://www.pingcap.com/
[TiKV]: https://tikv.org/
[`bfdd`]: https://docs.frrouting.org/en/latest/bfd.html
[`bgpd`]: https://docs.frrouting.org/en/latest/bgp.html
[`etcd`]: https://github.com/coreos/etcd
[`rqlite`]: https://rqlite.io/
[`zebra`]: https://docs.frrouting.org/en/latest/zebra.html
[bfdd]: https://docs.frrouting.org/en/latest/bfd.html
[bgpd]: https://docs.frrouting.org/en/latest/bgp.html
@@ -33,6 +38,7 @@
[bridge]: https://man7.org/linux/man-pages/man8/bridge.8.html
[capnproto]: https://capnproto.org/
[cbindgen]: https://github.com/mozilla/cbindgen
[distributed SQL]: https://en.wikipedia.org/wiki/Distributed_SQL
[frr]: https://frrouting.org/
[kernel]: https://en.wikipedia.org/wiki/Linux_kernel
[netlink]: https://en.wikipedia.org/wiki/Netlink

0 comments on commit 2702ab5

Please sign in to comment.