Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/improve traceroute #159

Merged
merged 47 commits into from
Aug 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
2b437c5
feat: Improve traceroute
niklastreml Jul 25, 2024
1c9b5e4
chore: add test setup
niklastreml Jul 25, 2024
9ad9535
fix: update unit test for new behaviour
niklastreml Jul 25, 2024
4ccae9a
feat: implement retry logic
niklastreml Jul 26, 2024
476f570
feat: show all attempted hops in api output
niklastreml Jul 26, 2024
b347c9b
feat: logging
niklastreml Jul 26, 2024
ced17bd
feat: check permissions for icmp
niklastreml Jul 26, 2024
01b76b6
fix: calm down linter
niklastreml Jul 26, 2024
84e92c7
refactor: tracerouteConfig struct for factory
niklastreml Jul 26, 2024
9297a1f
fix: unbuffered channel caused sparrow to hang on shut down when trac…
niklastreml Jul 26, 2024
8e450e2
feat: prometheus metrics
niklastreml Jul 26, 2024
b238797
testing: simplify test setup
niklastreml Jul 26, 2024
9aa6a8b
docs: update README.md
niklastreml Jul 26, 2024
e496418
docs: fix incorrect retry docs
niklastreml Jul 26, 2024
d44feca
docs: how to debug traceroute
niklastreml Jul 26, 2024
474a845
docs: add clean up section
niklastreml Jul 26, 2024
d977b6b
fix: initialize metrics in test
niklastreml Jul 26, 2024
8f69f80
fix: ignore some linter things
niklastreml Jul 26, 2024
4dae4b9
fix: linters are my friends
niklastreml Jul 26, 2024
906521c
fix: minHops now equals maxHops if no hop was able to connect
niklastreml Jul 26, 2024
20c714c
refactor: rename variables to make code more understandable
niklastreml Jul 30, 2024
7de61ba
docs: better comment for hops map
niklastreml Jul 30, 2024
cecea62
fix: correct channel buffer size
niklastreml Jul 30, 2024
bb2e5ac
refactor: remove channel from traceroute and simplify arguments
niklastreml Jul 30, 2024
d1379dc
refactor: split up traceroute function
niklastreml Jul 30, 2024
5d1cea9
fix: caps in json. All json output now starts with lowercase as god i…
niklastreml Jul 31, 2024
42c7123
tests: add some testcases
niklastreml Jul 31, 2024
9e163bb
feat(ci): e2e tests for traceroute
niklastreml Jul 31, 2024
b88ca55
fix(ci): snapshot
niklastreml Jul 31, 2024
501890a
fix(ci): install kathara to path
niklastreml Jul 31, 2024
bdfac86
feat(ci): retry ci
niklastreml Jul 31, 2024
872bf5a
feat(ci): kathara config
niklastreml Jul 31, 2024
92ce5f9
fix: flaky e2e test due to wrong socket reading logic
niklastreml Jul 31, 2024
5bf2d7a
chore: add dist to .gitignore
niklastreml Jul 31, 2024
fe4f631
feat: debug logs in ci
niklastreml Aug 5, 2024
4434492
chore: rename setupIcmpListener
niklastreml Aug 5, 2024
247730b
refactor: constants
niklastreml Aug 5, 2024
aa347ef
feat: unix package
niklastreml Aug 5, 2024
ba281ee
feat: ipv6
niklastreml Aug 5, 2024
63292f0
refactor: retry context
niklastreml Aug 5, 2024
6c5f24a
fix: correct log level in e2e tests
niklastreml Aug 5, 2024
b58d300
fix(ci): only run e2e test once on pr
niklastreml Aug 5, 2024
61c0564
feat(ci): only run on push events to prevent duplicate pipelines
niklastreml Aug 5, 2024
0a23dee
chore: lint markdown files
lvlcn-t Aug 5, 2024
e137d6a
fix: handle IPv6 icmpv6 time exceeded
lvlcn-t Aug 5, 2024
6c83857
refactor: inject more metadata into logs
niklastreml Aug 6, 2024
8fe8f33
Merge remote-tracking branch 'origin/main' into feat/improve-traceroute
niklastreml Aug 7, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ name: Continuous Integration

on:
push:
pull_request:

permissions:
contents: write
Expand Down
55 changes: 55 additions & 0 deletions .github/workflows/e2e_checks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
name: E2E - Test checks

on:
push:

permissions:
contents: read

jobs:
test_e2e:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"

- name: Install dependencies
run: |
sudo add-apt-repository ppa:katharaframework/kathara
sudo apt-get update
sudo apt-get install -y jq kathara
- name: Setup kathara
run: |
echo '{
"image": "kathara/base",
"manager_type": "docker",
"terminal": "/usr/bin/xterm",
"open_terminals": false,
"device_shell": "/bin/bash",
"net_prefix": "kathara",
"device_prefix": "kathara",
"debug_level": "INFO",
"print_startup_log": true,
"enable_ipv6": false,
"last_checked": 1721834897.2415252,
"hosthome_mount": false,
"shared_mount": true,
"image_update_policy": "Prompt",
"shared_cds": 1,
"remote_url": null,
"cert_path": null,
"network_plugin": "kathara/katharanp_vde"
}' > ~/.config/kathara.conf

- name: Build binary for e2e
uses: goreleaser/goreleaser-action@v5
with:
version: latest
args: build --single-target --clean --snapshot --config .goreleaser-ci.yaml

- name: Run e2e tests
run: |
./scripts/run_e2e_tests.sh
niklastreml marked this conversation as resolved.
Show resolved Hide resolved
1 change: 0 additions & 1 deletion .github/workflows/end2end.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
name: End2End Testing
on:
push:
pull_request:

jobs:
end2end:
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/pre-commit.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name: pre-commit.ci

on: [push, pull_request]
on:
push:

jobs:
pre-commit:
Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/test_sast.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ name: Test - SAST

on:
push:
pull_request:

permissions:
contents: read
Expand All @@ -21,4 +20,4 @@ jobs:
- name: Run Gosec Security Scanner
uses: securego/gosec@master
with:
args: ./...
args: ./...
1 change: 0 additions & 1 deletion .github/workflows/test_unit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ name: Test - Unit

on:
push:
pull_request:

permissions:
contents: read
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,4 @@ gen

# Temporary directory
.tmp/*
dist
96 changes: 87 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,9 @@
- [DNS Metrics](#dns-metrics)
- [Check: Traceroute](#check-traceroute)
- [Example configuration](#example-configuration-3)
- [Required Capabilities](#required-capabilities)
- [Optional Capabilities](#optional-capabilities)
- [Traceroute Prometheus Metrics](#traceroute-prometheus-metrics)
- [Traceroute API Metrics](#traceroute-api-metrics)
- [API](#api)
- [Metrics](#metrics)
- [Prometheus Integration](#prometheus-integration)
Expand Down Expand Up @@ -504,7 +506,8 @@ dns:
| ---------------- | ----------------- | ---------------------------------------------------------------------------- |
| `interval` | `duration` | Interval to perform the Traceroute check. |
| `timeout` | `duration` | Timeout for every hop. |
| `retries` | `integer` | Number of times to retry the traceroute for a target, if it fails. |
| `retry.count` | `integer` | Number of retries for the latency check. |
| `retry.delay` | `duration` | Initial delay between retries for the latency check. |
| `maxHops` | `integer` | Maximum number of hops to try before giving up. |
| `targets` | `list of objects` | List of targets to traceroute to. |
| `targets[].addr` | `string` | The address of the target to traceroute to. Can be an IP address or DNS name |
Expand All @@ -515,25 +518,100 @@ dns:
<!-- markdownlint-enable MD024 -->

```yaml
traceroute:
traceroute:
interval: 5s
timeout: 3s
retries: 3
maxHops: 8
retry:
count: 3
delay: 1s
maxHops: 30
targets:
- addr: 8.8.8.8
port: 53
- addr: www.google.com
port: 80
```

#### Required Capabilities
#### Optional Capabilities

Sparrow does not need any extra permissions to run this check. However, some data, like the ip address
of the hop that dropped a packet, will not be available. To enable this functionality, there are two options:

- Run sparrow as root:

```bash
sudo sparrow run --config config.yaml
```

- Allow sparrow to create raw sockets, by assigning the `CAP_NET_RAW` capability to the sparrow binary:

```bash
sudo setcap 'cap_net_raw=ep' sparrow
```

#### Traceroute Prometheus Metrics

- `sparrow_traceroute_check_duration_ms{target="google.com"} 43150`
- Type: Gauge
- Description: How long the last traceroute took for this target in total
- `sparrow_traceroute_minimum_hops{target="google.com"} 14`
- Type: Gauge
- Description: The minimum number of hops required to reach a target

#### Traceroute API Metrics

To use this check, sparrow needs to be run with the `CAP_NET_RAW` capability or elevated privileges to be able to send raw packets.
Using the `CAP_NET_RAW` capability is recommended over running sparrow as sudo.
The traceroute check exposes additional data through its rest API that isn't available in prometheus.
This data give a more detailed breakdown of the trace and can be found at `/v1/metrics/traceroute` and is
meant to be a json representation of traditional traceroute output:

```bash
sudo setcap 'cap_net_raw=ep' sparrow
$ traceroute -T -q 1 100.1.2.2
1 200.2.0.1 (200.2.0.1) 2 ms
2 11.0.0.34 (11.0.0.34) 5 ms
...
```

Is roughly equal to this:

```json
{
"data": {
"100.1.2.2": {
"MinHops": 1,
"Hops": {
"1": [
{
"Latency": 2,
"Addr": {
"IP": "200.2.0.1",
"Port": 80,
"Zone": ""
},
"Name": "",
"Ttl": 1,
"Reached": false
}
],
"2": [
{
"Latency": 5,
"Addr": {
"IP": "11.0.0.34",
"Port": 80,
"Zone": ""
},
"Name": "",
"Ttl": 2,
"Reached": false
}
]
...
}
},
},
"timestamp": "2024-07-26T15:49:39.60760766+02:00"
}

```

## API
Expand Down
21 changes: 21 additions & 0 deletions e2e/traceroute/lab.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
LAB_DESCRIPTION="A simple example showing how to configure static routes"
LAB_VERSION=2.0
LAB_AUTHOR="T. Caiazzi, G. Di Battista, M. Patrignani, M. Pizzonia, F. Ricci, M. Rimondini"
[email protected]
LAB_WEB=http://www.kathara.org/

r1[0]="A"
r1[1]="B"
r1[image]="kathara/base"

r2[0]="C"
r2[1]="B"
r2[image]="kathara/base"

pc1[0]="A"
pc1[image]="kathara/base"
pc1[env]="LOG_LEVEL=DEBUG"
pc1[env]="LOG_FORMAT=TEXT"

pc2[0]="C"
pc2[image]="kathara/base"
2 changes: 2 additions & 0 deletions e2e/traceroute/pc1.startup
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
ip address add 195.11.14.5/24 dev eth0
ip route add default via 195.11.14.1 dev eth0
3 changes: 3 additions & 0 deletions e2e/traceroute/pc2.startup
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
ip address add 200.1.1.7/24 dev eth0
ip route add default via 200.1.1.1 dev eth0
systemctl start apache2
3 changes: 3 additions & 0 deletions e2e/traceroute/r1.startup
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
ip address add 195.11.14.1/24 dev eth0
ip address add 100.0.0.9/30 dev eth1
ip route add 200.1.1.0/24 via 100.0.0.10 dev eth1
3 changes: 3 additions & 0 deletions e2e/traceroute/r2.startup
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
ip address add 200.1.1.1/24 dev eth0
ip address add 100.0.0.10/30 dev eth1
ip route add 195.11.14.0/24 via 100.0.0.9 dev eth1
25 changes: 25 additions & 0 deletions e2e/traceroute/shared/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# DNS sparrow is exposed on
name: sparrow.caas-t21.telekom.de

# Selects and configures a loader to continuously fetch the checks' configuration at runtime
loader:
# Defines which loader to use. Options: "file | http"
type: file
# The interval in which sparrow tries to fetch a new configuration
# If this isn't set or set to 0, the loader will only retrieve the configuration once
interval: 30s

# Config specific to the file loader
# The file loader is not intended for production use
file:
# Location of the file in the local filesystem
path: /shared/config.yaml

traceroute:
interval: 5s
timeout: 3s
retries: 3
maxHops: 3
targets:
- addr: 200.1.1.7
port: 80
2 changes: 2 additions & 0 deletions e2e/traceroute/shared/get_api.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
curl -s localhost:8080/v1/metrics/traceroute > /shared/api.json
curl -s localhost:8080/metrics > /shared/prometheus.txt
74 changes: 74 additions & 0 deletions e2e/traceroute/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
#!/bin/bash

EXIT_CODE=0

function cleanup()
{
kathara lclean
yes | rm ./shared/api.json ./shared/prometheus.txt ./shared/sparrow
exit $EXIT_CODE
}

function error() {
echo "[ ERROR ]: $@"
EXIT_CODE=1
}

function info() {
echo "[ INFO ]: $@"
}

function success() {
echo "[ SUCCESS ]: $@"
}

function check_prometheus_output() {
if grep -q 'sparrow_traceroute_minimum_hops{target="200.1.1.7"} 3' ./shared/prometheus.txt; then
success "The specific Prometheus output is present."
else
error "The specific Prometheus output is not present."
fi
}

function check_api_output() {
if jq -e '
.data["200.1.1.7"].hops |
.["1"][0].addr.ip == "195.11.14.1" and
.["1"][0].reached == false and
.["1"][0].ttl == 1 and
.["2"][0].addr.ip == "100.0.0.10" and
.["2"][0].reached == false and
.["2"][0].ttl == 2 and
.["3"][0].addr.ip == "200.1.1.7" and
.["3"][0].reached == true and
.["3"][0].ttl == 3 and
.["3"][0].addr.port == 80' ./shared/api.json > /dev/null; then
success "The API output matches the expected hops and conditions."
else
error "The API output does not match the expected hops and conditions."
cat ./shared/api.json
fi
}

trap cleanup EXIT

# Start the Kathará lab
kathara lstart


# Copy the binary into the shared folder
info "Using $SPARROW_BIN"
cp $SPARROW_BIN ./shared/sparrow

# Start Sparrow on pc1
kathara exec pc1 "/shared/sparrow run --config /shared/config.yaml" &

# Wait for 10 seconds to ensure Sparrow is up and running
sleep 10

# Curl the API of Sparrow
kathara exec pc1 "bash /shared/get_api.sh"

check_prometheus_output

check_api_output
Loading
Loading