Skip to content

Commit

Permalink
Operations (#22)
Browse files Browse the repository at this point in the history
  • Loading branch information
LeoBorai authored May 31, 2024
1 parent 174b802 commit 5d7a597
Show file tree
Hide file tree
Showing 14 changed files with 379 additions and 9 deletions.
2 changes: 1 addition & 1 deletion docs/concepts/batching.mdx
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
sidebar_position: 6
sidebar_position: 10
slug: /concepts/batching
title: "Batching"
---
Expand Down
2 changes: 1 addition & 1 deletion docs/concepts/data-consistency.mdx
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
sidebar_position: 7
sidebar_position: 11
slug: /concepts/data-consistency
title: "Data Consistency"
---
Expand Down
2 changes: 1 addition & 1 deletion docs/concepts/delivery-semantics.mdx
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
sidebar_position: 8
sidebar_position: 12
slug: /concepts/delivery-semantics
title: "Delivery Semantics"
---
Expand Down
2 changes: 1 addition & 1 deletion docs/concepts/offsets.mdx
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
sidebar_position: 5
sidebar_position: 9
slug: /concepts/offsets
title: "Offsets"
---
Expand Down
7 changes: 7 additions & 0 deletions docs/concepts/operations/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"label": "Operations",
"position": 2,
"link": {
"type": "generated-index"
}
}
195 changes: 195 additions & 0 deletions docs/concepts/operations/data-retention.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
---
sidebar_position: 2
slug: /concepts/operations/data-retentiton
title: "Data Retention"
---

## Overview

Topic data is automatically pruned when **any** of following criteria is true:
1. Partition size exceeds the configured max partition size
2. Elapsed time since the last write to segment has passed the configured retention time

-> Data eviction operates on the segment level. If any above conditions are met, the entire segment gets removed. Only previous segments can be pruned. If your data resides in the active segment, it won't be evicted unless the segment turns to a historical (read-only) segment.


## Configuring retention

Retention is configured per-topic at the time of topic creation with `fluvio topic create`.

```shell
$ fluvio topic create -h
Create a Topic with the given name

fluvio-stable topic create [FLAGS] [OPTIONS] <name>
[...]
--retention-time <time>
Retention time (round to seconds) Ex: '1h', '2d 10s', '7 days' (default)

--segment-size <bytes>
Segment size (by default measured in bytes) Ex: `2048`, '2 Ki', '10 MiB', `1 GB`

--max-partition-size <bytes>
Max partition size (by default measured in bytes) Ex: `2048`, '2 Ki', '10 MiB', `1 GB`

ARGS:
<name> The name of the Topic to create
```

## Retention time

Retention time duration can be provided as a free-form string. Check out the [`humantime` docs](https://docs.rs/humantime/2.1.0/humantime/fn.parse_duration.html) for the supported time suffixes.

The default retention time is `7 days`

### Example retention configurations
* Delete old segments that are `6 hours` old

%copy first-line%
```bash
$ fluvio topic create test1 --retention-time "6h"
```

%copy first-line%
```bash
$ fluvio topic list
NAME TYPE PARTITIONS REPLICAS RETENTION TIME STATUS REASON
test1 computed 1 1 6h resolution::provisioned
```

* Delete old segments that are a day old

%copy first-line%
```bash
$ fluvio topic create test2 --retention-time "1d"
```

%copy first-line%
```bash
$ fluvio topic list
NAME TYPE PARTITIONS REPLICAS RETENTION TIME STATUS REASON
test2 computed 1 1 1day resolution::provisioned
```

* A very specific duration that is 1 day, 2 hours, 3 minutes and 4 seconds long

%copy first-line%
```bash
$ fluvio topic create test3 --retention-time "1d 2h 3m 4s"
```

%copy first-line%
```bash
$ fluvio topic list
NAME TYPE PARTITIONS REPLICAS RETENTION TIME STATUS REASON
test3 computed 1 1 1day 2h 3m 4s resolution::provisioned
```

## Segment size

Produced records persist on the SPU in file chunks that cannot exceed the segment size.

If adding a new record to the active segment will would result in exceeding the segment size, it is saved into a new segment.

Older segments are still available for consumption until they get pruned when the eviction condition is met.

Segment size can be provided as a free form string. Check out the [`bytesize` docs](https://github.com/hyunsik/bytesize/)
for the supported size suffixes.

The default segment size is `1 GB`

### Example retention configurations
* 25 MB segment size w/ 7 day retention time

%copy first-line%
```bash
$ fluvio topic create test4 --segment-size 25000000
```

* 36 GB segment size w/ 12 hr retention time

%copy first-line%
```bash
$ fluvio topic create test5 --segment-size "36 GB" --retention-time 12h
```

## Max partition size

Fluvio keeps tracking the disk size that a partition occupies on **SPU** node. It includes the payload and all bookkeeping data.
If partition size exceeds the max partition size property Fluvio triggers segments eviction. The oldest segment is deleted first.
The size enforcing operation provides `best-effort` guarantee. There might be time windows when the actual partition size may
exceed the configured max partition size. It is recommended to configure max partitions sizes to cover up to 80% of the disk size.
If the disk is full before the retention period is triggered, the SPU stops accepting messages and the overall health of the system
may be compromised. The max partition size is applied to every partition in a topic. If a topic has 3 partitions
and `--max-partition-size '10 GB'` is set, Fluvio controls the topic to not exceed 30 GB disk usage for the topic.

Max partition can be provided as a free-form string. Check out the [`bytesize` docs](https://github.com/hyunsik/bytesize/)
for the supported size suffixes.

The default max partition size is `100 GB`.
The max partition size must not be less than segment size.


### Example retention configurations

* 10 GB max partition size w/ 1 GB segment size (only 10 segments are allowed at any time)

%copy first-line%
```bash
fluvio topic create test6 --max-partition-size '10 GB' --segment-size '1 GB'
```

## Example data lifecycle

For a given topic with a retention of `7 days` using `1 GB` segments

* Day 0: 2.5 GB is written (total topic data: 2.5 GB)

| Topic Segment # | Segment size | Days since last write |
|-----------------|--------------|-----------------------|
| 0 | 1 GB | 0 |
| 1 | 1 GB | 0 |
| 2 | 0.5 GB | N/A |

* Day 6: Another 2 GB is written (total topic data: 4.5 GB,)

| Topic Segment # | Segment size | Days since last write |
|-----------------|--------------|-----------------------|
| 0 | 1 GB | 6 |
| 1 | 1 GB | 6 |
| 2 | 1 GB | 0 |
| 3 | 1 GB | 0 |
| 4 | 0.5 GB | N/A |

* Day 7: 2 segments from Day 0 are 7 days old. They are pruned (total topic data: 2.5 GB)

| Topic Segment # | Segment size | Days since last write |
|-----------------|--------------|-----------------------|
| 2 | 1 GB | 1 |
| 3 | 1 GB | 1 |
| 4 | 0.5 GB | N/A |

* Day 14: 2 segments from Day 7 are 7 days old. They are pruned (total topic data: 0.5 GB)

| Topic Segment # | Segment size | Days since last write |
|-----------------|--------------|-----------------------|
| 4 | 0.5 GB | N/A |

The newest segment is left alone and only begins to age once a new segment is being written to.

For a given topic with max partition size is `3 GB` and `1 GB` segments
* 2.5 GB is written (total partition data: 2.5 GB)

| Topic Segment # | Segment size |
|-----------------|--------------|
| 0 | 1 GB |
| 1 | 1 GB |
| 2 | 0.5 GB |

* 600 MB is written. The total size becomes 3.1 GB. The first segment is pruned.

| Topic Segment # | Segment size |
|---------------------|--------------|
| 1 | 1 GB |
| 2 | 1 GB |
| 3 | 0.1 GB |
58 changes: 58 additions & 0 deletions docs/concepts/operations/monitor.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
---
sidebar_position: 1
slug: /concepts/operations/monitor
title: "Monitor"
---

These objects represent the state of the Fluvio cluster.

## Pods
`kubectl get pods` should show one for the SC and one for each SPU specified when installing Fluvio.

Example:

%copy first-line%
```bash
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
fluvio-sc-6458d598d6-qq2td 1/1 Running 0 3m35s
fluvio-spg-main-0 1/1 Running 0 3m28s
```

## Services
`kubectl get svc` should show one public and one internal service for the SC and also one public and one internal service for each SPU.

Example:

%copy first-line%
```bash
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
fluvio-sc-internal ClusterIP 10.96.41.31 <none> 9004/TCP 4m18s
fluvio-sc-public LoadBalancer 10.107.219.124 10.107.219.124 9003:30947/TCP 4m18s
fluvio-spg-main ClusterIP None <none> 9005/TCP,9006/TCP 4m11s
fluvio-spu-main-0 LoadBalancer 10.111.223.127 10.111.223.127 9005:30023/TCP 4m11s
```

## CRDs
Fluvio stores internal metadata in K8s custom resources. [Fluvio CRDs].

To verify system state you can compare results from

%copy%
```bash
kubectl get spugroups
kubectl get spu
kubectl get topics
kubectl get partitions
```

should respectively, match results from

%copy%
```bash
fluvio cluster spg list
fluvio cluster spu list
fluvio cluster topics
fluvio partitions list
```
64 changes: 64 additions & 0 deletions docs/concepts/operations/troubleshooting.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
---
sidebar_position: 4
slug: /concepts/operations/troubleshooting
title: "Troubleshooting"
---

## Run cluster check

To diagnose abnormal behavior, a good first step is to run `fluvio cluster check`, which checks against common problems and misconfigurations.

If everything is configured properly, you should see a result like this:

%copy first-line%
```bash
$ fluvio cluster check
Running pre-startup checks...
βœ… Kubernetes config is loadable
βœ… Supported kubernetes version is installed
βœ… Supported helm version is installed
βœ… Can create service
βœ… Can create customresourcedefinitions
βœ… Can create secret
βœ… Fluvio system charts are installed
πŸŽ‰ All checks passed!
You may proceed with cluster startup
next: run `fluvio cluster start`
```

## Logs

To discover errors, you should examine logs from the following components:

### SC
%copy first-line%
```bash
kubectl logs -l app=fluvio-sc
```
### SPU
%copy first-line%
```bash
kubectl logs -l app=spu
```

## Handling Bugs

### Records logs and create and GitHub Issue

In the event of a bug in Fluvio, we appreciate if you could save the log output to file and create a [GitHub Issue](https://github.com/infinyon/fluvio/issues/new?assignees=&labels=bug&template=bug_report.md&title=%5BBug%5D%3A).

### Reach out to community

[Discord](https://discord.gg/zHsWBt5Z2n)

### Restart Pods

To attempt to recover from the bug, you can try restarting the K8s pods.

%copy%
```bash
kubectl delete pod -l app=fluvio-sc
kubectl delete pod -l app=spu
```

Fluvio pods are created by either `Deployments` or `StatefulSets`. Therefore deleting them will automatically cause new pods to be started.
Loading

0 comments on commit 5d7a597

Please sign in to comment.