Skip to content

Commit

Permalink
Merge pull request rook#9997 from BlaineEXE/upgrade-docs-for-1.9
Browse files Browse the repository at this point in the history
Upgrade docs for 1.9
  • Loading branch information
BlaineEXE authored Apr 6, 2022
2 parents 3c83c62 + 8d61950 commit 7adba6c
Show file tree
Hide file tree
Showing 7 changed files with 94 additions and 86 deletions.
2 changes: 1 addition & 1 deletion Documentation/ceph-monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ indent: true
Each Rook Ceph cluster has some built in metrics collectors/exporters for monitoring with [Prometheus](https://prometheus.io/).

If you do not have Prometheus running, follow the steps below to enable monitoring of Rook. If your cluster already
contains a Prometheus instance, it will automatically discover Rooks scrape endpoint using the standard
contains a Prometheus instance, it will automatically discover Rook's scrape endpoint using the standard
`prometheus.io/scrape` and `prometheus.io/port` annotations.

> **NOTE**: This assumes that the Prometheus instances is searching all your Kubernetes namespaces for Pods with these annotations.
Expand Down
4 changes: 2 additions & 2 deletions Documentation/ceph-pool-crd.md
Original file line number Diff line number Diff line change
Expand Up @@ -204,8 +204,8 @@ stretched) then you will have 2 replicas per datacenter where each replica ends
* `name`: The name of Ceph pools is based on the `metadata.name` of the CephBlockPool CR. Some built-in Ceph pools
require names that are incompatible with K8s resource names. These special pools can be configured
by setting this `name` to override the name of the Ceph pool that is created instead of using the `metadata.name` for the pool.
Two pool names are supported: `device_health_metrics` and `.nfs`. See the example
[device health metrics pool](https://github.com/rook/rook/blob/{{ branchName }}/deploy/examples/pool-device-health-metrics.yaml).
Only the following pool names are supported: `device_health_metrics`, `.nfs`, and `.mgr`. See the example
[builtin mgr pool](https://github.com/rook/rook/blob/{{ branchName }}/deploy/examples/pool-builtin-mgr.yaml).

* `parameters`: Sets any [parameters](https://docs.ceph.com/docs/master/rados/operations/pools/#set-pool-values) listed to the given pool
* `target_size_ratio:` gives a hint (%) to Ceph in terms of expected consumption of the total cluster capacity of a given pool, for more info see the [ceph documentation](https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-expected-pool-size)
Expand Down
135 changes: 69 additions & 66 deletions Documentation/ceph-upgrade.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ We welcome feedback and opening issues!

## Supported Versions

This guide is for upgrading from **Rook v1.7.x to Rook v1.8.x**.
This guide is for upgrading from **Rook v1.8.x to Rook v1.9.x**.

Please refer to the upgrade guides from previous releases for supported upgrade paths.
Rook upgrades are only supported between official releases. Upgrades to and from `master` are not
Expand All @@ -27,6 +27,7 @@ supported.
For a guide to upgrade previous versions of Rook, please refer to the version of documentation for
those releases.

* [Upgrade 1.7 to 1.8](https://rook.io/docs/rook/v1.8/ceph-upgrade.html)
* [Upgrade 1.6 to 1.7](https://rook.io/docs/rook/v1.7/ceph-upgrade.html)
* [Upgrade 1.5 to 1.6](https://rook.io/docs/rook/v1.6/ceph-upgrade.html)
* [Upgrade 1.4 to 1.5](https://rook.io/docs/rook/v1.5/ceph-upgrade.html)
Expand All @@ -42,20 +43,17 @@ those releases.

## Breaking changes in this release

* The minimum Kubernetes version has changed to v1.16. You must update to at least Kubernetes version
v1.16 before upgrading Rook from v1.7 to v1.8.
* Helm charts now define default resource requests and limits for Rook-Ceph Pods. If you use Helm,
ensure you have defined an override for these in your `values.yaml` if you don't wish to use the
recommended defaults. Setting resource requests and limits could mean that Kubernetes will not
allow Pods to be scheduled in some cases. If sufficient resources are not available, you can
reduce or remove the requests and limits.

* Rook v1.8 no longer supports Ceph Nautilus (14.2.x). Nautilus users must
[upgrade Ceph](#ceph-version-upgrades) to Octopus (15.2.x) or Pacific (16.2.x) before upgrading to
Rook v1.8.
* MDS liveness and startup probes are now configured by the CephFilesystem resource instead of
CephCluster. Upgrade instructions are [below](#mds-liveness-and-startup-probes).

* Rook's FlexVolume driver has been deprecated and removed in Rook v1.8. FlexVolume users must
migrate Rook-Ceph block storage PVCs to CSI before upgrading. A migration tool has been created
and is documented [here](https://rook.io/docs/rook/v1.7/flex-to-csi-migration.html).

* The location of example manifests has changed to reduce the amount of user typing needed and to be
easier to discover for new Rook users. `cluster/examples/kubernetes/ceph` manifests can now be
found in `deploy/examples`.
* Rook no longer deploys Prometheus rules from the operator. If you have been relying on Rook to
deploy prometheus rules in the past, please follow the upgrade instructions [below](#prometheus).

## Considerations

Expand All @@ -71,23 +69,23 @@ With this upgrade guide, there are a few notes to consider:

Unless otherwise noted due to extenuating requirements, upgrades from one patch release of Rook to
another are as simple as updating the common resources and the image of the Rook operator. For
example, when Rook v1.8.1 is released, the process of updating from v1.8.0 is as simple as running
example, when Rook v1.9.1 is released, the process of updating from v1.9.0 is as simple as running
the following:

First get the latest common resources manifests that contain the latest changes for Rook v1.8.
First get the latest common resources manifests that contain the latest changes for Rook v1.9.
```sh
git clone --single-branch --depth=1 --branch v1.8.1 https://github.com/rook/rook.git
git clone --single-branch --depth=1 --branch v1.9.1 https://github.com/rook/rook.git
cd rook/deploy/examples
```

If you have deployed the Rook Operator or the Ceph cluster into a different namespace than
`rook-ceph`, see the [Update common resources and CRDs](#1-update-common-resources-and-crds)
section for instructions on how to change the default namespaces in `common.yaml`.

Then apply the latest changes from v1.8 and update the Rook Operator image.
Then apply the latest changes from v1.9 and update the Rook Operator image.
```console
kubectl apply -f common.yaml -f crds.yaml
kubectl -n rook-ceph set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.8.1
kubectl -n rook-ceph set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.9.1
```

As exemplified above, it is a good practice to update Rook-Ceph common resources from the example
Expand All @@ -107,7 +105,7 @@ The upgrade steps in this guide will clarify if Helm manages the step for you.
The `rook-ceph` helm chart upgrade performs the Rook upgrade.
The `rook-ceph-cluster` helm chart upgrade performs a [Ceph upgrade](#ceph-version-upgrades) if the Ceph image is updated.

## Upgrading from v1.7 to v1.8
## Upgrading from v1.8 to v1.9

**Rook releases from master are expressly unsupported.** It is strongly recommended that you use
[official releases](https://github.com/rook/rook/releases) of Rook. Unreleased versions from the
Expand Down Expand Up @@ -221,8 +219,8 @@ details on the health of the system, such as `ceph osd status`. See the
Rook will prevent the upgrade of the Ceph daemons if the health is in a `HEALTH_ERR` state.
If you desired to proceed with the upgrade anyway, you will need to set either
`skipUpgradeChecks: true` or `continueUpgradeAfterChecksEvenIfNotHealthy: true`
as described in the [cluster CR settings](https://rook.github.io/docs/rook/v1.8/ceph-cluster-crd.html#cluster-settings).
`skipUpgradeChecks: true` or `continueUpgradeAfterChecksEvenIfNotHealthy: true` as described in the
[cluster CR settings](ceph-cluster-crd.md#cluster-settings).
### **Container Versions**
Expand Down Expand Up @@ -265,9 +263,9 @@ Any pod that is using a Rook volume should also remain healthy:

## Rook Operator Upgrade Process

In the examples given in this guide, we will be upgrading a live Rook cluster running `v1.7.8` to
the version `v1.8.0`. This upgrade should work from any official patch release of Rook v1.7 to any
official patch release of v1.8.
In the examples given in this guide, we will be upgrading a live Rook cluster running `v1.8.8` to
the version `v1.9.0`. This upgrade should work from any official patch release of Rook v1.8 to any
official patch release of v1.9.

**Rook release from `master` are expressly unsupported.** It is strongly recommended that you use
[official releases](https://github.com/rook/rook/releases) of Rook. Unreleased versions from the
Expand All @@ -291,7 +289,7 @@ by the Operator. Also update the Custom Resource Definitions (CRDs).

Get the latest common resources manifests that contain the latest changes.
```sh
git clone --single-branch --depth=1 --branch v1.8.0 https://github.com/rook/rook.git
git clone --single-branch --depth=1 --branch v1.9.0 https://github.com/rook/rook.git
cd rook/deploy/examples
```

Expand All @@ -312,18 +310,29 @@ kubectl apply -f common.yaml -f crds.yaml

#### **Updates for optional resources**

##### **Prometheus**

If you have [Prometheus monitoring](ceph-monitoring.md) enabled, follow the
step to upgrade the Prometheus RBAC resources as well.

```sh
kubectl apply -f deploy/examples/monitoring/rbac.yaml
```

If you use the `rook-ceph` operator Helm chart, you should also add `monitoring.enabled` to
your Helm values with two caveats:
- this is unnecessary if you deploy monitoring RBAC from `deploy/examples/monitoring/rbac.yaml`
- this is unnecessary if you use `rook-ceph-cluster` charts exclusively outside of the `rook-ceph`
operator namespace.
Rook no longer deploys Prometheus rules from the operator.

If you use the Helm chart `monitoring.enabled` value to deploy Prometheus rules, you may now
additionally use `monitoring.createPrometheusRules` to instruct Helm to deploy the rules. You may
alternately deploy the rules manually if you wish.

To see the latest information about manually deploying rules, see the
[Prometheus monitoring docs](ceph-monitoring.md#prometheus-alets).

##### **MDS liveness and startup probes**

If you configure MDS probes in the CephCluster resource, copy them to the
[CephFilesystem `metadataServer` settings](ceph-filesystem-crd.md#metadata-server-settings) at this
point. Do not remove them from the CephCluster until after the Rook upgrade is fully complete.

### **2. Update Ceph CSI versions**

Expand All @@ -339,31 +348,13 @@ details.

> Automatically updated if you are upgrading via the helm chart
The largest portion of the upgrade is triggered when the operator's image is updated to `v1.8.x`.
The largest portion of the upgrade is triggered when the operator's image is updated to `v1.9.x`.
When the operator is updated, it will proceed to update all of the Ceph daemons.

```sh
kubectl -n $ROOK_OPERATOR_NAMESPACE set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.8.0
kubectl -n $ROOK_OPERATOR_NAMESPACE set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.9.0
```

#### Admission controller
If you use the optional [Admission controller](admission-controller-usage.md), there are additional
updates during this step. The admission controller has been integrated inside the operator
instead of a separate deployment. This means that the webhook server certificates are now stored in
the operator, and the operator manifest must be updated to use the one provided in
`deploy/examples/operator.yaml`. If you are using Helm to manage the deployment, this is handled
automatically.

When updating the operator deployment with the latest example from Rook, there is risk of
overwriting changes if you have customized the operator deployment or to the
`rook-ceph-operator-config` ConfigMap. We suggest that you remove the ConfigMap from `operator.yaml`
before moving on. Additionally, we encourage you to diff the current deployment and the latest one
to be sure any changes you may have made don't get overwritten. Required changes include the
`webhook-cert` volume/mount and `https-webhook` port, though there are some smaller changes as well.

Once you are sure any custom modifications to your operator deployment won't be overwritten, apply
the new `operator.yaml` with `kubectl apply -f deploy/examples/operator.yaml`.

### **4. Wait for the upgrade to complete**

Watch now in amazement as the Ceph mons, mgrs, OSDs, rbd-mirrors, MDSes and RGWs are terminated and
Expand All @@ -377,18 +368,18 @@ watch --exec kubectl -n $ROOK_CLUSTER_NAMESPACE get deployments -l rook_cluster=
```

As an example, this cluster is midway through updating the OSDs. When all deployments report `1/1/1`
availability and `rook-version=v1.8.0`, the Ceph cluster's core components are fully updated.
availability and `rook-version=v1.9.0`, the Ceph cluster's core components are fully updated.

>```
>Every 2.0s: kubectl -n rook-ceph get deployment -o j...
>
>rook-ceph-mgr-a req/upd/avl: 1/1/1 rook-version=v1.8.0
>rook-ceph-mon-a req/upd/avl: 1/1/1 rook-version=v1.8.0
>rook-ceph-mon-b req/upd/avl: 1/1/1 rook-version=v1.8.0
>rook-ceph-mon-c req/upd/avl: 1/1/1 rook-version=v1.8.0
>rook-ceph-osd-0 req/upd/avl: 1// rook-version=v1.8.0
>rook-ceph-osd-1 req/upd/avl: 1/1/1 rook-version=v1.7.8
>rook-ceph-osd-2 req/upd/avl: 1/1/1 rook-version=v1.7.8
>rook-ceph-mgr-a req/upd/avl: 1/1/1 rook-version=v1.9.0
>rook-ceph-mon-a req/upd/avl: 1/1/1 rook-version=v1.9.0
>rook-ceph-mon-b req/upd/avl: 1/1/1 rook-version=v1.9.0
>rook-ceph-mon-c req/upd/avl: 1/1/1 rook-version=v1.9.0
>rook-ceph-osd-0 req/upd/avl: 1// rook-version=v1.9.0
>rook-ceph-osd-1 req/upd/avl: 1/1/1 rook-version=v1.8.8
>rook-ceph-osd-2 req/upd/avl: 1/1/1 rook-version=v1.8.8
>```
An easy check to see if the upgrade is totally finished is to check that there is only one
Expand All @@ -397,27 +388,28 @@ An easy check to see if the upgrade is totally finished is to check that there i
```console
# kubectl -n $ROOK_CLUSTER_NAMESPACE get deployment -l rook_cluster=$ROOK_CLUSTER_NAMESPACE -o jsonpath='{range .items[*]}{"rook-version="}{.metadata.labels.rook-version}{"\n"}{end}' | sort | uniq
This cluster is not yet finished:
rook-version=v1.7.8
rook-version=v1.8.0
rook-version=v1.8.8
rook-version=v1.9.0
This cluster is finished:
rook-version=v1.8.0
rook-version=v1.9.0
```
### **5. Verify the updated cluster**

At this point, your Rook operator should be running version `rook/ceph:v1.8.0`.
At this point, your Rook operator should be running version `rook/ceph:v1.9.0`.

Verify the Ceph cluster's health using the [health verification section](#health-verification).


## Ceph Version Upgrades

Rook v1.8 supports the following Ceph versions:
- Ceph Pacific 16.2.0 or newer
Rook v1.9 supports the following Ceph versions:
- Ceph Quincy v17.2.0 or newer
- Ceph Pacific v16.2.0 or newer
- Ceph Octopus v15.2.0 or newer

These are the only supported versions of Ceph. Rook v1.8 no longer supports Ceph Nautilus (14.2.x).
Nautilus users must upgrade Ceph to Octopus (15.2.x) or Pacific (16.2.x) before upgrading to Rook v1.8.
These are the only supported versions of Ceph. Rook v1.10 is planning to drop support for Ceph
Octopus (15.2.x), so please consider upgrading your Ceph cluster.

> **IMPORTANT: When an update is requested, the operator will check Ceph's status, if it is in `HEALTH_ERR` it will refuse to do the upgrade.**
Expand Down Expand Up @@ -471,6 +463,17 @@ It's best to run `ceph config-key dump` again to verify references to

See for more information, see here: https://github.com/rook/rook/issues/9185

### **Rename CephBlockPool device_health_metrics pool when upgrading to Quincy v17**
In Ceph Quincy (v17), the `device_health_metrics` pool was renamed to `.mgr`. Ceph will perform this
migration automatically. If you do not use CephBlockPool to customize the configuration of the
`device_health_metrics` pool, you don't need to do anything further here.

If you do use CephBlockPool to customize the configuration of the `device_health_metrics` pool, you
will need to do a few steps after the Ceph upgrade is complete. Once upgrade is complete, create a
new CephBlockPool to configure the `.mgr` built-in pool. You can reference the example
[builtin mgr pool](https://github.com/rook/rook/blob/{{ branchName }}/deploy/examples/pool-builtin-mgr.yaml).
Also delete the old CephBlockPool that represents the `device_health_metrics` pool.

### **Important consideration for CephNFS users**
Users of CephNFS need to take additional steps to upgrade Ceph versions. Please see the
[NFS documentation](ceph-nfs-crd.md#upgrading-from-ceph-v15-to-v16) for full details.
Expand Down
27 changes: 16 additions & 11 deletions PendingReleaseNotes.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,24 @@

## Breaking Changes

* The mds liveness and startup probes are now configured by the filesystem CR instead of the cluster CR. To apply the mds probes, they need to be specified in the filesystem CR. See the [filesystem CR doc](Documentation/ceph-filesystem-crd.md#metadata-server-settings) for more details. See #9550
* In the helm charts, all Ceph components now have default values for the pod resources. The values can be modified or removed in values.yaml depending on cluster requirements.
* Prometheus rules are installed by the helm chart. If you were relying on the cephcluster setting `monitoring.enabled` to create the prometheus rules, they instead need to be enabled by setting `monitoring.createPrometheusRules` in the helm chart values.
* The `region` field for OBC Storage class is ignored, the RGW server always works with s3 client using `us-east-1` as region.

* The MDS liveness and startup probes are now configured by the CephFilesystem CR instead of the
CephCluster CR. To apply the MDS probes, they need to be specified in the CephFilesystem CR. See the
[CephFilesystem doc](Documentation/ceph-filesystem-crd.md#metadata-server-settings) for more details. See #9550
* In the Helm charts, all Ceph components now have default values for the pod resources. The values
can be modified or removed in values.yaml depending on cluster requirements.
* Prometheus rules are installed by the Helm chart. If you were relying on the cephcluster setting
`monitoring.enabled` to create the prometheus rules, they now need to be enabled by setting
`monitoring.createPrometheusRules` in the Helm chart values.

## Features

* The number of mgr daemons for example clusters is increased to 2 from 1, resulting in a standby mgr daemon.
If the active mgr goes down, Ceph will update the passive mgr to be active, and rook will update all the services
with the label app=rook-ceph-mgr to direct traffic to the new active mgr.
* The number of mgr daemons for example clusters is increased to 2 from 1, resulting in a standby
mgr daemon. If the active mgr goes down, Ceph will update the passive mgr to be active, and rook
will update all the services with the label app=rook-ceph-mgr to direct traffic to the new active
mgr.
* Network encryption is configurable with settings in the CephCluster CR. Requires the 5.11 kernel or newer.
* Network compression is configurable with settings in the CephCluster CR. Requires Ceph Quincy (v17) or newer.
* Add support for custom ceph.conf for csi pods. See #9567
* Added and updated many Ceph prometheus rules, picked up from the ceph repo
* Added support for custom ceph.conf for csi pods. See #9567
* Added and updated many Ceph prometheus rules as recommended the main Ceph project.
* Added service account rook-ceph-rgw for the RGW pods.
* Add support for rados namespace in a ceph blockpool. See #9733
* Added new RadosNamespace resource: create rados namespaces in a CephBlockPool. See #9733
3 changes: 1 addition & 2 deletions deploy/examples/cluster-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ metadata:
data:
config: |
[global]
osd_pool_default_size = 1
mon_warn_on_pool_no_redundancy = false
bdev_flock_retry = 20
bluefs_buffered_io = false
Expand Down Expand Up @@ -55,7 +54,7 @@ spec:
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: mgr
name: builtin-mgr
namespace: rook-ceph # namespace:cluster
spec:
name: .mgr
Expand Down
Loading

0 comments on commit 7adba6c

Please sign in to comment.