From 8d619501fe0ce501647fb1de424a54026d26c917 Mon Sep 17 00:00:00 2001 From: Blaine Gardner Date: Tue, 5 Apr 2022 12:35:57 -0600 Subject: [PATCH] docs: update docs for 1.9 release Update upgrade and supporting docs for release of Rook v1.9. Include pending release notes as part of this update. Signed-off-by: Blaine Gardner --- Documentation/ceph-monitoring.md | 2 +- Documentation/ceph-pool-crd.md | 4 +- Documentation/ceph-upgrade.md | 135 +++++++++--------- PendingReleaseNotes.md | 27 ++-- deploy/examples/cluster-test.yaml | 3 +- ...lth-metrics.yaml => pool-builtin-mgr.yaml} | 7 +- design/common/object-bucket.md | 2 +- 7 files changed, 94 insertions(+), 86 deletions(-) rename deploy/examples/{pool-device-health-metrics.yaml => pool-builtin-mgr.yaml} (74%) diff --git a/Documentation/ceph-monitoring.md b/Documentation/ceph-monitoring.md index 8ef8c7979e9f..d0050c1f00f3 100644 --- a/Documentation/ceph-monitoring.md +++ b/Documentation/ceph-monitoring.md @@ -10,7 +10,7 @@ indent: true Each Rook Ceph cluster has some built in metrics collectors/exporters for monitoring with [Prometheus](https://prometheus.io/). If you do not have Prometheus running, follow the steps below to enable monitoring of Rook. If your cluster already -contains a Prometheus instance, it will automatically discover Rooks scrape endpoint using the standard +contains a Prometheus instance, it will automatically discover Rook's scrape endpoint using the standard `prometheus.io/scrape` and `prometheus.io/port` annotations. > **NOTE**: This assumes that the Prometheus instances is searching all your Kubernetes namespaces for Pods with these annotations. diff --git a/Documentation/ceph-pool-crd.md b/Documentation/ceph-pool-crd.md index 5b3ee93eb58c..5e69eee7e939 100644 --- a/Documentation/ceph-pool-crd.md +++ b/Documentation/ceph-pool-crd.md @@ -204,8 +204,8 @@ stretched) then you will have 2 replicas per datacenter where each replica ends * `name`: The name of Ceph pools is based on the `metadata.name` of the CephBlockPool CR. Some built-in Ceph pools require names that are incompatible with K8s resource names. These special pools can be configured by setting this `name` to override the name of the Ceph pool that is created instead of using the `metadata.name` for the pool. - Two pool names are supported: `device_health_metrics` and `.nfs`. See the example - [device health metrics pool](https://github.com/rook/rook/blob/{{ branchName }}/deploy/examples/pool-device-health-metrics.yaml). + Only the following pool names are supported: `device_health_metrics`, `.nfs`, and `.mgr`. See the example + [builtin mgr pool](https://github.com/rook/rook/blob/{{ branchName }}/deploy/examples/pool-builtin-mgr.yaml). * `parameters`: Sets any [parameters](https://docs.ceph.com/docs/master/rados/operations/pools/#set-pool-values) listed to the given pool * `target_size_ratio:` gives a hint (%) to Ceph in terms of expected consumption of the total cluster capacity of a given pool, for more info see the [ceph documentation](https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-expected-pool-size) diff --git a/Documentation/ceph-upgrade.md b/Documentation/ceph-upgrade.md index c625c17ad61e..b2e211a6e6ef 100644 --- a/Documentation/ceph-upgrade.md +++ b/Documentation/ceph-upgrade.md @@ -18,7 +18,7 @@ We welcome feedback and opening issues! ## Supported Versions -This guide is for upgrading from **Rook v1.7.x to Rook v1.8.x**. +This guide is for upgrading from **Rook v1.8.x to Rook v1.9.x**. Please refer to the upgrade guides from previous releases for supported upgrade paths. Rook upgrades are only supported between official releases. Upgrades to and from `master` are not @@ -27,6 +27,7 @@ supported. For a guide to upgrade previous versions of Rook, please refer to the version of documentation for those releases. +* [Upgrade 1.7 to 1.8](https://rook.io/docs/rook/v1.8/ceph-upgrade.html) * [Upgrade 1.6 to 1.7](https://rook.io/docs/rook/v1.7/ceph-upgrade.html) * [Upgrade 1.5 to 1.6](https://rook.io/docs/rook/v1.6/ceph-upgrade.html) * [Upgrade 1.4 to 1.5](https://rook.io/docs/rook/v1.5/ceph-upgrade.html) @@ -42,20 +43,17 @@ those releases. ## Breaking changes in this release -* The minimum Kubernetes version has changed to v1.16. You must update to at least Kubernetes version - v1.16 before upgrading Rook from v1.7 to v1.8. +* Helm charts now define default resource requests and limits for Rook-Ceph Pods. If you use Helm, + ensure you have defined an override for these in your `values.yaml` if you don't wish to use the + recommended defaults. Setting resource requests and limits could mean that Kubernetes will not + allow Pods to be scheduled in some cases. If sufficient resources are not available, you can + reduce or remove the requests and limits. -* Rook v1.8 no longer supports Ceph Nautilus (14.2.x). Nautilus users must - [upgrade Ceph](#ceph-version-upgrades) to Octopus (15.2.x) or Pacific (16.2.x) before upgrading to - Rook v1.8. +* MDS liveness and startup probes are now configured by the CephFilesystem resource instead of + CephCluster. Upgrade instructions are [below](#mds-liveness-and-startup-probes). -* Rook's FlexVolume driver has been deprecated and removed in Rook v1.8. FlexVolume users must - migrate Rook-Ceph block storage PVCs to CSI before upgrading. A migration tool has been created - and is documented [here](https://rook.io/docs/rook/v1.7/flex-to-csi-migration.html). - -* The location of example manifests has changed to reduce the amount of user typing needed and to be - easier to discover for new Rook users. `cluster/examples/kubernetes/ceph` manifests can now be - found in `deploy/examples`. +* Rook no longer deploys Prometheus rules from the operator. If you have been relying on Rook to + deploy prometheus rules in the past, please follow the upgrade instructions [below](#prometheus). ## Considerations @@ -71,12 +69,12 @@ With this upgrade guide, there are a few notes to consider: Unless otherwise noted due to extenuating requirements, upgrades from one patch release of Rook to another are as simple as updating the common resources and the image of the Rook operator. For -example, when Rook v1.8.1 is released, the process of updating from v1.8.0 is as simple as running +example, when Rook v1.9.1 is released, the process of updating from v1.9.0 is as simple as running the following: -First get the latest common resources manifests that contain the latest changes for Rook v1.8. +First get the latest common resources manifests that contain the latest changes for Rook v1.9. ```sh -git clone --single-branch --depth=1 --branch v1.8.1 https://github.com/rook/rook.git +git clone --single-branch --depth=1 --branch v1.9.1 https://github.com/rook/rook.git cd rook/deploy/examples ``` @@ -84,10 +82,10 @@ If you have deployed the Rook Operator or the Ceph cluster into a different name `rook-ceph`, see the [Update common resources and CRDs](#1-update-common-resources-and-crds) section for instructions on how to change the default namespaces in `common.yaml`. -Then apply the latest changes from v1.8 and update the Rook Operator image. +Then apply the latest changes from v1.9 and update the Rook Operator image. ```console kubectl apply -f common.yaml -f crds.yaml -kubectl -n rook-ceph set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.8.1 +kubectl -n rook-ceph set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.9.1 ``` As exemplified above, it is a good practice to update Rook-Ceph common resources from the example @@ -107,7 +105,7 @@ The upgrade steps in this guide will clarify if Helm manages the step for you. The `rook-ceph` helm chart upgrade performs the Rook upgrade. The `rook-ceph-cluster` helm chart upgrade performs a [Ceph upgrade](#ceph-version-upgrades) if the Ceph image is updated. -## Upgrading from v1.7 to v1.8 +## Upgrading from v1.8 to v1.9 **Rook releases from master are expressly unsupported.** It is strongly recommended that you use [official releases](https://github.com/rook/rook/releases) of Rook. Unreleased versions from the @@ -221,8 +219,8 @@ details on the health of the system, such as `ceph osd status`. See the Rook will prevent the upgrade of the Ceph daemons if the health is in a `HEALTH_ERR` state. If you desired to proceed with the upgrade anyway, you will need to set either -`skipUpgradeChecks: true` or `continueUpgradeAfterChecksEvenIfNotHealthy: true` -as described in the [cluster CR settings](https://rook.github.io/docs/rook/v1.8/ceph-cluster-crd.html#cluster-settings). +`skipUpgradeChecks: true` or `continueUpgradeAfterChecksEvenIfNotHealthy: true` as described in the +[cluster CR settings](ceph-cluster-crd.md#cluster-settings). ### **Container Versions** @@ -265,9 +263,9 @@ Any pod that is using a Rook volume should also remain healthy: ## Rook Operator Upgrade Process -In the examples given in this guide, we will be upgrading a live Rook cluster running `v1.7.8` to -the version `v1.8.0`. This upgrade should work from any official patch release of Rook v1.7 to any -official patch release of v1.8. +In the examples given in this guide, we will be upgrading a live Rook cluster running `v1.8.8` to +the version `v1.9.0`. This upgrade should work from any official patch release of Rook v1.8 to any +official patch release of v1.9. **Rook release from `master` are expressly unsupported.** It is strongly recommended that you use [official releases](https://github.com/rook/rook/releases) of Rook. Unreleased versions from the @@ -291,7 +289,7 @@ by the Operator. Also update the Custom Resource Definitions (CRDs). Get the latest common resources manifests that contain the latest changes. ```sh -git clone --single-branch --depth=1 --branch v1.8.0 https://github.com/rook/rook.git +git clone --single-branch --depth=1 --branch v1.9.0 https://github.com/rook/rook.git cd rook/deploy/examples ``` @@ -312,6 +310,8 @@ kubectl apply -f common.yaml -f crds.yaml #### **Updates for optional resources** +##### **Prometheus** + If you have [Prometheus monitoring](ceph-monitoring.md) enabled, follow the step to upgrade the Prometheus RBAC resources as well. @@ -319,11 +319,20 @@ step to upgrade the Prometheus RBAC resources as well. kubectl apply -f deploy/examples/monitoring/rbac.yaml ``` -If you use the `rook-ceph` operator Helm chart, you should also add `monitoring.enabled` to -your Helm values with two caveats: -- this is unnecessary if you deploy monitoring RBAC from `deploy/examples/monitoring/rbac.yaml` -- this is unnecessary if you use `rook-ceph-cluster` charts exclusively outside of the `rook-ceph` - operator namespace. +Rook no longer deploys Prometheus rules from the operator. + +If you use the Helm chart `monitoring.enabled` value to deploy Prometheus rules, you may now +additionally use `monitoring.createPrometheusRules` to instruct Helm to deploy the rules. You may +alternately deploy the rules manually if you wish. + +To see the latest information about manually deploying rules, see the +[Prometheus monitoring docs](ceph-monitoring.md#prometheus-alets). + +##### **MDS liveness and startup probes** + +If you configure MDS probes in the CephCluster resource, copy them to the +[CephFilesystem `metadataServer` settings](ceph-filesystem-crd.md#metadata-server-settings) at this +point. Do not remove them from the CephCluster until after the Rook upgrade is fully complete. ### **2. Update Ceph CSI versions** @@ -339,31 +348,13 @@ details. > Automatically updated if you are upgrading via the helm chart -The largest portion of the upgrade is triggered when the operator's image is updated to `v1.8.x`. +The largest portion of the upgrade is triggered when the operator's image is updated to `v1.9.x`. When the operator is updated, it will proceed to update all of the Ceph daemons. ```sh -kubectl -n $ROOK_OPERATOR_NAMESPACE set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.8.0 +kubectl -n $ROOK_OPERATOR_NAMESPACE set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.9.0 ``` -#### Admission controller -If you use the optional [Admission controller](admission-controller-usage.md), there are additional -updates during this step. The admission controller has been integrated inside the operator -instead of a separate deployment. This means that the webhook server certificates are now stored in -the operator, and the operator manifest must be updated to use the one provided in -`deploy/examples/operator.yaml`. If you are using Helm to manage the deployment, this is handled -automatically. - -When updating the operator deployment with the latest example from Rook, there is risk of -overwriting changes if you have customized the operator deployment or to the -`rook-ceph-operator-config` ConfigMap. We suggest that you remove the ConfigMap from `operator.yaml` -before moving on. Additionally, we encourage you to diff the current deployment and the latest one -to be sure any changes you may have made don't get overwritten. Required changes include the -`webhook-cert` volume/mount and `https-webhook` port, though there are some smaller changes as well. - -Once you are sure any custom modifications to your operator deployment won't be overwritten, apply -the new `operator.yaml` with `kubectl apply -f deploy/examples/operator.yaml`. - ### **4. Wait for the upgrade to complete** Watch now in amazement as the Ceph mons, mgrs, OSDs, rbd-mirrors, MDSes and RGWs are terminated and @@ -377,18 +368,18 @@ watch --exec kubectl -n $ROOK_CLUSTER_NAMESPACE get deployments -l rook_cluster= ``` As an example, this cluster is midway through updating the OSDs. When all deployments report `1/1/1` -availability and `rook-version=v1.8.0`, the Ceph cluster's core components are fully updated. +availability and `rook-version=v1.9.0`, the Ceph cluster's core components are fully updated. >``` >Every 2.0s: kubectl -n rook-ceph get deployment -o j... > ->rook-ceph-mgr-a req/upd/avl: 1/1/1 rook-version=v1.8.0 ->rook-ceph-mon-a req/upd/avl: 1/1/1 rook-version=v1.8.0 ->rook-ceph-mon-b req/upd/avl: 1/1/1 rook-version=v1.8.0 ->rook-ceph-mon-c req/upd/avl: 1/1/1 rook-version=v1.8.0 ->rook-ceph-osd-0 req/upd/avl: 1// rook-version=v1.8.0 ->rook-ceph-osd-1 req/upd/avl: 1/1/1 rook-version=v1.7.8 ->rook-ceph-osd-2 req/upd/avl: 1/1/1 rook-version=v1.7.8 +>rook-ceph-mgr-a req/upd/avl: 1/1/1 rook-version=v1.9.0 +>rook-ceph-mon-a req/upd/avl: 1/1/1 rook-version=v1.9.0 +>rook-ceph-mon-b req/upd/avl: 1/1/1 rook-version=v1.9.0 +>rook-ceph-mon-c req/upd/avl: 1/1/1 rook-version=v1.9.0 +>rook-ceph-osd-0 req/upd/avl: 1// rook-version=v1.9.0 +>rook-ceph-osd-1 req/upd/avl: 1/1/1 rook-version=v1.8.8 +>rook-ceph-osd-2 req/upd/avl: 1/1/1 rook-version=v1.8.8 >``` An easy check to see if the upgrade is totally finished is to check that there is only one @@ -397,27 +388,28 @@ An easy check to see if the upgrade is totally finished is to check that there i ```console # kubectl -n $ROOK_CLUSTER_NAMESPACE get deployment -l rook_cluster=$ROOK_CLUSTER_NAMESPACE -o jsonpath='{range .items[*]}{"rook-version="}{.metadata.labels.rook-version}{"\n"}{end}' | sort | uniq This cluster is not yet finished: - rook-version=v1.7.8 - rook-version=v1.8.0 + rook-version=v1.8.8 + rook-version=v1.9.0 This cluster is finished: - rook-version=v1.8.0 + rook-version=v1.9.0 ``` ### **5. Verify the updated cluster** -At this point, your Rook operator should be running version `rook/ceph:v1.8.0`. +At this point, your Rook operator should be running version `rook/ceph:v1.9.0`. Verify the Ceph cluster's health using the [health verification section](#health-verification). ## Ceph Version Upgrades -Rook v1.8 supports the following Ceph versions: -- Ceph Pacific 16.2.0 or newer +Rook v1.9 supports the following Ceph versions: +- Ceph Quincy v17.2.0 or newer +- Ceph Pacific v16.2.0 or newer - Ceph Octopus v15.2.0 or newer -These are the only supported versions of Ceph. Rook v1.8 no longer supports Ceph Nautilus (14.2.x). -Nautilus users must upgrade Ceph to Octopus (15.2.x) or Pacific (16.2.x) before upgrading to Rook v1.8. +These are the only supported versions of Ceph. Rook v1.10 is planning to drop support for Ceph +Octopus (15.2.x), so please consider upgrading your Ceph cluster. > **IMPORTANT: When an update is requested, the operator will check Ceph's status, if it is in `HEALTH_ERR` it will refuse to do the upgrade.** @@ -471,6 +463,17 @@ It's best to run `ceph config-key dump` again to verify references to See for more information, see here: https://github.com/rook/rook/issues/9185 +### **Rename CephBlockPool device_health_metrics pool when upgrading to Quincy v17** +In Ceph Quincy (v17), the `device_health_metrics` pool was renamed to `.mgr`. Ceph will perform this +migration automatically. If you do not use CephBlockPool to customize the configuration of the +`device_health_metrics` pool, you don't need to do anything further here. + +If you do use CephBlockPool to customize the configuration of the `device_health_metrics` pool, you +will need to do a few steps after the Ceph upgrade is complete. Once upgrade is complete, create a +new CephBlockPool to configure the `.mgr` built-in pool. You can reference the example +[builtin mgr pool](https://github.com/rook/rook/blob/{{ branchName }}/deploy/examples/pool-builtin-mgr.yaml). +Also delete the old CephBlockPool that represents the `device_health_metrics` pool. + ### **Important consideration for CephNFS users** Users of CephNFS need to take additional steps to upgrade Ceph versions. Please see the [NFS documentation](ceph-nfs-crd.md#upgrading-from-ceph-v15-to-v16) for full details. diff --git a/PendingReleaseNotes.md b/PendingReleaseNotes.md index fb87113bdebd..25a758ad42e0 100644 --- a/PendingReleaseNotes.md +++ b/PendingReleaseNotes.md @@ -2,19 +2,24 @@ ## Breaking Changes -* The mds liveness and startup probes are now configured by the filesystem CR instead of the cluster CR. To apply the mds probes, they need to be specified in the filesystem CR. See the [filesystem CR doc](Documentation/ceph-filesystem-crd.md#metadata-server-settings) for more details. See #9550 -* In the helm charts, all Ceph components now have default values for the pod resources. The values can be modified or removed in values.yaml depending on cluster requirements. -* Prometheus rules are installed by the helm chart. If you were relying on the cephcluster setting `monitoring.enabled` to create the prometheus rules, they instead need to be enabled by setting `monitoring.createPrometheusRules` in the helm chart values. -* The `region` field for OBC Storage class is ignored, the RGW server always works with s3 client using `us-east-1` as region. - +* The MDS liveness and startup probes are now configured by the CephFilesystem CR instead of the + CephCluster CR. To apply the MDS probes, they need to be specified in the CephFilesystem CR. See the + [CephFilesystem doc](Documentation/ceph-filesystem-crd.md#metadata-server-settings) for more details. See #9550 +* In the Helm charts, all Ceph components now have default values for the pod resources. The values + can be modified or removed in values.yaml depending on cluster requirements. +* Prometheus rules are installed by the Helm chart. If you were relying on the cephcluster setting + `monitoring.enabled` to create the prometheus rules, they now need to be enabled by setting + `monitoring.createPrometheusRules` in the Helm chart values. + ## Features -* The number of mgr daemons for example clusters is increased to 2 from 1, resulting in a standby mgr daemon. - If the active mgr goes down, Ceph will update the passive mgr to be active, and rook will update all the services - with the label app=rook-ceph-mgr to direct traffic to the new active mgr. +* The number of mgr daemons for example clusters is increased to 2 from 1, resulting in a standby + mgr daemon. If the active mgr goes down, Ceph will update the passive mgr to be active, and rook + will update all the services with the label app=rook-ceph-mgr to direct traffic to the new active + mgr. * Network encryption is configurable with settings in the CephCluster CR. Requires the 5.11 kernel or newer. * Network compression is configurable with settings in the CephCluster CR. Requires Ceph Quincy (v17) or newer. -* Add support for custom ceph.conf for csi pods. See #9567 -* Added and updated many Ceph prometheus rules, picked up from the ceph repo +* Added support for custom ceph.conf for csi pods. See #9567 +* Added and updated many Ceph prometheus rules as recommended the main Ceph project. * Added service account rook-ceph-rgw for the RGW pods. -* Add support for rados namespace in a ceph blockpool. See #9733 +* Added new RadosNamespace resource: create rados namespaces in a CephBlockPool. See #9733 diff --git a/deploy/examples/cluster-test.yaml b/deploy/examples/cluster-test.yaml index 4ebb046f5851..9384654bd0c9 100644 --- a/deploy/examples/cluster-test.yaml +++ b/deploy/examples/cluster-test.yaml @@ -15,7 +15,6 @@ metadata: data: config: | [global] - osd_pool_default_size = 1 mon_warn_on_pool_no_redundancy = false bdev_flock_retry = 20 bluefs_buffered_io = false @@ -55,7 +54,7 @@ spec: apiVersion: ceph.rook.io/v1 kind: CephBlockPool metadata: - name: mgr + name: builtin-mgr namespace: rook-ceph # namespace:cluster spec: name: .mgr diff --git a/deploy/examples/pool-device-health-metrics.yaml b/deploy/examples/pool-builtin-mgr.yaml similarity index 74% rename from deploy/examples/pool-device-health-metrics.yaml rename to deploy/examples/pool-builtin-mgr.yaml index e8becdbbda78..1affe0dd5d47 100644 --- a/deploy/examples/pool-device-health-metrics.yaml +++ b/deploy/examples/pool-builtin-mgr.yaml @@ -1,16 +1,17 @@ apiVersion: ceph.rook.io/v1 kind: CephBlockPool metadata: - # If the built-in Ceph pool for health metrics needs to be configured with alternate + # If the built-in Ceph pool used by the Ceph mgr needs to be configured with alternate # settings, create this pool with any of the pool properties. Create this pool immediately # with the cluster CR, or else some properties may not be applied when Ceph creates the # pool by default. - name: device-health-metrics + name: builtin-mgr namespace: rook-ceph # namespace:cluster spec: # The required pool name with underscores cannot be specified as a K8s resource name, thus we override # the pool name created in Ceph with this name property. - name: device_health_metrics + # The ".mgr" pool is called "device_health_metrics" in Ceph versions v16.x.y and below. + name: .mgr failureDomain: host replicated: size: 3 diff --git a/design/common/object-bucket.md b/design/common/object-bucket.md index 0afcaddd80e5..e81cbefdfa2c 100644 --- a/design/common/object-bucket.md +++ b/design/common/object-bucket.md @@ -2,7 +2,7 @@ ## Overview -An object store bucket is a container holding immutable objects. The Rook-Ceph [operator](https://github.com/yard-turkey/rook/blob/master/deploy/examples/operator.yaml) creates a controller which automates the provisioning of new and existing buckets. +An object store bucket is a container holding immutable objects. The Rook-Ceph [operator](https://github.com/rook/rook/blob/master/deploy/examples/operator.yaml) creates a controller which automates the provisioning of new and existing buckets. A user requests bucket storage by creating an _ObjectBucketClaim_ (OBC). Upon detecting a new OBC, the Rook-Ceph bucket provisioner does the following: