#########################################################################################
#########################################################################################
The Scenario08 describes how to manage how storage is used, as in capacity.
Trident 21.01 introduced the support of QoS Policy Groups for ONTAP, under 2 forms:
- Policy Group: set a minimum &/or maximum throughput in IOPS or bandwidth (example: Minimum 100IOPS & Maximum 1000 IOPS per volume)
- Adaptive Policy Group: same as above, however, the policy is defined per capacity (example: Minimum 100IOPS per volume per TB)
💥
QoS is only supported with ONTAP 9.8.
If you want to test this feature with the Lab on Demand, you first nee to upgrade ONTAP from 9.7 to 9.8.
This procedure is explained in this Addenda09.
💥
Setting this feature is not the exciting part of this scenario. You actually want to see it working!
I am using here an application called dbench which includes FIO (a standard performance script).
This can be useful for some use cases, such as testing QoS. However, I would not necessarily recommend it to test the maximum performance a storage platform can provide, as it involves many different parameters (node size, number of threads, network ports, etc ...).
You can learn more about DBench here: https://github.com/leeliu/dbench, app created by the company called LogDNA.
However, the orginal image is not available on the Docker Hub. I have modified the definition to point to an alternative source.
If this image also went to disappear, you can find in the DBench repository a Dockerfile to create your own image.
Let's start by creating 3 different policies through REST API calls or ONTAP CLI:
- Policy Group#1: Maximum Throughput = 500 IOPS
- Policy Group#2: Maximum Throughput = 100 MBps
- Adaptive Policy Group: Peak = 50 IOPS / GB (which is 51200 IOPS / TB)
$ curl -X POST -ku admin:Netapp1! -H "accept: application/json" -H "Content-Type: application/json" -d '{
"fixed": {
"capacity_shared": false,
"max_throughput_iops": 500
},
"name": "QoS_500iops",
"svm": {
"name": "nfs_svm",
"uuid": "2829ebfb-4d6a-11e8-a5dc-005056b08451"
}
}' "https://cluster1.demo.netapp.com/api/storage/qos/policies"
$ curl -X POST -ku admin:Netapp1! -H "accept: application/json" -H "Content-Type: application/json" -d '{
"fixed": {
"capacity_shared": false,
"max_throughput_mbps": 100
},
"name": "QoS_100MBps",
"svm": {
"name": "nfs_svm",
"uuid": "2829ebfb-4d6a-11e8-a5dc-005056b08451"
}
}' "https://cluster1.demo.netapp.com/api/storage/qos/policies"
$ ssh 192.168.0.101 -l admin qos adaptive-policy-group create -policy-group aQoS -vserver nfs_svm -expected-iops 5IOPS/GB -peak-iops 50IOPS/GB -peak-iops-allocation allocated-space
Adaptive QoS allows you to set a rule on allocated or used space. I chose allocated space in order to better reflect the results we will get with DBench.
Let's make sure the 3 policies were indeed created:
$ curl -X GET -ku admin:Netapp1! "https://cluster1.demo.netapp.com/api/storage/qos/policies" -H "accept: application/json"
{
"records": [
{
"uuid": "8bfdebbb-6605-11eb-b732-005056a46cf7",
"name": "QoS_500iops"
},
{
"uuid": "d6cc3f56-6603-11eb-b732-005056a46cf7",
"name": "QoS_100MBps"
},
{
"uuid": "dbafb8b4-6603-11eb-b732-005056a46cf7",
"name": "aQoS"
}
],
"num_records": 3
}
For the benchmark, I am going to use one Trident Backend (Virtual Storage Pool with 3 differents pools) & 3 different storage classes.
$ kubectl create -n trident -f backend_vsp_qos.yaml
tridentbackendconfig.trident.netapp.io/backend-tbc-ontap-nas-qos created
$ kubectl create -f sc_qos1.yaml
storageclass.storage.k8s.io/sc-qos1 created
$ kubectl create -f sc_qos2.yaml
storageclass.storage.k8s.io/sc-qos2 created
$ kubectl create -f sc_qos3.yaml
storageclass.storage.k8s.io/sc-qos3 created
The next step consists in running a baseline with a RWX/NFS volume.
$ kubectl create -f dbench_baseline.yaml
persistentvolumeclaim/dbench-pvc-baseline created
job.batch/dbench created
In order to see the results of this test, you can read the logs of the dbench job that is currently running. Note that it takes a few minutes for the benchmark to complete, however you can read its output in a live manner.
$ kubectl logs -f jobs/dbench-baseline
==================
= Dbench Summary =
==================
Random Read/Write IOPS: 15.3k/10.9k. BW: 195MiB/s / 128MiB/s
Average Latency (usec) Read/Write: 785.78/704.16
Sequential Read/Write: 246MiB/s / 136MiB/s
Mixed Random Read/Write IOPS: 10.4k/3433
When the job is complete, you can delete it with the following command:
$ kubectl delete -f dbench_baseline.yaml
persistentvolumeclaim "dbench-pvc-baseline" deleted
job.batch "dbench-baseline" deleted
We will first start with a 100GB volume.
$ kubectl create -f dbench_qos1_100G.yaml
persistentvolumeclaim/dbench-pvc-qos1 created
job.batch/dbench-qos1 created
$ kubectl logs -f jobs/dbench-qos1
==================
= Dbench Summary =
==================
Random Read/Write IOPS: 495/495. BW: 31.4MiB/s / 31.4MiB/s
Average Latency (usec) Read/Write: 8002.64/8008.70
Sequential Read/Write: 30.1MiB/s / 31.6MiB/s
Mixed Random Read/Write IOPS: 374/120
$ kubectl delete -f dbench_qos1_100G.yaml
persistentvolumeclaim "dbench-pvc-qos1" deleted
job.batch "dbench-qos1" deleted
As expected, the benchmark does not go above the limit that was assigned to the volume.
Let's try with a bigger volume.
$ kubectl create -f dbench_qos1_200G.yaml
persistentvolumeclaim/dbench-pvc-qos1 created
job.batch/dbench-qos1 created
$ kubectl logs -f jobs/dbench-qos1
==================
= Dbench Summary =
==================
Random Read/Write IOPS: 495/496. BW: 31.4MiB/s / 31.4MiB/s
Average Latency (usec) Read/Write: 8009.49/8008.97
Sequential Read/Write: 31.3MiB/s / 31.8MiB/s
Mixed Random Read/Write IOPS: 372/122
$ kubectl delete -f dbench_qos1_200G.yaml
persistentvolumeclaim "dbench-pvc-qos1" deleted
job.batch "dbench-qos1" deleted
The behavior is the same in both cases. Whatever size of the PVC, the QoS policy will correspond to the whole PVC.
$ kubectl create -f dbench_qos2.yaml
persistentvolumeclaim/dbench-pvc-qos2 created
job.batch/dbench-qos2 created
$ kubectl logs -f jobs/dbench-qos2
==================
= Dbench Summary =
==================
Random Read/Write IOPS: 16.4k/11.9k. BW: 100MiB/s / 100MiB/s
Average Latency (usec) Read/Write: 781.33/701.97
Sequential Read/Write: 100MiB/s / 101MiB/s
Mixed Random Read/Write IOPS: 11.4k/3690
$ kubectl delete -f dbench_qos2.yaml
persistentvolumeclaim "dbench-pvc-qos2" deleted
job.batch "dbench-qos2" deleted
Again, as expected, the benchmark stays in the limits positionned by the QoS Policy.
If you were to use a bigger size volume, you would end up with the same benchmark results.
Let's first start with a 100GB volume.
$ kubectl create -f dbench_qos3_100G.yaml
persistentvolumeclaim/dbench-pvc-qos3 created
job.batch/dbench-qos3 created
$ kubectl logs -f jobs/dbench-qos3
==================
= Dbench Summary =
==================
Random Read/Write IOPS: 4742/4742. BW: 256MiB/s / 143MiB/s
Average Latency (usec) Read/Write: 801.47/800.03
Sequential Read/Write: 300MiB/s / 107MiB/s
Mixed Random Read/Write IOPS: 3729/1264
$ kubectl delete -f dbench_qos3_100G.yaml
persistentvolumeclaim "dbench-pvc-qos3" deleted
job.batch "dbench-qos3" deleted
As we are using Adaptive QoS, doubling capacity should provide twice the IOPS.
Let's run the same test with a 200GB volume.
$ kubectl create -f dbench_qos3_200G.yaml
persistentvolumeclaim/dbench-pvc-qos3 created
job.batch/dbench-qos3 created
$ kubectl logs -f jobs/dbench-qos3
==================
= Dbench Summary =
==================
Random Read/Write IOPS: 9490/9489. BW: 251MiB/s / 138MiB/s
Average Latency (usec) Read/Write: 813.49/661.02
Sequential Read/Write: 250MiB/s / 146MiB/s
Mixed Random Read/Write IOPS: 7120/2369
$ kubectl delete -f dbench_qos3_200G.yaml
persistentvolumeclaim "dbench-pvc-qos3" deleted
job.batch "dbench-qos3" deleted
Point proven !