Skip to content

Commit

Permalink
feat: use new property name for node count property + specify the cor…
Browse files Browse the repository at this point in the history
…e properties (node count, total/allocatable/available CPU/memory) (#810)

* Added print cols directives + Re-gen the manifests

* Use the new node count property name

* Revert irrelevant changes

* Added more changes

* Minor changes

* Fixes

* Minor fixes
  • Loading branch information
michaelawyu authored May 14, 2024
1 parent dc7012f commit e3ff5b9
Show file tree
Hide file tree
Showing 23 changed files with 240 additions and 202 deletions.
10 changes: 5 additions & 5 deletions cmd/memberagent/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -46,15 +46,15 @@ import (
workv1alpha1controller "go.goms.io/fleet/pkg/controllers/workv1alpha1"
fleetmetrics "go.goms.io/fleet/pkg/metrics"
"go.goms.io/fleet/pkg/propertyprovider"
"go.goms.io/fleet/pkg/propertyprovider/aks"
"go.goms.io/fleet/pkg/propertyprovider/azure"
"go.goms.io/fleet/pkg/utils"
"go.goms.io/fleet/pkg/utils/httpclient"
//+kubebuilder:scaffold:imports
)

const (
// The list of available property provider names.
aksPropertyProvider = "azure"
azurePropertyProvider = "azure"
)

var (
Expand Down Expand Up @@ -351,11 +351,11 @@ func Start(ctx context.Context, hubCfg, memberConfig *rest.Config, hubOpts, memb
// Set up a provider provider (if applicable).
var pp propertyprovider.PropertyProvider
switch {
case propertyProvider != nil && *propertyProvider == aksPropertyProvider:
klog.V(2).Info("setting up the AKS property provider")
case propertyProvider != nil && *propertyProvider == azurePropertyProvider:
klog.V(2).Info("setting up the Azure property provider")
// Note that the property provider, though initialized here, is not started until
// the specific instance wins the leader election.
pp = aks.New(region)
pp = azure.New(region)
default:
// Fall back to not using any property provider if the provided type is none or
// not recognizable.
Expand Down
19 changes: 15 additions & 4 deletions docs/concepts/PropertyProviderAndClusterProperties/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,8 @@ in the form of Kubernetes conditions.

The Fleet member agent can run with or without a property provider. If a provider is not set up, or
the given provider fails to start properly, the agent will collect limited properties about
the cluster on its own, specifically the total and allocatable CPU and memory capacities of
the host member cluster.
the cluster on its own, specifically the node count, plus the total/allocatable
CPU and memory capacities of the host member cluster.

## Cluster properties

Expand All @@ -70,7 +70,7 @@ such as `cpu` and `memory`, and the usage information should consist of:
* Non-resource property: a metric about a member cluster, in the form of a key/value
pair; the key should be in the format of
[a Kubernetes label key](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set),
such as `kubernetes.azure.com/node-count`, and the value at this moment should be a sortable
such as `kubernetes-fleet.io/node-count`, and the value at this moment should be a sortable
numeric that can be parsed as
[a Kubernetes quantity](https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/).

Expand All @@ -87,7 +87,7 @@ status:
agentStatus: ...
conditions: ...
properties:
kubernetes.azure.com/node-count:
kubernetes-fleet.io/node-count:
observationTime: "2024-04-30T14:54:24Z"
value: "2"
...
Expand All @@ -106,3 +106,14 @@ status:
Note that conditions reported by the property provider (if any), would be available in the
`.status.conditions` array as well.

### Core properties

The following properties are considered core properties in Fleet, which should be supported
in all property provider implementations. Fleet agents will collect them even when no
property provider has been set up.

| Property Type | Name | Description |
| ------------- | ---- | ----------- |
| Non-resource property | `kubernetes-fleet.io/node-count` | The number of nodes in a cluster. |
| Resource property | `cpu` | The usage information (total, allocatable, and available capacity) of CPU resource in a cluster. |
| Resource property | `memory` | The usage information (total, allocatable, and available capacity) of memory resource in a cluster. |
6 changes: 3 additions & 3 deletions docs/howtos/property-based-scheduling.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ spec:
clusterSelectorTerms:
- propertySelector:
matchExpressions:
- name: "kubernetes.azure.com/node-count"
- name: "kubernetes-fleet.io/node-count"
operator: Ge
values:
- "5"
Expand Down Expand Up @@ -130,7 +130,7 @@ spec:
region: east
propertySelector:
matchExpressions:
- name: "kubernetes.azure.com/node-count"
- name: "kubernetes-fleet.io/node-count"
operator: Ge
values:
- "5"
Expand Down Expand Up @@ -225,7 +225,7 @@ spec:
- weight: 20
preference:
metricSorter:
name: kubernetes.azure.com/node-count
name: kubernetes-fleet.io/node-count
sortOrder: Descending
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -463,6 +463,12 @@ func (r *Reconciler) updateResourceStats(ctx context.Context, imc *clusterv1beta
allocatableMemory.Add(*(node.Status.Allocatable.Memory()))
}

imc.Status.Properties = map[clusterv1beta1.PropertyName]clusterv1beta1.PropertyValue{
propertyprovider.NodeCountProperty: {
Value: fmt.Sprintf("%d", len(nodes.Items)),
ObservationTime: metav1.Now(),
},
}
imc.Status.ResourceUsage.Capacity = corev1.ResourceList{
corev1.ResourceCPU: capacityCPU,
corev1.ResourceMemory: capacityMemory,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ import (

clusterv1beta1 "go.goms.io/fleet/apis/cluster/v1beta1"
"go.goms.io/fleet/pkg/controllers/work"
"go.goms.io/fleet/pkg/propertyprovider"
"go.goms.io/fleet/pkg/utils"
)

Expand Down Expand Up @@ -136,6 +137,7 @@ var _ = Describe("Test Internal Member Cluster Controller", Serial, func() {
Expect(updatedHealthCond.Reason).To(Equal(EventReasonInternalMemberClusterHealthy))

By("checking updated member cluster usage")
Expect(imc.Status.Properties[propertyprovider.NodeCountProperty].Value).ShouldNot(BeEmpty())
Expect(imc.Status.ResourceUsage.Allocatable).ShouldNot(BeNil())
Expect(imc.Status.ResourceUsage.Capacity).ShouldNot(BeNil())
Expect(imc.Status.ResourceUsage.ObservationTime).ToNot(Equal(metav1.Now()))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Licensed under the MIT license.
*/

// Package controllers feature a number of controllers that are in use
// by the AKS property provider.
// by the Azure property provider.
package controllers

import (
Expand All @@ -17,7 +17,7 @@ import (
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"

"go.goms.io/fleet/pkg/propertyprovider/aks/trackers"
"go.goms.io/fleet/pkg/propertyprovider/azure/trackers"
)

// NodeReconciler reconciles Node objects.
Expand All @@ -30,10 +30,10 @@ type NodeReconciler struct {
func (r *NodeReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
nodeRef := klog.KRef(req.Namespace, req.Name)
startTime := time.Now()
klog.V(2).InfoS("Reconciliation starts for node objects in the AKS property provider", "node", nodeRef)
klog.V(2).InfoS("Reconciliation starts for node objects in the Azure property provider", "node", nodeRef)
defer func() {
latency := time.Since(startTime).Milliseconds()
klog.V(2).InfoS("Reconciliation ends for node objects in the AKS property provider", "node", nodeRef, "latency", latency)
klog.V(2).InfoS("Reconciliation ends for node objects in the Azure property provider", "node", nodeRef, "latency", latency)
}()

// Retrieve the node object.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Licensed under the MIT license.
*/

// Package controllers feature a number of controllers that are in use
// by the AKS property provider.
// by the Azure property provider.
package controllers

import (
Expand All @@ -17,7 +17,7 @@ import (
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"

"go.goms.io/fleet/pkg/propertyprovider/aks/trackers"
"go.goms.io/fleet/pkg/propertyprovider/azure/trackers"
)

// TO-DO (chenyu1): this is a relatively expensive watcher, due to how frequent pods can change
Expand All @@ -35,10 +35,10 @@ type PodReconciler struct {
func (p *PodReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
podRef := klog.KRef(req.Namespace, req.Name)
startTime := time.Now()
klog.V(2).InfoS("Reconciliation starts for pod objects in the AKS property provider", "pod", podRef)
klog.V(2).InfoS("Reconciliation starts for pod objects in the Azure property provider", "pod", podRef)
defer func() {
latency := time.Since(startTime).Milliseconds()
klog.V(2).InfoS("Reconciliation ends for pod objects in the AKS property provider", "pod", podRef, "latency", latency)
klog.V(2).InfoS("Reconciliation ends for pod objects in the Azure property provider", "pod", podRef, "latency", latency)
}()

// Retrieve the pod object.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ Copyright (c) Microsoft Corporation.
Licensed under the MIT license.
*/

// Package aks features the AKS property provider for Fleet.
package aks
// Package azure features the Azure property provider for Fleet.
package azure

import (
"context"
Expand All @@ -24,36 +24,26 @@ import (

clusterv1beta1 "go.goms.io/fleet/apis/cluster/v1beta1"
"go.goms.io/fleet/pkg/propertyprovider"
"go.goms.io/fleet/pkg/propertyprovider/aks/controllers"
"go.goms.io/fleet/pkg/propertyprovider/aks/trackers"
"go.goms.io/fleet/pkg/propertyprovider/azure/controllers"
"go.goms.io/fleet/pkg/propertyprovider/azure/trackers"
)

const (
// A list of properties that the AKS property provider collects.
// A list of properties that the Azure property provider collects in addition to the
// Fleet required ones.

// NodeCountProperty is a property that describes the number of nodes in the cluster.
NodeCountProperty = "kubernetes.azure.com/node-count"
// PerCPUCoreCostProperty is a property that describes the average hourly cost of a CPU core in
// a Kubernetes cluster.
PerCPUCoreCostProperty = "kubernetes.azure.com/per-cpu-core-cost"
// PerGBMemoryCostProperty is a property that describes the average cost of one GB of memory in
// a Kubernetes cluster.
PerGBMemoryCostProperty = "kubernetes.azure.com/per-gb-memory-cost"

// The resource properties.
TotalCPUCapacityProperty = "resources.kubernetes-fleet.io/total-cpu"
AllocatableCPUCapacityProperty = "resources.kubernetes-fleet.io/allocatable-cpu"
AvailableCPUCapacityProperty = "resources.kubernetes-fleet.io/available-cpu"

TotalMemoryCapacityProperty = "resources.kubernetes-fleet.io/total-memory"
AllocatableMemoryCapacityProperty = "resources.kubernetes-fleet.io/allocatable-memory"
AvailableMemoryCapacityProperty = "resources.kubernetes-fleet.io/available-memory"

CostPrecisionTemplate = "%.3f"
)

const (
// The condition related values in use by the AKS property provider.
// The condition related values in use by the Azure property provider.

// PropertyCollectionSucceededConditionType is a condition type that indicates whether a
// property collection attempt has succeeded.
Expand All @@ -64,57 +54,57 @@ const (
PropertyCollectionFailedCostErrorMessageTemplate = "An error has occurred when collecting cost properties: %v"
)

// PropertyProvider is the AKS property provider for Fleet.
// PropertyProvider is the Azure property provider for Fleet.
type PropertyProvider struct {
// The trackers.
podTracker *trackers.PodTracker
nodeTracker *trackers.NodeTracker

// The region where the AKS property provider resides.
// The region where the Azure property provider resides.
//
// This is necessary as the pricing client requires that a region to be specified; it can
// be either specified by the user or auto-discovered from the AKS cluster.
region *string

// The controller manager in use by the AKS property provider; this field is mostly reserved for
// The controller manager in use by the Azure property provider; this field is mostly reserved for
// testing purposes.
mgr ctrl.Manager
}

// Verify that the AKS property provider implements the MetricProvider interface at compile time.
// Verify that the Azure property provider implements the MetricProvider interface at compile time.
var _ propertyprovider.PropertyProvider = &PropertyProvider{}

// Start starts the AKS property provider.
// Start starts the Azure property provider.
func (p *PropertyProvider) Start(ctx context.Context, config *rest.Config) error {
klog.V(2).Info("Starting AKS property provider")
klog.V(2).Info("Starting Azure property provider")

mgr, err := ctrl.NewManager(config, ctrl.Options{
Scheme: scheme.Scheme,
// Disable metric serving for the AKS property provider controller manager.
// Disable metric serving for the Azure property provider controller manager.
//
// Note that this will not stop the metrics from being collected and exported; as they
// are registered via a top-level variable as a part of the controller runtime package,
// which is also used by the Fleet member agent.
Metrics: metricsserver.Options{
BindAddress: "0",
},
// Disable health probe serving for the AKS property provider controller manager.
// Disable health probe serving for the Azure property provider controller manager.
HealthProbeBindAddress: "0",
// Disable leader election for the AKS property provider.
// Disable leader election for the Azure property provider.
//
// Note that for optimal performance, only the running instance of the Fleet member agent
// (if there are multiple ones) should have the AKS property provider enabled; this can
// be achieved by starting the AKS property provider only when an instance of the Fleet
// member agent wins the leader election. It should be noted that running the AKS property
// (if there are multiple ones) should have the Azure property provider enabled; this can
// be achieved by starting the Azure property provider only when an instance of the Fleet
// member agent wins the leader election. It should be noted that running the Azure property
// provider for multiple times will not incur any side effect other than some minor
// performance costs, as at this moment the AKS property provider observes data individually
// performance costs, as at this moment the Azure property provider observes data individually
// in a passive manner with no need for any centralized state.
LeaderElection: false,
})
p.mgr = mgr

if err != nil {
klog.ErrorS(err, "Failed to start AKS property provider")
klog.ErrorS(err, "Failed to start Azure property provider")
return err
}

Expand All @@ -132,7 +122,7 @@ func (p *PropertyProvider) Start(ctx context.Context, config *rest.Config) error
// once, the performance impact is negligible.
discoveredRegion, err := p.autoDiscoverRegionAndSetupTrackers(ctx, mgr.GetAPIReader())
if err != nil {
klog.ErrorS(err, "Failed to auto-discover region for the AKS property provider")
klog.ErrorS(err, "Failed to auto-discover region for the Azure property provider")
return err
}
p.region = discoveredRegion
Expand All @@ -152,7 +142,7 @@ func (p *PropertyProvider) Start(ctx context.Context, config *rest.Config) error
Client: mgr.GetClient(),
}
if err := nodeReconciler.SetupWithManager(mgr); err != nil {
klog.ErrorS(err, "Failed to start the node reconciler in the AKS property provider")
klog.ErrorS(err, "Failed to start the node reconciler in the Azure property provider")
return err
}

Expand All @@ -162,7 +152,7 @@ func (p *PropertyProvider) Start(ctx context.Context, config *rest.Config) error
Client: mgr.GetClient(),
}
if err := podReconciler.SetupWithManager(mgr); err != nil {
klog.ErrorS(err, "Failed to start the pod reconciler in the AKS property provider")
klog.ErrorS(err, "Failed to start the pod reconciler in the Azure property provider")
return err
}

Expand All @@ -173,7 +163,7 @@ func (p *PropertyProvider) Start(ctx context.Context, config *rest.Config) error
go func() {
// This call will block until the context exits.
if err := mgr.Start(ctx); err != nil {
klog.ErrorS(err, "Failed to start the AKS property provider controller manager")
klog.ErrorS(err, "Failed to start the Azure property provider controller manager")
}
}()

Expand All @@ -197,7 +187,7 @@ func (p *PropertyProvider) Collect(_ context.Context) propertyprovider.PropertyC

// Collect the non-resource properties.
properties := make(map[clusterv1beta1.PropertyName]clusterv1beta1.PropertyValue)
properties[NodeCountProperty] = clusterv1beta1.PropertyValue{
properties[propertyprovider.NodeCountProperty] = clusterv1beta1.PropertyValue{
Value: fmt.Sprintf("%d", p.nodeTracker.NodeCount()),
ObservationTime: metav1.Now(),
}
Expand Down Expand Up @@ -271,7 +261,7 @@ func (p *PropertyProvider) Collect(_ context.Context) propertyprovider.PropertyC

// autoDiscoverRegionAndSetupTrackers auto-discovers the region of the AKS cluster.
func (p *PropertyProvider) autoDiscoverRegionAndSetupTrackers(ctx context.Context, c client.Reader) (*string, error) {
klog.V(2).Info("Auto-discover region for the AKS property provider")
klog.V(2).Info("Auto-discover region for the Azure property provider")
// Auto-discover the region by listing the nodes.
nodeList := &corev1.NodeList{}
// List only one node to reduce performance impact (if supported).
Expand Down Expand Up @@ -311,12 +301,12 @@ func (p *PropertyProvider) autoDiscoverRegionAndSetupTrackers(ctx context.Contex
klog.Error(err)
return nil, err
}
klog.V(2).InfoS("Auto-discovered region for the AKS property provider", "region", nodeRegion)
klog.V(2).InfoS("Auto-discovered region for the Azure property provider", "region", nodeRegion)

return &nodeRegion, nil
}

// New returns a new AKS property provider using the default pricing provider, which is,
// New returns a new Azure property provider using the default pricing provider, which is,
// at this moment, an AKS Karpenter pricing client.
//
// If the region is unspecified at the time when this function is called, the provider
Expand All @@ -328,7 +318,7 @@ func New(region *string) propertyprovider.PropertyProvider {
}
}

// NewWithPricingProvider returns a new AKS property provider with the given
// NewWithPricingProvider returns a new Azure property provider with the given
// pricing provider.
//
// This is mostly used for allow plugging in of alternate pricing providers (one that
Expand Down
Loading

0 comments on commit e3ff5b9

Please sign in to comment.