InferencePool Ownership #117

danehans · 2024-12-19T17:57:54Z

According to the API proposal:

When a new InferencePool object is created, a new ext proc deployment is created.

Multiple controllers may exist that reconcile InferencePool objects. A mechanism should exist that defines the controller responsible for managing the InferencePool object. For example, Gateway APi defines gatewayclass.spec.controllerName:

	// ControllerName is the name of the controller that is managing Gateways of
	// this class. The value of this field MUST be a domain prefixed path.
        ...

The text was updated successfully, but these errors were encountered:

danehans · 2024-12-19T17:58:05Z

cc: @robscott

ahg-g · 2025-01-02T20:33:14Z

The current thinking is that an inferencePool is reconciled by a single extension deployment, and this doesn't require specifying the controllerName on the object as the InferencePool name can be passed as a parameter to the extension (what InferencePool to reconcile).

I have a proposal for extending the InferencePool API with a configuration parameter that allow specifying the extension deployment that is supposed to reconcile it. The proposal looks like the following, but I will share it in a doc to make it easy to comment on:

type InferencePoolSpec struct {
 ...
 // Selects and configures the endpoint picking algorithm to apply on the requests sent 
 // to this pool.
 //
 // Only one of the following options can be set.
 //
 // Extension configures an endpoint picker as an extension service.
 Extension *ExtensionConfig

 // Algorithm configures the endpoint picker by a name that the provider understands
 // and knows how to set it up with its own gateway implementation.
 Algorithm *AlgorithmConfig
}


// Specifies a reference to the endpoint picking algorithm that the gateway must apply.
type AlgorithmConfig struct {
 // The name of the algorithm that the gateway should use.
 Name string
}


// Specifies how to instantiate and configure an extension that implements the endpoint picking 
// algorithm.
type ExtensionConfig struct {
 // Specifies how the deployment for the extension gets configured.
 ExtensionDeployment
 // Configurations for the connection between the LB and the extension.
 ExtensionConnection
}

// Encapsulates options that configures the connection to the extension.
type ExtensionConnection struct {
 // Configures how the gateway handles the case when the extension is not responsive.
 // Defaults to failClose.
 FailureMode  ExtensionFailureMode
}

// Defines the options for how the gateway handles the case when the extension is not 
// responsive.
type ExtensionFailureMode string
const (
  // The endpoint will be selected via the provider’s LB configured algorithm.
 FailOpen  ExtensionFailureMode = "FailOpen"
  // Requests should be dropped.
 FailClose ExtensionFailureMode = "FailClose"
)

// Encapsulates the parameters for how to instantiate the extension deployment.
type ExtensionDeployment struct {
 // A reference to a deployed extension.
 // <unresolved>
 // Whether or not ExtensionRef is required is still an open question. 
 // </unresolved>
 ExtensionRef *ExtensionRef
}


// A reference to the extension deployment.
type ExtensionRef struct {
 // A selector for the pods that run the deployment.
 Selector map[string]string
 // The port number on the pods running the extension. Defaults to 9002 if not set.
 TargetPort int32
}

danehans · 2025-01-03T00:46:28Z

@ahg-g thanks for the feedback.

The current thinking is that an inferencePool is reconciled by a single extension deployment

Do you see the potential use case for an extension deployment to reconcile more than one InferencePool?

The proposal looks like the following, but I will share it in a doc to make it easy to comment on:

Please do share with me. I have a few thoughts on the snippet you shared above.

ahg-g · 2025-01-06T23:57:31Z

Proposal: https://docs.google.com/document/d/1RKVFAgerUoIMhClU9gRduXoF8Hq2tzzVOHdPvkQPKgs/edit?tab=t.0#heading=h.es52ohrnnv9g

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InferencePool Ownership #117

InferencePool Ownership #117

danehans commented Dec 19, 2024

danehans commented Dec 19, 2024

ahg-g commented Jan 2, 2025

danehans commented Jan 3, 2025

ahg-g commented Jan 6, 2025

InferencePool Ownership #117

InferencePool Ownership #117

Comments

danehans commented Dec 19, 2024

danehans commented Dec 19, 2024

ahg-g commented Jan 2, 2025

danehans commented Jan 3, 2025

ahg-g commented Jan 6, 2025