Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InferencePool Ownership #117

Open
danehans opened this issue Dec 19, 2024 · 4 comments
Open

InferencePool Ownership #117

danehans opened this issue Dec 19, 2024 · 4 comments

Comments

@danehans
Copy link
Contributor

According to the API proposal:

When a new InferencePool object is created, a new ext proc deployment is created.

Multiple controllers may exist that reconcile InferencePool objects. A mechanism should exist that defines the controller responsible for managing the InferencePool object. For example, Gateway APi defines gatewayclass.spec.controllerName:

	// ControllerName is the name of the controller that is managing Gateways of
	// this class. The value of this field MUST be a domain prefixed path.
        ...
@danehans
Copy link
Contributor Author

cc: @robscott

@ahg-g
Copy link
Contributor

ahg-g commented Jan 2, 2025

The current thinking is that an inferencePool is reconciled by a single extension deployment, and this doesn't require specifying the controllerName on the object as the InferencePool name can be passed as a parameter to the extension (what InferencePool to reconcile).

I have a proposal for extending the InferencePool API with a configuration parameter that allow specifying the extension deployment that is supposed to reconcile it. The proposal looks like the following, but I will share it in a doc to make it easy to comment on:

type InferencePoolSpec struct {
 ...
 // Selects and configures the endpoint picking algorithm to apply on the requests sent 
 // to this pool.
 //
 // Only one of the following options can be set.
 //
 // Extension configures an endpoint picker as an extension service.
 Extension *ExtensionConfig

 // Algorithm configures the endpoint picker by a name that the provider understands
 // and knows how to set it up with its own gateway implementation.
 Algorithm *AlgorithmConfig
}


// Specifies a reference to the endpoint picking algorithm that the gateway must apply.
type AlgorithmConfig struct {
 // The name of the algorithm that the gateway should use.
 Name string
}


// Specifies how to instantiate and configure an extension that implements the endpoint picking 
// algorithm.
type ExtensionConfig struct {
 // Specifies how the deployment for the extension gets configured.
 ExtensionDeployment
 // Configurations for the connection between the LB and the extension.
 ExtensionConnection
}

// Encapsulates options that configures the connection to the extension.
type ExtensionConnection struct {
 // Configures how the gateway handles the case when the extension is not responsive.
 // Defaults to failClose.
 FailureMode  ExtensionFailureMode
}

// Defines the options for how the gateway handles the case when the extension is not 
// responsive.
type ExtensionFailureMode string
const (
  // The endpoint will be selected via the provider’s LB configured algorithm.
 FailOpen  ExtensionFailureMode = "FailOpen"
  // Requests should be dropped.
 FailClose ExtensionFailureMode = "FailClose"
)

// Encapsulates the parameters for how to instantiate the extension deployment.
type ExtensionDeployment struct {
 // A reference to a deployed extension.
 // <unresolved>
 // Whether or not ExtensionRef is required is still an open question. 
 // </unresolved>
 ExtensionRef *ExtensionRef
}


// A reference to the extension deployment.
type ExtensionRef struct {
 // A selector for the pods that run the deployment.
 Selector map[string]string
 // The port number on the pods running the extension. Defaults to 9002 if not set.
 TargetPort int32
}

@danehans
Copy link
Contributor Author

danehans commented Jan 3, 2025

@ahg-g thanks for the feedback.

The current thinking is that an inferencePool is reconciled by a single extension deployment

Do you see the potential use case for an extension deployment to reconcile more than one InferencePool?

The proposal looks like the following, but I will share it in a doc to make it easy to comment on:

Please do share with me. I have a few thoughts on the snippet you shared above.

@ahg-g
Copy link
Contributor

ahg-g commented Jan 6, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants