Morphling requires users to specify the ProflingExperiment interface for configuration tuning, including:
-
ML model container (e.g., the Pod template),
-
performance objective function,
-
tunable configuration parameters with types and search range,
-
sampling algorithms,
-
sampling budget.
type ProfilingExperimentSpec struct {
ServicePodTemplate corev1.PodTemplate `json:"servicePodTemplate,omitempty"`
Objective ObjectiveSpec `json:"objective,omitempty"`
TunableParameters []ParameterCategory `json:"tunableParameters,omitempty"`
Algorithm AlgorithmSpec `json:"algorithm,omitempty"`
MaxNumTrials *int32 `json:"maxNumTrials,omitempty"`
}
The ProflingExperiment workflow looks as follows:
-
A user submits a
ProflingExperiment
via a RPC or front-end UI interface, specifying the ML model, tunable configuration parameters, optimization objectives, and sampling budgets. -
Within the sampling budget, Morphling iteratively communicates with the algorithm server to get the next configuration for sampling.
-
Then Morphling starts a
Trial
to evaluate that sampling. -
When performing a
Trial
, a model serving inference instanceDeployment
is launched, and its “readiness” is reported to trigger a client-side RPS stress-testJob
. -
After the client
Job
completes, the measured peak RPS is stored in theDB
. -
A
Trial
finishes, and the result is sent to theProflingExperiment
. -
The
ProflingExperiment
completes when the sampling budget is reached.
The sequence diagram of the ProflingExperiment workfolow is shown as follows: