-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/sagemaker llms #234
base: main
Are you sure you want to change the base?
Feat/sagemaker llms #234
Conversation
25d08df
to
5fddb90
Compare
@@ -249,6 +249,11 @@ data "aws_ecr_lifecycle_policy_document" "expire_untagged_after_one_day" { | |||
} | |||
} | |||
|
|||
resource "aws_ecr_repository" "sagemaker" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peter-woodcock identified that this can be removed @isobel-daley-6point6 we can review
# Use the data source to get the bucket ARN from the bucket name | ||
data "aws_s3_bucket" "sagemaker_default_bucket" { | ||
bucket = var.sagemaker_default_bucket_name | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peter-woodcock identified this bucket is not defined as a resource @isobel-daley-6point6 we can review
@@ -274,38 +274,94 @@ variable "s3_prefixes_for_external_role_copy" { | |||
default = ["import-data", "export-data"] | |||
} | |||
|
|||
variable "sagemaker_example_inference_image" { default = "" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peter-woodcock identified this can be removed we can review @isobel-daley-6point6
Overview
This PR introduces SageMaker asynchronous inference endpoints to Data Workspace. SageMaker asynchronous endpoints can be used to deploy self-hosted ML models (including those that require GPUs, like LLMs). Users of Data Workspace tools (Theia/Jupyter/VSCode) will be able to invoke these inference endpoints. They will not have permission to deploy new inference endpoints.
Feature Flags
The overall SageMaker functionality has been introduced behind a feature flag (set by
var.sagemaker_on
).A model-specific feature flag has also been added. This can be used to easily turn models 'on' and 'off'. In this PR, there is only one model (
phi_2_3b
). Therefore there is one model-specific feature flag, set byvar.sagemaker_phi_2_3b
.High Level Summary of Functionality
SageMaker model artefacts are stored in S3 (model weights) and ECR (dependencies and inference code). A SageMaker model is created using these artefacts. These are deployed behind a SageMaker asynchronous inference endpoint with autoscaling.
A user can invoke the asynchronous endpoint from Data Workspace python tools using the boto3 library. When the SageMaker inference endpoints is called, a request enters the backlog. This triggers SageMaker to provision the necessary infrastructure (EC2 instance) to run this model. Once the model endpoint is available, the user's request is processed and the output is sent to a centralised SageMaker S3 bucket. Users of Data Workspace tools do not have access to this bucket. Instead, SNS triggers a Lambda function to run which copies the SageMaker output file from the centralised SageMaker bucket to the user's own Data Workspace file space. When no requests remain in the backlog, the infrastructure associated with the endpoint scales down.
Architecture Diagram
Implementation Details
SageMaker VPC
A new VPC has been created with a single private subnet. This VPC is used to host:
This VPC is peered with:
main
VPC to enable access to the SageMaker API and Runtime VPC endpointsnotebooks
VPC to allow users of DataWorkspace tools access to the SageMaker asynchronous inference endpointsNew VPC Endpoints in main VPC
Two new VPC endpoints have been added to the main VPC:
boto3
library)These VPC endpoints have been placed in the main VPC as it is anticipated services like data-flow will need to access them in the future.
SageMaker Asynchronous Inference Endpoints
The sagemaker_llm_resource.tf file calls a reusable module
./modules/sagemaker_deployment
. This module enables setup of new SageMaker asynchronous inference endpoints. Each new asynchronous endpoint consists of the following resources:SageMaker is granted permissions via the inference and execution roles to do the following:
Lambdas
Lambdas have been implemented to cover the following
AWS Budgets
AWS budgets has been set up to support tracking of costs relating to SageMaker
Data Workspace Tools: User Permissions
Permissions have been added to the
notebook_task_execution
policy to allow: