diff --git a/README.md b/README.md index 112b6a81ec..444fcf09cf 100644 --- a/README.md +++ b/README.md @@ -564,6 +564,10 @@ Platform usage. * [STS Job Manager](tools/sts-job-manager/) - A petabyte-scale bucket migration tool utilizing [Storage Transfer Service](https://cloud.google.com/storage-transfer-service) +* [Vertex AI Endpoint Tester] (tools/vertex-ai-endpoint-load-tester) - This + utility helps to methodically test variety of Vertex AI Endpoints by their + sizes so that one can decide the right size to deploy an ML Model on Vertex + AI given a sample request JSON and some idea(s) on expected queries per second. * [VM Migrator](tools/vm-migrator) - This utility automates migrating Virtual Machine instances within GCP. You can migrate VM's from one zone to another zone/region within the same project or different projects while retaining diff --git a/tools/vertex-ai-endpoint-load-tester/README.md b/tools/vertex-ai-endpoint-load-tester/README.md new file mode 100644 index 0000000000..c835615250 --- /dev/null +++ b/tools/vertex-ai-endpoint-load-tester/README.md @@ -0,0 +1,79 @@ +``` +# Copyright 2024 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# https://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +``` + +# Vertex AI Endpoint Stress Tester + +go/vertex-endpoint-stress-tester + +## Introduction + +Vertex AI Endpoints are a great managed solution to deploy ML models at scale. By their architecture, the Vertex AI Endpoints use GKE or similar infrastructure components in the background to enable seamless deployment and inference capabilities for any ML model, be it AutoML or Custom ones. + +In some of our recent engagements, we have seen questions or queries raised about the scalability perspective of Vertex AI Endpoints. There is this sample notebook available in GitHub under the Google Cloud Platform account, which explains one of the many ways to check how much load a particular instance handles. However, it is not an automated solution which anyone from GCC can use with ease. Also, it involves some tedious and manual activities as well of creating and deleting endpoints and deploying ML models on them to test the load that specific type of VM can handle. In lieu of the fact that Vertex AI endpoint service continues to grow and supports variety of instance types, this procedure requires an improvement, so that it is easy for anyone from GCC to deploy a given ML model on a series of endpoints of various sizes and check which one is more suitable for the given workload, with some estimations about how much traffic this particular ML model will or is supposed to receive once it goes to Production. + +This is where we propose our automated tool (proposed to be open sourced in the PSO GitHub and KitHub), the objective of which is to automatically perform stress testing for one particular model over various types of Endpoint configurations with and without autoscaling, so that we have data driven approach to decide the right sizing of the endpoint. + +## Assumptions + +1. That the ML model is already built, which this automation tool will not train, but will simply refer from BQML or Vertex AI model registry. +2. That the deployed ML model can accept a valid JSON request as input and provide online predictions as an output, preferably JSON. +3. That the user of this utility has at least an example JSON request file, put into the [requests](requests/) folder. Please see the existing [example](requests/request_movie.json) for clarity. + +## How to Install & Run? + +Out of the box, the utility can be run from the command line, so the best way to try it for the first time, is to: + +1. Edit the [config](config/config.ini) file and select only 1 or 2 VM types. +2. Place the request JSON file into the [requests](requests/) folder. Please see the existing [example](requests/request_movie.json) for reference. +3. Run the utility as follows: + + +``` +cd vertex-ai-endpoint-load-tester/ +gcloud auth login +gcloud config set project PROJECT_ID +python main.py +``` + +## Logging + +When ran from the command line, all logs are printed on the console or STDOUT for user to validate. It is NOT stored anywhere else for historical references. +Hence we recommend installing this solution as a container on Cloud Run and run it as a Cloud Run service or job (as long as applicable) so that all logs can then be found from Cloud logging. + +## Reporting/Analytics + +TODO: This is an open feature, and will be added shortly. +The idea here is to utilize a Looker Studio dashboard to visualize the results of the load testing, so that it is easily consumable by anyone! + +## Troubleshooting + +1. Check for requisite IAM permissions of the user or Service account on Cloud run (for example) who is running the job. +2. Ensure the [config](config/config.ini) file has no typo or additional information. +3. Ensure from Logs if there are any specific errors captured to debug further. + +## Known Errors + +TODO + +## Roadmap + +In future, we can aim to extend this utility for LLMs or any other types of ML models. +Further, we can also extend the same feature to load test other services in GCP, like GKE, which are frequently used to deploy ML solutions. + +## Authors: + +Ajit Sonawane - AI Engineer, Google Cloud +Suddhasatwa Bhaumik - AI Engineer, Google Cloud diff --git a/tools/vertex-ai-endpoint-load-tester/config/config.ini b/tools/vertex-ai-endpoint-load-tester/config/config.ini new file mode 100644 index 0000000000..e5389483ab --- /dev/null +++ b/tools/vertex-ai-endpoint-load-tester/config/config.ini @@ -0,0 +1,64 @@ +# Copyright 2024 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# https://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Input configurations +[config] + +# logging level +log_level = INFO + +# deployed model ID +MODEL_ID = 888526341522063360 + +# the QPS rates to try +RATE = [25, 50] + +# duration for which tests will be ran +DURATION = 10 + +# BigQuery table to store results +OUTPUT_BQ_TBL_ID = load_test_dataset.test9 + +# project ID +PROJECT = rare-signer-355918 + +# region +LOCATION = us-central1 + +# amount of sleep time before +# the endpoint is tested after +# the model is deployed +TIMEOUT = 300 + +# autoscaling details. +MIN_NODES = 1 +MAX_NODES = 2 + +# Types of machines to +# be used during testing +# needs to be a list of all VM +MACHINE_TYPES_LST = n1-standard-4,n1-standard-8 + +#name of request body file in requests folder for making post call to stress testing API +#Please do not enclosed file names with quotes +REQUEST_FILE = request_movie.json + +# , "n1-standard-32", "n1-standard-64"] + +# "n1-standard-4", "n1-standard-8", "n1-standard-16", "n1-standard-32", +# "n1-highmem-2", "n1-highmem-4", "n1-highmem-8", "n1-highmem-16", "n1-highmem-32", +# "n1-highcpu-2", "n1-highcpu-4", "n1-highcpu-8", "n1-highcpu-16", "n1-highcpu-32", +# "c3-standard-4", "c3-standard-8", "c3-standard-22", "c3-standard-44", "c3-standard-88", "c3-standard-176"] + +# End. \ No newline at end of file diff --git a/tools/vertex-ai-endpoint-load-tester/extras/vegeta_12.8.4_linux_amd64.tar.gz b/tools/vertex-ai-endpoint-load-tester/extras/vegeta_12.8.4_linux_amd64.tar.gz new file mode 100644 index 0000000000..4e9eba9046 Binary files /dev/null and b/tools/vertex-ai-endpoint-load-tester/extras/vegeta_12.8.4_linux_amd64.tar.gz differ diff --git a/tools/vertex-ai-endpoint-load-tester/main.py b/tools/vertex-ai-endpoint-load-tester/main.py new file mode 100644 index 0000000000..2232a0768f --- /dev/null +++ b/tools/vertex-ai-endpoint-load-tester/main.py @@ -0,0 +1,211 @@ +# Copyright 2024 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# https://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# +# Script deploys vertex AI endpoint +# and Capture endpoint performance to BQ +# +# Authors: ajitsonawane@,suddhasatwa@ +# Team: Google Cloud Consulting +# Date: 25.01.2024 + +# Imports +import sys +import logging +import traceback +import uuid +import time +import json +from google.cloud import aiplatform + +from utils import utils +# from utils import config_parser as cfp +# from utils.utils import register_latency +# from utils.utils import log_latencies_to_bq +# from utils.utils import write_results_to_bq + +# function to process requests to endpoint. +def process(machine_type: str, latencies: list, log_level: str): + """ + Deploys machine based on user input, creates endpoint and measure latencies. + Takes the latencies List as input. + Calls the Vegata utility to update latencies for each machine type. + Passes it to another utility to generate full Results. + Returns the Results back. + + Inputs: + machine_type: each type of machine to be tested. + latencies: list (usually empty) to get results from Vegata + log_level: level of logging. + + Outputs: + results: Combined results for each machine type. + """ + + # set logging setup + logging.basicConfig(level=log_level, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s") + + # start logging. + logging.info("Reading configuration.") + + # read config. + config_data = utils.read_config("config/config.ini") + MODEL_ID = config_data["config"]["model_id"] # model ID + RATE = json.loads(config_data["config"]["rate"]) # the QPS rates to try + DURATION = str(config_data["config"]["duration"]) # duration for which tests will be ran + PROJECT = config_data["config"]["project"] # project ID + LOCATION = config_data["config"]["location"] # region + TIMEOUT = config_data["config"]["timeout"] # endpoint timeout + MIN_NODES = int(config_data["config"]["min_nodes"]) # min nodes for scaling + MAX_NODES = int(config_data["config"]["max_nodes"]) #max nodes for scaling + REQUEST_FILE = str(config_data["config"]["request_file"]) + + # deploy model on endpoint. + logging.info( + "Deploying endpoint on machine: %s for model: %s", machine_type, MODEL_ID) + try: + # create client for Vertex AI. + logging.info("Creating AI Platform object.") + aiplatform.init(project=PROJECT, location=LOCATION) + + # load the model from registry. + logging.info("Loading {} from Model registry.".format(MODEL_ID)) + model = aiplatform.Model(model_name=MODEL_ID) + + # generate random UUID + logging.info("Generating random UUID for endpoint creation.") + ep_uuid = uuid.uuid4().hex + display_name = f"ep_{machine_type}_{ep_uuid}" + + # create endpoint instance + logging.info("Creating endpoint instance.") + endpoint = aiplatform.Endpoint.create(display_name=display_name) + + # deploy endpoint on specific machine type + logging.info("Deploying model {} on endpoint {}".format(model, display_name)) + endpoint.deploy(model, min_replica_count=MIN_NODES, + max_replica_count=MAX_NODES, machine_type=machine_type) + + # Sleep for 5 minutes + # general best practice with Vertex AI Endpoints + logging.info("Sleeping for 5 minutes, for the endpoint to be ready!") + time.sleep(TIMEOUT) + + # Register latencies for predictions + logging.info("Calling utility to register the latencies.") + ret_code, latencies = utils.register_latencies(RATE, DURATION, endpoint, machine_type, endpoint.display_name, latencies, REQUEST_FILE, log_level) + if ret_code == 1: + logging.info("Latencies recorded for {}".format(machine_type)) + else: + logging.error("Error in recording latencies for {}".format(machine_type)) + sys.exit(1) + + # preprocess registered latencies + logging.info("Calling utility to prepare latencies for BigQuery.") + results = utils.log_latencies_to_bq(MODEL_ID, latencies, log_level) + if results: + logging.info("Latencies information processed successfully.") + else: + logging.error("Error in recording all latencies. Exiting.") + sys.exit(1) + + # Un-deploy endpoint + logging.info("Un-deploying endpoint: %s", endpoint.resource_name) + endpoint.undeploy_all() + + # Deleting endpoint + logging.info("Deleting endpoint: %s", endpoint.resource_name) + endpoint.delete() + + logging.info("Processing completed for machine: %s", machine_type) + + except Exception as ex: + logging.error(''.join(traceback.format_exception(etype=type(ex), + value=ex, tb=ex.__traceback__))) + + # return results. + return (results) + +# entrypoint function. +def main(): + """ Entrypoint """ + + # Read config. + config_data = utils.read_config("config/config.ini") + MACHINE_TYPES_LST = config_data["config"]["machine_types_lst"].split(',') # List of machine types + LOG_LEVEL = config_data["config"]["log_level"] # level of logging. + OUTPUT_BQ_TBL_ID = config_data["config"]["output_bq_tbl_id"] # BigQuery table to store results + PROJECT = config_data["config"]["project"] # project ID + + # log setup. + logging.basicConfig(level=LOG_LEVEL, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s") + + # start logging. + logging.info("Vertex Endpoint Stress Tester Utility.") + + # variables + logging.info("Prepping local variables.") + LATENCIES = [] + RESULTS = [] + + # record start time. + start = time.time() + + # loop through each machine type + # and process the records. + try: + for machine_type in MACHINE_TYPES_LST: + # log calling the utility + logging.info("Calling data processing utility.") + + # append the results from utility + RESULTS.extend(process(machine_type, LATENCIES, LOG_LEVEL)) + + # log end. + logging.info("Results utility completed.") + + # reset the latencies variable + LATENCIES = [] + except Exception as e: + # log error + logging.error("Got error while running load tests.") + logging.error(e) + # exit + sys.exit(1) + + # REMOVE + logging.info(len(LATENCIES)) + logging.info(len(RESULTS)) + + # write collected results to BigQuery + logging.info(" Writing data of load testing on machine type %s", machine_type) + bq_write_ret_code = utils.write_results_to_bq(RESULTS, OUTPUT_BQ_TBL_ID, PROJECT, LOG_LEVEL) + if bq_write_ret_code == 1: + # log success + logging.info("Successfully written data into BQ in {} table.".format(OUTPUT_BQ_TBL_ID)) + else: + # log error + logging.error("Errors in writing data into BigQuery. Exiting.") + # exit + sys.exit(1) + + # print the total time taken. + # this is for all machines. + logging.info(f"Total time taken for execution {time.time()-start}") + +# Call entrypoint +if __name__ == "__main__": + main() + +# End. diff --git a/tools/vertex-ai-endpoint-load-tester/requests/request_movie.json b/tools/vertex-ai-endpoint-load-tester/requests/request_movie.json new file mode 100644 index 0000000000..6b4f6ac992 --- /dev/null +++ b/tools/vertex-ai-endpoint-load-tester/requests/request_movie.json @@ -0,0 +1,20 @@ +{ + "instances": [ + { + "Id": 3837, + "name": "The", + "rating": "R", + "genre": "Comedy", + "year": 2000, + "released": "8/3/2001", + "director": "John", + "writer": "John", + "star": "Michael", + "country": "United", + "budget": 35524924.14, + "company": "Pictures", + "runtime": 104, + "data_cat": "TRAIN" + } + ] +} diff --git a/tools/vertex-ai-endpoint-load-tester/requirements.txt b/tools/vertex-ai-endpoint-load-tester/requirements.txt new file mode 100644 index 0000000000..bf6ccb651b --- /dev/null +++ b/tools/vertex-ai-endpoint-load-tester/requirements.txt @@ -0,0 +1,16 @@ +# Copyright 2024 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# https://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +google-cloud-aiplatform +pandas +pandas-gbq \ No newline at end of file diff --git a/tools/vertex-ai-endpoint-load-tester/utils/__init__.py b/tools/vertex-ai-endpoint-load-tester/utils/__init__.py new file mode 100644 index 0000000000..9f1deef125 --- /dev/null +++ b/tools/vertex-ai-endpoint-load-tester/utils/__init__.py @@ -0,0 +1 @@ +"""Utilities to run the Vertex AI Endpoint Stress Tester""" \ No newline at end of file diff --git a/tools/vertex-ai-endpoint-load-tester/utils/utils.py b/tools/vertex-ai-endpoint-load-tester/utils/utils.py new file mode 100644 index 0000000000..cc8a42e638 --- /dev/null +++ b/tools/vertex-ai-endpoint-load-tester/utils/utils.py @@ -0,0 +1,249 @@ +# Copyright 2024 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# https://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# +# Set of utilities to +# assist running the +# load tester utility. +# +# Authors: ajitsonawane@,suddhasatwa@ +# Team: Google Cloud Consulting +# Date: 25.01.2024 +# + +# Imports +import logging +import datetime +import time +import os +import json +from subprocess import Popen, PIPE +from os.path import dirname, realpath +import configparser + +import pandas as pd + +# Utility function to store records +# in the provided BigQuery table +def write_results_to_bq(records, table, project, log_level): + """ + Writes results of load testing to BQ Table + in a specific format, like a dataframe. + + Input: + records: entries to be written in the target table + table: the table in BigQuery to write the results (format: dataset.table) + project: the GCP project where the table exists + log_level: level of logging + + Output: + ret_code: 1 for success, 0 for failure + + """ + + # set logging preferences + logging.basicConfig(level=log_level, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s") + + # start log + logging.info("Writing output to BigQuery") + + # say how many results are being written + logging.info("Writing %d records in BigQuery table", len(records)) + + try: + # create dataframe with records + df = pd.json_normalize(records) + + + #setting the GOOGLE_CLOUD_PROJECT environment variable + os.environ['GOOGLE_CLOUD_PROJECT'] = str(project) + + + # write to BigQuery table + df.to_gbq(table, project_id=project, if_exists='append') + + # set return code + ret_code = 1 + + except Exception as e: + + # print error log + logging.error("Error while writing to BigQuery table") + logging.error(e) + + # set return code + ret_code = 0 + + # return the code to caller + return(ret_code) + +# Utility function to log endpoint latencies +# and return the dictionary +def log_latencies_to_bq(model_id, latencies, log_level): + """ + Preparing registered latencies to add to the BQ Table + + Input: + latencies: empty/existing dictionary + + Output: + updated_latency: updated list with records. + """ + + # set logging preferences + logging.basicConfig(level=log_level, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s") + + # start log + logging.info("Logging latencies for BQ insetions") + + # create result dict + RESULT = [] + + # for each value from input list + for record in latencies: + temp = {} + temp['min_latency_ms'] = record["latencies"]["min"] / 1000000 + temp['p50_latency_ms'] = record["latencies"]["50th"] / 1000000 + temp['p90_latency_ms'] = record["latencies"]["90th"] / 1000000 + temp['p95_latency_ms'] = record["latencies"]["95th"] / 1000000 + temp['p99_latency_ms'] = record["latencies"]["99th"] / 1000000 + temp['max_latency_ms'] = record["latencies"]["max"] / 1000000 + temp['mean_latency_ms'] = record["latencies"]["mean"] / 1000000 + temp['model_id'] = model_id + temp["duration_secs"] = record["duration"] + temp["qps"] = record["qps"] + temp["status_codes"] = str(record['status_codes']) + temp["errors"] = str(record['errors']) + temp["ep_display_name"] = record['ep_display_name'] + temp["record_ts"] = datetime.datetime.now() + RESULT.append(temp) + + # return list + return (RESULT) + +# Utility function to run the Vegeta Tool +# and record the observed latencies +def register_latencies(rate, duration, endpoint, machine_type, ep_display_name, latencies, request_file, log_level): + """ + Register latencies of the given endpoint + + Input: + rate: the QPS rates to be tried + duration: the duration of each load test for each QPS Rate + endpoint: the Vertex AI Endpoint ID + machine_type: type of VM of the endpoint + ep_display_name: endpoint display name, for logging + latencies: blank/existing set of latencies + log_level: level of logging + + Output: + ret_code: 1 for success, 0 for errors + latencies: list of latencies captured during the tests + + """ + + # log configuration + logging.basicConfig(level=log_level, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s") + + # start log. + logging.info("Registering latencies for endpoint %s", endpoint.resource_name) + + # getting endpoint ID value + endpoint_id = endpoint.resource_name.split("/")[-1] + project_number = endpoint.resource_name.split("/")[1] + + #request file path + request_filepath = f"{dirname(dirname(realpath(__file__)))}/requests/{request_file}" + # send load test requests + for qps in rate: + + # get teh QPS value + qps = str(qps) + + # run a SHELL process + shell_process = Popen(['sh', 'utils/vegeta_test.sh', endpoint_id, duration, qps, project_number, request_filepath], stdout=PIPE, stderr=PIPE) + + # interim log. + logging.info("Executing Vegata for endpoint: %s with QPS: %s", endpoint_id, qps) + + # start the shell process + stdout, stderr = shell_process.communicate() + + # timeout [optional] + time.sleep(100) + + # handle errors + if not stderr: + + # interim log + logging.info("Shell script execution completed") + + # store results + vegeta_result = json.loads(str(stdout, 'UTF-8')) + vegeta_result["duration"] = duration + vegeta_result["qps"] = qps + vegeta_result['ep_display_name'] = ep_display_name + vegeta_result['timestamp'] = datetime.datetime.now() + vegeta_result['machine_type'] = machine_type + + # interim: log results: optional + logging.info(vegeta_result) + + # append final results + latencies.append(vegeta_result) + ret_code = 1 + + else: + # display error + logging.error("Error while running Vegata load tests") + logging.error(stderr) + + # set return code value + ret_code = 0 + + # return code + list of latencies + return (ret_code, latencies) + +# utility function to read config file. +def read_config(config_file): + """Reads configuration data from a specified file. + + Args: + config_file (str): Path to the configuration file. + + Returns: + dict: A dictionary containing the configuration data. + """ + try: + # create object of config parser + # and read the data file + config = configparser.ConfigParser() + config.read(config_file) + + # create empty dict to store config. + data = {} + + # read/store all config + for section in config.sections(): + data[section] = {} + for option in config.options(section): + data[section][option] = config.get(section, option) + + # finally return the dict. + return data + + except Exception as e: + raise(e) + +# End. diff --git a/tools/vertex-ai-endpoint-load-tester/utils/vegeta b/tools/vertex-ai-endpoint-load-tester/utils/vegeta new file mode 100755 index 0000000000..2f95573cb0 Binary files /dev/null and b/tools/vertex-ai-endpoint-load-tester/utils/vegeta differ diff --git a/tools/vertex-ai-endpoint-load-tester/utils/vegeta_test.sh b/tools/vertex-ai-endpoint-load-tester/utils/vegeta_test.sh new file mode 100755 index 0000000000..1ae3f7671d --- /dev/null +++ b/tools/vertex-ai-endpoint-load-tester/utils/vegeta_test.sh @@ -0,0 +1,29 @@ +#!/bin/bash +# Copyright 2024 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# https://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# declare input/other variables +ENDPOINT_ID=$1 +PROJECT_ID=$4 +REQUEST_FILE=$5 +DURATION=$2 +RATE=$3 +CURR_DIR="$(dirname "$0")" +TOKEN="$(gcloud auth application-default print-access-token)" +URL="https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/us-central1/endpoints/${ENDPOINT_ID}:predict?access_token=${TOKEN}" + +# run vegeta tool for load test +echo "POST ${URL}" | "$CURR_DIR"/vegeta attack -header "Content-Type: application/json" -body "${REQUEST_FILE}" -duration="${DURATION}"s -rate="${RATE}" | "$CURR_DIR"/vegeta report -type=json + +# END