Skip to content

Commit

Permalink
Ccozi/temp logging fix (#2249)
Browse files Browse the repository at this point in the history
* Benchmarking january results. (#2189)

* Benchmarking january results.

* Update to add MFE job definition files.

* Fix phi-2 paths.

* Update phi-2 model directory.

* Fix boolq phi-2 results path.

---------

Co-authored-by: Alex Kalita <[email protected]>

* Model card updated for whisper large (#2202)

* fix credential-less blob check (#2188)

* fix credential-less blob check

* add spec_version_upgrader

* update component versions

* add header and doc string.

* add more UT for spec version upgrader

* remove trailing whitespace

* add missing param.

* add null check for client_secret for adlsgen2 datastore

---------

Co-authored-by: Richard Li <[email protected]>

* upgrading the environment to latest pkgs (#2204)

* removing NC series from computes allow list (#2211)

* updating model specific defaults and finetune config for mistral model (#2209)

* add rai qa quality and safety eval flow (#2208)

* add rai qa quality and safety eval flow

* add test_config for rai qa quality & safety flow

* Check if secrets exist (#2217)

* Check if secrets exists

* update

* Update

* add batch allowlist for mistral base model (#2201)

* add batch allowlist for mistral base model

* format

* Fix olive-optimizer vul Jan new (#2200)

* Vulnerability fixes for python-sdk-v2 and model-management environment (#2216)

* sdk v2

* sdk v2

* sdk v2

* sdk v2

* sdk v2

* sdk v2

* sdk v2

* sdk v2

* sdk v2

* new acpt env for torch2.1 and cuda12.1 (#2186)

* new env for cuda12.1

* updated

* update rai qa safety flow output format (#2226)

* update rai qa safety flow output format

* update rai qa quality&saftey flow output format

* bump up component version and use azureml-rag 0.2.24.2 in environment (#2225)

* Update DBCopilot version (#2220)

* Preprocessor custom scipt fix (#2219)

* Replaced os.system with subprocess.check_output in dataset_preprocessor method that is used to run custom script.

* Replaced os.system with subprocess.check_output in dataset_preprocessor method that is used to run custom script.

* Fix llama-2-7b results for truthful-qa (#2229)

* stable diffusion XL base model support (#2233)

* basexl update

* wrapper updates

* format update

* Make sure we recover details

* Upgrade AML Benchmark components (#2236)

Co-authored-by: Sarthak Singhal <[email protected]>

* add gsq e2e test (#2231)

* Update inputs (#2239)

* Remove acs stuff in faiss pipeline (#2240)

* Add two promptflow models: count-cars and detect-defects (#2070)

* Add two promptflow models: count-cars and detect-defects

* Add ci test configs for count-cars and detect-defects

* Put "connection" into "inputs" for Azure OpenAI GPT-4 Turbo with Vision tool

---------

Co-authored-by: Zhi Zhou <[email protected]>

* Update DBCopilot promptflow (#2242)

* SystemLog: prefix logging

* Adding more detailed logging

* Ccozianu/rm bug fix (#2247)

* add more logs

* fix stdout logs

* Make sure we recover details

* Make sure we recover details (#2238)

* SystemLog: prefix logging

* Ccozi/temp logging fix (#2246)

* Make sure we recover details

* SystemLog: prefix logging

* Adding more detailed logging

---------

Co-authored-by: svaruag <[email protected]>

* Fixing typo

---------

Co-authored-by: arun-rajora <[email protected]>
Co-authored-by: Alex Kalita <[email protected]>
Co-authored-by: HrishikeshGeedMS <[email protected]>
Co-authored-by: Richard Li <[email protected]>
Co-authored-by: Richard Li <[email protected]>
Co-authored-by: pmanoj <[email protected]>
Co-authored-by: qusongms <[email protected]>
Co-authored-by: Ayush Mishra <[email protected]>
Co-authored-by: ym11369 <[email protected]>
Co-authored-by: savitamittal1 <[email protected]>
Co-authored-by: jingyizhu99 <[email protected]>
Co-authored-by: XiangRao <[email protected]>
Co-authored-by: Nivedita Mishra <[email protected]>
Co-authored-by: Ramu Vadthyavath <[email protected]>
Co-authored-by: sarthaks95 <[email protected]>
Co-authored-by: Sarthak Singhal <[email protected]>
Co-authored-by: Ilya Matiach <[email protected]>
Co-authored-by: jinzhaochang <[email protected]>
Co-authored-by: Zhi Zhou <[email protected]>
Co-authored-by: Zhi Zhou <[email protected]>
Co-authored-by: svaruag <[email protected]>
  • Loading branch information
22 people authored Feb 2, 2024
1 parent 20ffbd7 commit 2d747f0
Show file tree
Hide file tree
Showing 503 changed files with 34,373 additions and 702 deletions.
14 changes: 10 additions & 4 deletions .github/workflows/assets-validation.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -54,12 +54,18 @@ jobs:
python-version: '>=3.8'

- name: Log in to Azure
env:
# to use in condition
client_id: ${{ secrets.AZURE_CLIENT_ID }}
tenant_id: ${{ secrets.AZURE_TENANT_ID }}
subscription_id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
if: env.client_id != '' && env.tenant_id != ''
uses: azure/login@v1
with:
client-id: ${{ secrets.AZURE_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
client-id: ${{ env.client_id }}
tenant-id: ${{ env.tenant_id }}
subscription-id: ${{ env.subscription_id }}

- name: Install dependencies
run: pip install -e $scripts_azureml_assets_dir

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ type: pipeline
name: batch_benchmark_inference
display_name: Batch Benchmark Inference
description: Components for batch endpoint inference
version: 0.0.4
version: 0.0.5

inputs:
input_dataset:
Expand Down Expand Up @@ -149,7 +149,7 @@ jobs:
# Preparer
batch_inference_preparer:
type: command
component: azureml:batch_inference_preparer:0.0.5
component: azureml:batch_inference_preparer:0.0.6
inputs:
input_dataset: ${{parent.inputs.input_dataset}}
model_type: ${{parent.inputs.model_type}}
Expand All @@ -167,7 +167,7 @@ jobs:
# Inference
endpoint_batch_score:
type: parallel
component: azureml:batch_benchmark_score:0.0.4
component: azureml:batch_benchmark_score:0.0.5
inputs:
model_type: ${{parent.inputs.model_type}}
online_endpoint_url: ${{parent.inputs.endpoint_url}}
Expand Down Expand Up @@ -199,7 +199,7 @@ jobs:
# Reformat
batch_output_formatter:
type: command
component: azureml:batch_output_formatter:0.0.5
component: azureml:batch_output_formatter:0.0.6
inputs:
model_type: ${{parent.inputs.model_type}}
batch_inference_output: ${{parent.jobs.endpoint_batch_score.outputs.mini_batch_results_out_directory}}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
$schema: http://azureml/sdk-2-0/ParallelComponent.json
name: batch_benchmark_score
version: 0.0.4
version: 0.0.5
display_name: Batch Benchmark Score
is_deterministic: False
type: parallel
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ type: command
name: batch_inference_preparer
display_name: Batch Inference Preparer
description: Prepare the jsonl file and endpoint for batch inference component.
version: 0.0.5
version: 0.0.6

inputs:
input_dataset:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: batch_output_formatter
version: 0.0.5
version: 0.0.6
display_name: Batch Output Formatter
is_deterministic: True
type: command
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ type: pipeline
name: batch_benchmark_inference_claude
display_name: Batch Benchmark Inference with claude support
description: Components for batch endpoint inference
version: 0.0.1
version: 0.0.2

inputs:
input_dataset:
Expand Down Expand Up @@ -151,7 +151,7 @@ jobs:
# Preparer
batch_inference_preparer:
type: command
component: azureml:batch_inference_preparer:0.0.4
component: azureml:batch_inference_preparer:0.0.6
inputs:
input_dataset: ${{parent.inputs.input_dataset}}
model_type: ${{parent.inputs.model_type}}
Expand All @@ -168,7 +168,7 @@ jobs:
# Inference
endpoint_batch_score:
type: parallel
component: azureml:batch_benchmark_score:0.0.4
component: azureml:batch_benchmark_score:0.0.5
inputs:
model_type: ${{parent.inputs.model_type}}
online_endpoint_url: ${{parent.inputs.endpoint_url}}
Expand Down Expand Up @@ -199,7 +199,7 @@ jobs:
# Reformat
batch_output_formatter:
type: command
component: azureml:batch_output_formatter:0.0.4
component: azureml:batch_output_formatter:0.0.6
inputs:
model_type: ${{parent.inputs.model_type}}
batch_inference_output: ${{parent.jobs.endpoint_batch_score.outputs.mini_batch_results_out_directory}}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ type: command
name: benchmark_result_aggregator
display_name: Benchmark result aggregator
description: Aggregate quality metrics, performance metrics and all of the metadata from the pipeline. Also add them to the root run.
version: 0.0.3
version: 0.0.4
is_deterministic: false

inputs:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ type: command
name: compute_performance_metrics
display_name: Compute Performance Metrics
description: Performs performance metric post processing using data from a model inference run.
version: 0.0.1
version: 0.0.2
is_deterministic: true

inputs:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ type: command
name: dataset_downloader
display_name: Dataset Downloader
description: Downloads the dataset onto blob store.
version: 0.0.1
version: 0.0.2

inputs:
dataset_name:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ type: command
name: dataset_preprocessor
display_name: Dataset Preprocessor
description: Dataset Preprocessor
version: 0.0.1
version: 0.0.2
is_deterministic: true

inputs:
Expand Down
2 changes: 1 addition & 1 deletion assets/aml-benchmark/components/dataset-sampler/spec.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ type: command
name: dataset_sampler
display_name: Dataset Sampler
description: Samples a dataset containing JSONL file(s).
version: 0.0.1
version: 0.0.2

inputs:
dataset:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ type: command
name: inference_postprocessor
display_name: Inference Postprocessor
description: Inference Postprocessor
version: 0.0.2
version: 0.0.3
is_deterministic: true

inputs:
Expand Down
2 changes: 1 addition & 1 deletion assets/aml-benchmark/components/prompt_crafter/spec.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ display_name: Prompt Crafter
description: This component is used to create prompts from a given dataset. From a
given jinja prompt template, it will generate prompts. It can also create
few-shot prompts given a few-shot dataset and the number of shots.
version: 0.0.4
version: 0.0.5
is_deterministic: true

inputs:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -445,7 +445,7 @@ def main(
AzureMLError.create(
BenchmarkUserError,
error_details=f"{retries_err_msg} Details: {BufferStore.get_all_data()}"
)
))
elif delete_managed_deployment:
if not deployment_metadata:
logger.info("Delete deployment using input parameters.")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@
"""DataPreprocessor class and runner."""

import json
import os
import re
import jinja2
import subprocess

from azureml._common._error_definition.azureml_error import AzureMLError
from aml_benchmark.utils.exceptions import BenchmarkValidationException, BenchmarkUserException
Expand Down Expand Up @@ -144,13 +144,17 @@ def run(self) -> None:
return

def run_user_preprocessor(self) -> None:
"""Prerpocessor run using custom template."""
"""Preprocessor run using custom script."""
try:
os.system(
f'python {self.user_preprocessor} --input_path {self.input_dataset} \
--output_path {self.output_dataset}'
_ = subprocess.check_output(
f"python {self.user_preprocessor} --input_path {self.input_dataset} \
--output_path {self.output_dataset}",
stderr=subprocess.STDOUT,
universal_newlines=True,
shell=True,
)
except Exception as e:
except subprocess.CalledProcessError as e:
error_message = e.output.strip()
raise BenchmarkUserException._with_error(
AzureMLError.create(BenchmarkUserError, error_details=e)
AzureMLError.create(BenchmarkUserError, error_details=error_message)
)
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,7 @@ def create_deployment(self):
payload['properties']["versionUpgradeOption"] = "OnceNewDefaultVersionAvailable"
payload['properties']["raiPolicyName"] = "Microsoft.Default"
resp = self._call_endpoint(get_requests_session().put, self._aoai_deployment_url, payload=payload)
logger.info(f"Calling(PUT) {self._aoai_deployment_url} returned {resp.status_code} with content {resp.content}.")
self._raise_if_not_success(resp)
logger.info("Calling(PUT) {} returned {} with content {}.".format(
self._aoai_deployment_url, resp.status_code, self._get_content_from_response(resp)))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ def model_version(self) -> str:
finetuned_run = get_dependent_run(self.model_depend_step)
ws = Run.get_context().experiment.workspace
finetuned_run_id = self._get_model_registered_run_id(finetuned_run)
logger.info(f"Finetuned run id is {finetuned_run_id}")
logger.info(f"Searching for model in worskpace {ws} run_id={finetuned_run_id} is {self._model_name}")
models = list(Model.list(ws, self._model_name, run_id=finetuned_run_id))
if len(models) == 0:
raise BenchmarkUserException._with_error(
Expand Down
3 changes: 2 additions & 1 deletion assets/common/environments/python-sdk-v2/context/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
FROM mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:{{latest-image-tag}}

RUN apt-get update -y && apt-get install libc-bin libc-dev-bin libc6 libc6-dev libcurl3-gnutls tar openssh-server openssh-client openssl curl -y
RUN apt-get update -y && apt-get install binutils libssh-4 libsqlite3-0 libpam-modules linux-libc-dev libldap-common libldap-2.4-2 libc-bin libc-dev-bin libc6 libc6-dev libcurl3-gnutls libgnutls30 tar openssh-server openssh-client openssl curl -y

WORKDIR /

Expand All @@ -17,3 +17,4 @@ RUN conda env create -p $CONDA_PREFIX -f conda_dependencies.yaml -q && \
rm conda_dependencies.yaml && \
conda run -p $CONDA_PREFIX pip cache purge && \
conda clean -a -y

Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ channels:
- conda-forge
dependencies:
- python=3.8
- pip=21.2.4
- pip=23.3
- pip:
- azure-ai-ml=={{latest-pypi-version}}
- azure-identity=={{latest-pypi-version}}
Expand All @@ -13,4 +13,5 @@ dependencies:
- azureml-telemetry=={{latest-pypi-version}}
- cryptography=={{latest-pypi-version}}
- certifi=={{latest-pypi-version}}
- urllib3=={{latest-pypi-version}}
- urllib3=={{latest-pypi-version}}
- paramiko=={{latest-pypi-version}}
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
type: evaluationresult
name: boolq_gpt_35_turbo_0301_question_answering
version: 1.0.1
version: 1.0.2
display_name: boolq_gpt_35_turbo_0301_question_answering
description: gpt-35-turbo-0301 run for boolq dataset
dataset_family: boolq
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
type: evaluationresult
name: boolq_gpt_35_turbo_0613_question_answering
version: 1.0.1
version: 1.0.2
display_name: boolq_gpt_35_turbo_0613_question_answering
description: gpt-35-turbo-0613 run for boolq dataset
dataset_family: boolq
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
type: evaluationresult
name: boolq_gpt_4_0314_question_answering
version: 1.0.1
version: 1.0.2
display_name: boolq_gpt_4_0314_question_answering
description: gpt-4-0314 run for boolq dataset
dataset_family: boolq
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
type: evaluationresult
name: boolq_gpt_4_0613_question_answering
version: 1.0.1
version: 1.0.2
display_name: boolq_gpt_4_0613_question_answering
description: gpt-4-0613 run for boolq dataset
dataset_family: boolq
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
type: evaluationresult
name: boolq_gpt_4_32k_0314_question_answering
version: 1.0.1
version: 1.0.2
display_name: boolq_gpt_4_32k_0314_question_answering
description: gpt-4-32k-0314 run for boolq dataset
dataset_family: boolq
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
type: evaluationresult
name: boolq_gpt_4_32k_0613_question_answering
version: 1.0.1
version: 1.0.2
display_name: boolq_gpt_4_32k_0613_question_answering
description: gpt-4-32k-0613 run for boolq dataset
dataset_family: boolq
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
type: evaluationresult
name: boolq_llama_2_13b_chat_question_answering
version: 1.0.1
version: 1.0.2
display_name: boolq_llama_2_13b_chat_question_answering
description: llama-2-13b-chat run for boolq dataset
dataset_family: boolq
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
type: evaluationresult
name: boolq_llama_2_13b_question_answering
version: 1.0.1
version: 1.0.2
display_name: boolq_llama_2_13b_question_answering
description: llama-2-13b run for boolq dataset
dataset_family: boolq
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
type: evaluationresult
name: boolq_llama_2_70b_chat_question_answering
version: 1.0.1
version: 1.0.2
display_name: boolq_llama_2_70b_chat_question_answering
description: llama-2-70b-chat run for boolq dataset
dataset_family: boolq
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
type: evaluationresult
name: boolq_llama_2_70b_question_answering
version: 1.0.1
version: 1.0.2
display_name: boolq_llama_2_70b_question_answering
description: llama-2-70b run for boolq dataset
dataset_family: boolq
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
type: evaluationresult
name: boolq_llama_2_7b_chat_question_answering
version: 1.0.1
version: 1.0.2
display_name: boolq_llama_2_7b_chat_question_answering
description: llama-2-7b-chat run for boolq dataset
dataset_family: boolq
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
type: evaluationresult
name: boolq_llama_2_7b_question_answering
version: 1.0.1
version: 1.0.2
display_name: boolq_llama_2_7b_question_answering
description: llama-2-7b run for boolq dataset
dataset_family: boolq
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
type: evaluationresult
spec: spec.yaml
categories: ["EvaluationResult"]
Loading

0 comments on commit 2d747f0

Please sign in to comment.