-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit cb3cd43
Showing
618 changed files
with
899,900 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
*.pt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
# Description | ||
|
||
This repo contains a set of practice inference graphs implemented using [Seldon core inference graph](https://docs.seldon.io/projects/seldon-core/en/latest/graph/inference-graph.html). Inference graphs in [seldon](./seldon/) folder are implemented using [Seldon 1st gen custom python package](https://docs.seldon.io/projects/seldon-core/en/latest/python/python_wrapping_docker.html) and pipelines in [mlserver](./mlserver/) folder are implemented using [Serving Custom Model](https://mlserver.readthedocs.io/en/latest/examples/custom/README.html) Seldon's newer serving platform [mlserver](https://mlserver.readthedocs.io/en/latest/examples/custom/README.html) and Seldon Inference Graph. | ||
|
||
NOTE: This repo is shared for learning purposes, some of the pipeliens implemented here might not have a real-world usecases and they are not fully tested. | ||
|
||
Pull requests, suggestions and completing the list of pipeliens for future implementation are highly appreciated. | ||
|
||
## Inference graphs implemented using 1st gen Seldon | ||
Pipelines from [InferLine: latency-aware provisioning and scaling for prediction serving pipelines](https://dl.acm.org/doi/10.1145/3419111.3421285) | ||
1. Cascade | ||
2. Ensemble | ||
3. Preprocess | ||
4. Vidoe Monitoring | ||
|
||
 | ||
|
||
and the following pipelines: | ||
|
||
 | ||
|
||
5. audio-qa: Audio to text -> Question Answering | ||
6. audio-sent: Audio to text -> Sentiment Analysis | ||
7. nlp: language identification -> translate fr to Eng -> summerisation | ||
8. sum-qa: Summerisation -> Question Answering | ||
9. video: Object Detection -> Object Classification | ||
|
||
## Inference graphs implemented using MLServer | ||
|
||
1. audio-qa: Audio to text -> Question Answering | ||
2. audio-sent: Audio to text -> Sentiment Analysis | ||
3. nlp: language identification -> translate fr to Eng -> summerisation | ||
4. sum-qa: Summerisation -> Question Answering | ||
5. video: Object Detection -> Object Classification | ||
|
||
## DockerHub | ||
Pre-built container images are also available [here](https://hub.docker.com/u/sdghafouri). Therefore if you are just trying out, you can deploy yaml files on your K8S cluster the way they are. | ||
|
||
## Sources of Pipelines for Future Implementation | ||
|
||
### Academic Papers | ||
1. [InferLine: latency-aware provisioning and scaling for prediction serving pipelines](https://dl.acm.org/doi/10.1145/3419111.3421285) | ||
2. [GrandSLAm: Guaranteeing SLAs for Jobs in | ||
Microservices Execution Frameworks](https://jeongseob.github.io/papers/kannan_eurosys19.pdf) | ||
3. [FA2: Fast, Accurate Autoscaling for Serving Deep Learning Inference with SLA Guarantees](https://ieeexplore.ieee.org/document/9804606) | ||
4. [Rim: Offloading Inference to the Edge](https://dl.acm.org/doi/pdf/10.1145/3450268.3453521) | ||
5. [Llama: A Heterogeneous & Serverless Framework for | ||
Auto-Tuning Video Analytics Pipelines](https://web.stanford.edu/~faromero/llama.pdf) | ||
6. [Scrooge: A Cost-Effective Deep Learning Inference System](https://dl.acm.org/doi/10.1145/3472883.3486993) | ||
7. [Nexus](https://homes.cs.washington.edu/~arvind/papers/nexus.pdf) | ||
8. [VideoEdge: Processing Camera Streams using Hierarchical Clusters](https://ieeexplore.ieee.org/document/8567661) | ||
9. [Live Video Analytics at Scale with Approximation and Delay-Tolerance](https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/zhang) | ||
10. [Serving Heterogeneous Machine Learning Models on Multi-GPU Servers with Spatio-Temporal Sharing](https://www.usenix.org/conference/atc22/presentation/choi-seungbeom) | ||
|
||
### Industrial Projects | ||
1. [Clarifai Workflows](https://clarifai.com/clarifai/main/workflows) | ||
2. [Facebook DLRM](https://github.com/facebookresearch/dlrm) | ||
|
||
|
||
Please give a star if this repo helped you learning somthing new :) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# Source | ||
Adapted from [clarifai-asr-sentiment](https://clarifai.com/clarifai/main/workflows/asr-sentiment) | ||
|
||
Two node pipeline | ||
|
||
list of available models per node: | ||
|
||
automatic-speech-recognition: node 1 | ||
https://huggingface.co/models?pipeline_tag=automatic-speech-recognition | ||
|
||
facebook/s2t-small-librispeech-asr | ||
facebook/s2t-medium-librispeech-asr | ||
facebook/s2t-large-librispeech-asr | ||
facebook/wav2vec2-base-960h | ||
facebook/wav2vec2-large-960h | ||
|
||
|
||
Question Answering: node 2 | ||
source: https://huggingface.co/models?pipeline_tag=question-answering | ||
|
||
deepset/roberta-base-squad2 | ||
deepset/xlm-roberta-large-squad2 | ||
distilbert-base-cased-distilled-squad | ||
deepset/xlm-roberta-base-squad2 |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,225 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Audio Node" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stderr", | ||
"output_type": "stream", | ||
"text": [ | ||
"/home/cc/miniconda3/envs/central/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", | ||
" from .autonotebook import tqdm as notebook_tqdm\n", | ||
"2022-10-08 18:21:50.411390: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory\n", | ||
"2022-10-08 18:21:50.411414: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.\n", | ||
"WARNING:datasets.builder:Reusing dataset librispeech_asr_demo (/home/cc/.cache/huggingface/datasets/hf-internal-testing___librispeech_asr_demo/clean/2.1.0/d3bc4c2bc2078fcde3ad0f0f635862e4c0fef78ba94c4a34c4c250a097af240b)\n", | ||
"/home/cc/miniconda3/envs/central/lib/python3.8/site-packages/transformers/models/speech_to_text/modeling_speech_to_text.py:561: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').\n", | ||
" input_lengths = (input_lengths - 1) // 2 + 1\n", | ||
"/home/cc/miniconda3/envs/central/lib/python3.8/site-packages/transformers/generation_utils.py:1202: UserWarning: Neither `max_length` nor `max_new_tokens` have been set, `max_length` will default to 200 (`self.config.max_length`). Controlling `max_length` via the config is deprecated and `max_length` will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.\n", | ||
" warnings.warn(\n" | ||
] | ||
}, | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"[{'text': 'mister quilter is the apostle of the middle classes and we are glad to welcome his gospel'},\n", | ||
" {'text': 'mister quilter is the apostle of the middle classes and we are glad to welcome his gospel'},\n", | ||
" {'text': 'mister quilter is the apostle of the middle classes and we are glad to welcome his gospel'},\n", | ||
" {'text': 'mister quilter is the apostle of the middle classes and we are glad to welcome his gospel'},\n", | ||
" {'text': 'mister quilter is the apostle of the middle classes and we are glad to welcome his gospel'}]" | ||
] | ||
}, | ||
"execution_count": 1, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"from transformers import pipeline\n", | ||
"from datasets import load_dataset\n", | ||
"import numpy as np\n", | ||
"\n", | ||
"ds = load_dataset(\n", | ||
" \"hf-internal-testing/librispeech_asr_demo\",\n", | ||
" \"clean\",\n", | ||
" split=\"validation\")\n", | ||
"\n", | ||
"input_data = ds[0][\"audio\"][\"array\"]\n", | ||
"batch = []\n", | ||
"for i in range(5):\n", | ||
" input_data = ds[i][\"audio\"][\"array\"]\n", | ||
" batch.append(input_data)\n", | ||
"# batch_test = np.vstack((input_data, input_data)) \n", | ||
"# for i in range(5):\n", | ||
"# batch_test = np.vstack((batch_test, input_data))\n", | ||
"\n", | ||
"task = \"automatic-speech-recognition\"\n", | ||
"model_name = 'facebook/s2t-small-librispeech-asr'\n", | ||
"batch_size = 5\n", | ||
"\n", | ||
"model = pipeline(\n", | ||
" task=task,\n", | ||
" model=model_name,\n", | ||
" batch_size=batch_size)\n", | ||
"\n", | ||
"res = model(batch)\n", | ||
"res" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"(93680,)" | ||
] | ||
}, | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"batch[0].shape" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# QA Node" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 15, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stderr", | ||
"output_type": "stream", | ||
"text": [ | ||
"Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForQuestionAnswering: ['vocab_projector.bias', 'vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_transform.weight']\n", | ||
"- This IS expected if you are initializing DistilBertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n", | ||
"- This IS NOT expected if you are initializing DistilBertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n", | ||
"Some weights of DistilBertForQuestionAnswering were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']\n", | ||
"You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n" | ||
] | ||
}, | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"{'score': 0.04677097871899605,\n", | ||
" 'start': 4,\n", | ||
" 'end': 20,\n", | ||
" 'answer': 'model conversion'}" | ||
] | ||
}, | ||
"execution_count": 15, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"from transformers.pipelines import pipeline\n", | ||
"# from transformers.modeling_auto import AutoModelForQuestionAnswering\n", | ||
"# from transformers.tokenization_auto import AutoTokenizer\n", | ||
"\n", | ||
"model_name = \"distilbert-base-uncased\"\n", | ||
"\n", | ||
"# a) Get predictions\n", | ||
"nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)\n", | ||
"QA_input = {\n", | ||
" 'question': 'Why is model conversion important?',\n", | ||
" 'context': 'nlp model conversion'\n", | ||
"}\n", | ||
"res = nlp(QA_input)\n", | ||
"res" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 8, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"a = np.vstack((input_data, input_data))\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 10, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"a = np.vstack((input_data, input_data)) \n", | ||
"for i in range(5):\n", | ||
" a = np.vstack((a, input_data))\n", | ||
" \n", | ||
" " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 12, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"(7, 93680)" | ||
] | ||
}, | ||
"execution_count": 12, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"a.shape" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"interpreter": { | ||
"hash": "2465c4f56298bc06dbdad3e7519856d346ec0e9edf6ba2c905f0af711583810e" | ||
}, | ||
"kernelspec": { | ||
"display_name": "Python 3.8.13 ('central')", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.8.13" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
63 changes: 63 additions & 0 deletions
63
mlserver/1-paper-audio-qa/seldon-core-version/client-async.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
from urllib import response | ||
import requests | ||
from pprint import PrettyPrinter | ||
from mlserver.types import InferenceResponse | ||
from mlserver.codecs.string import StringRequestCodec | ||
pp = PrettyPrinter(indent=4) | ||
from datasets import load_dataset | ||
import threading | ||
import json | ||
|
||
# single node inferline | ||
gateway_endpoint = "localhost:32000" | ||
deployment_name = 'audio-qa' | ||
namespace = "default" | ||
endpoint = f"http://{gateway_endpoint}/seldon/{namespace}/{deployment_name}/v2/models/infer" | ||
|
||
batch_test = 100 | ||
responses = [] | ||
|
||
ds = load_dataset( | ||
"hf-internal-testing/librispeech_asr_demo", | ||
"clean", | ||
split="validation") | ||
|
||
input_data = ds[0]["audio"]["array"] | ||
|
||
def send_requests(): | ||
payload = { | ||
"inputs": [ | ||
{ | ||
"name": "array_inputs", | ||
"shape": [1, len(input_data)], | ||
"datatype": "FP32", | ||
"data": input_data.tolist(), | ||
"parameters": { | ||
"content_type": "np" | ||
} | ||
} | ||
] | ||
} | ||
response = requests.post(endpoint, json=payload) | ||
responses.append(response) | ||
return response | ||
|
||
thread_pool = [] | ||
|
||
for i in range(batch_test): | ||
t = threading.Thread(target=send_requests) | ||
t.start() | ||
thread_pool.append(t) | ||
|
||
for t in thread_pool: | ||
t.join() | ||
|
||
inference_responses = list(map( | ||
lambda response: InferenceResponse.parse_raw(response.text), responses)) | ||
raw_jsons = list(map( | ||
lambda inference_response: StringRequestCodec.decode_response( | ||
inference_response), inference_responses)) | ||
outputs = list(map( | ||
lambda raw_json: json.loads(raw_json[0]), raw_jsons)) | ||
|
||
pp.pprint(outputs) |
Oops, something went wrong.