Skip to content

Commit

Permalink
first commit
Browse files Browse the repository at this point in the history
  • Loading branch information
saeid93 committed Oct 23, 2022
0 parents commit cb3cd43
Show file tree
Hide file tree
Showing 618 changed files with 899,900 additions and 0 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
__pycache__/
*.py[cod]
*$py.class
*.pt
60 changes: 60 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Description

This repo contains a set of practice inference graphs implemented using [Seldon core inference graph](https://docs.seldon.io/projects/seldon-core/en/latest/graph/inference-graph.html). Inference graphs in [seldon](./seldon/) folder are implemented using [Seldon 1st gen custom python package](https://docs.seldon.io/projects/seldon-core/en/latest/python/python_wrapping_docker.html) and pipelines in [mlserver](./mlserver/) folder are implemented using [Serving Custom Model](https://mlserver.readthedocs.io/en/latest/examples/custom/README.html) Seldon's newer serving platform [mlserver](https://mlserver.readthedocs.io/en/latest/examples/custom/README.html) and Seldon Inference Graph.

NOTE: This repo is shared for learning purposes, some of the pipeliens implemented here might not have a real-world usecases and they are not fully tested.

Pull requests, suggestions and completing the list of pipeliens for future implementation are highly appreciated.

## Inference graphs implemented using 1st gen Seldon
Pipelines from [InferLine: latency-aware provisioning and scaling for prediction serving pipelines](https://dl.acm.org/doi/10.1145/3419111.3421285)
1. Cascade
2. Ensemble
3. Preprocess
4. Vidoe Monitoring

![inferline](inferline.png)

and the following pipelines:

![inferline](other-pipelines.png)

5. audio-qa: Audio to text -> Question Answering
6. audio-sent: Audio to text -> Sentiment Analysis
7. nlp: language identification -> translate fr to Eng -> summerisation
8. sum-qa: Summerisation -> Question Answering
9. video: Object Detection -> Object Classification

## Inference graphs implemented using MLServer

1. audio-qa: Audio to text -> Question Answering
2. audio-sent: Audio to text -> Sentiment Analysis
3. nlp: language identification -> translate fr to Eng -> summerisation
4. sum-qa: Summerisation -> Question Answering
5. video: Object Detection -> Object Classification

## DockerHub
Pre-built container images are also available [here](https://hub.docker.com/u/sdghafouri). Therefore if you are just trying out, you can deploy yaml files on your K8S cluster the way they are.

## Sources of Pipelines for Future Implementation

### Academic Papers
1. [InferLine: latency-aware provisioning and scaling for prediction serving pipelines](https://dl.acm.org/doi/10.1145/3419111.3421285)
2. [GrandSLAm: Guaranteeing SLAs for Jobs in
Microservices Execution Frameworks](https://jeongseob.github.io/papers/kannan_eurosys19.pdf)
3. [FA2: Fast, Accurate Autoscaling for Serving Deep Learning Inference with SLA Guarantees](https://ieeexplore.ieee.org/document/9804606)
4. [Rim: Offloading Inference to the Edge](https://dl.acm.org/doi/pdf/10.1145/3450268.3453521)
5. [Llama: A Heterogeneous & Serverless Framework for
Auto-Tuning Video Analytics Pipelines](https://web.stanford.edu/~faromero/llama.pdf)
6. [Scrooge: A Cost-Effective Deep Learning Inference System](https://dl.acm.org/doi/10.1145/3472883.3486993)
7. [Nexus](https://homes.cs.washington.edu/~arvind/papers/nexus.pdf)
8. [VideoEdge: Processing Camera Streams using Hierarchical Clusters](https://ieeexplore.ieee.org/document/8567661)
9. [Live Video Analytics at Scale with Approximation and Delay-Tolerance](https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/zhang)
10. [Serving Heterogeneous Machine Learning Models on Multi-GPU Servers with Spatio-Temporal Sharing](https://www.usenix.org/conference/atc22/presentation/choi-seungbeom)

### Industrial Projects
1. [Clarifai Workflows](https://clarifai.com/clarifai/main/workflows)
2. [Facebook DLRM](https://github.com/facebookresearch/dlrm)


Please give a star if this repo helped you learning somthing new :)
Binary file added inferline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
24 changes: 24 additions & 0 deletions mlserver/1-paper-audio-qa/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Source
Adapted from [clarifai-asr-sentiment](https://clarifai.com/clarifai/main/workflows/asr-sentiment)

Two node pipeline

list of available models per node:

automatic-speech-recognition: node 1
https://huggingface.co/models?pipeline_tag=automatic-speech-recognition

facebook/s2t-small-librispeech-asr
facebook/s2t-medium-librispeech-asr
facebook/s2t-large-librispeech-asr
facebook/wav2vec2-base-960h
facebook/wav2vec2-large-960h


Question Answering: node 2
source: https://huggingface.co/models?pipeline_tag=question-answering

deepset/roberta-base-squad2
deepset/xlm-roberta-large-squad2
distilbert-base-cased-distilled-squad
deepset/xlm-roberta-base-squad2
Empty file.
225 changes: 225 additions & 0 deletions mlserver/1-paper-audio-qa/notebook-version.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Audio Node"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/cc/miniconda3/envs/central/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n",
"2022-10-08 18:21:50.411390: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory\n",
"2022-10-08 18:21:50.411414: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.\n",
"WARNING:datasets.builder:Reusing dataset librispeech_asr_demo (/home/cc/.cache/huggingface/datasets/hf-internal-testing___librispeech_asr_demo/clean/2.1.0/d3bc4c2bc2078fcde3ad0f0f635862e4c0fef78ba94c4a34c4c250a097af240b)\n",
"/home/cc/miniconda3/envs/central/lib/python3.8/site-packages/transformers/models/speech_to_text/modeling_speech_to_text.py:561: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').\n",
" input_lengths = (input_lengths - 1) // 2 + 1\n",
"/home/cc/miniconda3/envs/central/lib/python3.8/site-packages/transformers/generation_utils.py:1202: UserWarning: Neither `max_length` nor `max_new_tokens` have been set, `max_length` will default to 200 (`self.config.max_length`). Controlling `max_length` via the config is deprecated and `max_length` will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.\n",
" warnings.warn(\n"
]
},
{
"data": {
"text/plain": [
"[{'text': 'mister quilter is the apostle of the middle classes and we are glad to welcome his gospel'},\n",
" {'text': 'mister quilter is the apostle of the middle classes and we are glad to welcome his gospel'},\n",
" {'text': 'mister quilter is the apostle of the middle classes and we are glad to welcome his gospel'},\n",
" {'text': 'mister quilter is the apostle of the middle classes and we are glad to welcome his gospel'},\n",
" {'text': 'mister quilter is the apostle of the middle classes and we are glad to welcome his gospel'}]"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from transformers import pipeline\n",
"from datasets import load_dataset\n",
"import numpy as np\n",
"\n",
"ds = load_dataset(\n",
" \"hf-internal-testing/librispeech_asr_demo\",\n",
" \"clean\",\n",
" split=\"validation\")\n",
"\n",
"input_data = ds[0][\"audio\"][\"array\"]\n",
"batch = []\n",
"for i in range(5):\n",
" input_data = ds[i][\"audio\"][\"array\"]\n",
" batch.append(input_data)\n",
"# batch_test = np.vstack((input_data, input_data)) \n",
"# for i in range(5):\n",
"# batch_test = np.vstack((batch_test, input_data))\n",
"\n",
"task = \"automatic-speech-recognition\"\n",
"model_name = 'facebook/s2t-small-librispeech-asr'\n",
"batch_size = 5\n",
"\n",
"model = pipeline(\n",
" task=task,\n",
" model=model_name,\n",
" batch_size=batch_size)\n",
"\n",
"res = model(batch)\n",
"res"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(93680,)"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"batch[0].shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# QA Node"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForQuestionAnswering: ['vocab_projector.bias', 'vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_transform.weight']\n",
"- This IS expected if you are initializing DistilBertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"- This IS NOT expected if you are initializing DistilBertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"Some weights of DistilBertForQuestionAnswering were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']\n",
"You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n"
]
},
{
"data": {
"text/plain": [
"{'score': 0.04677097871899605,\n",
" 'start': 4,\n",
" 'end': 20,\n",
" 'answer': 'model conversion'}"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from transformers.pipelines import pipeline\n",
"# from transformers.modeling_auto import AutoModelForQuestionAnswering\n",
"# from transformers.tokenization_auto import AutoTokenizer\n",
"\n",
"model_name = \"distilbert-base-uncased\"\n",
"\n",
"# a) Get predictions\n",
"nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)\n",
"QA_input = {\n",
" 'question': 'Why is model conversion important?',\n",
" 'context': 'nlp model conversion'\n",
"}\n",
"res = nlp(QA_input)\n",
"res"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"a = np.vstack((input_data, input_data))\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"a = np.vstack((input_data, input_data)) \n",
"for i in range(5):\n",
" a = np.vstack((a, input_data))\n",
" \n",
" "
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(7, 93680)"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a.shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"interpreter": {
"hash": "2465c4f56298bc06dbdad3e7519856d346ec0e9edf6ba2c905f0af711583810e"
},
"kernelspec": {
"display_name": "Python 3.8.13 ('central')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Binary file added mlserver/1-paper-audio-qa/pipeline-audio-qa.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
63 changes: 63 additions & 0 deletions mlserver/1-paper-audio-qa/seldon-core-version/client-async.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
from urllib import response
import requests
from pprint import PrettyPrinter
from mlserver.types import InferenceResponse
from mlserver.codecs.string import StringRequestCodec
pp = PrettyPrinter(indent=4)
from datasets import load_dataset
import threading
import json

# single node inferline
gateway_endpoint = "localhost:32000"
deployment_name = 'audio-qa'
namespace = "default"
endpoint = f"http://{gateway_endpoint}/seldon/{namespace}/{deployment_name}/v2/models/infer"

batch_test = 100
responses = []

ds = load_dataset(
"hf-internal-testing/librispeech_asr_demo",
"clean",
split="validation")

input_data = ds[0]["audio"]["array"]

def send_requests():
payload = {
"inputs": [
{
"name": "array_inputs",
"shape": [1, len(input_data)],
"datatype": "FP32",
"data": input_data.tolist(),
"parameters": {
"content_type": "np"
}
}
]
}
response = requests.post(endpoint, json=payload)
responses.append(response)
return response

thread_pool = []

for i in range(batch_test):
t = threading.Thread(target=send_requests)
t.start()
thread_pool.append(t)

for t in thread_pool:
t.join()

inference_responses = list(map(
lambda response: InferenceResponse.parse_raw(response.text), responses))
raw_jsons = list(map(
lambda inference_response: StringRequestCodec.decode_response(
inference_response), inference_responses))
outputs = list(map(
lambda raw_json: json.loads(raw_json[0]), raw_jsons))

pp.pprint(outputs)
Loading

0 comments on commit cb3cd43

Please sign in to comment.