first commit

saeid93 · Oct 23, 2022 · cb3cd43 · cb3cd43
commit cb3cd43
Show file tree

Hide file tree

Showing 618 changed files with 899,900 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,4 @@
+__pycache__/
+*.py[cod]
+*$py.class
+*.pt
diff --git a/README.md b/README.md
@@ -0,0 +1,60 @@
+# Description
+
+This repo contains a set of practice inference graphs implemented using [Seldon core inference graph](https://docs.seldon.io/projects/seldon-core/en/latest/graph/inference-graph.html). Inference graphs in [seldon](./seldon/) folder are implemented using [Seldon 1st gen custom python package](https://docs.seldon.io/projects/seldon-core/en/latest/python/python_wrapping_docker.html) and pipelines in [mlserver](./mlserver/) folder are implemented using [Serving Custom Model](https://mlserver.readthedocs.io/en/latest/examples/custom/README.html) Seldon's newer serving platform [mlserver](https://mlserver.readthedocs.io/en/latest/examples/custom/README.html) and Seldon Inference Graph.
+
+NOTE: This repo is shared for learning purposes, some of the pipeliens implemented here might not have a real-world usecases and they are not fully tested.
+
+Pull requests, suggestions and completing the list of pipeliens for future implementation are highly appreciated.
+
+## Inference graphs implemented using 1st gen Seldon
+Pipelines from [InferLine: latency-aware provisioning and scaling for prediction serving pipelines](https://dl.acm.org/doi/10.1145/3419111.3421285)
+1. Cascade
+2. Ensemble
+3. Preprocess
+4. Vidoe Monitoring
+
+![inferline](inferline.png)
+
+and the following pipelines:
+
+![inferline](other-pipelines.png)
+
+5. audio-qa: Audio to text -> Question Answering
+6. audio-sent: Audio to text -> Sentiment Analysis
+7. nlp: language identification -> translate fr to Eng -> summerisation
+8. sum-qa: Summerisation -> Question Answering
+9. video: Object Detection -> Object Classification
+
+## Inference graphs implemented using MLServer
+
+1. audio-qa: Audio to text -> Question Answering
+2. audio-sent: Audio to text -> Sentiment Analysis
+3. nlp: language identification -> translate fr to Eng -> summerisation
+4. sum-qa: Summerisation -> Question Answering
+5. video: Object Detection -> Object Classification
+
+## DockerHub
+Pre-built container images are also available [here](https://hub.docker.com/u/sdghafouri). Therefore if you are just trying out, you can deploy yaml files on your K8S cluster the way they are.
+
+## Sources of Pipelines for Future Implementation
+
+### Academic Papers
+1. [InferLine: latency-aware provisioning and scaling for prediction serving pipelines](https://dl.acm.org/doi/10.1145/3419111.3421285)
+2. [GrandSLAm: Guaranteeing SLAs for Jobs in
+Microservices Execution Frameworks](https://jeongseob.github.io/papers/kannan_eurosys19.pdf)
+3. [FA2: Fast, Accurate Autoscaling for Serving Deep Learning Inference with SLA Guarantees](https://ieeexplore.ieee.org/document/9804606)
+4. [Rim: Offloading Inference to the Edge](https://dl.acm.org/doi/pdf/10.1145/3450268.3453521)
+5. [Llama: A Heterogeneous & Serverless Framework for
+Auto-Tuning Video Analytics Pipelines](https://web.stanford.edu/~faromero/llama.pdf)
+6. [Scrooge: A Cost-Effective Deep Learning Inference System](https://dl.acm.org/doi/10.1145/3472883.3486993)
+7. [Nexus](https://homes.cs.washington.edu/~arvind/papers/nexus.pdf)
+8. [VideoEdge: Processing Camera Streams using Hierarchical Clusters](https://ieeexplore.ieee.org/document/8567661)
+9. [Live Video Analytics at Scale with Approximation and Delay-Tolerance](https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/zhang)
+10. [Serving Heterogeneous Machine Learning Models on Multi-GPU Servers with Spatio-Temporal Sharing](https://www.usenix.org/conference/atc22/presentation/choi-seungbeom)
+
+### Industrial Projects
+1. [Clarifai Workflows](https://clarifai.com/clarifai/main/workflows)
+2. [Facebook DLRM](https://github.com/facebookresearch/dlrm)
+
+
+Please give a star if this repo helped you learning somthing new :)
diff --git a/inferline.png b/inferline.png
diff --git a/mlserver/1-paper-audio-qa/README.md b/mlserver/1-paper-audio-qa/README.md
@@ -0,0 +1,24 @@
+# Source
+Adapted from [clarifai-asr-sentiment](https://clarifai.com/clarifai/main/workflows/asr-sentiment)
+
+Two node pipeline
+
+list of available models per node:
+
+automatic-speech-recognition: node 1
+https://huggingface.co/models?pipeline_tag=automatic-speech-recognition
+
+facebook/s2t-small-librispeech-asr
+facebook/s2t-medium-librispeech-asr
+facebook/s2t-large-librispeech-asr
+facebook/wav2vec2-base-960h
+facebook/wav2vec2-large-960h
+
+
+Question Answering: node 2
+source: https://huggingface.co/models?pipeline_tag=question-answering
+
+deepset/roberta-base-squad2
+deepset/xlm-roberta-large-squad2
+distilbert-base-cased-distilled-squad
+deepset/xlm-roberta-base-squad2
diff --git a/mlserver/1-paper-audio-qa/__init__.py b/mlserver/1-paper-audio-qa/__init__.py
diff --git a/mlserver/1-paper-audio-qa/notebook-version.ipynb b/mlserver/1-paper-audio-qa/notebook-version.ipynb
@@ -0,0 +1,225 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Audio Node"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/cc/miniconda3/envs/central/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+      "  from .autonotebook import tqdm as notebook_tqdm\n",
+      "2022-10-08 18:21:50.411390: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory\n",
+      "2022-10-08 18:21:50.411414: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.\n",
+      "WARNING:datasets.builder:Reusing dataset librispeech_asr_demo (/home/cc/.cache/huggingface/datasets/hf-internal-testing___librispeech_asr_demo/clean/2.1.0/d3bc4c2bc2078fcde3ad0f0f635862e4c0fef78ba94c4a34c4c250a097af240b)\n",
+      "/home/cc/miniconda3/envs/central/lib/python3.8/site-packages/transformers/models/speech_to_text/modeling_speech_to_text.py:561: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').\n",
+      "  input_lengths = (input_lengths - 1) // 2 + 1\n",
+      "/home/cc/miniconda3/envs/central/lib/python3.8/site-packages/transformers/generation_utils.py:1202: UserWarning: Neither `max_length` nor `max_new_tokens` have been set, `max_length` will default to 200 (`self.config.max_length`). Controlling `max_length` via the config is deprecated and `max_length` will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.\n",
+      "  warnings.warn(\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[{'text': 'mister quilter is the apostle of the middle classes and we are glad to welcome his gospel'},\n",
+       " {'text': 'mister quilter is the apostle of the middle classes and we are glad to welcome his gospel'},\n",
+       " {'text': 'mister quilter is the apostle of the middle classes and we are glad to welcome his gospel'},\n",
+       " {'text': 'mister quilter is the apostle of the middle classes and we are glad to welcome his gospel'},\n",
+       " {'text': 'mister quilter is the apostle of the middle classes and we are glad to welcome his gospel'}]"
+      ]
+     },
+     "execution_count": 1,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from transformers import pipeline\n",
+    "from datasets import load_dataset\n",
+    "import numpy as np\n",
+    "\n",
+    "ds = load_dataset(\n",
+    "    \"hf-internal-testing/librispeech_asr_demo\",\n",
+    "    \"clean\",\n",
+    "    split=\"validation\")\n",
+    "\n",
+    "input_data = ds[0][\"audio\"][\"array\"]\n",
+    "batch = []\n",
+    "for i in range(5):\n",
+    "       input_data = ds[i][\"audio\"][\"array\"]\n",
+    "       batch.append(input_data)\n",
+    "# batch_test = np.vstack((input_data, input_data)) \n",
+    "# for i in range(5):\n",
+    "#        batch_test = np.vstack((batch_test, input_data))\n",
+    "\n",
+    "task = \"automatic-speech-recognition\"\n",
+    "model_name = 'facebook/s2t-small-librispeech-asr'\n",
+    "batch_size = 5\n",
+    "\n",
+    "model  = pipeline(\n",
+    "    task=task,\n",
+    "    model=model_name,\n",
+    "    batch_size=batch_size)\n",
+    "\n",
+    "res = model(batch)\n",
+    "res"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(93680,)"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "batch[0].shape"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# QA Node"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForQuestionAnswering: ['vocab_projector.bias', 'vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_transform.weight']\n",
+      "- This IS expected if you are initializing DistilBertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
+      "- This IS NOT expected if you are initializing DistilBertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
+      "Some weights of DistilBertForQuestionAnswering were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']\n",
+      "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "{'score': 0.04677097871899605,\n",
+       " 'start': 4,\n",
+       " 'end': 20,\n",
+       " 'answer': 'model conversion'}"
+      ]
+     },
+     "execution_count": 15,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from transformers.pipelines import pipeline\n",
+    "# from transformers.modeling_auto import AutoModelForQuestionAnswering\n",
+    "# from transformers.tokenization_auto import AutoTokenizer\n",
+    "\n",
+    "model_name = \"distilbert-base-uncased\"\n",
+    "\n",
+    "# a) Get predictions\n",
+    "nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)\n",
+    "QA_input = {\n",
+    "    'question': 'Why is model conversion important?',\n",
+    "    'context': 'nlp model conversion'\n",
+    "}\n",
+    "res = nlp(QA_input)\n",
+    "res"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "a = np.vstack((input_data, input_data))\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "a = np.vstack((input_data, input_data)) \n",
+    "for i in range(5):\n",
+    "       a = np.vstack((a, input_data))\n",
+    "       \n",
+    "       "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(7, 93680)"
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "a.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "interpreter": {
+   "hash": "2465c4f56298bc06dbdad3e7519856d346ec0e9edf6ba2c905f0af711583810e"
+  },
+  "kernelspec": {
+   "display_name": "Python 3.8.13 ('central')",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.13"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/mlserver/1-paper-audio-qa/pipeline-audio-qa.png b/mlserver/1-paper-audio-qa/pipeline-audio-qa.png
diff --git a/mlserver/1-paper-audio-qa/seldon-core-version/client-async.py b/mlserver/1-paper-audio-qa/seldon-core-version/client-async.py
@@ -0,0 +1,63 @@
+from urllib import response
+import requests
+from pprint import PrettyPrinter
+from mlserver.types import InferenceResponse
+from mlserver.codecs.string import StringRequestCodec
+pp = PrettyPrinter(indent=4)
+from datasets import load_dataset
+import threading
+import json
+
+# single node inferline
+gateway_endpoint = "localhost:32000"
+deployment_name = 'audio-qa'
+namespace = "default"
+endpoint = f"http://{gateway_endpoint}/seldon/{namespace}/{deployment_name}/v2/models/infer"
+
+batch_test = 100
+responses = []
+
+ds = load_dataset(
+    "hf-internal-testing/librispeech_asr_demo",
+    "clean",
+    split="validation")
+
+input_data = ds[0]["audio"]["array"]
+
+def send_requests():
+    payload = {
+        "inputs": [
+            {
+            "name": "array_inputs",
+            "shape": [1, len(input_data)],
+            "datatype": "FP32",
+            "data": input_data.tolist(),
+            "parameters": {
+                "content_type": "np"
+            }
+            }
+        ]
+    }
+    response = requests.post(endpoint, json=payload)
+    responses.append(response)
+    return response
+
+thread_pool = []
+
+for i in range(batch_test):
+    t = threading.Thread(target=send_requests)
+    t.start()
+    thread_pool.append(t)
+
+for t in thread_pool:
+    t.join()
+
+inference_responses = list(map(
+    lambda response: InferenceResponse.parse_raw(response.text), responses))
+raw_jsons = list(map(
+    lambda inference_response: StringRequestCodec.decode_response(
+        inference_response), inference_responses))
+outputs = list(map(
+    lambda raw_json: json.loads(raw_json[0]), raw_jsons))
+
+pp.pprint(outputs)