From notebooks to operational ML pipelines

This repository provides the code base for the HSG CAS Big data & AI for managers session on how to operationalize machine learning models (October 2020).

##Pre-requisites

In order to be able to run all the code provided, the following pre-requisites must be met:

An environment to run Jupyter notebooks in, e.g. Visual Studio Code must be available. Also see "working with Jupyter Notebooks in VS Code".
Azure ML Studio workspace must be available. Follow these instructions to create it. In case you need a trial Azure subscription first, start here.
Once you Azure ML workspace is provisioned, download the workspace config.json file as shown below. Make sure to put the file in the same folder that contains the notebooks. Here you can find more information on setting up a local development environment to work with Azure ML.
To run ML pipelines remotely, you will also need to have a service principal account as shown in the "Service Principal Authentication" section in this notebook. The service principal is used in remote runs for authentication and authorization.
To interact with Azure ML from your local python environment, you must first install the azureml-sdk for python. It can be installed like any other library or package via pip. Siply type !pip install azureml-sdk in one cell of your Jupyter notebook and execute the cell. More information on the installation can be found here.

Notebook	Content
0_register_dataset.ipynb	Registers the csv file as a dataset in Azure ML
0_register_secret.ipynb	Registers the secret for the service principal in the Azure Keyvault to ensure it is not stored in clear text in a script. This secret is then retrieved by the pipeline runs and used to authenticate as well as to retrieve assets such as models and datasets from the Azure ML workspace.
1_first_model.ipynb	Trains a first basic model locally using python's sklearn library.
2_experiment_tracking.ipynb	Re-runs the training process of the previous notebook, this time with tracking model metrics inside Azure ML experiments for traceability purposes. To view them, open Azure ML Studio and click "Experiments" in the navigation bar on the left.
3_predictions.ipynb	Sets up a ML pipeline for batch scoring. Loads the previously registered dataset and registered model and generates a CSV file with the models predictions which is then uploaded to the Azure ML storage.
4_deploy_realtime.ipynb	Publishes the previously trained model as a REST endpoint in a Docker container instance. This enables getting real-time predictions from the model.
5_test_ml_endpoint.http	Can be used to send new data via HTTP request to the model endpoint deployed in the previous notebook. Requires the "HTTP client" extension to be installed in VS Code.

Clean up

To ensure no unwanted costs are incured, ensure that the endpoint deployed with notebook #4 is deleted. To do so, in Azure ML Studio, click "Endpoints" in the navigation bar on the left. Here you will find the endpoint named german-credit-hsg. Highlight it and finally click the "Delete" button in the menu bar just above it.

When no compute resources are running, the Azure ML Studio workspace does not incur any costs.

Disclaimer

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

From notebooks to operational ML pipelines

Contents

Clean up

Disclaimer

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.vscode		.vscode
data		data
.gitignore		.gitignore
0_register_dataset.ipynb		0_register_dataset.ipynb
0_register_secret.ipynb		0_register_secret.ipynb
1_first_model.ipynb		1_first_model.ipynb
2_experiment_tracking.ipynb		2_experiment_tracking.ipynb
3_predictions.ipynb		3_predictions.ipynb
4_deploy_realtime.ipynb		4_deploy_realtime.ipynb
5_test_ml_endpoint.http		5_test_ml_endpoint.http
README.md		README.md

marcscho/notebook-to-ml-pipeline

Folders and files

Latest commit

History

Repository files navigation

From notebooks to operational ML pipelines

Contents

Clean up

Disclaimer

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages