Powered By GitBook
kubernetes-ml-ops
An introduction to machine learning model deployment operations using Python and Kubernetes.
Contributed by the Google Cloud community. Not official Google documentation.
A common pattern for deploying machine learning (ML) models (such as models trained using the SciKit Learn or Keras packages for Python) into production environments is to expose these models as RESTful API microservices, hosted from within Docker containers. These microservices can then be deployed to a cloud environment for handling everything required for maintaining continuous availability.
Kubernetes is a container orchestration platform that provides a mechanism for defining entire microservice-based application deployment topologies and their service-level requirements for maintaining continuous availability.

Create a Google Cloud project

A default project is often set up by default for new accounts, but you will start by creating a new project to keep this separate and easy to delete later. After creating the project, be sure to copy the project ID, which is usually different from the project name.
How to find your project ID.

Open Cloud Shell and create a project directory

Open Cloud Shell by clicking the Activate Cloud Shell button in the navigation bar in the upper-right corner of the Cloud Console.
In Cloud Shell, use the following command to create a project directory:
1
mkdir py-flask-ml-rest-api
Copied!

Containerizing a simple ML model scoring service using Flask and Docker

This tutorial uses the contents of the py-flask-ml-rest-api directory for demonstration purposes. This directory contains a simple Python ML model scoring REST API in the api.py module and a Dockerfile:
1
py-flask-ml-rest-api/
2
| Dockerfile
3
| api.py ## Needs to be altered according to your requirements and ML model
Copied!

Defining the Flask service in the api.py module

This is a Python module that uses the Flask framework for defining a web service (app), with a function (score), that executes in response to an HTTP request to a specific URL (or route).

api.py

1
from flask import Flask, jsonify, make_response, request
2
​
3
app = Flask(__name__)
4
​
5
​
6
@app.route('/score', methods=['POST'])
7
def score():
8
features = request.json['X']
9
return make_response(jsonify({'score': features}))
10
​
11
​
12
if __name__ == '__main__':
13
app.run(host='0.0.0.0', port=5000)
Copied!

Defining the Docker image with the Dockerfile

Dockerfile

1
FROM python:3.6-slim
2
WORKDIR /usr/src/app
3
COPY . .
4
RUN pip install flask
5
​
6
EXPOSE 5000
7
CMD ["python", "api.py"]
Copied!
In Cloud Shell, run the following command to build the dockerfile:
1
$ docker build -t ml-k8s .
Copied!

Pushing the Docker Image to Container Registry

1
$ docker tag ml-k8s [HOSTNAME]/[PROJECT-ID]/ml-k8s
2
$ docker push [HOSTNAME]/[PROJECT-ID]/ml-k8s
Copied!
For more about Container Registry, see this quickstart.
When your Docker file is built and pushed to Container Registry, you are done with containerizing the ML model.

Setting up and connecting to the Kubernetes cluster

    1.
    Ensure that you have enabled the Google Kubernetes Engine API. You can enable an API in the Cloud Console.
    2.
    Start a cluster:
    1
    $ gcloud container clusters create k8s-ml-cluster --num-nodes 3 --machine-type g1-small --zone us-west1-b
    Copied!
    You may need to wait a moment for the cluster to be created.
    3.
    Connect to the cluster:
    1
    $ gcloud container clusters get-credentials tf-gke-k8s --zone us-west1-b --project [PROJECT_ID]
    Copied!
For more information, see Creating a Kubernetes cluster.

Deploying the containerized ML model to Kubernetes

The structure of this project that you create is as follows:
1
| api.py
2
| base
3
| namespace.yaml
4
| deployment.yaml
5
| service.yaml
6
| kustomize.yaml
7
| Dockerfile
Copied!
Writing YAML files for Kubernetes can get repetitive and hard to manage, especially when there are multiple files and you need to execute them one by one. With the Kustomize utility, you can customize raw, template-free YAML files for multiple purposes, leaving the original YAML untouched.

Install Kustomize

1
curl -s https://api.github.com/repos/kubernetes-sigs/kustomize/releases |\
2
grep browser_download |\
3
grep linux |\
4
cut -d '"' -f 4 |\
5
grep /kustomize/v |\
6
sort | tail -n 1 |\
7
xargs curl -O -L && \
8
tar xzf ./kustomize_v*_linux_amd64.tar.gz && \
9
mv kustomize /usr/bin/
Copied!

Create namespace.yaml

Namespaces provide a scope for Kubernetes resources, carving up your cluster in smaller units.
File contents:
1
apiVersion: v1
2
kind: Namespace
3
metadata:
4
name: mlops
Copied!

Create deployment.yaml

Deployments represent a set of multiple, identical Pods with no unique identities. A Deployment runs multiple replicas of your application and automatically replaces any instances that fail or become unresponsive.
File contents:
1
apiVersion: apps/v1
2
kind: Deployment
3
metadata:
4
labels:
5
app: score-app
6
env: qa
7
name: score-app
8
namespace: mlops
9
spec:
10
replicas: 2 # Creating two PODs for our app
11
selector:
12
matchLabels:
13
app: score-app
14
template:
15
metadata:
16
labels:
17
app: score-app
18
env: qa
19
spec:
20
containers:
21
- image: <DOCKER_IMAGE_NAME> # Docker image name, that we pushed to GCR
22
name: <CONTAINER_NAME> # POD name
23
ports:
24
- containerPort: 5000
25
protocol: TCP
Copied!

Create service.yaml

An abstract way to expose an application running on a set of Pods as a network service.
File contents:
1
apiVersion: v1
2
kind: Service
3
metadata:
4
name: score-app
5
labels:
6
app: score-app
7
namespace: mlops
8
spec:
9
type: LoadBalancer
10
ports:
11
- port: 5000
12
targetPort: 5000
13
selector:
14
app: score-app
Copied!

Create kustomize.yaml

File contents:
1
apiVersion: kustomize.config.k8s.io/v1beta1
2
kind: Kustomization
3
resources:
4
- namespace.yaml
5
- deployment.yaml
6
- service.yaml
Copied!

Deploy the app

After setting up these YAML files, you can deploy your app using this single command:
1
kubectl apply --kustomize=${PWD}/base/ --record=true
Copied!
To see all components deployed into this namespace use this command:
1
$ kubectl get ns
Copied!
The output should look something like the following:
1
NAME STATUS AGE
2
default Active 35m
3
kube-node-lease Active 35m
4
kube-public Active 35m
5
kube-system Active 35m
6
mlops Active 33s
7
staging Active 34m
Copied!
To see the status of the deployment, use this command:
1
$ kubectl get deployment -n mlops
Copied!
The output should look something like the following:
1
NAME READY UP-TO-DATE AVAILABLE AGE
2
score-app 2/2 2 2 100s
Copied!
To see the status of the service, use this command:
1
$ kubectl get service -n mlops
Copied!
The output should look something like the following:
1
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
2
score-app LoadBalancer xx.xx.xx.xx xx.xx.xx.xx 5000:xxxx/TCP 2m3s
Copied!

Test the deployed model

1
curl http://[EXTERNAL_IP_ADDRESS]:5000/score \
2
--request POST \
3
--header "Content-Type: application/json" \
4
--data '{"X": [1, 2]}
Copied!
The output should look something like the following:
1
{"score":[1,2]}
Copied!

Cleanup

With a GKE cluster running, you can create and delete resources with the kubectl command-line client.
To remove your cluster, select the checkbox next to the cluster name and click the Delete button.
Last modified 7mo ago