AI/ML

Deploy OpenThinker 7B on GCP: Best Practices for AI Model Hosting

Introduction

Deploying OpenThinker 7B on Google Cloud Platform (GCP) allows for scalable, secure, and cost efficient hosting of the model. GCP provides various services such as Google Kubernetes Engine (GKE), Cloud Run and Compute Engine (GCE) for deployment.

In this guide, we will focus on deploying OpenThinker 7B using Google Kubernetes Engine (GKE), which provides managed Kubernetes infrastructure for deploying and scaling containers.

Key Benefits of Deploying OpenThinker 7B on GCP

Scalability: Auto-scaling for high-demand workloads
Cost Optimization: Pay for compute resources as needed
Managed Kubernetes: Simplifies deployment and scaling
Security: Integrated IAM and VPC networking

Step 1: Prerequisites

Before starting, ensure you have:

A Google Cloud account with billing enabled
Google Cloud SDK (gcloud CLI) installed and authenticated
Docker installed on your local machine
A pre-built Docker image of OpenThinker 7B
Kubernetes command-line tool (kubectl) installed

Step 2: Push the Docker Image to Google Container Registry (GCR)

Enable GCR API and Authenticate Docker

Enable Google Container Registry (GCR):

gcloud services enable containerregistry.googleapis.com

Authenticate Docker to push images to GCR:

gcloud auth configure-docker

Tag the Docker Image

Retrieve your GCP project ID:

gcloud config get-value project

Tag the image for GCR (replace <project-id> and <region> with your actual values):

docker tag openthinker-7b gcr.io/<project-id>/openthinker-7b:latest

Push the Image to GCR

docker push gcr.io/<project-id>/openthinker-7b:latest

Once completed, the image will be stored in Google Container Registry (GCR).

Step 3: Create a GKE Cluster

We will use Google Kubernetes Engine (GKE) to deploy the model.

Enable GKE API

gcloud services enable container.googleapis.com

Create a GKE Cluster

gcloud container clusters create openthinker-cluster \
	--zone us-central1-a \
	--num-nodes 2 \
	--machine-type n1-standard-4

This command creates a 2-node cluster in us-central1-a using n1-standard-4 instances.

Connect to the Cluster

gcloud container clusters get-credentials openthinker-cluster --zone us-central1-a

Step 4: Deploy OpenThinker 7B on GKE

Create a Kubernetes Deployment YAML File

Create a new file called openthinker deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: openthinker-deployment
spec:
  replicas: 1
  selector:
	matchLabels:
  	app: openthinker
  template:
	metadata:
  	labels:
    	app: openthinker
	spec:
  	containers:
  	- name: openthinker
    	image: gcr.io/<project-id>/openthinker-7b:latest
    	ports:
    	- containerPort: 11434
    	resources:
      	limits:
        	memory: "8Gi"
        	cpu: "2"

Apply the Deployment

kubectl apply -f openthinker-deployment.yaml

Step 5: Expose the Deployment

To allow external access to OpenThinker 7B, create a Kubernetes Service.

Create a Service YAML File

Create a new file called openthinker-service.yaml:

apiVersion: v1
kind: Service
metadata:
  name: openthinker-service
spec:
  type: LoadBalancer
  selector:
	app: openthinker
  ports:
	- protocol: TCP
  	port: 80
  	targetPort: 11434

Apply the Service

kubectl apply -f openthinker-service.yaml

This command exposes OpenThinker via a LoadBalancer, which assigns a public IP.

Step 6: Verify Deployment

Check Running Pods

kubectl get pods

Ensure the pod is running.

Get the External IP

kubectl get service openthinker-service

Look for the EXTERNAL IP field. Once assigned, you can access OpenThinker using:

curl http://<external-ip>

Expected output:

{"message": "Model is up and running"}

Step 7: Scaling the Model (Optional)

To handle high traffic, increase the number of replicas:

Update the Replica Count

kubectl scale deployment openthinker-deployment --replicas=3

Enable Auto Scaling

kubectl autoscale deployment openthinker-deployment --cpu-percent=70 --min=1 --max=5

This scales the model dynamically based on CPU usage.

Step 8: Cleaning Up Resources (If Needed)

To delete the Kubernetes deployment:

kubectl delete deployment openthinker-deployment
kubectl delete service openthinker-service

To delete the GKE cluster:

gcloud container clusters delete openthinker-cluster --zone us-central1-a

Conclusion

Deploying OpenThinker 7B on Google Cloud Platform (GCP) using GKE allows for scalable, managed deployment. By leveraging Google Kubernetes Engine (GKE), Google Container Registry (GCR) and Load Balancers, the model runs efficiently with minimal manual intervention.

Ready to transform your business with our technology solutions? Contact Us today to Leverage Our AI/ML Expertise.