AI/ML

How to Deploy Kimi K2 on AWS ECS or EKS (Kubernetes) - Full Step by Step Guide

Option 1: Deploy Kimi K2 on AWS EKS (Elastic Kubernetes Service)

Prerequisites:

  • AWS account

  • IAM role with EKS and EC2 permissions

  • GPU-enabled EC2 instance types (e.g., p3.2xlarge)

  • Kubectl, eksctl, Helm installed

  • Docker & Git installed locally

Step 1: Create an EKS Cluster with GPU Nodes

eksctl create cluster \
  --name kimi-k2-cluster \
  --region us-east-1 \
  --nodegroup-name gpu-nodes \
  --node-type p3.2xlarge \
  --nodes 2 \
  --nodes-min 1 \
  --nodes-max 3 \
  --managed

Step 2: Build and Push Docker Image for Kimi K2

git clone https://huggingface.co/moonshotai/Kimi-K2-Instruct
cd Kimi-K2-Instruct

Dockerfile:

FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu20.04
RUN apt update && apt install -y python3 python3-pip git
RUN pip3 install torch torchvision transformers accelerate huggingface_hub
WORKDIR /app
COPY . .
CMD ["python3", "app.py"]
aws ecr create-repository --repository-name kimi-k2
$(aws ecr get-login --no-include-email --region us-east-1)
docker build -t kimi-k2 .
docker tag kimi-k2:latest <account-id>.dkr.ecr.us-east-1.amazonaws.com/kimi-k2:latest
docker push <account-id>.dkr.ecr.us-east-1.amazonaws.com/kimi-k2:latest

Step 3: Create Kubernetes YAML Deployment Files

deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kimi-k2
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kimi-k2
  template:
    metadata:
      labels:
        app: kimi-k2
    spec:
      containers:
      - name: kimi-k2
        image: <your-ecr-repo-url>
        resources:
          limits:
            nvidia.com/gpu: 1
        ports:
        - containerPort: 7860

service.yaml:

apiVersion: v1
kind: Service
metadata:
  name: kimi-k2-service
spec:
  type: LoadBalancer
  selector:
    app: kimi-k2
  ports:
    - protocol: TCP
      port: 80
      targetPort: 7860
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml

Step 4: Access Kimi K2 on Public IP

kubectl get svc kimi-k2-service

Option 2: Deploy Kimi K2 on AWS ECS (Elastic Container Service)

Prerequisites:

  • AWS CLI configured

  • IAM roles for ECS + ECR

  • Docker installed

  • ECS Fargate or EC2 cluster created

  • ECR repository created

Step 1: Build Docker Image

Same steps as above. Push image to Amazon ECR.

Step 2: Create ECS Task Definition

task-definition.json:

{
  "family": "kimi-k2-task",
  "containerDefinitions": [
    {
      "name": "kimi-k2",
      "image": "<account-id>.dkr.ecr.us-east-1.amazonaws.com/kimi-k2:latest",
      "memory": 30720,
      "cpu": 2048,
      "essential": true,
      "portMappings": [
        {
          "containerPort": 7860,
          "hostPort": 7860
        }
      ]
    }
  ],
  "requiresCompatibilities": ["EC2"],
  "networkMode": "bridge",
  "cpu": "2048",
  "memory": "30720"
}
aws ecs register-task-definition --cli-input-json file://task-definition.json

Step 3: Run Task on ECS Cluster

aws ecs run-task \
  --cluster kimi-k2-cluster \
  --launch-type EC2 \
  --task-definition kimi-k2-task \
  --count 1

Final Thoughts

Deploying Kimi K2 on EKS or ECS gives you the power to scale open-source LLMs efficiently in the cloud.

Kubernetes allows for autoscaling, GPU scheduling and production-grade LLM APIs all while keeping you in control.

Need enterprise grade deployment or DevOps help? Contact our AI DevOps experts at OneClick IT Consultancy.

Contact Us

0

Comment

1k

Share

facebook
LinkedIn
Twitter
Mail
AI/ML

Related Center Of Excellence