AI/ML

Deploy LLaMA 3.3 in Docker with Ollama on AWS EC2 - Step by Step Guide

Free Installation Guide - Step by Step Instructions Inside!

Introduction

LLaMA 3.3 is a cutting edge AI model designed for text generation. Deploying it in a Docker container on an AWS EC2 instance ensures scalability, portability and ease of maintenance.

Prerequisites

Before starting, ensure you have:

An AWS EC2 instance (Ubuntu recommended) with Docker installed.
SSH access to the instance.

Step 1: SSH into Your EC2 Instance

To begin, connect to your EC2 instance using SSH:

ssh -i your-key.pem ubuntu@your-ec2-public-ip

Once logged in, update the system:

sudo apt update && sudo apt upgrade -y

Step 2: Start an Ollama Docker Container

Deploy the Ollama container with persistent storage:

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Step 3: Enter the Container

Access the running container using:

docker exec -it ollama /bin/bash

Step 4: Fetch LLaMA 3.3 Model

Inside the container, download the LLaMA 3.3 model:

ollama pull llama3.3

Step 5: Running the Model

To start generating text, execute:

ollama run llama3.3

Test it with an example query:

>>> Explain quantum computing in simple terms.

Step 6: Deploy a Web UI for Interaction

For an intuitive interface, deploy Open WebUI:

docker run -d -p 3000:8080 -e
OLLAMA_BASE_URL=http://<YOUR-IP>:11434 -v
open-webui:/app/backend/data --name open-webui --restart always

ghcr.io/open-webui/open-webui:main

Now, open http://<YOUR-IP>:3000 in your browser to interact with LLaMA 3.3.

Conclusion

With Docker and Ollama, running LLaMA 3.3 on an AWS EC2 instance is simple and efficient. This setup ensures scalability and flexibility for AI-driven text generation.

Ready to elevate your business with cutting edge AI and ML solutions? Contact us today to harness the power of our expert technology services and drive innovation.