Stop Paying for API Keys: Run DeepSeek-R1 on AWS Spot Instances for $0.15/Hour
If you are a DevOps engineer or developer, you know the pain: You want to run private, uncensored LLMs (like DeepSeek-R1 or Llama 3) for your internal tools, but the costs are prohibitive.
- OpenAI/Claude APIs: Fast but expensive at scale, and you send your private data to them.
- AWS On-Demand GPUs: A single
g4dn.xlargeinstance costs **~$0.52/hour** ($380/month).
There is a third option that most “AI tutorials” ignore: AWS Spot Instances combined with “stateless” deployment.
In this guide, I will show you how to deploy a private, inference-ready DeepSeek-R1 (Distill-Llama-8B) API on AWS for roughly $0.15 – $0.18 per hour—a 70% discount—using Terraform and Ollama.
The Hardware: Why g4dn.xlarge?
For running 7B to 14B parameter models, you don’t need an H100. You just need enough VRAM to hold the model weights.
| Instance Type | GPU | VRAM | On-Demand Price | Spot Price (Est.) | Good For |
| g4dn.xlarge | NVIDIA T4 | 16 GB | $0.526/hr | **~$0.16/hr** | DeepSeek-R1 8B & 14B (Perfect Fit) |
| g5.xlarge | NVIDIA A10G | 24 GB | $1.006/hr | **~$0.35/hr** | DeepSeek-R1 32B |
| p3.2xlarge | NVIDIA V100 | 16 GB | $3.06/hr | $0.90/hr | Avoid (Too expensive/old) |
Technical Note: The full DeepSeek-R1 is 671B parameters and requires a cluster. We will use the Distilled 8B or 14B versions, which retain most of the reasoning capability but fit on a single T4 GPU.
The Architecture: Handling Spot Interruptions
Spot instances are cheap because AWS can reclaim them with 2 minutes’ notice.
Do not treat this like a pet server. Treat it like a container.
- No Persistent Root Volume: We don’t care if the OS is deleted.
- Startup Script (User Data): Every time the server boots, it will auto-install drivers and pull the model.
- Terraform: To request the instance and handle the “Spot Request” logic.
Step 1: The Terraform Script
Save this as main.tf. It requests the cheapest available GPU spot instance in your region.
Terraform
provider "aws" {
region = "us-east-1" # Change to your region
}
resource "aws_security_group" "llm_sg" {
name = "llm-security-group"
description = "Allow SSH and Ollama API"
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"] # Lock this down to your IP in production!
}
ingress {
from_port = 11434
to_port = 11434
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"] # Ollama API Port
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_spot_instance_request" "llm_worker" {
ami = "ami-0a0e5d9c7acc336f1" # Ubuntu 22.04 Deep Learning AMI (Drivers pre-installed)
instance_type = "g4dn.xlarge"
spot_price = "0.20" # Max price you are willing to pay
key_name = "your-ssh-key-name" # Create this in AWS Console first!
security_groups = [aws_security_group.llm_sg.name]
wait_for_fulfillment = true
spot_type = "one-time"
# The Magic: This script runs on boot
user_data = file("install_ollama.sh")
tags = {
Name = "DeepSeek-Spot-Instance"
}
}
output "instance_ip" {
value = aws_spot_instance_request.llm_worker.public_ip
}
Critical Tip: Instead of using a blank Ubuntu AMI and struggling with NVIDIA drivers, use the AWS Deep Learning AMI (Ubuntu 22.04). It comes with CUDA pre-installed, saving you 20 minutes of boot time.
Step 2: The “User Data” Automation
Create a file named install_ollama.sh. This script runs automatically when the instance launches.
#!/bin/bash
# 1. Update and install basic tools
apt-get update
apt-get install -y curl
# 2. Install Ollama (The easiest way to run LLMs)
curl -fsSL https://ollama.com/install.sh | sh
# 3. Configure Ollama to listen on all interfaces (so you can access it remotely)
mkdir -p /etc/systemd/system/ollama.service.d
echo '[Service]
Environment="OLLAMA_HOST=0.0.0.0"' > /etc/systemd/system/ollama.service.d/override.conf
systemctl daemon-reload
systemctl restart ollama
# 4. Wait for Ollama to start, then pull the model
# We use "nohup" so this continues even if the script exits
nohup bash -c 'sleep 10 && ollama pull deepseek-r1:8b' &
Step 3: Deploy & Verify
- Initialize Terraform:
terraform init - Apply:
terraform apply - Wait ~2 minutes. Terraform will output your
instance_ip.
SSH into the box to check the progress:
Bash
ssh -i your-key.pem ubuntu@<INSTANCE_IP>
tail -f /var/log/cloud-init-output.log
Once you see “success”, test the API from your local laptop:
curl http://<INSTANCE_IP>:11434/api/generate -d '{
"model": "deepseek-r1:8b",
"prompt": "Write a python script to parse AWS CloudTrail logs",
"stream": false
}'
If you get a JSON response with code, congratulations. You now have your own private DeepSeek API running for pennies per hour.
Optimization: How to Stop “Re-Downloading” the Model
The script above downloads the 5GB model every time the Spot instance starts. This wastes bandwidth and takes ~3 minutes.
The “Pro” Fix:
- Create an AWS EBS Volume (100GB) manually.
- Update your Terraform to attach this volume to the Spot Instance at
/var/lib/ollama. - Now, the model files persist. Even if AWS kills your Spot instance, your data is safe on the EBS volume. When you launch a new one, it detects the existing files and is ready instantly.
Conclusion
By moving from On-Demand to Spot, you reduce your AI lab costs from $380/month to roughly $110/month (if running 24/7), or just $0.15 for a quick 1-hour experiment.For DevOps engineers, this is the most cost-effective way to experiment with private AI agents, log analysis, and code generation without leaking company data to public APIs.