TL;DR — Quick Summary
Ollama makes running LLMs locally as easy as Docker. Pull Llama 3, Mistral, Gemma, or Code Llama and chat privately — no API keys, no cloud, full privacy.
Ollama gives you private, local AI without sending a single byte to the cloud. Running Llama 3, Mistral, or Code Llama is as simple as running a Docker container — one command to download, one command to chat.
Installation
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# macOS
brew install ollama
# Start the server (runs in background)
ollama serve
# Verify
ollama --version
Quick Start
# Pull and run a model
ollama run llama3.1
# You're now chatting with Llama 3.1 locally!
# Pull specific models
ollama pull mistral # 7B general purpose
ollama pull codellama # Code generation
ollama pull gemma2 # Google's model
ollama pull phi3 # Microsoft's small model
ollama pull mixtral # Mixture of experts
ollama pull llama3.1:70b # Large model (needs 64GB+ RAM)
API Usage
# Chat completion (OpenAI-compatible)
curl http://localhost:11434/v1/chat/completions \
-d '{
"model": "llama3.1",
"messages": [{"role": "user", "content": "Explain Docker in 3 sentences"}]
}'
# Generate (simple)
curl http://localhost:11434/api/generate \
-d '{"model": "llama3.1", "prompt": "Write a haiku about coding"}'
# List installed models
ollama list
# Show model details
ollama show llama3.1
Custom Models (Modelfile)
FROM llama3.1
SYSTEM """
You are a senior DevOps engineer. Answer questions about Docker,
Kubernetes, CI/CD, and infrastructure. Be concise and practical.
"""
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 4096
Build and run:
ollama create devops-assistant -f Modelfile
ollama run devops-assistant
Popular Models
| Model | Size | Use Case |
|---|---|---|
llama3.1 | 8B | General purpose |
llama3.1:70b | 70B | High quality |
mistral | 7B | Fast, general |
mixtral | 47B | Mixture of experts |
codellama | 7-34B | Code generation |
gemma2 | 9B-27B | Google’s model |
phi3 | 3.8B | Small, fast |
deepseek-coder | 6.7B | Code |
Summary
- Ollama runs LLMs locally with one command — no cloud, no API keys, full privacy
- Pull models like Docker images: ollama pull llama3.1
- OpenAI-compatible REST API on localhost:11434 for application integration
- GPU acceleration (NVIDIA, AMD, Apple Metal) used automatically when available
- Custom models via Modelfiles with system prompts and parameter tuning