TL;DR — Quick Summary

Ollama makes running LLMs locally as easy as Docker. Pull Llama 3, Mistral, Gemma, or Code Llama and chat privately — no API keys, no cloud, full privacy.

Ollama gives you private, local AI without sending a single byte to the cloud. Running Llama 3, Mistral, or Code Llama is as simple as running a Docker container — one command to download, one command to chat.

Installation

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# macOS
brew install ollama

# Start the server (runs in background)
ollama serve

# Verify
ollama --version

Quick Start

# Pull and run a model
ollama run llama3.1
# You're now chatting with Llama 3.1 locally!

# Pull specific models
ollama pull mistral       # 7B general purpose
ollama pull codellama     # Code generation
ollama pull gemma2        # Google's model
ollama pull phi3          # Microsoft's small model
ollama pull mixtral       # Mixture of experts
ollama pull llama3.1:70b  # Large model (needs 64GB+ RAM)

API Usage

# Chat completion (OpenAI-compatible)
curl http://localhost:11434/v1/chat/completions \
  -d '{
    "model": "llama3.1",
    "messages": [{"role": "user", "content": "Explain Docker in 3 sentences"}]
  }'

# Generate (simple)
curl http://localhost:11434/api/generate \
  -d '{"model": "llama3.1", "prompt": "Write a haiku about coding"}'

# List installed models
ollama list

# Show model details
ollama show llama3.1

Custom Models (Modelfile)

FROM llama3.1

SYSTEM """
You are a senior DevOps engineer. Answer questions about Docker,
Kubernetes, CI/CD, and infrastructure. Be concise and practical.
"""

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 4096

Build and run:

ollama create devops-assistant -f Modelfile
ollama run devops-assistant
ModelSizeUse Case
llama3.18BGeneral purpose
llama3.1:70b70BHigh quality
mistral7BFast, general
mixtral47BMixture of experts
codellama7-34BCode generation
gemma29B-27BGoogle’s model
phi33.8BSmall, fast
deepseek-coder6.7BCode

Summary

  • Ollama runs LLMs locally with one command — no cloud, no API keys, full privacy
  • Pull models like Docker images: ollama pull llama3.1
  • OpenAI-compatible REST API on localhost:11434 for application integration
  • GPU acceleration (NVIDIA, AMD, Apple Metal) used automatically when available
  • Custom models via Modelfiles with system prompts and parameter tuning