Ollama: Run Large Language Models Locally on Your Machine

TL;DR — Quick Summary

Ollama makes running LLMs locally as easy as Docker. Pull Llama 3, Mistral, Gemma, or Code Llama and chat privately — no API keys, no cloud, full privacy.

Ollama gives you private, local AI without sending a single byte to the cloud. Running Llama 3, Mistral, or Code Llama is as simple as running a Docker container — one command to download, one command to chat.

Installation

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# macOS
brew install ollama

# Start the server (runs in background)
ollama serve

# Verify
ollama --version

Quick Start

# Pull and run a model
ollama run llama3.1
# You're now chatting with Llama 3.1 locally!

# Pull specific models
ollama pull mistral       # 7B general purpose
ollama pull codellama     # Code generation
ollama pull gemma2        # Google's model
ollama pull phi3          # Microsoft's small model
ollama pull mixtral       # Mixture of experts
ollama pull llama3.1:70b  # Large model (needs 64GB+ RAM)

API Usage

# Chat completion (OpenAI-compatible)
curl http://localhost:11434/v1/chat/completions \
  -d '{
    "model": "llama3.1",
    "messages": [{"role": "user", "content": "Explain Docker in 3 sentences"}]
  }'

# Generate (simple)
curl http://localhost:11434/api/generate \
  -d '{"model": "llama3.1", "prompt": "Write a haiku about coding"}'

# List installed models
ollama list

# Show model details
ollama show llama3.1

Custom Models (Modelfile)

FROM llama3.1

SYSTEM """
You are a senior DevOps engineer. Answer questions about Docker,
Kubernetes, CI/CD, and infrastructure. Be concise and practical.
"""

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 4096

Build and run:

ollama create devops-assistant -f Modelfile
ollama run devops-assistant

Popular Models

Model	Size	Use Case
`llama3.1`	8B	General purpose
`llama3.1:70b`	70B	High quality
`mistral`	7B	Fast, general
`mixtral`	47B	Mixture of experts
`codellama`	7-34B	Code generation
`gemma2`	9B-27B	Google’s model
`phi3`	3.8B	Small, fast
`deepseek-coder`	6.7B	Code

Summary

Ollama runs LLMs locally with one command — no cloud, no API keys, full privacy
Pull models like Docker images: ollama pull llama3.1
OpenAI-compatible REST API on localhost:11434 for application integration
GPU acceleration (NVIDIA, AMD, Apple Metal) used automatically when available
Custom models via Modelfiles with system prompts and parameter tuning

Docker Container Management

Guide & Instructions

Estimated Time: 10m

Tools Needed:

Ollama
Terminal
GPU (optional)

Install Ollama

Linux: curl -fsSL https://ollama.com/install.sh | sh. macOS: brew install ollama or download from ollama.com. Windows: download installer.

Pull a model

Run ollama pull llama3.1 to download Llama 3.1. Other models: mistral, gemma2, codellama, phi3, mixtral.

Chat with a model

Run ollama run llama3.1 to start an interactive chat session. Type your prompt and get responses.

Use the API

Ollama serves an OpenAI-compatible API on localhost:11434. Use curl or any HTTP client to send prompts programmatically.

Create custom models

Create a Modelfile with a FROM base model, SYSTEM prompt, and PARAMETER settings. Build with ollama create mymodel -f Modelfile.

Frequently Asked Questions

What is Ollama?

Ollama is a tool that makes running large language models (LLMs) locally as simple as docker pull and docker run. Download models like Llama 3, Mistral, Gemma, Phi-3, and Code Llama with a single command and run them entirely on your hardware.

What hardware do I need?

For 7B models: 8GB RAM minimum. For 13B models: 16GB RAM. For 70B models: 64GB RAM or a GPU with sufficient VRAM. Ollama automatically uses GPU acceleration (NVIDIA CUDA, AMD ROCm, Apple Metal) when available.

Is Ollama free?

Yes, Ollama is free and open source. The models are also freely available. There are no API charges — everything runs on your hardware.

Can I use Ollama as an API?

Yes, Ollama exposes an OpenAI-compatible REST API on localhost:11434. You can use it as a drop-in replacement for OpenAI in your applications by changing the base URL.