TL;DR — Quick Summary
Run OpenAI's Whisper speech-to-text model locally for free, private audio transcription. Covers CLI, Docker, GPU acceleration, Whisper.cpp for CPU, faster-whisper, and web UI options.
What Is Whisper?
Whisper is OpenAI’s open-source automatic speech recognition (ASR) model. It can transcribe audio in 99 languages, translate speech to English, and generate subtitles with accurate timestamps — all running locally on your own hardware, completely free and private.
Key features:
- 99 languages — transcripts in the original language or translated to English
- Multiple model sizes — from tiny (75MB) to large-v3 (3GB)
- Subtitle generation — SRT, VTT, and TSV formats with timestamps
- No API needed — runs entirely offline
- GPU acceleration — CUDA for NVIDIA GPUs
- Multiple implementations — Python, C++ (whisper.cpp), faster-whisper
Model Comparison
| Model | Size | VRAM | English WER | Speed (GPU) | Speed (CPU) |
|---|---|---|---|---|---|
tiny | 75 MB | ~1 GB | 8.0% | ~32x realtime | ~10x realtime |
base | 142 MB | ~1 GB | 5.7% | ~16x realtime | ~7x realtime |
small | 466 MB | ~2 GB | 4.2% | ~6x realtime | ~2x realtime |
medium | 1.5 GB | ~5 GB | 3.5% | ~2x realtime | ~0.5x realtime |
large-v3 | 3 GB | ~6 GB | 2.9% | ~1x realtime | ~0.1x realtime |
Tip: For most use cases,
smallormediumoffers the best accuracy vs speed tradeoff. Uselarge-v3only when accuracy is critical.
Installation
Python (Original)
pip install openai-whisper
# Transcribe
whisper audio.mp3 --model base
# Transcribe with language detection and SRT output
whisper interview.wav --model small --output_format srt
# Translate to English
whisper audio_spanish.mp3 --model medium --task translate
faster-whisper (4x Faster)
pip install faster-whisper
python -c "
from faster_whisper import WhisperModel
model = WhisperModel('base', device='cuda', compute_type='float16')
segments, info = model.transcribe('audio.mp3', beam_size=5)
for segment in segments:
print(f'[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}')
"
whisper.cpp (Best for CPU)
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp && make
# Download a model
bash models/download-ggml-model.sh base
# Transcribe (WAV format required)
./main -m models/ggml-base.bin -f audio.wav -otxt -osrt
Docker
docker run --gpus all -v $(pwd):/data \
onerahmet/openai-whisper-asr-webservice:latest \
whisper /data/audio.mp3 --model base --output_dir /data/
Web UI Options
For a browser-based interface:
| Project | Description | Docker |
|---|---|---|
| Whisper Web | Simple upload and transcribe UI | docker run -p 9000:9000 pluja/whishper |
| Whisper ASR Webservice | REST API with Swagger UI | onerahmet/openai-whisper-asr-webservice |
| Subtitle Edit | Full editor with Whisper integration | Desktop app |
Common Use Cases
| Use Case | Model | Command |
|---|---|---|
| Meeting transcription | small | whisper meeting.mp3 --model small |
| Video subtitles | medium | whisper video.mp4 --model medium --output_format srt |
| Podcast transcription | base | whisper podcast.mp3 --model base --output_format txt |
| Translate foreign audio | medium | whisper foreign.mp3 --model medium --task translate |
| Batch process folder | base | for f in *.mp3; do whisper "$f" --model base; done |
Whisper vs. Cloud Speech-to-Text
| Aspect | Whisper (Local) | Google STT | AWS Transcribe |
|---|---|---|---|
| Cost | Free | $0.006-0.048/min | $0.024/min |
| Privacy | ✅ Data stays local | ❌ Cloud | ❌ Cloud |
| Offline | ✅ | ❌ | ❌ |
| Languages | 99 | 125+ | 100+ |
| Accuracy | Excellent | Excellent | Good |
| Custom vocabulary | ❌ | ✅ | ✅ |
| Real-time streaming | Limited | ✅ | ✅ |
| Speaker diarization | Via plugin | ✅ | ✅ |
Summary
Whisper gives you state-of-the-art speech-to-text transcription running locally, privately, and for free. Use the Python package for quick transcriptions, faster-whisper for GPU-accelerated performance, or whisper.cpp for efficient CPU-only operation.