Why Paperless-ngx?
Stacks of paper, scattered PDFs, endless email attachments — Paperless-ngx tames the chaos:
- OCR everything — Extract text from scans, photos, and PDFs.
- Auto-classify — Rules assign tags, correspondents, and types automatically.
- Full-text search — Find any document by its content in seconds.
- Email ingestion — Automatically import email attachments.
- Web UI — Modern, responsive dashboard with previews.
Prerequisites
- Docker with docker-compose.
- At least 1 GB RAM (Tesseract OCR is RAM-hungry).
- Storage space for your documents.
Step 1: Deploy with Docker Compose
# docker-compose.yml
version: "3"
services:
paperless-redis:
image: redis:7
restart: always
paperless-db:
image: postgres:16
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: changeme
volumes:
- pgdata:/var/lib/postgresql/data
restart: always
paperless:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
depends_on:
- paperless-redis
- paperless-db
ports:
- "8000:8000"
environment:
PAPERLESS_REDIS: redis://paperless-redis:6379
PAPERLESS_DBHOST: paperless-db
PAPERLESS_ADMIN_USER: admin
PAPERLESS_ADMIN_PASSWORD: changeme
PAPERLESS_OCR_LANGUAGE: eng+spa
volumes:
- ./data:/usr/src/paperless/data
- ./media:/usr/src/paperless/media
- ./consume:/usr/src/paperless/consume
restart: always
volumes:
pgdata:
docker compose up -d
Access at http://your-server:8000.
Step 2: Consumption Workflow
Drop files into the consume/ directory:
cp invoice-2026-02.pdf /path/to/consume/
Paperless automatically:
- Detects the new file.
- Runs OCR (Tesseract) to extract text.
- Applies matching rules to assign tags/correspondent/type.
- Stores the original and creates a searchable archive version.
Step 3: Auto-Tagging Rules
| Rule Type | Example | Use Case |
|---|---|---|
| Tag | Contains “electricity” → tag “Utilities” | Categorize by topic |
| Correspondent | Contains “Telmex” → correspondent “Telmex” | Identify sender |
| Document Type | Contains “factura” → type “Invoice” | Classify document kind |
| Storage Path | Year/Correspondent/ | Organize filesystem |
Step 4: Email Ingestion
# Add to docker-compose environment
PAPERLESS_EMAIL_HOST: imap.gmail.com
PAPERLESS_EMAIL_PORT: 993
PAPERLESS_EMAIL_USERNAME: docs@example.com
PAPERLESS_EMAIL_PASSWORD: app-password
PAPERLESS_EMAIL_INBOX: INBOX
Paperless checks for new emails with attachments and ingests them automatically.
Troubleshooting
| Problem | Solution |
|---|---|
| OCR produces garbage text | Install the correct language pack: PAPERLESS_OCR_LANGUAGE: eng+spa+deu |
| Document stuck in “Processing” | Check container logs: docker compose logs paperless; usually a Tesseract crash on corrupt files |
| Duplicate documents detected | Paperless has built-in duplicate detection via content hash — this is expected behavior |
| Search returns no results | Rebuild the search index: docker compose exec paperless document_index reindex |
| Email ingestion not working | Test IMAP credentials manually; ensure “Less secure apps” or app-specific password is configured |
Summary
- Drop files into a folder — Paperless handles OCR and classification.
- Matching rules auto-tag documents by content, saving manual work.
- Full-text search finds any document in seconds.
- Email ingestion automates document input.