Why Paperless-ngx?

Stacks of paper, scattered PDFs, endless email attachments — Paperless-ngx tames the chaos:

  • OCR everything — Extract text from scans, photos, and PDFs.
  • Auto-classify — Rules assign tags, correspondents, and types automatically.
  • Full-text search — Find any document by its content in seconds.
  • Email ingestion — Automatically import email attachments.
  • Web UI — Modern, responsive dashboard with previews.

Prerequisites

  • Docker with docker-compose.
  • At least 1 GB RAM (Tesseract OCR is RAM-hungry).
  • Storage space for your documents.

Step 1: Deploy with Docker Compose

# docker-compose.yml
version: "3"
services:
  paperless-redis:
    image: redis:7
    restart: always

  paperless-db:
    image: postgres:16
    environment:
      POSTGRES_DB: paperless
      POSTGRES_USER: paperless
      POSTGRES_PASSWORD: changeme
    volumes:
      - pgdata:/var/lib/postgresql/data
    restart: always

  paperless:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    depends_on:
      - paperless-redis
      - paperless-db
    ports:
      - "8000:8000"
    environment:
      PAPERLESS_REDIS: redis://paperless-redis:6379
      PAPERLESS_DBHOST: paperless-db
      PAPERLESS_ADMIN_USER: admin
      PAPERLESS_ADMIN_PASSWORD: changeme
      PAPERLESS_OCR_LANGUAGE: eng+spa
    volumes:
      - ./data:/usr/src/paperless/data
      - ./media:/usr/src/paperless/media
      - ./consume:/usr/src/paperless/consume
    restart: always

volumes:
  pgdata:
docker compose up -d

Access at http://your-server:8000.


Step 2: Consumption Workflow

Drop files into the consume/ directory:

cp invoice-2026-02.pdf /path/to/consume/

Paperless automatically:

  1. Detects the new file.
  2. Runs OCR (Tesseract) to extract text.
  3. Applies matching rules to assign tags/correspondent/type.
  4. Stores the original and creates a searchable archive version.

Step 3: Auto-Tagging Rules

Rule TypeExampleUse Case
TagContains “electricity” → tag “Utilities”Categorize by topic
CorrespondentContains “Telmex” → correspondent “Telmex”Identify sender
Document TypeContains “factura” → type “Invoice”Classify document kind
Storage PathYear/Correspondent/Organize filesystem

Step 4: Email Ingestion

# Add to docker-compose environment
PAPERLESS_EMAIL_HOST: imap.gmail.com
PAPERLESS_EMAIL_PORT: 993
PAPERLESS_EMAIL_USERNAME: docs@example.com
PAPERLESS_EMAIL_PASSWORD: app-password
PAPERLESS_EMAIL_INBOX: INBOX

Paperless checks for new emails with attachments and ingests them automatically.


Troubleshooting

ProblemSolution
OCR produces garbage textInstall the correct language pack: PAPERLESS_OCR_LANGUAGE: eng+spa+deu
Document stuck in “Processing”Check container logs: docker compose logs paperless; usually a Tesseract crash on corrupt files
Duplicate documents detectedPaperless has built-in duplicate detection via content hash — this is expected behavior
Search returns no resultsRebuild the search index: docker compose exec paperless document_index reindex
Email ingestion not workingTest IMAP credentials manually; ensure “Less secure apps” or app-specific password is configured

Summary

  • Drop files into a folder — Paperless handles OCR and classification.
  • Matching rules auto-tag documents by content, saving manual work.
  • Full-text search finds any document in seconds.
  • Email ingestion automates document input.