TL;DR — Quick Summary

Set up Grafana Loki for centralized log aggregation with Promtail. Configure LogQL queries, dashboards, alerting, and multi-tenant log pipelines in production.

Grafana Loki is a horizontally scalable log aggregation system built on the same principles as Prometheus — it indexes only labels, not the full log text, which makes it dramatically cheaper to operate than the ELK stack. This guide covers Loki’s architecture, installation via Docker Compose, Promtail configuration, LogQL query language, Grafana dashboard building, alerting, multi-tenancy, and production sizing.

Prerequisites

  • Docker and Docker Compose (or a Kubernetes cluster for Helm-based install).
  • Grafana 10+ (or Grafana Cloud) for visualization.
  • Basic familiarity with YAML configuration and command-line tools.

Loki Architecture

Loki splits its work across four main components:

  • Distributor — Receives log streams from Promtail/agents, validates labels, and fans out to ingesters via consistent hashing.
  • Ingester — Holds recent logs in memory (configurable chunk size), flushes compressed chunks to object storage, and maintains a WAL for crash safety.
  • Querier — Executes LogQL queries across both ingested (in-memory) and stored (object storage) chunks.
  • Compactor — Merges and deduplicates the boltdb-shipper index tables and enforces retention policies.

In monolithic mode (the default for single-node installs), all four components run in a single binary — perfect for most teams starting out. When log volume exceeds ~50 GB/day, switch to microservices mode where each component scales independently.

The key design insight: Loki never builds an inverted index of log content. A query for |= "OutOfMemoryError" does a distributed grep across the matching chunks. This trades CPU on query time for massive storage savings.


Why Loki Instead of ELK?

FeatureLokiElasticsearch
Indexing modelLabels onlyFull-text inverted index
Storage cost~10x lessHigh (index + data)
Query languageLogQL (PromQL-like)Lucene / KQL
Native GrafanaYes — first-classVia plugin
Horizontal scaleYes (label-sharding)Yes (shards)
Schema changesNone neededRe-index required
AlertingLoki ruler + AlertmanagerElastAlert / SIEM
Kubernetes-nativeYes (Helm chart)Yes (ECK operator)

Step 1: Install with Docker Compose

Create a project directory and the following docker-compose.yml:

version: "3.8"
services:
  loki:
    image: grafana/loki:3.0.0
    ports:
      - "3100:3100"
    volumes:
      - ./loki-config.yaml:/etc/loki/local-config.yaml
      - loki-data:/loki
    command: -config.file=/etc/loki/local-config.yaml

  promtail:
    image: grafana/promtail:3.0.0
    volumes:
      - ./promtail-config.yaml:/etc/promtail/config.yaml
      - /var/log:/var/log:ro
      - /run/log/journal:/run/log/journal:ro
    command: -config.file=/etc/promtail/config.yaml

  grafana:
    image: grafana/grafana:10.4.0
    ports:
      - "3000:3000"
    environment:
      GF_AUTH_ANONYMOUS_ENABLED: "true"
      GF_AUTH_ANONYMOUS_ORG_ROLE: Admin

volumes:
  loki-data:

Step 2: Configure Loki

Create loki-config.yaml:

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

ingester:
  lifecycler:
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
  chunk_idle_period: 5m
  chunk_retain_period: 30s
  wal:
    enabled: true
    dir: /loki/wal

schema_config:
  configs:
    - from: 2024-01-01
      store: boltdb-shipper
      object_store: filesystem
      schema: v12
      index:
        prefix: index_
        period: 24h

storage_config:
  boltdb_shipper:
    active_index_directory: /loki/index
    cache_location: /loki/boltdb-cache
    shared_store: filesystem
  filesystem:
    directory: /loki/chunks

compactor:
  working_directory: /loki/compactor
  shared_store: filesystem
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h

limits_config:
  retention_period: 30d
  ingestion_rate_mb: 16
  ingestion_burst_size_mb: 32
  max_query_length: 721h

Key settings to tune for production:

  • Set auth_enabled: true and pass X-Scope-OrgID headers for multi-tenancy.
  • Replace filesystem with s3 or gcs for cloud-native object storage.
  • Adjust retention_period per tenant via per_tenant_override_config.

Step 3: Configure Promtail

Create promtail-config.yaml:

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  # Systemd journal
  - job_name: journal
    journal:
      max_age: 12h
      labels:
        job: systemd-journal
        host: __HOSTNAME__
    relabel_configs:
      - source_labels: [__journal__systemd_unit]
        target_label: unit

  # Nginx access logs
  - job_name: nginx
    static_configs:
      - targets: [localhost]
        labels:
          job: nginx
          env: prod
          __path__: /var/log/nginx/access.log

  # Docker container logs via file
  - job_name: docker
    static_configs:
      - targets: [localhost]
        labels:
          job: docker
          __path__: /var/lib/docker/containers/*/*-json.log
    pipeline_stages:
      - json:
          expressions:
            log: log
            stream: stream
      - output:
          source: log

Start the stack:

docker compose up -d
docker compose logs -f loki

Step 4: LogQL Query Language

LogQL has two forms — log queries and metric queries.

Stream selectors (required, always first):

{job="nginx"}
{job="nginx", env="prod"}
{job=~"nginx|apache"}

Filter expressions (pipe after selector):

{job="nginx"} |= "error"
{job="nginx"} != "health_check"
{job="nginx"} |~ "5[0-9]{2}"

Parser expressions — extract fields from structured logs:

{job="app"} | json
{job="app"} | logfmt
{job="app"} | json | level="error"
{job="nginx"} | pattern `<ip> - - [<_>] "<method> <uri> <_>" <status> <_>`

Metric queries — turn logs into time-series:

rate({job="nginx"} |= "error" [5m])
count_over_time({job="app"} | json | level="error" [1h])
sum by (status) (rate({job="nginx"} | logfmt | status=~"5.." [5m]))
topk(5, sum by (uri) (rate({job="nginx"}[10m])))

Step 5: Grafana Data Source and Dashboards

  1. In Grafana, go to Connections → Data sources → Add data source → Loki.
  2. Set URL to http://loki:3100 (or your Loki address).
  3. Click Save & test — you should see “Data source connected.”

Explore panel tips:

  • Use Live tail (the lightning icon) to stream logs in real time — ideal for debugging deployments.
  • Click any log line to see log context — lines before and after the match across the same stream.
  • Switch to Metrics tab to visualize rate() queries as time-series panels.

Building a log dashboard:

  • Add a Logs panel with {job="nginx"} — set “Visualize logs” to show volume histogram above the log table.
  • Add a Time series panel with sum by (status) (rate({job="nginx"} | logfmt | status!="" [5m])) for HTTP status distribution.
  • Use Stat panels with count_over_time({job="app"} | json | level="error" [24h]) for error rate KPIs.

Step 6: Alerting with Loki Ruler

Add a ruler block to loki-config.yaml:

ruler:
  storage:
    type: local
    local:
      directory: /loki/rules
  rule_path: /loki/rules-temp
  alertmanager_url: http://alertmanager:9093
  ring:
    kvstore:
      store: inmemory
  enable_api: true

Create /loki/rules/fake/rules.yaml (tenant fake for auth_enabled: false):

groups:
  - name: app-alerts
    interval: 1m
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate({job="app"} | json | level="error" [5m])) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High error rate in app logs"
      - record: job:loki_errors:rate5m
        expr: |
          sum by (job) (rate({job=~".+"} |= "error" [5m]))

Loki vs. Alternatives Comparison

ToolStorage modelSelf-hostedCostGrafana nativeBest for
LokiLabel index + compressed chunksYesLowYesTeams already using Grafana/Prometheus
ElasticsearchFull-text inverted indexYesHighVia pluginFull-text search, SIEM
FluentdAgent only (no storage)YesFreeNoLog routing and transformation
Datadog LogsSaaS, proprietaryNoHighNoEnterprise, full observability
CloudWatch LogsSaaS (AWS)NoMediumVia pluginAWS-native workloads

Multi-Tenancy

Enable multi-tenancy by setting auth_enabled: true in Loki. Each request must include X-Scope-OrgID: <tenant-id>. Configure Promtail to send the header:

clients:
  - url: http://loki:3100/loki/api/v1/push
    tenant_id: team-backend

Set per-tenant limits in a per_tenant_override_config file:

overrides:
  team-backend:
    retention_period: 90d
    ingestion_rate_mb: 32
  team-frontend:
    retention_period: 14d
    ingestion_rate_mb: 8

Docker Log Driver

Ship Docker container logs directly to Loki without Promtail using the Loki Docker plugin:

docker plugin install grafana/loki-docker-driver:3.0.0 \
  --alias loki --grant-all-permissions

Configure per-container in docker-compose.yml:

services:
  myapp:
    image: myapp:latest
    logging:
      driver: loki
      options:
        loki-url: "http://loki:3100/loki/api/v1/push"
        loki-labels: "job=myapp,env=prod"
        loki-retries: "3"

Production Sizing Recommendations

Log volumeArchitectureStorageRAM
< 5 GB/dayMonolithic, single nodeLocal filesystem2 GB
5–50 GB/dayMonolithic + S3/GCSObject storage4–8 GB
50–500 GB/dayMicroservices, 3 ingestersObject storage + TSDB16–32 GB
> 500 GB/dayMicroservices + Thanos rulerObject storage, multi-AZ64 GB+

Always enable the WAL (ingester.wal.enabled: true) in production — it prevents data loss during ingester restarts. Use chunk caching (memcached or Redis) to reduce redundant object storage reads for frequently queried time ranges.


Troubleshooting

ProblemSolution
connection refused on pushVerify Loki container is running and port 3100 is reachable from Promtail
context deadline exceeded queriesIncrease querier.query_timeout and split queries to shorter time ranges
Missing logs from DockerCheck Promtail positions.yaml for the container log path; verify volume mounts
High ingester memoryReduce chunk_idle_period to flush chunks to storage sooner
Retention not workingEnsure compactor.retention_enabled: true and limits_config.retention_period is set
Duplicate log entriesSet Promtail positions.filename to a persistent path and avoid restarting with stale positions

Summary

  • Label-only indexing makes Loki 10x cheaper than Elasticsearch for the same log volume.
  • Promtail ships logs from files, systemd journal, and Docker to Loki with label enrichment.
  • LogQL supports stream filtering, JSON/logfmt parsing, rate calculations, and aggregation.
  • Loki ruler evaluates LogQL alerts on a schedule and routes to Alertmanager.
  • Multi-tenancy is enabled with auth_enabled: true and X-Scope-OrgID headers.
  • For production, replace filesystem storage with S3/GCS and enable WAL + chunk caching.