TL;DR — Quick Summary
Set up Grafana Loki for centralized log aggregation with Promtail. Configure LogQL queries, dashboards, alerting, and multi-tenant log pipelines in production.
Grafana Loki is a horizontally scalable log aggregation system built on the same principles as Prometheus — it indexes only labels, not the full log text, which makes it dramatically cheaper to operate than the ELK stack. This guide covers Loki’s architecture, installation via Docker Compose, Promtail configuration, LogQL query language, Grafana dashboard building, alerting, multi-tenancy, and production sizing.
Prerequisites
- Docker and Docker Compose (or a Kubernetes cluster for Helm-based install).
- Grafana 10+ (or Grafana Cloud) for visualization.
- Basic familiarity with YAML configuration and command-line tools.
Loki Architecture
Loki splits its work across four main components:
- Distributor — Receives log streams from Promtail/agents, validates labels, and fans out to ingesters via consistent hashing.
- Ingester — Holds recent logs in memory (configurable chunk size), flushes compressed chunks to object storage, and maintains a WAL for crash safety.
- Querier — Executes LogQL queries across both ingested (in-memory) and stored (object storage) chunks.
- Compactor — Merges and deduplicates the boltdb-shipper index tables and enforces retention policies.
In monolithic mode (the default for single-node installs), all four components run in a single binary — perfect for most teams starting out. When log volume exceeds ~50 GB/day, switch to microservices mode where each component scales independently.
The key design insight: Loki never builds an inverted index of log content. A query for |= "OutOfMemoryError" does a distributed grep across the matching chunks. This trades CPU on query time for massive storage savings.
Why Loki Instead of ELK?
| Feature | Loki | Elasticsearch |
|---|---|---|
| Indexing model | Labels only | Full-text inverted index |
| Storage cost | ~10x less | High (index + data) |
| Query language | LogQL (PromQL-like) | Lucene / KQL |
| Native Grafana | Yes — first-class | Via plugin |
| Horizontal scale | Yes (label-sharding) | Yes (shards) |
| Schema changes | None needed | Re-index required |
| Alerting | Loki ruler + Alertmanager | ElastAlert / SIEM |
| Kubernetes-native | Yes (Helm chart) | Yes (ECK operator) |
Step 1: Install with Docker Compose
Create a project directory and the following docker-compose.yml:
version: "3.8"
services:
loki:
image: grafana/loki:3.0.0
ports:
- "3100:3100"
volumes:
- ./loki-config.yaml:/etc/loki/local-config.yaml
- loki-data:/loki
command: -config.file=/etc/loki/local-config.yaml
promtail:
image: grafana/promtail:3.0.0
volumes:
- ./promtail-config.yaml:/etc/promtail/config.yaml
- /var/log:/var/log:ro
- /run/log/journal:/run/log/journal:ro
command: -config.file=/etc/promtail/config.yaml
grafana:
image: grafana/grafana:10.4.0
ports:
- "3000:3000"
environment:
GF_AUTH_ANONYMOUS_ENABLED: "true"
GF_AUTH_ANONYMOUS_ORG_ROLE: Admin
volumes:
loki-data:
Step 2: Configure Loki
Create loki-config.yaml:
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
ingester:
lifecycler:
ring:
kvstore:
store: inmemory
replication_factor: 1
chunk_idle_period: 5m
chunk_retain_period: 30s
wal:
enabled: true
dir: /loki/wal
schema_config:
configs:
- from: 2024-01-01
store: boltdb-shipper
object_store: filesystem
schema: v12
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /loki/index
cache_location: /loki/boltdb-cache
shared_store: filesystem
filesystem:
directory: /loki/chunks
compactor:
working_directory: /loki/compactor
shared_store: filesystem
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2h
limits_config:
retention_period: 30d
ingestion_rate_mb: 16
ingestion_burst_size_mb: 32
max_query_length: 721h
Key settings to tune for production:
- Set
auth_enabled: trueand passX-Scope-OrgIDheaders for multi-tenancy. - Replace
filesystemwiths3orgcsfor cloud-native object storage. - Adjust
retention_periodper tenant viaper_tenant_override_config.
Step 3: Configure Promtail
Create promtail-config.yaml:
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
# Systemd journal
- job_name: journal
journal:
max_age: 12h
labels:
job: systemd-journal
host: __HOSTNAME__
relabel_configs:
- source_labels: [__journal__systemd_unit]
target_label: unit
# Nginx access logs
- job_name: nginx
static_configs:
- targets: [localhost]
labels:
job: nginx
env: prod
__path__: /var/log/nginx/access.log
# Docker container logs via file
- job_name: docker
static_configs:
- targets: [localhost]
labels:
job: docker
__path__: /var/lib/docker/containers/*/*-json.log
pipeline_stages:
- json:
expressions:
log: log
stream: stream
- output:
source: log
Start the stack:
docker compose up -d
docker compose logs -f loki
Step 4: LogQL Query Language
LogQL has two forms — log queries and metric queries.
Stream selectors (required, always first):
{job="nginx"}
{job="nginx", env="prod"}
{job=~"nginx|apache"}
Filter expressions (pipe after selector):
{job="nginx"} |= "error"
{job="nginx"} != "health_check"
{job="nginx"} |~ "5[0-9]{2}"
Parser expressions — extract fields from structured logs:
{job="app"} | json
{job="app"} | logfmt
{job="app"} | json | level="error"
{job="nginx"} | pattern `<ip> - - [<_>] "<method> <uri> <_>" <status> <_>`
Metric queries — turn logs into time-series:
rate({job="nginx"} |= "error" [5m])
count_over_time({job="app"} | json | level="error" [1h])
sum by (status) (rate({job="nginx"} | logfmt | status=~"5.." [5m]))
topk(5, sum by (uri) (rate({job="nginx"}[10m])))
Step 5: Grafana Data Source and Dashboards
- In Grafana, go to Connections → Data sources → Add data source → Loki.
- Set URL to
http://loki:3100(or your Loki address). - Click Save & test — you should see “Data source connected.”
Explore panel tips:
- Use Live tail (the lightning icon) to stream logs in real time — ideal for debugging deployments.
- Click any log line to see log context — lines before and after the match across the same stream.
- Switch to Metrics tab to visualize
rate()queries as time-series panels.
Building a log dashboard:
- Add a Logs panel with
{job="nginx"}— set “Visualize logs” to show volume histogram above the log table. - Add a Time series panel with
sum by (status) (rate({job="nginx"} | logfmt | status!="" [5m]))for HTTP status distribution. - Use Stat panels with
count_over_time({job="app"} | json | level="error" [24h])for error rate KPIs.
Step 6: Alerting with Loki Ruler
Add a ruler block to loki-config.yaml:
ruler:
storage:
type: local
local:
directory: /loki/rules
rule_path: /loki/rules-temp
alertmanager_url: http://alertmanager:9093
ring:
kvstore:
store: inmemory
enable_api: true
Create /loki/rules/fake/rules.yaml (tenant fake for auth_enabled: false):
groups:
- name: app-alerts
interval: 1m
rules:
- alert: HighErrorRate
expr: |
sum(rate({job="app"} | json | level="error" [5m])) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "High error rate in app logs"
- record: job:loki_errors:rate5m
expr: |
sum by (job) (rate({job=~".+"} |= "error" [5m]))
Loki vs. Alternatives Comparison
| Tool | Storage model | Self-hosted | Cost | Grafana native | Best for |
|---|---|---|---|---|---|
| Loki | Label index + compressed chunks | Yes | Low | Yes | Teams already using Grafana/Prometheus |
| Elasticsearch | Full-text inverted index | Yes | High | Via plugin | Full-text search, SIEM |
| Fluentd | Agent only (no storage) | Yes | Free | No | Log routing and transformation |
| Datadog Logs | SaaS, proprietary | No | High | No | Enterprise, full observability |
| CloudWatch Logs | SaaS (AWS) | No | Medium | Via plugin | AWS-native workloads |
Multi-Tenancy
Enable multi-tenancy by setting auth_enabled: true in Loki. Each request must include X-Scope-OrgID: <tenant-id>. Configure Promtail to send the header:
clients:
- url: http://loki:3100/loki/api/v1/push
tenant_id: team-backend
Set per-tenant limits in a per_tenant_override_config file:
overrides:
team-backend:
retention_period: 90d
ingestion_rate_mb: 32
team-frontend:
retention_period: 14d
ingestion_rate_mb: 8
Docker Log Driver
Ship Docker container logs directly to Loki without Promtail using the Loki Docker plugin:
docker plugin install grafana/loki-docker-driver:3.0.0 \
--alias loki --grant-all-permissions
Configure per-container in docker-compose.yml:
services:
myapp:
image: myapp:latest
logging:
driver: loki
options:
loki-url: "http://loki:3100/loki/api/v1/push"
loki-labels: "job=myapp,env=prod"
loki-retries: "3"
Production Sizing Recommendations
| Log volume | Architecture | Storage | RAM |
|---|---|---|---|
| < 5 GB/day | Monolithic, single node | Local filesystem | 2 GB |
| 5–50 GB/day | Monolithic + S3/GCS | Object storage | 4–8 GB |
| 50–500 GB/day | Microservices, 3 ingesters | Object storage + TSDB | 16–32 GB |
| > 500 GB/day | Microservices + Thanos ruler | Object storage, multi-AZ | 64 GB+ |
Always enable the WAL (ingester.wal.enabled: true) in production — it prevents data loss during ingester restarts. Use chunk caching (memcached or Redis) to reduce redundant object storage reads for frequently queried time ranges.
Troubleshooting
| Problem | Solution |
|---|---|
connection refused on push | Verify Loki container is running and port 3100 is reachable from Promtail |
context deadline exceeded queries | Increase querier.query_timeout and split queries to shorter time ranges |
| Missing logs from Docker | Check Promtail positions.yaml for the container log path; verify volume mounts |
| High ingester memory | Reduce chunk_idle_period to flush chunks to storage sooner |
| Retention not working | Ensure compactor.retention_enabled: true and limits_config.retention_period is set |
| Duplicate log entries | Set Promtail positions.filename to a persistent path and avoid restarting with stale positions |
Summary
- Label-only indexing makes Loki 10x cheaper than Elasticsearch for the same log volume.
- Promtail ships logs from files, systemd journal, and Docker to Loki with label enrichment.
- LogQL supports stream filtering, JSON/logfmt parsing, rate calculations, and aggregation.
- Loki ruler evaluates LogQL alerts on a schedule and routes to Alertmanager.
- Multi-tenancy is enabled with
auth_enabled: trueandX-Scope-OrgIDheaders. - For production, replace filesystem storage with S3/GCS and enable WAL + chunk caching.