Centralized logging becomes essential the moment you manage more than a handful of servers, containers, or microservices. Without it, debugging a production incident means SSH-ing into individual machines and grepping through scattered log files. Loki log aggregation with Promtail and Grafana — the PLG stack — solves this by shipping all your logs to a single queryable store while keeping resource costs dramatically lower than traditional solutions like Elasticsearch.
Prerequisites
- A Linux server (Ubuntu 22.04 or Debian 12 recommended) with at least 2 GB RAM
- Docker 24+ and Docker Compose v2 installed
- Basic familiarity with Grafana dashboards (see Grafana Dashboards for Infrastructure Monitoring)
- Ports 3000 (Grafana), 3100 (Loki), and 9080 (Promtail) available
- Sudo access on the host machine
Understanding the PLG Stack
The PLG stack consists of three components that work together as a lightweight observability pipeline:
Promtail is the log collection agent. It runs alongside your applications, tails log files or reads from the systemd journal, attaches labels (key-value metadata), and pushes log streams to Loki via HTTP.
Loki is the log aggregation backend. Unlike Elasticsearch, Loki does not index log content — it indexes only the labels attached by Promtail. The raw log lines are compressed and stored as chunks. This architecture makes Loki dramatically cheaper to run: a cluster handling gigabytes of logs per day can run on a few hundred megabytes of RAM.
Grafana provides the query and visualization layer. It connects to Loki as a data source and lets you write LogQL queries, build dashboards, and set up alerts — all in the same interface you use for Prometheus metrics (see Prometheus and Grafana Server Monitoring Setup).
The data flow is: Application → log file/journal → Promtail → Loki → Grafana.
Installing Loki with Docker Compose
Create a working directory and the following docker-compose.yml:
version: "3.8"
networks:
logging:
driver: bridge
volumes:
loki_data:
grafana_data:
services:
loki:
image: grafana/loki:3.0.0
container_name: loki
ports:
- "3100:3100"
volumes:
- ./loki-config.yaml:/etc/loki/loki-config.yaml
- loki_data:/loki
command: -config.file=/etc/loki/loki-config.yaml
networks:
- logging
promtail:
image: grafana/promtail:3.0.0
container_name: promtail
volumes:
- ./promtail-config.yaml:/etc/promtail/promtail-config.yaml
- /var/log:/var/log:ro
- /run/log/journal:/run/log/journal:ro
- /etc/machine-id:/etc/machine-id:ro
command: -config.file=/etc/promtail/promtail-config.yaml
networks:
- logging
depends_on:
- loki
grafana:
image: grafana/grafana:11.0.0
container_name: grafana
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=changeme
volumes:
- grafana_data:/var/lib/grafana
networks:
- logging
depends_on:
- loki
Create loki-config.yaml in the same directory:
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
common:
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
schema_config:
configs:
- from: 2024-01-01
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
limits_config:
retention_period: 30d
compactor:
working_directory: /loki/compactor
retention_enabled: true
delete_request_store: filesystem
Start the stack:
docker compose up -d
docker compose ps
All three services should show healthy or running status within 30 seconds.
Configuring Promtail
Create promtail-config.yaml:
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
# Systemd journal logs
- job_name: journal
journal:
max_age: 12h
labels:
job: systemd-journal
host: ${HOSTNAME}
relabel_configs:
- source_labels: [__journal__systemd_unit]
target_label: unit
# File-based logs under /var/log
- job_name: varlogs
static_configs:
- targets:
- localhost
labels:
job: varlogs
host: ${HOSTNAME}
__path__: /var/log/*.log
# Docker container logs (if Docker is running on the host)
- job_name: docker
docker_sd_configs:
- host: unix:///var/run/docker.sock
refresh_interval: 5s
relabel_configs:
- source_labels: [__meta_docker_container_name]
regex: "/(.*)"
target_label: container
- source_labels: [__meta_docker_container_label_com_docker_compose_service]
target_label: service
The positions file tracks how far Promtail has read into each log file, preventing duplicate ingestion after restarts.
Restart Promtail to apply the configuration:
docker compose restart promtail
docker compose logs promtail --tail=20
Querying Logs with LogQL
LogQL is Loki’s query language, inspired by PromQL. Queries have two parts: a stream selector in curly braces and optional pipeline stages.
Basic stream selector — fetch all logs from the varlogs job:
{job="varlogs"}
Line filter — find lines containing “error”:
{job="varlogs"} |= "error"
Regex filter — find lines matching a pattern:
{job="systemd-journal"} |~ "fail(ed)?"
Label filter after parsing — parse JSON logs and filter by level:
{job="varlogs"} | json | level="error"
Metric query — error rate over time:
rate({job="varlogs"} |= "error" [5m])
Count errors by host:
sum by (host) (count_over_time({job="varlogs"} |= "error" [1h]))
For more complex log parsing patterns, compare with the Journalctl Query and Analyze Linux System Logs approach for local-only scenarios.
Setting Up Grafana Dashboards
Open Grafana at http://your-server:3000 and log in with admin / changeme.
Add Loki as a data source:
- Go to Connections > Data Sources > Add data source
- Select Loki
- Set URL to
http://loki:3100 - Click Save & Test — you should see “Data source connected and labels found”
Create a log exploration panel:
- Create a new dashboard and add a panel
- Select Loki as the data source
- Switch visualization to Logs
- Enter a LogQL query:
{job="varlogs"} |= "error" - Enable Deduplication and Wrap lines in panel options
Build a rate dashboard:
Add a Time series panel with the query:
sum by (host) (rate({job="varlogs"} |= "error" [5m]))
This gives you a real-time error rate graph broken down by server — the same pattern used in Docker Compose Multi-Container Applications observability setups.
Alerting on Log Patterns
Loki includes a ruler component that evaluates LogQL metric queries on a schedule and fires alerts to Alertmanager.
Add the ruler configuration to loki-config.yaml:
ruler:
storage:
type: local
local:
directory: /loki/rules
rule_path: /loki/rules-temp
alertmanager_url: http://alertmanager:9093
enable_api: true
Create /loki/rules/fake/rules.yaml (mount this via Docker volume):
groups:
- name: log-alerts
rules:
- alert: HighErrorRate
expr: |
sum(rate({job="varlogs"} |= "error" [5m])) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "High error rate detected in system logs"
description: "Error log rate has exceeded 0.1 lines/sec for 5 minutes."
- alert: SSHAuthFailures
expr: |
sum(count_over_time({unit="sshd.service"} |= "Failed password" [10m])) > 10
for: 2m
labels:
severity: critical
annotations:
summary: "Multiple SSH authentication failures"
description: "More than 10 SSH auth failures in 10 minutes — possible brute force."
Loki vs ELK vs Graylog
| Feature | Loki | ELK Stack | Graylog |
|---|---|---|---|
| Indexing strategy | Labels only | Full-text index | Full-text index |
| RAM usage (10 GB/day) | ~512 MB | 8–16 GB | 4–8 GB |
| Query language | LogQL | KQL / Lucene | GELF / Elasticsearch |
| Grafana integration | Native | Plugin required | Plugin required |
| Setup complexity | Low | High | Medium |
| Horizontal scalability | Good (microservices mode) | Excellent | Good |
| Cost at scale | Very low | High | Medium |
| Best for | Kubernetes / container logs | Full-text search on logs | Compliance, SIEM |
Loki wins on cost and simplicity when your primary need is log aggregation and correlation with metrics. ELK wins when you need full-text search across log content. For a comparison with Elasticsearch setup, see Elasticsearch Setup for Log Analysis.
Real-World Scenario
You manage 15 production servers running a mix of Nginx, PostgreSQL, and custom Python APIs. Incidents happen but root cause analysis takes hours because logs are spread across 15 machines. Here is how PLG solves this:
- Deploy Promtail on each server using a configuration management tool (Ansible/Puppet), pointing all instances at a single Loki endpoint.
- Add
hostandenvlabels so every log stream is tagged with the originating server and environment (production/staging). - In Grafana, create a dashboard with a variable
$hostbound to thehostlabel values. Now a single dashboard shows logs from all 15 servers, with a dropdown to filter by host. - Add an alert rule that fires when any host produces more than 50 error lines per minute — you get notified in Slack before users report the issue.
- During incidents, use the Explore view to correlate logs and Prometheus metrics side-by-side: error spike in logs at 14:32 → CPU spike on the same host at 14:31.
This workflow replaces a 45-minute manual SSH investigation with a 3-minute Grafana drill-down.
Gotchas and Edge Cases
Label cardinality explosion. Every unique label combination creates a separate log stream. Never use high-cardinality values (user IDs, UUIDs, request IDs) as labels. Keep labels to stable, low-count values: host, job, env, service.
Retention configuration. The retention_period in limits_config requires the compactor to be enabled with retention_enabled: true. Without this, Loki stores logs indefinitely until disk fills up.
Chunk cache and memory. Loki caches uncompressed chunks in memory during ingestion. On servers with less than 2 GB RAM, set chunk_target_size: 1048576 and max_chunk_age: 1h to limit memory pressure.
Promtail position file loss. If the Promtail container is recreated without a persistent volume for /tmp/positions.yaml, it re-reads all log files from the start and ingests duplicates. Always mount the positions file on a persistent volume.
Log ordering. Loki requires that log lines within a single stream are ingested in timestamp order. If your application writes logs with out-of-order timestamps, enable unordered_writes: true in the Loki ingester config.
Docker socket access. The Docker SD config in Promtail requires access to the Docker socket. The Promtail container must either run as root or be added to the docker group.
Summary
- The PLG stack (Promtail + Loki + Grafana) provides centralized log aggregation at a fraction of the cost of ELK
- Loki indexes labels only, not log content — keeping storage and RAM usage minimal
- LogQL stream selectors (
{job="varlogs"}) plus pipeline stages (|= "error",| json) enable powerful log filtering - Promtail scrapes systemd journal, files, and Docker container logs with automatic label attachment
- Grafana’s Loki data source enables unified dashboards combining logs (Loki) and metrics (Prometheus)
- High-cardinality labels are the most common Loki performance mistake — keep label values stable and low-count
- Loki alerting rules use LogQL metric queries evaluated by the ruler component