What is the PLG stack for logging?

PLG stands for Promtail, Loki, and Grafana. Promtail collects logs, Loki stores and indexes them by labels, and Grafana provides the query and visualization interface.

How does Loki differ from Elasticsearch for log management?

Loki indexes only metadata labels rather than full log content, making it significantly cheaper to operate. It uses LogQL for queries and integrates natively with Grafana.

Can Loki handle logs from multiple servers?

Yes. Deploy Promtail on each server to ship logs to a central Loki instance. All logs are queryable together in Grafana using label selectors like {host='server1'}.

What is LogQL and how do I write basic queries?

LogQL is Loki's query language. A basic stream selector is {app='nginx'} and you can filter lines with |= 'error' or use metric queries like rate({app='nginx'} |= 'error' [5m]).

Loki Log Aggregation with Promtail and Grafana

Centralized logging becomes essential the moment you manage more than a handful of servers, containers, or microservices. Without it, debugging a production incident means SSH-ing into individual machines and grepping through scattered log files. Loki log aggregation with Promtail and Grafana — the PLG stack — solves this by shipping all your logs to a single queryable store while keeping resource costs dramatically lower than traditional solutions like Elasticsearch.

Prerequisites

A Linux server (Ubuntu 22.04 or Debian 12 recommended) with at least 2 GB RAM
Docker 24+ and Docker Compose v2 installed
Basic familiarity with Grafana dashboards (see Grafana Dashboards for Infrastructure Monitoring)
Ports 3000 (Grafana), 3100 (Loki), and 9080 (Promtail) available
Sudo access on the host machine

Understanding the PLG Stack

The PLG stack consists of three components that work together as a lightweight observability pipeline:

Promtail is the log collection agent. It runs alongside your applications, tails log files or reads from the systemd journal, attaches labels (key-value metadata), and pushes log streams to Loki via HTTP.

Loki is the log aggregation backend. Unlike Elasticsearch, Loki does not index log content — it indexes only the labels attached by Promtail. The raw log lines are compressed and stored as chunks. This architecture makes Loki dramatically cheaper to run: a cluster handling gigabytes of logs per day can run on a few hundred megabytes of RAM.

Grafana provides the query and visualization layer. It connects to Loki as a data source and lets you write LogQL queries, build dashboards, and set up alerts — all in the same interface you use for Prometheus metrics (see Prometheus and Grafana Server Monitoring Setup).

The data flow is: Application → log file/journal → Promtail → Loki → Grafana.

Installing Loki with Docker Compose

Create a working directory and the following docker-compose.yml:

version: "3.8"

networks:
  logging:
    driver: bridge

volumes:
  loki_data:
  grafana_data:

services:
  loki:
    image: grafana/loki:3.0.0
    container_name: loki
    ports:
      - "3100:3100"
    volumes:
      - ./loki-config.yaml:/etc/loki/loki-config.yaml
      - loki_data:/loki
    command: -config.file=/etc/loki/loki-config.yaml
    networks:
      - logging

  promtail:
    image: grafana/promtail:3.0.0
    container_name: promtail
    volumes:
      - ./promtail-config.yaml:/etc/promtail/promtail-config.yaml
      - /var/log:/var/log:ro
      - /run/log/journal:/run/log/journal:ro
      - /etc/machine-id:/etc/machine-id:ro
    command: -config.file=/etc/promtail/promtail-config.yaml
    networks:
      - logging
    depends_on:
      - loki

  grafana:
    image: grafana/grafana:11.0.0
    container_name: grafana
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=changeme
    volumes:
      - grafana_data:/var/lib/grafana
    networks:
      - logging
    depends_on:
      - loki

Create loki-config.yaml in the same directory:

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2024-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

limits_config:
  retention_period: 30d

compactor:
  working_directory: /loki/compactor
  retention_enabled: true
  delete_request_store: filesystem

Start the stack:

docker compose up -d
docker compose ps

All three services should show healthy or running status within 30 seconds.

Configuring Promtail

Create promtail-config.yaml:

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  # Systemd journal logs
  - job_name: journal
    journal:
      max_age: 12h
      labels:
        job: systemd-journal
        host: ${HOSTNAME}
    relabel_configs:
      - source_labels: [__journal__systemd_unit]
        target_label: unit

  # File-based logs under /var/log
  - job_name: varlogs
    static_configs:
      - targets:
          - localhost
        labels:
          job: varlogs
          host: ${HOSTNAME}
          __path__: /var/log/*.log

  # Docker container logs (if Docker is running on the host)
  - job_name: docker
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
        refresh_interval: 5s
    relabel_configs:
      - source_labels: [__meta_docker_container_name]
        regex: "/(.*)"
        target_label: container
      - source_labels: [__meta_docker_container_label_com_docker_compose_service]
        target_label: service

The positions file tracks how far Promtail has read into each log file, preventing duplicate ingestion after restarts.

Restart Promtail to apply the configuration:

docker compose restart promtail
docker compose logs promtail --tail=20

Querying Logs with LogQL

LogQL is Loki’s query language, inspired by PromQL. Queries have two parts: a stream selector in curly braces and optional pipeline stages.

Basic stream selector — fetch all logs from the varlogs job:

{job="varlogs"}

Line filter — find lines containing “error”:

{job="varlogs"} |= "error"

Regex filter — find lines matching a pattern:

{job="systemd-journal"} |~ "fail(ed)?"

Label filter after parsing — parse JSON logs and filter by level:

{job="varlogs"} | json | level="error"

Metric query — error rate over time:

rate({job="varlogs"} |= "error" [5m])

Count errors by host:

sum by (host) (count_over_time({job="varlogs"} |= "error" [1h]))

For more complex log parsing patterns, compare with the Journalctl Query and Analyze Linux System Logs approach for local-only scenarios.

Setting Up Grafana Dashboards

Open Grafana at http://your-server:3000 and log in with admin / changeme.

Add Loki as a data source:

Go to Connections > Data Sources > Add data source
Select Loki
Set URL to http://loki:3100
Click Save & Test — you should see “Data source connected and labels found”

Create a log exploration panel:

Create a new dashboard and add a panel
Select Loki as the data source
Switch visualization to Logs
Enter a LogQL query: {job="varlogs"} |= "error"
Enable Deduplication and Wrap lines in panel options

Build a rate dashboard:

Add a Time series panel with the query:

sum by (host) (rate({job="varlogs"} |= "error" [5m]))

This gives you a real-time error rate graph broken down by server — the same pattern used in Docker Compose Multi-Container Applications observability setups.

Alerting on Log Patterns

Loki includes a ruler component that evaluates LogQL metric queries on a schedule and fires alerts to Alertmanager.

Add the ruler configuration to loki-config.yaml:

ruler:
  storage:
    type: local
    local:
      directory: /loki/rules
  rule_path: /loki/rules-temp
  alertmanager_url: http://alertmanager:9093
  enable_api: true

Create /loki/rules/fake/rules.yaml (mount this via Docker volume):

groups:
  - name: log-alerts
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate({job="varlogs"} |= "error" [5m])) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High error rate detected in system logs"
          description: "Error log rate has exceeded 0.1 lines/sec for 5 minutes."

      - alert: SSHAuthFailures
        expr: |
          sum(count_over_time({unit="sshd.service"} |= "Failed password" [10m])) > 10
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Multiple SSH authentication failures"
          description: "More than 10 SSH auth failures in 10 minutes — possible brute force."

Loki vs ELK vs Graylog

Feature	Loki	ELK Stack	Graylog
Indexing strategy	Labels only	Full-text index	Full-text index
RAM usage (10 GB/day)	~512 MB	8–16 GB	4–8 GB
Query language	LogQL	KQL / Lucene	GELF / Elasticsearch
Grafana integration	Native	Plugin required	Plugin required
Setup complexity	Low	High	Medium
Horizontal scalability	Good (microservices mode)	Excellent	Good
Cost at scale	Very low	High	Medium
Best for	Kubernetes / container logs	Full-text search on logs	Compliance, SIEM

Loki wins on cost and simplicity when your primary need is log aggregation and correlation with metrics. ELK wins when you need full-text search across log content. For a comparison with Elasticsearch setup, see Elasticsearch Setup for Log Analysis.

Real-World Scenario

You manage 15 production servers running a mix of Nginx, PostgreSQL, and custom Python APIs. Incidents happen but root cause analysis takes hours because logs are spread across 15 machines. Here is how PLG solves this:

Deploy Promtail on each server using a configuration management tool (Ansible/Puppet), pointing all instances at a single Loki endpoint.
Add host and env labels so every log stream is tagged with the originating server and environment (production/staging).
In Grafana, create a dashboard with a variable $host bound to the host label values. Now a single dashboard shows logs from all 15 servers, with a dropdown to filter by host.
Add an alert rule that fires when any host produces more than 50 error lines per minute — you get notified in Slack before users report the issue.
During incidents, use the Explore view to correlate logs and Prometheus metrics side-by-side: error spike in logs at 14:32 → CPU spike on the same host at 14:31.

This workflow replaces a 45-minute manual SSH investigation with a 3-minute Grafana drill-down.

Gotchas and Edge Cases

Label cardinality explosion. Every unique label combination creates a separate log stream. Never use high-cardinality values (user IDs, UUIDs, request IDs) as labels. Keep labels to stable, low-count values: host, job, env, service.

Retention configuration. The retention_period in limits_config requires the compactor to be enabled with retention_enabled: true. Without this, Loki stores logs indefinitely until disk fills up.

Chunk cache and memory. Loki caches uncompressed chunks in memory during ingestion. On servers with less than 2 GB RAM, set chunk_target_size: 1048576 and max_chunk_age: 1h to limit memory pressure.

Promtail position file loss. If the Promtail container is recreated without a persistent volume for /tmp/positions.yaml, it re-reads all log files from the start and ingests duplicates. Always mount the positions file on a persistent volume.

Log ordering. Loki requires that log lines within a single stream are ingested in timestamp order. If your application writes logs with out-of-order timestamps, enable unordered_writes: true in the Loki ingester config.

Docker socket access. The Docker SD config in Promtail requires access to the Docker socket. The Promtail container must either run as root or be added to the docker group.

Summary

The PLG stack (Promtail + Loki + Grafana) provides centralized log aggregation at a fraction of the cost of ELK
Loki indexes labels only, not log content — keeping storage and RAM usage minimal
LogQL stream selectors ({job="varlogs"}) plus pipeline stages (|= "error", | json) enable powerful log filtering
Promtail scrapes systemd journal, files, and Docker container logs with automatic label attachment
Grafana’s Loki data source enables unified dashboards combining logs (Loki) and metrics (Prometheus)
High-cardinality labels are the most common Loki performance mistake — keep label values stable and low-count
Loki alerting rules use LogQL metric queries evaluated by the ruler component