GRAFANA + PROMETHEUS — MONITORING STACK Servers web-01 web-02 db-01 Node Exporter :9100 scrape Prometheus TSDB Storage PromQL Engine Scrape Targets Alert Rules :9090 query Grafana Dashboards Panel Editor Alerting Engine Variables :3000 CPU / Memory Time series, Gauge Disk / Network Stat, Bar gauge Alerts Slack, Email, PD Teams SRE / DevOps Developers Management Collect → Store → Visualize → Alert — full observability pipeline

You cannot fix what you cannot see. When a server’s CPU spikes at 3 AM, a disk fills up silently, or a network interface starts dropping packets, you need dashboards that tell you exactly what happened, when, and where. Grafana combined with Prometheus gives you a production-grade monitoring stack that scales from a handful of servers to thousands of nodes — all with open-source tools.

This guide walks you through deploying the Prometheus-Grafana stack, building dashboards that surface the metrics that matter, writing effective PromQL queries, and configuring alerts that notify you before problems become outages.


Prerequisites

Before you begin, ensure you have:

  • Linux server (Ubuntu 22.04 LTS recommended) with at least 2 GB RAM
  • Docker and Docker Compose installed
  • Network access to the servers you want to monitor
  • Basic familiarity with YAML and the Linux terminal

Step 1: Deploy the Monitoring Stack

The fastest way to get Prometheus and Grafana running is with Docker Compose. This setup includes Prometheus, Grafana, and Node Exporter (for host metrics).

Create the Project Structure

mkdir -p monitoring/{prometheus,grafana}
cd monitoring

Docker Compose Configuration

# docker-compose.yml
services:
  prometheus:
    image: prom/prometheus:v2.51.0
    container_name: prometheus
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=30d'
      - '--web.enable-lifecycle'
    ports:
      - "9090:9090"
    restart: unless-stopped

  grafana:
    image: grafana/grafana:11.0.0
    container_name: grafana
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=changeme
      - GF_USERS_ALLOW_SIGN_UP=false
    volumes:
      - grafana_data:/var/lib/grafana
    ports:
      - "3000:3000"
    restart: unless-stopped
    depends_on:
      - prometheus

  node-exporter:
    image: prom/node-exporter:v1.8.0
    container_name: node-exporter
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--path.rootfs=/rootfs'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
    ports:
      - "9100:9100"
    restart: unless-stopped

volumes:
  prometheus_data:
  grafana_data:

Prometheus Configuration

# prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node-exporter'
    static_configs:
      - targets:
          - 'node-exporter:9100'
        labels:
          instance: 'monitoring-server'
      # Add remote servers running Node Exporter
      # - targets: ['192.168.1.10:9100']
      #   labels:
      #     instance: 'web-01'
      # - targets: ['192.168.1.11:9100']
      #   labels:
      #     instance: 'db-01'

Start the Stack

docker compose up -d

Verify everything is running:

docker compose ps
# Check Prometheus targets: http://your-server:9090/targets
# Access Grafana: http://your-server:3000 (admin / changeme)

Step 2: Add Prometheus as a Data Source

  1. Open Grafana at http://your-server:3000
  2. Log in with admin / changeme
  3. Navigate to Connections > Data Sources > Add data source
  4. Select Prometheus
  5. Set the URL to http://prometheus:9090 (Docker internal DNS)
  6. Click Save & Test — you should see “Data source is working”

You can also provision data sources automatically by creating a YAML file:

# grafana/provisioning/datasources/prometheus.yml
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: true

Mount this into Grafana by adding to your Docker Compose:

volumes:
  - ./grafana/provisioning:/etc/grafana/provisioning

Step 3: Essential PromQL Queries

PromQL (Prometheus Query Language) is how you extract meaningful data from Prometheus. Here are the essential queries for infrastructure monitoring.

CPU Metrics

# CPU usage percentage (across all cores)
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# CPU usage by mode (user, system, iowait)
avg by (instance, mode) (irate(node_cpu_seconds_total{mode!="idle"}[5m])) * 100

# CPU load average (1 minute)
node_load1

Memory Metrics

# Memory usage percentage
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

# Available memory in GB
node_memory_MemAvailable_bytes / 1024 / 1024 / 1024

# Swap usage
(node_memory_SwapTotal_bytes - node_memory_SwapFree_bytes) / node_memory_SwapTotal_bytes * 100

Disk Metrics

# Disk usage percentage per mount point
(1 - (node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"} / node_filesystem_size_bytes{fstype!~"tmpfs|overlay"})) * 100

# Disk I/O rate (reads + writes per second)
irate(node_disk_read_bytes_total[5m]) + irate(node_disk_written_bytes_total[5m])

# Disk I/O utilization
irate(node_disk_io_time_seconds_total[5m]) * 100

Network Metrics

# Network receive rate (bytes/sec)
irate(node_network_receive_bytes_total{device!~"lo|veth.*|docker.*|br-.*"}[5m])

# Network transmit rate (bytes/sec)
irate(node_network_transmit_bytes_total{device!~"lo|veth.*|docker.*|br-.*"}[5m])

# Network errors
irate(node_network_receive_errs_total[5m]) + irate(node_network_transmit_errs_total[5m])

System Uptime

# Uptime in days
(time() - node_boot_time_seconds) / 86400

Step 4: Build the Dashboard

Create a New Dashboard

  1. Click + > New Dashboard
  2. Click Add visualization
  3. Select your Prometheus data source
  4. Enter a PromQL query in the query editor

Panel Types and When to Use Them

Panel TypeBest ForExample
Time SeriesMetrics over timeCPU usage, network bandwidth
StatSingle current valueUptime, total memory
GaugeValue within a rangeDisk usage %, CPU temp
Bar GaugeComparing valuesDisk usage per mount
TableTabular dataTop processes, server list
HeatmapDistribution over timeRequest latency buckets

Example: CPU Usage Panel

Create a Time Series panel with this configuration:

Query:

100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Panel settings:

  • Title: CPU Usage %
  • Unit: Percent (0-100)
  • Min: 0, Max: 100
  • Thresholds: Green (0-70), Yellow (70-85), Red (85-100)
  • Legend: {{instance}}

Example: Memory Gauge Panel

Create a Gauge panel:

Query:

(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

Panel settings:

  • Title: Memory Usage
  • Unit: Percent (0-100)
  • Thresholds: Green (0-75), Orange (75-90), Red (90-100)

Step 5: Template Variables

Variables make dashboards dynamic. Instead of hardcoding server names, let users select from a dropdown.

Create an Instance Variable

  1. Go to Dashboard Settings > Variables > New

  2. Configure:

    • Name: instance
    • Type: Query
    • Data source: Prometheus
    • Query: label_values(node_uname_info, instance)
    • Multi-value: Enable
    • Include All option: Enable
  3. Click Apply

Use Variables in Queries

Replace hardcoded instance labels with $instance:

# CPU usage filtered by selected instance
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle", instance=~"$instance"}[5m])) * 100)

# Memory for selected instance
(1 - (node_memory_MemAvailable_bytes{instance=~"$instance"} / node_memory_MemTotal_bytes{instance=~"$instance"})) * 100

Add a Job Variable

label_values(up, job)

This lets users filter by monitor type (node-exporter, blackbox, application exporters).


Step 6: Alerting in Grafana

Create an Alert Rule

  1. Navigate to Alerting > Alert Rules > New Alert Rule
  2. Define the rule:

High CPU Alert:

  • Rule name: High CPU Usage
  • Query: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
  • Condition: IS ABOVE 90
  • Evaluate every: 1m
  • For: 5m (alert fires after 5 minutes of sustained high CPU)

Low Disk Space Alert:

  • Rule name: Low Disk Space
  • Query: (1 - (node_filesystem_avail_bytes{fstype!~"tmpfs|overlay",mountpoint="/"} / node_filesystem_size_bytes{fstype!~"tmpfs|overlay",mountpoint="/"})) * 100
  • Condition: IS ABOVE 85
  • For: 10m

Configure Notification Channels

Set up a Slack notification contact point:

  1. Go to Alerting > Contact Points > New Contact Point
  2. Select Slack
  3. Configure the webhook URL from your Slack workspace
  4. Set the channel (e.g., #alerts-infra)

For email notifications:

# Add to Grafana environment in docker-compose.yml
environment:
  - GF_SMTP_ENABLED=true
  - GF_SMTP_HOST=smtp.company.com:587
  - GF_SMTP_USER=grafana@company.com
  - GF_SMTP_PASSWORD=smtp_password
  - GF_SMTP_FROM_ADDRESS=grafana@company.com

Dashboard JSON Export/Import

Save your dashboards as JSON for version control:

# Export via API
curl -s -H "Authorization: Bearer YOUR_API_KEY" \
  http://localhost:3000/api/dashboards/uid/YOUR_DASHBOARD_UID | \
  jq '.dashboard' > dashboard-infra.json

# Import via API
curl -s -X POST -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d "{\"dashboard\": $(cat dashboard-infra.json), \"overwrite\": true}" \
  http://localhost:3000/api/dashboards/db

Store dashboard JSON files in Git alongside your infrastructure code for reproducible monitoring setups.


Troubleshooting Common Issues

Grafana Shows “No Data”

# Verify Prometheus is scraping targets
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {job: .labels.job, health: .health}'

# Check if the metric exists in Prometheus
curl -s "http://localhost:9090/api/v1/query?query=up" | jq

# Verify the data source URL in Grafana
# Use http://prometheus:9090 (Docker DNS), NOT localhost

Node Exporter Not Appearing in Targets

# Check if Node Exporter is running
curl -s http://localhost:9100/metrics | head -5

# Reload Prometheus configuration
curl -X POST http://localhost:9090/-/reload

# Check Prometheus logs
docker compose logs prometheus | tail -20

Dashboard Variables Not Populating

Ensure the variable query uses the correct metric name:

# Correct: use a metric that exists
label_values(node_uname_info, instance)

# Wrong: using a metric that hasn't been scraped yet
label_values(nonexistent_metric, instance)

Verify in Prometheus that node_uname_info returns results before using it in a variable query.

High Memory Usage by Prometheus

# Limit retention in prometheus.yml command args
command:
  - '--storage.tsdb.retention.time=15d'   # Reduce from 30d
  - '--storage.tsdb.retention.size=5GB'    # Add size limit

Dashboard Best Practices

  • Use rows to organize — group related panels (CPU, Memory, Disk, Network) into collapsible rows
  • Set meaningful thresholds — green/yellow/red indicators help users spot problems instantly
  • Use consistent units — standardize on bytes vs. GB, percent vs. ratio across all panels
  • Limit the time range — default to 6h or 12h; long ranges slow queries on large datasets
  • Add annotations — mark deployments, incidents, and maintenance windows on your dashboards
  • Use stat panels for overview — place key metrics (uptime, total servers, alerts firing) at the top
  • Export to JSON — version control your dashboards in Git alongside infrastructure code

Summary

A well-built Grafana dashboard turns raw Prometheus metrics into actionable visibility across your infrastructure. You’ve learned how to deploy the Prometheus-Grafana stack with Docker Compose, write PromQL queries for CPU, memory, disk, and network metrics, build dashboards with appropriate panel types, implement template variables for multi-server filtering, and configure alerting rules with notification channels.

Start with the essential system metrics covered here, then expand to application-specific exporters (MySQL, PostgreSQL, Redis, nginx) as your monitoring needs grow. The combination of Prometheus’s reliable metric collection and Grafana’s flexible visualization gives you a monitoring platform that scales from a home lab to enterprise infrastructure.