What is the difference between Grafana and Prometheus?

Prometheus is a time-series database that scrapes and stores metrics from your infrastructure. Grafana is a visualization platform that connects to Prometheus (and other data sources) to display metrics as dashboards, graphs, and alerts. Prometheus collects data; Grafana makes it visual and actionable.

Can Grafana send alerts without Prometheus Alertmanager?

Yes. Grafana has a built-in alerting engine since version 8 (Grafana Alerting). You can define alert rules directly in Grafana, evaluate PromQL or other queries, and route notifications to email, Slack, PagerDuty, or webhooks without requiring Prometheus Alertmanager.

How do I monitor multiple servers on a single Grafana dashboard?

Use Grafana template variables with the 'instance' or 'node' label from Prometheus. Create a variable that queries label_values(node_uname_info, instance) and use it in your panels with $instance. Users can then select individual servers or use the 'All' option to view aggregate data.

What is Node Exporter and why do I need it?

Node Exporter is a Prometheus exporter that runs on each Linux server and exposes hardware and OS metrics — CPU usage, memory, disk I/O, network traffic, filesystem utilization, and more. Without Node Exporter, Prometheus has no way to collect system-level metrics from your servers.

Grafana Dashboards for Infrastructure Monitoring: A Practical Guide

You cannot fix what you cannot see. When a server’s CPU spikes at 3 AM, a disk fills up silently, or a network interface starts dropping packets, you need dashboards that tell you exactly what happened, when, and where. Grafana combined with Prometheus gives you a production-grade monitoring stack that scales from a handful of servers to thousands of nodes — all with open-source tools.

This guide walks you through deploying the Prometheus-Grafana stack, building dashboards that surface the metrics that matter, writing effective PromQL queries, and configuring alerts that notify you before problems become outages.

Prerequisites

Before you begin, ensure you have:

Linux server (Ubuntu 22.04 LTS recommended) with at least 2 GB RAM
Docker and Docker Compose installed
Network access to the servers you want to monitor
Basic familiarity with YAML and the Linux terminal

Step 1: Deploy the Monitoring Stack

The fastest way to get Prometheus and Grafana running is with Docker Compose. This setup includes Prometheus, Grafana, and Node Exporter (for host metrics).

Create the Project Structure

mkdir -p monitoring/{prometheus,grafana}
cd monitoring

Docker Compose Configuration

# docker-compose.yml
services:
  prometheus:
    image: prom/prometheus:v2.51.0
    container_name: prometheus
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=30d'
      - '--web.enable-lifecycle'
    ports:
      - "9090:9090"
    restart: unless-stopped

  grafana:
    image: grafana/grafana:11.0.0
    container_name: grafana
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=changeme
      - GF_USERS_ALLOW_SIGN_UP=false
    volumes:
      - grafana_data:/var/lib/grafana
    ports:
      - "3000:3000"
    restart: unless-stopped
    depends_on:
      - prometheus

  node-exporter:
    image: prom/node-exporter:v1.8.0
    container_name: node-exporter
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--path.rootfs=/rootfs'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
    ports:
      - "9100:9100"
    restart: unless-stopped

volumes:
  prometheus_data:
  grafana_data:

Prometheus Configuration

# prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node-exporter'
    static_configs:
      - targets:
          - 'node-exporter:9100'
        labels:
          instance: 'monitoring-server'
      # Add remote servers running Node Exporter
      # - targets: ['192.168.1.10:9100']
      #   labels:
      #     instance: 'web-01'
      # - targets: ['192.168.1.11:9100']
      #   labels:
      #     instance: 'db-01'

Start the Stack

docker compose up -d

Verify everything is running:

docker compose ps
# Check Prometheus targets: http://your-server:9090/targets
# Access Grafana: http://your-server:3000 (admin / changeme)

Step 2: Add Prometheus as a Data Source

Open Grafana at http://your-server:3000
Log in with admin / changeme
Navigate to Connections > Data Sources > Add data source
Select Prometheus
Set the URL to http://prometheus:9090 (Docker internal DNS)
Click Save & Test — you should see “Data source is working”

You can also provision data sources automatically by creating a YAML file:

# grafana/provisioning/datasources/prometheus.yml
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: true

Mount this into Grafana by adding to your Docker Compose:

volumes:
  - ./grafana/provisioning:/etc/grafana/provisioning

Step 3: Essential PromQL Queries

PromQL (Prometheus Query Language) is how you extract meaningful data from Prometheus. Here are the essential queries for infrastructure monitoring.

CPU Metrics

# CPU usage percentage (across all cores)
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# CPU usage by mode (user, system, iowait)
avg by (instance, mode) (irate(node_cpu_seconds_total{mode!="idle"}[5m])) * 100

# CPU load average (1 minute)
node_load1

Memory Metrics

# Memory usage percentage
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

# Available memory in GB
node_memory_MemAvailable_bytes / 1024 / 1024 / 1024

# Swap usage
(node_memory_SwapTotal_bytes - node_memory_SwapFree_bytes) / node_memory_SwapTotal_bytes * 100

Disk Metrics

# Disk usage percentage per mount point
(1 - (node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"} / node_filesystem_size_bytes{fstype!~"tmpfs|overlay"})) * 100

# Disk I/O rate (reads + writes per second)
irate(node_disk_read_bytes_total[5m]) + irate(node_disk_written_bytes_total[5m])

# Disk I/O utilization
irate(node_disk_io_time_seconds_total[5m]) * 100

Network Metrics

# Network receive rate (bytes/sec)
irate(node_network_receive_bytes_total{device!~"lo|veth.*|docker.*|br-.*"}[5m])

# Network transmit rate (bytes/sec)
irate(node_network_transmit_bytes_total{device!~"lo|veth.*|docker.*|br-.*"}[5m])

# Network errors
irate(node_network_receive_errs_total[5m]) + irate(node_network_transmit_errs_total[5m])

System Uptime

# Uptime in days
(time() - node_boot_time_seconds) / 86400

Step 4: Build the Dashboard

Create a New Dashboard

Click + > New Dashboard
Click Add visualization
Select your Prometheus data source
Enter a PromQL query in the query editor

Panel Types and When to Use Them

Panel Type	Best For	Example
Time Series	Metrics over time	CPU usage, network bandwidth
Stat	Single current value	Uptime, total memory
Gauge	Value within a range	Disk usage %, CPU temp
Bar Gauge	Comparing values	Disk usage per mount
Table	Tabular data	Top processes, server list
Heatmap	Distribution over time	Request latency buckets

Example: CPU Usage Panel

Create a Time Series panel with this configuration:

Query:

100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Panel settings:

Title: CPU Usage %
Unit: Percent (0-100)
Min: 0, Max: 100
Thresholds: Green (0-70), Yellow (70-85), Red (85-100)
Legend: {{instance}}

Example: Memory Gauge Panel

Create a Gauge panel:

Query:

(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

Panel settings:

Title: Memory Usage
Unit: Percent (0-100)
Thresholds: Green (0-75), Orange (75-90), Red (90-100)

Step 5: Template Variables

Variables make dashboards dynamic. Instead of hardcoding server names, let users select from a dropdown.

Create an Instance Variable

Go to Dashboard Settings > Variables > New
Configure:
- Name: instance
- Type: Query
- Data source: Prometheus
- Query: label_values(node_uname_info, instance)
- Multi-value: Enable
- Include All option: Enable
Click Apply

Use Variables in Queries

Replace hardcoded instance labels with $instance:

# CPU usage filtered by selected instance
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle", instance=~"$instance"}[5m])) * 100)

# Memory for selected instance
(1 - (node_memory_MemAvailable_bytes{instance=~"$instance"} / node_memory_MemTotal_bytes{instance=~"$instance"})) * 100

Add a Job Variable

label_values(up, job)

This lets users filter by monitor type (node-exporter, blackbox, application exporters).

Step 6: Alerting in Grafana

Create an Alert Rule

Navigate to Alerting > Alert Rules > New Alert Rule
Define the rule:

High CPU Alert:

Rule name: High CPU Usage
Query: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
Condition: IS ABOVE 90
Evaluate every: 1m
For: 5m (alert fires after 5 minutes of sustained high CPU)

Low Disk Space Alert:

Rule name: Low Disk Space
Query: (1 - (node_filesystem_avail_bytes{fstype!~"tmpfs|overlay",mountpoint="/"} / node_filesystem_size_bytes{fstype!~"tmpfs|overlay",mountpoint="/"})) * 100
Condition: IS ABOVE 85
For: 10m

Configure Notification Channels

Set up a Slack notification contact point:

Go to Alerting > Contact Points > New Contact Point
Select Slack
Configure the webhook URL from your Slack workspace
Set the channel (e.g., #alerts-infra)

For email notifications:

# Add to Grafana environment in docker-compose.yml
environment:
  - GF_SMTP_ENABLED=true
  - GF_SMTP_HOST=smtp.company.com:587
  - GF_SMTP_USER=grafana@company.com
  - GF_SMTP_PASSWORD=smtp_password
  - GF_SMTP_FROM_ADDRESS=grafana@company.com

Dashboard JSON Export/Import

Save your dashboards as JSON for version control:

# Export via API
curl -s -H "Authorization: Bearer YOUR_API_KEY" \
  http://localhost:3000/api/dashboards/uid/YOUR_DASHBOARD_UID | \
  jq '.dashboard' > dashboard-infra.json

# Import via API
curl -s -X POST -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d "{\"dashboard\": $(cat dashboard-infra.json), \"overwrite\": true}" \
  http://localhost:3000/api/dashboards/db

Store dashboard JSON files in Git alongside your infrastructure code for reproducible monitoring setups.

Troubleshooting Common Issues

Grafana Shows “No Data”

# Verify Prometheus is scraping targets
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {job: .labels.job, health: .health}'

# Check if the metric exists in Prometheus
curl -s "http://localhost:9090/api/v1/query?query=up" | jq

# Verify the data source URL in Grafana
# Use http://prometheus:9090 (Docker DNS), NOT localhost

Node Exporter Not Appearing in Targets

# Check if Node Exporter is running
curl -s http://localhost:9100/metrics | head -5

# Reload Prometheus configuration
curl -X POST http://localhost:9090/-/reload

# Check Prometheus logs
docker compose logs prometheus | tail -20

Dashboard Variables Not Populating

Ensure the variable query uses the correct metric name:

# Correct: use a metric that exists
label_values(node_uname_info, instance)

# Wrong: using a metric that hasn't been scraped yet
label_values(nonexistent_metric, instance)

Verify in Prometheus that node_uname_info returns results before using it in a variable query.

High Memory Usage by Prometheus

# Limit retention in prometheus.yml command args
command:
  - '--storage.tsdb.retention.time=15d'   # Reduce from 30d
  - '--storage.tsdb.retention.size=5GB'    # Add size limit

Dashboard Best Practices

Use rows to organize — group related panels (CPU, Memory, Disk, Network) into collapsible rows
Set meaningful thresholds — green/yellow/red indicators help users spot problems instantly
Use consistent units — standardize on bytes vs. GB, percent vs. ratio across all panels
Limit the time range — default to 6h or 12h; long ranges slow queries on large datasets
Add annotations — mark deployments, incidents, and maintenance windows on your dashboards
Use stat panels for overview — place key metrics (uptime, total servers, alerts firing) at the top
Export to JSON — version control your dashboards in Git alongside infrastructure code

Summary

A well-built Grafana dashboard turns raw Prometheus metrics into actionable visibility across your infrastructure. You’ve learned how to deploy the Prometheus-Grafana stack with Docker Compose, write PromQL queries for CPU, memory, disk, and network metrics, build dashboards with appropriate panel types, implement template variables for multi-server filtering, and configure alerting rules with notification channels.

Start with the essential system metrics covered here, then expand to application-specific exporters (MySQL, PostgreSQL, Redis, nginx) as your monitoring needs grow. The combination of Prometheus’s reliable metric collection and Grafana’s flexible visualization gives you a monitoring platform that scales from a home lab to enterprise infrastructure.