Why Prometheus + Grafana?

In DevOps, you can’t fix what you can’t see. Prometheus and Grafana form the gold standard for open-source infrastructure monitoring:

  • Prometheus scrapes metrics from your servers, containers, and applications every 15 seconds and stores them as time-series data.
  • Grafana turns that data into rich, interactive dashboards with alerting capabilities.

Together, they give you complete visibility into CPU, memory, disk, network, application latency, error rates, and more.

Prerequisites

  • A Linux server (Ubuntu 22.04 or RHEL 9 recommended).
  • At least 2 GB RAM for a basic Prometheus + Grafana setup.
  • Root or sudo access.
  • Firewall rules allowing ports 9090 (Prometheus), 9100 (Node Exporter), and 3000 (Grafana).

Step 1: Install Prometheus

Download and Configure

# Create a dedicated user
sudo useradd --no-create-home --shell /bin/false prometheus

# Download (check latest version at prometheus.io)
wget https://github.com/prometheus/prometheus/releases/download/v2.50.0/prometheus-2.50.0.linux-amd64.tar.gz
tar xvfz prometheus-*.tar.gz
sudo mv prometheus-2.50.0.linux-amd64/prometheus /usr/local/bin/
sudo mv prometheus-2.50.0.linux-amd64/promtool /usr/local/bin/

# Create config directory
sudo mkdir /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus

Configure prometheus.yml

# /etc/prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node'
    static_configs:
      - targets:
        - '192.168.1.10:9100'
        - '192.168.1.11:9100'
        - '192.168.1.12:9100'

Create a systemd Service

# /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Monitoring
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
ExecStart=/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus/ \
  --storage.tsdb.retention.time=30d
Restart=always

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable --now prometheus

Verify: Navigate to http://your-server:9090/targets


Step 2: Install Node Exporter on Targets

Node Exporter exposes hardware and OS metrics (CPU, memory, disk, network) in Prometheus format.

# On EACH server you want to monitor
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvfz node_exporter-*.tar.gz
sudo mv node_exporter-*/node_exporter /usr/local/bin/

# Create systemd service (similar to Prometheus above)
sudo systemctl enable --now node_exporter

The metrics are available at http://server-ip:9100/metrics


Step 3: Install Grafana

# Ubuntu/Debian
sudo apt install -y apt-transport-https software-properties-common
wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg
echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update && sudo apt install grafana -y
sudo systemctl enable --now grafana-server

Access Grafana at http://your-server:3000 (default login: admin / admin).

Add Prometheus as a Data Source

  1. Go to Settings (gear icon) → Data SourcesAdd data source.
  2. Select Prometheus.
  3. Set the URL to http://localhost:9090 (or the Prometheus server IP).
  4. Click Save & Test. Should say “Data source is working”.

Import a Dashboard

  1. Go to DashboardsImport.
  2. Enter ID 1860 (Node Exporter Full) and click Load.
  3. Select your Prometheus data source, click Import.

You instantly get CPU, memory, disk, network, and filesystem visualizations for all your monitored servers.


Step 4: Configure Alerting

Prometheus Alert Rules

Create /etc/prometheus/alert_rules.yml:

groups:
  - name: node_alerts
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 90
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "CPU usage is above 90% on {{ $labels.instance }}"

      - alert: DiskSpaceLow
        expr: (node_filesystem_avail_bytes{fstype!="tmpfs"} / node_filesystem_size_bytes * 100) < 15
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "Disk space is below 15% on {{ $labels.instance }}"

Add it to prometheus.yml:

rule_files:
  - "alert_rules.yml"

Troubleshooting

ProblemSolution
Target shows “DOWN”Check exporter is running, firewall allows port, /metrics endpoint is accessible
Grafana dashboard emptyVerify data source URL, check PromQL query, adjust time range
Prometheus OOM crashReduce scrape targets, increase retention interval, add more RAM
”No data” in alert queryTest the PromQL expression directly in Prometheus UI (/graph)

Summary

  • Prometheus scrapes metrics; Grafana visualizes them.
  • Use Node Exporter on every server, and add each as a target in prometheus.yml.
  • Import dashboard 1860 for instant full-system visibility.
  • Configure alert rules for CPU, disk, and memory before you have an outage.