Why Prometheus + Grafana?
In DevOps, you can’t fix what you can’t see. Prometheus and Grafana form the gold standard for open-source infrastructure monitoring:
- Prometheus scrapes metrics from your servers, containers, and applications every 15 seconds and stores them as time-series data.
- Grafana turns that data into rich, interactive dashboards with alerting capabilities.
Together, they give you complete visibility into CPU, memory, disk, network, application latency, error rates, and more.
Prerequisites
- A Linux server (Ubuntu 22.04 or RHEL 9 recommended).
- At least 2 GB RAM for a basic Prometheus + Grafana setup.
- Root or sudo access.
- Firewall rules allowing ports 9090 (Prometheus), 9100 (Node Exporter), and 3000 (Grafana).
Step 1: Install Prometheus
Download and Configure
# Create a dedicated user
sudo useradd --no-create-home --shell /bin/false prometheus
# Download (check latest version at prometheus.io)
wget https://github.com/prometheus/prometheus/releases/download/v2.50.0/prometheus-2.50.0.linux-amd64.tar.gz
tar xvfz prometheus-*.tar.gz
sudo mv prometheus-2.50.0.linux-amd64/prometheus /usr/local/bin/
sudo mv prometheus-2.50.0.linux-amd64/promtool /usr/local/bin/
# Create config directory
sudo mkdir /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
Configure prometheus.yml
# /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
static_configs:
- targets:
- '192.168.1.10:9100'
- '192.168.1.11:9100'
- '192.168.1.12:9100'
Create a systemd Service
# /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Monitoring
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus/ \
--storage.tsdb.retention.time=30d
Restart=always
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable --now prometheus
Verify: Navigate to http://your-server:9090/targets
Step 2: Install Node Exporter on Targets
Node Exporter exposes hardware and OS metrics (CPU, memory, disk, network) in Prometheus format.
# On EACH server you want to monitor
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvfz node_exporter-*.tar.gz
sudo mv node_exporter-*/node_exporter /usr/local/bin/
# Create systemd service (similar to Prometheus above)
sudo systemctl enable --now node_exporter
The metrics are available at http://server-ip:9100/metrics
Step 3: Install Grafana
# Ubuntu/Debian
sudo apt install -y apt-transport-https software-properties-common
wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg
echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update && sudo apt install grafana -y
sudo systemctl enable --now grafana-server
Access Grafana at http://your-server:3000 (default login: admin / admin).
Add Prometheus as a Data Source
- Go to Settings (gear icon) → Data Sources → Add data source.
- Select Prometheus.
- Set the URL to
http://localhost:9090(or the Prometheus server IP). - Click Save & Test. Should say “Data source is working”.
Import a Dashboard
- Go to Dashboards → Import.
- Enter ID 1860 (Node Exporter Full) and click Load.
- Select your Prometheus data source, click Import.
You instantly get CPU, memory, disk, network, and filesystem visualizations for all your monitored servers.
Step 4: Configure Alerting
Prometheus Alert Rules
Create /etc/prometheus/alert_rules.yml:
groups:
- name: node_alerts
rules:
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 90
for: 5m
labels:
severity: warning
annotations:
summary: "CPU usage is above 90% on {{ $labels.instance }}"
- alert: DiskSpaceLow
expr: (node_filesystem_avail_bytes{fstype!="tmpfs"} / node_filesystem_size_bytes * 100) < 15
for: 10m
labels:
severity: critical
annotations:
summary: "Disk space is below 15% on {{ $labels.instance }}"
Add it to prometheus.yml:
rule_files:
- "alert_rules.yml"
Troubleshooting
| Problem | Solution |
|---|---|
| Target shows “DOWN” | Check exporter is running, firewall allows port, /metrics endpoint is accessible |
| Grafana dashboard empty | Verify data source URL, check PromQL query, adjust time range |
| Prometheus OOM crash | Reduce scrape targets, increase retention interval, add more RAM |
| ”No data” in alert query | Test the PromQL expression directly in Prometheus UI (/graph) |
Summary
- Prometheus scrapes metrics; Grafana visualizes them.
- Use Node Exporter on every server, and add each as a
targetinprometheus.yml. - Import dashboard 1860 for instant full-system visibility.
- Configure alert rules for CPU, disk, and memory before you have an outage.