PROMETHEUS + GRAFANA MONITORING ARCHITECTURE Targets node_exporter :9100 cAdvisor :8080 scrape Prometheus TSDB Storage PromQL Engine Alert Rules :9090 alerts Alertmanager :9093 query Grafana Dashboards Panels & Graphs Alert Contacts :3000 view Browser Real-time Dashboards & Alerts

Knowing what is happening on your servers is not optional — it is essential. Whether you manage a single VPS or a fleet of production machines, real-time visibility into CPU usage, memory consumption, disk I/O, and network traffic is what separates proactive infrastructure management from firefighting. Prometheus and Grafana together form the most widely adopted open-source monitoring stack in the industry, powering observability at organizations from startups to enterprises.

This guide walks you through the complete process: installing Prometheus and node_exporter to collect system metrics, setting up Grafana for visualization, writing PromQL queries, configuring alerting with Alertmanager, and monitoring Docker containers with cAdvisor. By the end, you will have a fully functional monitoring stack running on Ubuntu.

Prerequisites

Before you begin, make sure you have:

  • A server running Ubuntu 22.04 or 24.04 (desktop or server edition)
  • Terminal access with sudo privileges
  • At least 2 GB of RAM and 20 GB of free disk space
  • Ports 9090 (Prometheus), 9100 (node_exporter), 3000 (Grafana), and 9093 (Alertmanager) available
  • Basic familiarity with the Linux command line and YAML syntax
  • Docker installed (optional, for cAdvisor section) — see our Docker installation guide

Note: All commands in this guide are for Ubuntu on the amd64 architecture. If you are running ARM64, adjust the download URLs to use the linux-arm64 binary variants.

What Are Prometheus and Grafana?

Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud in 2012. It collects metrics by scraping HTTP endpoints at regular intervals, stores them in a time-series database (TSDB), and provides a powerful query language called PromQL for analysis. Prometheus follows a pull-based model — it actively fetches metrics from your servers rather than waiting for data to be pushed.

Grafana is an open-source analytics and interactive visualization platform. It connects to data sources like Prometheus, Elasticsearch, InfluxDB, and PostgreSQL to render real-time dashboards with graphs, tables, heatmaps, and alerts. Grafana does not store data itself; it queries the underlying data source on demand.

Together, they form a complete monitoring pipeline:

  1. Exporters (like node_exporter) expose metrics on HTTP endpoints
  2. Prometheus scrapes and stores those metrics
  3. Grafana queries Prometheus and displays the data visually
  4. Alertmanager handles notifications when metrics cross defined thresholds

Monitoring Architecture Overview

Understanding the data flow is critical before deploying the stack:

┌─────────────────┐     scrape      ┌──────────────┐     query     ┌──────────────┐
│  node_exporter  │────────────────>│  Prometheus   │<─────────────│   Grafana    │
│  :9100          │  /metrics       │  :9090        │              │   :3000      │
└─────────────────┘                 │              │              │              │
                                    │  TSDB        │              │  Dashboards  │
┌─────────────────┐     scrape      │  PromQL      │              │  Panels      │
│  cAdvisor       │────────────────>│  Alert Rules  │              │  Alerts      │
│  :8080          │  /metrics       │              │              └──────────────┘
└─────────────────┘                 └──────┬───────┘
                                           │ fire alerts
                                    ┌──────▼───────┐
                                    │ Alertmanager  │──> Email / Slack / PagerDuty
                                    │ :9093         │
                                    └──────────────┘

Prometheus uses a pull model: it periodically sends HTTP GET requests to configured targets (exporters) to fetch the latest metrics. Each exporter exposes a /metrics endpoint that returns data in Prometheus exposition format. This architecture means targets do not need to know about Prometheus — they simply serve their metrics when asked.

Installing Prometheus

Start by creating a dedicated system user and the necessary directories:

sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus

Download the latest Prometheus release (check prometheus.io/download for the current version):

cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.53.0/prometheus-2.53.0.linux-amd64.tar.gz
tar xvf prometheus-2.53.0.linux-amd64.tar.gz
cd prometheus-2.53.0.linux-amd64

Copy the binaries and configuration files:

sudo cp prometheus promtool /usr/local/bin/
sudo cp -r consoles console_libraries /etc/prometheus/
sudo cp prometheus.yml /etc/prometheus/
sudo chown -R prometheus:prometheus /etc/prometheus

Verify the installation:

prometheus --version

You should see output displaying the version, build date, and Go version.

Configuring Prometheus

The main configuration file is /etc/prometheus/prometheus.yml. This YAML file defines global settings, scrape intervals, and target endpoints.

Create a clean configuration:

sudo nano /etc/prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s
  scrape_timeout: 10s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - localhost:9093

rule_files:
  - "alert_rules.yml"

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "node"
    static_configs:
      - targets: ["localhost:9100"]

Key configuration parameters:

  • scrape_interval: How often Prometheus scrapes targets (15 seconds is the recommended default)
  • evaluation_interval: How often Prometheus evaluates alerting rules
  • scrape_timeout: Maximum time to wait for a scrape response before marking the target as down
  • job_name: A label applied to all metrics collected from the targets in this group
  • static_configs: A list of target endpoints to scrape

Create a systemd service file for Prometheus:

sudo nano /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Monitoring System
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus/ \
  --web.console.templates=/etc/prometheus/consoles \
  --web.console.libraries=/etc/prometheus/console_libraries \
  --storage.tsdb.retention.time=30d \
  --web.enable-lifecycle
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=5s

[Install]
WantedBy=multi-user.target

Enable and start Prometheus:

sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus
sudo systemctl status prometheus

Prometheus should now be running and accessible at http://your-server-ip:9090.

Installing and Configuring node_exporter

node_exporter is the standard Prometheus exporter for hardware and OS-level metrics. It exposes CPU, memory, disk, filesystem, and network statistics on port 9100.

Create a dedicated user and download node_exporter:

sudo useradd --no-create-home --shell /bin/false node_exporter

cd /tmp
wget https://github.com/prometheus/node_exporter/releases/download/v1.8.1/node_exporter-1.8.1.linux-amd64.tar.gz
tar xvf node_exporter-1.8.1.linux-amd64.tar.gz
sudo cp node_exporter-1.8.1.linux-amd64/node_exporter /usr/local/bin/
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter

Create the systemd service:

sudo nano /etc/systemd/system/node_exporter.service
[Unit]
Description=Prometheus Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter \
  --collector.systemd \
  --collector.processes
Restart=on-failure
RestartSec=5s

[Install]
WantedBy=multi-user.target

Enable and start node_exporter:

sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter

Verify that metrics are being exposed:

curl -s http://localhost:9100/metrics | head -20

You should see lines beginning with # HELP and # TYPE followed by metric names and values. Confirm that Prometheus is scraping node_exporter by navigating to http://your-server-ip:9090/targets — the node job should show a status of UP.

Installing Grafana

Add the official Grafana APT repository:

sudo apt-get install -y apt-transport-https software-properties-common wget
sudo mkdir -p /etc/apt/keyrings/

wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null

echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee /etc/apt/sources.list.d/grafana.list

Install and start Grafana:

sudo apt-get update
sudo apt-get install grafana -y
sudo systemctl daemon-reload
sudo systemctl enable grafana-server
sudo systemctl start grafana-server

Verify Grafana is running:

sudo systemctl status grafana-server

Grafana is now accessible at http://your-server-ip:3000. The default credentials are admin / admin. You will be prompted to change the password on first login.

Security: Change the default admin password immediately. In production, configure Grafana behind a reverse proxy with HTTPS. See our Nginx reverse proxy guide for instructions.

Creating Your First Dashboard

After logging into Grafana, add Prometheus as a data source:

  1. Navigate to Connections > Data Sources > Add data source
  2. Select Prometheus
  3. Set the URL to http://localhost:9090
  4. Click Save & Test — you should see “Successfully queried the Prometheus API”

Import the widely-used Node Exporter Full dashboard:

  1. Navigate to Dashboards > New > Import
  2. Enter dashboard ID 1860 (Node Exporter Full by rfmoz)
  3. Select your Prometheus data source
  4. Click Import

This dashboard provides comprehensive panels for CPU usage, memory utilization, disk I/O, network traffic, filesystem usage, and system load — all without writing a single query.

To create a custom panel:

  1. Click Add > Visualization on any dashboard
  2. In the query editor, enter a PromQL expression such as:
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
  1. This calculates the overall CPU usage percentage
  2. Configure the panel title, legend, thresholds, and unit format
  3. Click Apply to save the panel

PromQL Basics

PromQL (Prometheus Query Language) is a functional query language that lets you select, aggregate, and transform time-series data. Understanding PromQL is essential for building useful dashboards and alert rules.

Instant vectors return the most recent value for each time series:

node_memory_MemAvailable_bytes

Range vectors return values over a time window:

node_cpu_seconds_total[5m]

rate() calculates the per-second average rate of increase (essential for counters):

rate(node_cpu_seconds_total{mode="idle"}[5m])

Aggregation operators combine multiple time series:

# Average CPU usage across all cores
avg without(cpu) (rate(node_cpu_seconds_total{mode="idle"}[5m]))

# Total memory across all instances
sum by(instance) (node_memory_MemTotal_bytes)

# Top 5 instances by CPU usage
topk(5, 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100))

Mathematical operations allow combining metrics:

# Memory usage percentage
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100

# Disk usage percentage
(1 - node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100

Label filtering uses curly braces:

# Filter by specific instance
node_cpu_seconds_total{instance="server1:9100", mode!="idle"}

# Regex match
node_filesystem_avail_bytes{mountpoint=~"/|/home"}

Setting Up Alerting Rules

Prometheus evaluates alerting rules at the evaluation_interval defined in the global configuration. When a rule condition is met, Prometheus fires an alert to Alertmanager, which handles deduplication, grouping, and routing to notification channels.

Create an alert rules file:

sudo nano /etc/prometheus/alert_rules.yml
groups:
  - name: system_alerts
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "CPU usage is above 85% for more than 5 minutes (current value: {{ $value }}%)"

      - alert: HighMemoryUsage
        expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 90
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High memory usage on {{ $labels.instance }}"
          description: "Memory usage is above 90% for more than 5 minutes (current value: {{ $value }}%)"

      - alert: DiskSpaceLow
        expr: (1 - node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 > 85
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Disk space low on {{ $labels.instance }}"
          description: "Root filesystem usage is above 85% (current value: {{ $value }}%)"

      - alert: InstanceDown
        expr: up == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} is down"
          description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 2 minutes."

Validate the rules file:

promtool check rules /etc/prometheus/alert_rules.yml

Now install Alertmanager:

cd /tmp
wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz
tar xvf alertmanager-0.27.0.linux-amd64.tar.gz

sudo useradd --no-create-home --shell /bin/false alertmanager
sudo mkdir -p /etc/alertmanager /var/lib/alertmanager
sudo cp alertmanager-0.27.0.linux-amd64/alertmanager /usr/local/bin/
sudo cp alertmanager-0.27.0.linux-amd64/amtool /usr/local/bin/
sudo chown alertmanager:alertmanager /usr/local/bin/alertmanager /usr/local/bin/amtool

Configure Alertmanager:

sudo nano /etc/alertmanager/alertmanager.yml
global:
  resolve_timeout: 5m

route:
  group_by: ['alertname', 'severity']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: 'email-notifications'

receivers:
  - name: 'email-notifications'
    email_configs:
      - to: 'admin@example.com'
        from: 'alertmanager@example.com'
        smarthost: 'smtp.example.com:587'
        auth_username: 'alertmanager@example.com'
        auth_password: 'your-app-password'
        require_tls: true

  - name: 'slack-notifications'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
        channel: '#alerts'
        send_resolved: true
        title: '{{ .GroupLabels.alertname }}'
        text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'

Create the Alertmanager systemd service:

sudo nano /etc/systemd/system/alertmanager.service
[Unit]
Description=Prometheus Alertmanager
Wants=network-online.target
After=network-online.target

[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \
  --config.file=/etc/alertmanager/alertmanager.yml \
  --storage.path=/var/lib/alertmanager/
Restart=on-failure
RestartSec=5s

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable alertmanager
sudo systemctl start alertmanager

Reload Prometheus to pick up the new alert rules:

sudo systemctl restart prometheus

Verify your alerts at http://your-server-ip:9090/alerts and check Alertmanager at http://your-server-ip:9093.

Monitoring Docker Containers with cAdvisor

cAdvisor (Container Advisor) provides container-level resource usage and performance metrics. It runs as a Docker container itself and exposes metrics in Prometheus format.

Start cAdvisor:

docker run -d \
  --name=cadvisor \
  --restart=always \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:ro \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --volume=/dev/disk/:/dev/disk:ro \
  --publish=8080:8080 \
  --privileged \
  --device=/dev/kmsg \
  gcr.io/cadvisor/cadvisor:v0.49.1

Add cAdvisor as a Prometheus scrape target in /etc/prometheus/prometheus.yml:

  - job_name: "cadvisor"
    static_configs:
      - targets: ["localhost:8080"]

Reload Prometheus:

curl -X POST http://localhost:9090/-/reload

Useful cAdvisor PromQL queries:

# Container CPU usage
rate(container_cpu_usage_seconds_total{name!=""}[5m])

# Container memory usage
container_memory_usage_bytes{name!=""}

# Container network received bytes
rate(container_network_receive_bytes_total{name!=""}[5m])

# Container filesystem usage
container_fs_usage_bytes{name!=""}

Import Grafana dashboard ID 14282 (cAdvisor Exporter) for pre-built container monitoring panels.

Useful PromQL Queries Reference

QueryDescription
upCheck if targets are reachable (1 = up, 0 = down)
rate(node_cpu_seconds_total{mode="idle"}[5m])CPU idle rate per core
node_memory_MemAvailable_bytes / 1024 / 1024Available memory in MB
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100Memory usage percentage
rate(node_disk_read_bytes_total[5m])Disk read throughput
rate(node_disk_written_bytes_total[5m])Disk write throughput
rate(node_network_receive_bytes_total[5m])Network incoming traffic
rate(node_network_transmit_bytes_total[5m])Network outgoing traffic
node_filesystem_avail_bytes{mountpoint="/"}Available disk space on root
node_load1 / node_load5 / node_load15System load averages
rate(node_cpu_seconds_total{mode="iowait"}[5m])I/O wait percentage
node_time_seconds - node_boot_time_secondsSystem uptime in seconds
count(node_cpu_seconds_total{mode="idle"})Number of CPU cores
rate(container_cpu_usage_seconds_total{name!=""}[5m])Container CPU usage

Troubleshooting

Prometheus fails to start:

Check the configuration syntax:

promtool check config /etc/prometheus/prometheus.yml

Check the service logs:

sudo journalctl -u prometheus -f --no-pager

Common issues include YAML indentation errors, invalid scrape intervals, and file permission problems on /var/lib/prometheus.

Target shows as DOWN in Prometheus:

Verify the exporter is running:

sudo systemctl status node_exporter
curl -v http://localhost:9100/metrics

Check the firewall:

sudo ufw status
sudo ufw allow 9100/tcp

Grafana cannot connect to Prometheus:

Ensure Prometheus is listening on the correct address. If Grafana and Prometheus are on the same server, use http://localhost:9090. Check connectivity:

curl http://localhost:9090/api/v1/query?query=up

No data in Grafana panels:

Verify the time range selector in the dashboard. The default might be outside your data retention window. Also confirm the Prometheus data source is selected in the panel query editor.

High memory usage by Prometheus:

Reduce the number of time series by limiting exporters or adding metric relabeling rules. Check the current memory and series count:

curl http://localhost:9090/api/v1/status/tsdb

Consider lowering retention time:

--storage.tsdb.retention.time=15d

Alertmanager not sending notifications:

Test the Alertmanager configuration:

amtool check-config /etc/alertmanager/alertmanager.yml

Send a test alert:

amtool alert add alertname=TestAlert severity=critical --alertmanager.url=http://localhost:9093

Summary

You now have a complete monitoring stack running on your server: Prometheus collecting metrics from node_exporter and cAdvisor, Grafana rendering real-time dashboards, and Alertmanager delivering notifications when things go wrong. This setup gives you deep visibility into system health, container performance, and resource utilization.

Key takeaways from this guide:

  • Prometheus scrapes metrics from targets at regular intervals and stores them in a time-series database
  • node_exporter provides system-level metrics (CPU, memory, disk, network)
  • Grafana visualizes metrics with customizable dashboards and panels
  • PromQL is the query language for selecting, filtering, and aggregating metrics
  • Alertmanager routes alerts to email, Slack, PagerDuty, or other notification channels
  • cAdvisor exposes container-level resource metrics for Docker environments

For the foundation this monitoring stack runs on, make sure your server is properly secured following our Linux server security checklist. If you have not yet set up Docker for the cAdvisor section, follow our Docker installation guide on Ubuntu.

As your infrastructure grows, explore Prometheus federation for multi-server setups, Grafana Loki for log aggregation, and Thanos or Cortex for long-term storage and high availability.