How much RAM does Elasticsearch need for log analysis?

Elasticsearch relies heavily on the JVM heap and filesystem cache. Allocate 50% of available RAM to the JVM heap (up to 31 GB maximum to stay within compressed oops) and leave the other 50% for the OS filesystem cache, which Elasticsearch uses for Lucene segment reads. For a small environment processing under 50 GB of logs per day, 16 GB of total RAM per node is a reasonable starting point. For larger deployments, 32-64 GB per node is common.

What is the difference between Elasticsearch and OpenSearch?

OpenSearch is a fork of Elasticsearch 7.10 created by AWS after Elastic changed the Elasticsearch license from Apache 2.0 to SSPL. Both share the same core search engine fundamentals, but they have diverged in features since the fork. Elasticsearch 8.x includes features like native vector search and improved security defaults. OpenSearch has its own plugin ecosystem. Configuration concepts from this guide apply to both with minor differences in package names and default settings.

Should I use Filebeat or Logstash to ship logs?

Use Filebeat for lightweight log collection on servers — it reads log files, handles backpressure, and sends data to Elasticsearch or Logstash with minimal resource usage. Use Logstash when you need complex parsing, enrichment, or transformation (such as grok patterns for unstructured logs, GeoIP lookups, or routing to multiple outputs). In most deployments, Filebeat runs on every server and ships to a centralized Logstash instance that processes and forwards to Elasticsearch.

How do I prevent Elasticsearch from running out of disk space?

Configure Index Lifecycle Management (ILM) policies to automatically roll over indices when they reach a size or age threshold, and delete old indices after a retention period. Elasticsearch also has built-in disk watermarks: at 85% usage it stops allocating new shards (low watermark), at 90% it relocates shards away (high watermark), and at 95% it sets all indices to read-only (flood stage). Monitor disk usage and set your ILM retention to stay well below these thresholds.

Elasticsearch Setup for Log Analysis: Installation and Configuration

Elasticsearch is a distributed search and analytics engine built on Apache Lucene, designed for horizontal scalability and near-real-time search. When combined with Logstash (or Filebeat) for collection and Kibana for visualization, it forms the ELK Stack — one of the most widely deployed solutions for centralized log analysis, infrastructure monitoring, and security event management.

This guide walks through setting up a production-ready Elasticsearch environment for log analysis on Ubuntu Linux, covering everything from initial installation through cluster security and lifecycle management.

Prerequisites

Before you begin, ensure you have:

Ubuntu Server 22.04 LTS or 24.04 LTS with at least 8 GB RAM (16 GB recommended).
Root or sudo access.
Sufficient disk space: Estimate your daily log volume and multiply by your retention period. SSD storage is strongly recommended for Elasticsearch data.
Open ports: 9200 (Elasticsearch HTTP), 9300 (Elasticsearch transport), 5601 (Kibana), 5044 (Logstash Beats input).

Check available memory and disk:

free -h
df -h /

Installing Elasticsearch

Add the Elastic Repository

Import the Elastic GPG key and add the repository:

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elastic-keyring.gpg

echo "deb [signed-by=/usr/share/keyrings/elastic-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list

Install and Start Elasticsearch

sudo apt update
sudo apt install elasticsearch -y

During installation, Elasticsearch 8.x generates a superuser password and enrollment tokens. Save the output — you’ll need the password for initial setup:

The generated password for the elastic built-in superuser is: <password>

Configure Elasticsearch to start on boot:

sudo systemctl daemon-reload
sudo systemctl enable elasticsearch

Before starting, configure the cluster settings.

Cluster Configuration

Edit the main Elasticsearch configuration file:

sudo nano /etc/elasticsearch/elasticsearch.yml

Single-Node Development Setup

For a single-node deployment (development or small environments):

cluster.name: log-analysis
node.name: es-node-01
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.type: single-node
xpack.security.enabled: true
xpack.security.enrollment.enabled: true
xpack.security.http.ssl.enabled: false
xpack.security.transport.ssl.enabled: false

Multi-Node Production Cluster

For a three-node cluster:

Node 1 (es-node-01):

cluster.name: log-analysis
node.name: es-node-01
node.roles: [master, data, ingest]
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 192.168.1.50
http.port: 9200
discovery.seed_hosts: ["192.168.1.50", "192.168.1.51", "192.168.1.52"]
cluster.initial_master_nodes: ["es-node-01", "es-node-02", "es-node-03"]

Nodes 2 and 3 use the same configuration with their respective node.name and network.host values.

Start Elasticsearch

sudo systemctl start elasticsearch

Verify it’s running:

curl -u elastic:YourPassword -X GET "localhost:9200"

Expected response:

{
  "name": "es-node-01",
  "cluster_name": "log-analysis",
  "cluster_uuid": "...",
  "version": {
    "number": "8.17.0",
    "build_flavor": "default",
    "build_type": "deb"
  },
  "tagline": "You Know, for Search"
}

Check cluster health:

curl -u elastic:YourPassword "localhost:9200/_cluster/health?pretty"

A green status means all primary and replica shards are allocated.

Installing and Configuring Kibana

Install Kibana from the same repository:

sudo apt install kibana -y

Configure Kibana:

sudo nano /etc/kibana/kibana.yml

server.port: 5601
server.host: "0.0.0.0"
server.name: "kibana-server"
elasticsearch.hosts: ["http://192.168.1.50:9200"]
elasticsearch.username: "kibana_system"
elasticsearch.password: "YourKibanaPassword"

Set the kibana_system user password in Elasticsearch:

sudo /usr/share/elasticsearch/bin/elasticsearch-reset-password -u kibana_system -i

Start Kibana:

sudo systemctl enable kibana
sudo systemctl start kibana

Access Kibana at http://your-server:5601 and log in with the elastic user.

Setting Up Logstash

Install Logstash:

sudo apt install logstash -y

Creating a Pipeline Configuration

Logstash pipelines define how data flows from input through filters to output. Create a pipeline for syslog data:

sudo nano /etc/logstash/conf.d/syslog.conf

input {
  beats {
    port => 5044
  }
}

filter {
  if [fields][log_type] == "syslog" {
    grok {
      match => {
        "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}"
      }
    }
    date {
      match => [ "syslog_timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
    }
    mutate {
      remove_field => [ "message" ]
    }
  }
}

output {
  elasticsearch {
    hosts => ["http://192.168.1.50:9200"]
    user => "elastic"
    password => "YourPassword"
    index => "syslog-%{+YYYY.MM.dd}"
  }
}

Nginx Access Log Pipeline

sudo nano /etc/logstash/conf.d/nginx.conf

input {
  beats {
    port => 5045
  }
}

filter {
  if [fields][log_type] == "nginx_access" {
    grok {
      match => {
        "message" => '%{IPORHOST:client_ip} - %{DATA:user} \[%{HTTPDATE:timestamp}\] "%{WORD:method} %{URIPATHPARAM:request} HTTP/%{NUMBER:http_version}" %{NUMBER:status_code} %{NUMBER:bytes} "%{DATA:referrer}" "%{DATA:user_agent}"'
      }
    }
    date {
      match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
    }
    geoip {
      source => "client_ip"
      target => "geoip"
    }
    useragent {
      source => "user_agent"
      target => "ua"
    }
    mutate {
      convert => {
        "status_code" => "integer"
        "bytes" => "integer"
      }
    }
  }
}

output {
  elasticsearch {
    hosts => ["http://192.168.1.50:9200"]
    user => "elastic"
    password => "YourPassword"
    index => "nginx-access-%{+YYYY.MM.dd}"
  }
}

Start Logstash:

sudo systemctl enable logstash
sudo systemctl start logstash

Configuring Filebeat on Source Servers

Install Filebeat on each server whose logs you want to collect:

sudo apt install filebeat -y

Configure Filebeat:

sudo nano /etc/filebeat/filebeat.yml

filebeat.inputs:
  - type: filestream
    id: syslog
    enabled: true
    paths:
      - /var/log/syslog
      - /var/log/auth.log
    fields:
      log_type: syslog

  - type: filestream
    id: nginx-access
    enabled: true
    paths:
      - /var/log/nginx/access.log
    fields:
      log_type: nginx_access

output.logstash:
  hosts: ["192.168.1.50:5044"]

processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~

Start Filebeat:

sudo systemctl enable filebeat
sudo systemctl start filebeat

Verify logs are flowing:

curl -u elastic:YourPassword "localhost:9200/_cat/indices?v&s=index"

You should see syslog-* and nginx-access-* indices appearing.

Index Lifecycle Management (ILM)

ILM automates index management to prevent unbounded disk growth. Create a policy that moves indices through hot, warm, and delete phases:

curl -u elastic:YourPassword -X PUT "localhost:9200/_ilm/policy/logs-policy" \
  -H 'Content-Type: application/json' -d'
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_primary_shard_size": "30gb",
            "max_age": "1d"
          },
          "set_priority": {
            "priority": 100
          }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          },
          "forcemerge": {
            "max_num_segments": 1
          },
          "set_priority": {
            "priority": 50
          }
        }
      },
      "delete": {
        "min_age": "30d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}'

This policy rolls over indices daily or at 30 GB, moves them to warm storage after 7 days (merging segments for efficiency), and deletes them after 30 days.

Apply ILM to an Index Template

curl -u elastic:YourPassword -X PUT "localhost:9200/_index_template/logs-template" \
  -H 'Content-Type: application/json' -d'
{
  "index_patterns": ["syslog-*", "nginx-access-*"],
  "template": {
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 1,
      "index.lifecycle.name": "logs-policy",
      "index.lifecycle.rollover_alias": "logs"
    }
  }
}'

JVM and System Tuning

JVM Heap Size

Set the heap to 50% of available RAM, up to 31 GB:

sudo nano /etc/elasticsearch/jvm.options.d/heap.options

-Xms8g
-Xmx8g

Both values must be equal to prevent heap resizing during operation.

OS-Level Tuning

Elasticsearch requires several kernel parameter adjustments:

# Increase virtual memory map count
echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.d/99-elasticsearch.conf
sudo sysctl -p /etc/sysctl.d/99-elasticsearch.conf

# Disable swap (Elasticsearch should never swap)
sudo swapoff -a
# Remove swap entries from /etc/fstab to persist across reboots

Set file descriptor limits:

sudo tee -a /etc/security/limits.conf > /dev/null << 'EOF'
elasticsearch  -  nofile  65535
elasticsearch  -  nproc   4096
EOF

Restart Elasticsearch after making these changes:

sudo systemctl restart elasticsearch

Security Hardening

Firewall Rules

Restrict Elasticsearch and Kibana ports to trusted networks only:

sudo ufw allow from 192.168.1.0/24 to any port 9200 comment "Elasticsearch HTTP"
sudo ufw allow from 192.168.1.0/24 to any port 9300 comment "Elasticsearch transport"
sudo ufw allow from 192.168.1.0/24 to any port 5601 comment "Kibana"
sudo ufw allow from 192.168.1.0/24 to any port 5044 comment "Logstash Beats"

Create Dedicated Users

Avoid using the elastic superuser for daily operations. Create role-specific users:

# Create a read-only user for Kibana dashboards
curl -u elastic:YourPassword -X POST "localhost:9200/_security/user/dashboard_viewer" \
  -H 'Content-Type: application/json' -d'
{
  "password": "ViewerPass123",
  "roles": ["viewer"],
  "full_name": "Dashboard Viewer"
}'

Enable Audit Logging

# Add to elasticsearch.yml
xpack.security.audit.enabled: true
xpack.security.audit.logfile.events.include: ["authentication_failed", "access_denied", "connection_denied"]

Monitoring and Troubleshooting

Cluster Health

# Overall health
curl -u elastic:YourPassword "localhost:9200/_cluster/health?pretty"

# Node stats
curl -u elastic:YourPassword "localhost:9200/_nodes/stats?pretty" | head -50

# Index sizes
curl -u elastic:YourPassword "localhost:9200/_cat/indices?v&s=store.size:desc"

# Shard allocation
curl -u elastic:YourPassword "localhost:9200/_cat/shards?v&s=index"

Common Issues

Cluster status is yellow: This usually means replica shards are unassigned, often because you have a single-node cluster. For single-node setups, set replicas to zero:

curl -u elastic:YourPassword -X PUT "localhost:9200/_settings" \
  -H 'Content-Type: application/json' -d'{"index.number_of_replicas": 0}'

Elasticsearch won’t start — bootstrap checks failed: Check the logs:

sudo journalctl -u elasticsearch --no-pager | tail -30

Common causes: insufficient vm.max_map_count, heap size not set, or file descriptor limits too low.

Logstash pipeline not processing: Test the pipeline configuration:

sudo /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/syslog.conf

High disk usage — indices not being deleted: Verify ILM is active:

curl -u elastic:YourPassword "localhost:9200/_ilm/policy/logs-policy?pretty"
curl -u elastic:YourPassword "localhost:9200/syslog-*/_ilm/explain?pretty"

Useful Kibana Queries (KQL)

Once your data is flowing, use Kibana’s Discover tab with these queries:

# Find all 5xx errors in Nginx logs
status_code >= 500

# Search for authentication failures
syslog_program: "sshd" AND syslog_message: "Failed password"

# Find logs from a specific host
host.name: "web-server-01" AND syslog_program: "kernel"

# Time-scoped queries
@timestamp >= "2026-01-05T00:00:00" AND @timestamp < "2026-01-06T00:00:00"

Summary

A well-configured ELK Stack transforms scattered log files across dozens of servers into a searchable, visualizable, and alertable centralized system. The key points from this guide:

Install Elasticsearch, Kibana, and Logstash from the official Elastic repository to keep versions synchronized.
Use single-node discovery for development; deploy at least three nodes for production resilience.
Deploy Filebeat on every source server as a lightweight shipper; use Logstash for parsing and enrichment.
Configure ILM policies from day one to automate index rollover and deletion — retroactive cleanup is painful.
Set JVM heap to 50% of RAM (max 31 GB) and disable swap entirely.
Restrict network access with firewall rules and create role-based users instead of sharing the superuser credentials.
Monitor cluster health daily and set up Kibana alerts for disk usage, cluster status changes, and log volume anomalies.

With these foundations in place, you can expand your deployment to handle application logs, security events, metrics, and traces — building a comprehensive observability platform from a solid Elasticsearch core.