Is Vector a replacement for Fluentd or Logstash?

Yes. Vector handles sources, transforms, and sinks in a single binary with lower memory usage and higher throughput than Fluentd or Logstash, and can ingest from the same sources.

What language does Vector use for transforms?

Vector uses VRL (Vector Remap Language), a purpose-built expression language for parsing, enriching, and transforming observability events safely and with zero runtime panics.

Can Vector run as both a log agent and an aggregator?

Yes. Vector supports agent mode (DaemonSet per host collecting local logs), aggregator mode (centralized processor receiving from agents), and sidecar mode in Kubernetes pods.

Does Vector guarantee log delivery?

Vector provides at-least-once delivery semantics with end-to-end acknowledgments and disk buffers that survive restarts, preventing data loss during sink outages.

Vector: High-Performance Log Collection and Observability Pipelines

TL;DR — Quick Summary

Vector is a Rust-powered observability pipeline by Datadog. Collect logs, metrics, and traces from any source and route them to any sink with VRL transforms.

Vector is a high-performance observability data pipeline written in Rust, maintained by Datadog. It collects logs, metrics, and traces from any source, transforms them with VRL (Vector Remap Language), and routes them to any sink — all in a single statically linked binary that uses a fraction of the memory consumed by Logstash or Fluentd. This guide covers the full Vector stack: architecture, installation, configuration, VRL transforms, deployment patterns, delivery guarantees, and a production pipeline that collects Nginx logs and routes them to Loki and ClickHouse.

Prerequisites

Linux server (Ubuntu 22.04+, Debian 12+, RHEL 9+) or macOS for local testing.
Docker or Kubernetes cluster (optional, for container-based deployments).
Basic familiarity with YAML configuration.
Target sinks running and accessible (Loki, Elasticsearch, ClickHouse, S3, etc.).

Vector Architecture

Vector processes observability data as a directed acyclic graph (DAG) of sources → transforms → sinks.

Sources — ingest data from files, journals, sockets, HTTP, Kafka, cloud APIs.
Transforms — parse, enrich, filter, route, aggregate, or deduplicate events.
Sinks — deliver data to storage, search, metrics, and alerting systems.

Under the hood, Vector achieves its throughput through several Rust-native design choices:

Zero-copy parsing — log lines are parsed in-place without extra heap allocations.
Adaptive concurrency — Vector auto-tunes the number of in-flight requests to each sink using an AIMD algorithm, preventing overload without manual tuning.
End-to-end acknowledgments — a log line is not deleted from its source (or buffer) until all configured sinks confirm receipt.
Disk buffers — events overflow to disk when the in-memory queue fills, surviving process restarts and preventing loss during sink outages.

Vector supports two event types internally: log (structured key-value map with a timestamp) and metric (named value with optional tags and a kind — counter, gauge, histogram, set).

Installation

apt (Debian/Ubuntu)

curl -1sLf 'https://repositories.timber.io/public/vector/cfg/setup/bash.deb.sh' \
  | sudo -E bash
sudo apt install -y vector
sudo systemctl enable --now vector

yum/dnf (RHEL/Fedora/Rocky)

curl -1sLf 'https://repositories.timber.io/public/vector/cfg/setup/bash.rpm.sh' \
  | sudo -E bash
sudo yum install -y vector
sudo systemctl enable --now vector

Docker

docker run -d \
  --name vector \
  -v /var/log:/var/log:ro \
  -v $(pwd)/vector.yaml:/etc/vector/vector.yaml:ro \
  timberio/vector:latest-alpine

Kubernetes DaemonSet via Helm

helm repo add vector https://helm.vector.dev
helm repo update
helm install vector vector/vector \
  --namespace vector \
  --create-namespace \
  --values vector-values.yaml

A minimal vector-values.yaml for DaemonSet mode:

role: "Agent"
customConfig:
  data_dir: /vector-data-dir
  sources:
    kubernetes_logs:
      type: kubernetes_logs
  sinks:
    loki:
      type: loki
      inputs: ["kubernetes_logs"]
      endpoint: "http://loki.monitoring:3100"
      labels:
        namespace: "{{ '{{' }}kubernetes.namespace_labels.name{{ '}}' }}"

Homebrew (macOS)

brew tap vectordotdev/brew
brew install vector

Configuration

Vector reads vector.yaml (or vector.toml, vector.json) from /etc/vector/ by default. The top-level structure:

data_dir: /var/lib/vector   # disk buffer and state storage

sources:
  my_source:
    type: file
    include: ["/var/log/nginx/*.log"]

transforms:
  my_transform:
    type: remap
    inputs: ["my_source"]
    source: |
      . = parse_json!(string!(.message))

sinks:
  my_sink:
    type: loki
    inputs: ["my_transform"]
    endpoint: "http://loki:3100"
    labels:
      job: nginx

Every component has a unique string key. inputs wires components together — a sink’s inputs lists the transform or source it reads from.

Sources Reference

Source	Type key	Description
File tailing	`file`	Tail log files with glob patterns, tracks position
systemd journal	`journald`	Read from `journald` with unit filters
Docker containers	`docker_logs`	Collect from Docker daemon socket
Kubernetes pods	`kubernetes_logs`	Native k8s log collection with pod metadata
Syslog UDP/TCP	`syslog`	Receive RFC 3164 / RFC 5424 syslog messages
HTTP server	`http_server`	Expose an HTTP endpoint to receive pushed logs
Kafka	`kafka`	Consume from one or more Kafka topics
StatsD	`statsd`	Receive StatsD metrics over UDP
Host metrics	`host_metrics`	Collect CPU, memory, disk, network from the host
Internal metrics	`internal_metrics`	Vector’s own throughput and error counters

sources:
  nginx_access:
    type: file
    include: ["/var/log/nginx/access.log"]
    read_from: beginning

  journal:
    type: journald
    include_units: ["nginx.service", "postgresql.service"]

  k8s:
    type: kubernetes_logs
    auto_partial_merge: true

VRL — Vector Remap Language

VRL is the transform language at Vector’s core. It is a statically typed, expression-based language designed to be fast, safe, and readable. Every VRL program receives an event as . (dot) and must return the mutated event.

Parsing JSON

transforms:
  parse_json:
    type: remap
    inputs: ["nginx_access"]
    source: |
      . = parse_json!(string!(.message))

The ! suffix on a function call means “abort the event if this fails.” Events that fail are routed to the component’s dropped output, preventing bad data from poisoning downstream sinks.

Parsing Syslog

transforms:
  parse_syslog:
    type: remap
    inputs: ["syslog_in"]
    source: |
      parsed = parse_syslog!(.message)
      .severity = parsed.severity
      .facility = parsed.facility
      .appname = parsed.appname
      .message = parsed.message

Field Access, Coercion, and String Interpolation

source: |
  .level = upcase(string!(.level))
  .response_time_ms = to_int!(.response_time) * 1000
  .url = "https://" + string!(.host) + string!(.path)
  .log_line = "{{ '{{' }}{{ '}}' }}{{ '{{' }}.appname{{ '}}' }}: {{ '{{' }}.message{{ '}}' }}"

Conditional Logic and PII Redaction

source: |
  if .status_code >= 500 {
    .severity = "error"
  } else if .status_code >= 400 {
    .severity = "warn"
  } else {
    .severity = "info"
  }

  # Redact credit card numbers from message field
  .message = redact(.message, filters: [
    { type: "pattern", pattern: r'\b(?:\d[ -]*?){13,16}\b' }
  ])

  # Remove a field entirely
  del(.internal_trace_id)

Regex Parsing for Nginx Combined Log Format

transforms:
  parse_nginx:
    type: remap
    inputs: ["nginx_access"]
    source: |
      parsed, err = parse_regex(.message, r'^(?P<remote_addr>\S+) - (?P<user>\S+) \[(?P<time>[^\]]+)\] "(?P<method>\S+) (?P<path>\S+) (?P<protocol>[^"]+)" (?P<status>\d+) (?P<bytes>\d+)')
      if err == null {
        . = merge(., parsed)
        .status = to_int!(.status)
        .bytes = to_int!(.bytes)
        del(.message)
      }

Transforms Reference

Transform	Purpose
`remap`	Full VRL transform — parse, enrich, filter, mutate
`filter`	Drop events that don’t match a VRL condition
`route`	Split stream into named lanes by condition
`aggregate`	Combine multiple events into one (batching, windowing)
`dedupe`	Drop duplicate events within a time window
`sample`	Keep a configurable percentage of events
`reduce`	Merge a series of partial events into one complete event
`metric_to_log`	Convert metrics to log events for debugging
`log_to_metric`	Derive counters/gauges from log fields

Route Transform — Send Errors and Requests Separately

transforms:
  router:
    type: route
    inputs: ["parse_nginx"]
    route:
      errors: .status >= 500
      slow: .response_time_ms > 1000

sinks:
  loki_errors:
    type: loki
    inputs: ["router.errors"]
    endpoint: "http://loki:3100"
    labels:
      stream: errors

  loki_slow:
    type: loki
    inputs: ["router.slow"]
    endpoint: "http://loki:3100"
    labels:
      stream: slow_requests

  loki_all:
    type: loki
    inputs: ["router._unmatched"]
    endpoint: "http://loki:3100"
    labels:
      stream: access

Sinks Reference

Sink	Type key	Use Case
Loki	`loki`	Grafana log storage
Elasticsearch / OpenSearch	`elasticsearch`	Full-text search and analytics
ClickHouse	`clickhouse`	High-volume analytical log storage
S3 / R2	`aws_s3`	Long-term archival
Kafka	`kafka`	Re-publish to downstream consumers
Prometheus Exporter	`prometheus_exporter`	Expose metrics on `/metrics`
Prometheus Remote Write	`prometheus_remote_write`	Push metrics to Mimir/Thanos
Datadog Logs	`datadog_logs`	Send to Datadog ingestion
Splunk HEC	`splunk_hec_logs`	Send to Splunk HTTP Event Collector
Console	`console`	Local debugging (stdout)
HTTP	`http`	Generic webhook / custom receiver

Buffering and Delivery Guarantees

By default, Vector uses in-memory buffers (fastest, lost on crash). For production:

sinks:
  clickhouse_sink:
    type: clickhouse
    inputs: ["parse_nginx"]
    endpoint: "http://clickhouse:8123"
    database: logs
    table: nginx_access
    buffer:
      type: disk
      max_size: 268435456   # 256 MB
      when_full: block      # or "drop_newest"
    acknowledgements:
      enabled: true

With acknowledgements.enabled: true, Vector waits for the sink to confirm receipt before advancing the read position in the source. Combined with disk buffers, this achieves durable at-least-once delivery.

Unit Testing Transforms

Vector has a built-in test runner. Define tests in the same vector.yaml:

tests:
  - name: "parse_nginx parses 200 request correctly"
    inputs:
      - insert_at: parse_nginx
        type: log
        log_fields:
          message: '127.0.0.1 - - [23/Mar/2026:10:00:00 +0000] "GET /api/health HTTP/1.1" 200 42'
    outputs:
      - extract_from: parse_nginx
        conditions:
          - type: vrl
            source: |
              assert_eq!(.status, 200)
              assert_eq!(.method, "GET")
              assert_eq!(.path, "/api/health")

Run tests before deploying:

vector test /etc/vector/vector.yaml

Monitoring Vector

# Real-time component throughput (like top, but for Vector)
vector top

# Validate configuration without starting
vector validate /etc/vector/vector.yaml

# Generate a GraphViz topology diagram
vector graph /etc/vector/vector.yaml | dot -Tsvg > topology.svg

Expose Vector’s own metrics to Prometheus:

sources:
  internal_metrics:
    type: internal_metrics
    scrape_interval_secs: 15

sinks:
  prometheus:
    type: prometheus_exporter
    inputs: ["internal_metrics"]
    address: "0.0.0.0:9598"

Key metrics to alert on: vector_component_errors_total, vector_buffer_byte_size, vector_component_received_events_total, vector_component_sent_events_total.

Production Pipeline: Nginx Logs → Loki + ClickHouse

data_dir: /var/lib/vector

sources:
  nginx_raw:
    type: file
    include: ["/var/log/nginx/access.log"]
    read_from: beginning

transforms:
  parse_nginx:
    type: remap
    inputs: ["nginx_raw"]
    source: |
      parsed, err = parse_regex(.message,
        r'^(?P<remote_addr>\S+) - (?P<user>\S+) \[(?P<time_local>[^\]]+)\] "(?P<method>\S+) (?P<path>\S+) (?P<protocol>[^"]+)" (?P<status>\d+) (?P<bytes_sent>\d+) "(?P<referer>[^"]*)" "(?P<user_agent>[^"]*)"')
      if err == null {
        . = merge(., parsed)
        .status = to_int!(.status)
        .bytes_sent = to_int!(.bytes_sent)
        .timestamp = now()
        del(.message)
      }
  enrich:
    type: remap
    inputs: ["parse_nginx"]
    source: |
      .host = get_hostname!()
      if .status >= 500 {
        .level = "error"
      } else if .status >= 400 {
        .level = "warn"
      } else {
        .level = "info"
      }
  router:
    type: route
    inputs: ["enrich"]
    route:
      errors: .status >= 500

sinks:
  loki_errors:
    type: loki
    inputs: ["router.errors"]
    endpoint: "http://loki:3100"
    labels:
      job: nginx
      stream: errors
      host: "{{ '{{' }}.host{{ '}}' }}"
    buffer:
      type: disk
      max_size: 134217728
    acknowledgements:
      enabled: true

  loki_access:
    type: loki
    inputs: ["router._unmatched"]
    endpoint: "http://loki:3100"
    labels:
      job: nginx
      stream: access
    buffer:
      type: memory
      max_events: 50000

  clickhouse_access:
    type: clickhouse
    inputs: ["router._unmatched", "router.errors"]
    endpoint: "http://clickhouse:8123"
    database: logs
    table: nginx_access
    batch:
      max_bytes: 10485760
      timeout_secs: 10
    buffer:
      type: disk
      max_size: 268435456
    acknowledgements:
      enabled: true

Vector vs Alternatives

Feature	Vector	Fluentd	Fluent Bit	Logstash	Filebeat	Promtail	Grafana Alloy
Language	Rust	Ruby	C	JVM	Go	Go	Go
Memory usage	~20 MB	~200 MB	~1 MB	~512 MB	~50 MB	~30 MB	~60 MB
Throughput	Very High	Medium	High	Medium	High	Medium	High
Transform language	VRL	Fluentd DSL	Lua	Ruby/Grok	Minimal	Stages	Alloy DSL
Metrics pipeline	Yes	Plugin	Limited	Plugin	No	No	Yes
Disk buffers	Native	Plugin	Yes	Native	Native	No	Yes
Unit testing	Built-in	No	No	No	No	No	Limited
Kubernetes native	Yes	Yes	Yes	No	Yes	Yes	Yes
License	MPL-2.0	Apache 2.0	Apache 2.0	SSPL	Elastic	Apache 2.0	Apache 2.0

Gotchas and Edge Cases

VRL ! vs ? — parse_json! aborts the event on error (sends to dropped); parse_json? swallows the error and returns null. Choose intentionally.
File source position tracking — Vector stores offsets in data_dir. Deleting this directory causes re-ingestion of existing log files.
Kubernetes log rotation — set glob_minimum_cooldown_ms to avoid missing events during log rotation in high-throughput pods.
ClickHouse data types — Vector infers types from JSON; predefine your ClickHouse table schema and set skip_unknown_fields: true to avoid ingestion errors.
Loki label cardinality — avoid using high-cardinality fields (user IDs, request IDs) as Loki labels; use structured metadata instead.
Adaptive concurrency with slow sinks — if a sink has high latency, Vector will automatically reduce concurrency; set request.concurrency: fixed:N to override.

Summary

Vector is a Rust-based observability pipeline with a sources → transforms → sinks DAG model.
VRL handles all parsing, enrichment, filtering, and routing in a safe, testable language.
Deployment modes: agent (per host), aggregator (centralized), sidecar (Kubernetes pod).
Disk buffers + acknowledgments provide durable at-least-once delivery with no data loss on crashes.
Unit tests in vector.yaml validate transforms before production deployment.
vector top gives real-time per-component throughput; the internal Prometheus endpoint enables dashboarding.
Vector handles logs and metrics in a single pipeline, replacing Fluentd, Logstash, Filebeat, and Promtail simultaneously.

Vector: High-Performance Log Collection and Observability Pipelines

Prerequisites

Vector Architecture

Installation

apt (Debian/Ubuntu)

yum/dnf (RHEL/Fedora/Rocky)

Docker

Kubernetes DaemonSet via Helm

Homebrew (macOS)

Configuration

Sources Reference

VRL — Vector Remap Language

Parsing JSON

Parsing Syslog

Field Access, Coercion, and String Interpolation

Conditional Logic and PII Redaction

Regex Parsing for Nginx Combined Log Format

Transforms Reference

Route Transform — Send Errors and Requests Separately

Sinks Reference

Buffering and Delivery Guarantees

Unit Testing Transforms

Monitoring Vector

Production Pipeline: Nginx Logs → Loki + ClickHouse

Vector vs Alternatives

Gotchas and Edge Cases

Summary

Guide & Instructions

Install Vector

Configure sources

Add VRL transforms

Route to multiple sinks

Enable disk buffers and acknowledgments

Monitor Vector with vector top

Frequently Asked Questions

Prerequisites

Vector Architecture

Installation

apt (Debian/Ubuntu)

yum/dnf (RHEL/Fedora/Rocky)

Docker

Kubernetes DaemonSet via Helm

Homebrew (macOS)

Configuration

Sources Reference

VRL — Vector Remap Language

Parsing JSON

Parsing Syslog

Field Access, Coercion, and String Interpolation

Conditional Logic and PII Redaction

Regex Parsing for Nginx Combined Log Format

Transforms Reference

Route Transform — Send Errors and Requests Separately

Sinks Reference

Buffering and Delivery Guarantees

Unit Testing Transforms

Monitoring Vector

Production Pipeline: Nginx Logs → Loki + ClickHouse

Vector vs Alternatives

Gotchas and Edge Cases

Summary

Related Articles

Guide & Instructions

Install Vector

Configure sources

Add VRL transforms

Route to multiple sinks

Enable disk buffers and acknowledgments

Monitor Vector with vector top

Frequently Asked Questions

Related Articles

OpenObserve: A Lightweight Log, Metrics, and Traces Platform

Envoy Proxy: Service Mesh and Edge Proxy Networking Guide

OpenTelemetry Guide: Unified Observability for Traces, Metrics, and Logs