TL;DR — Quick Summary

DuckDB is an in-process SQL analytics database. Query CSV, Parquet, and JSON files directly with SQL — no server, no setup. The SQLite of analytics.

DuckDB is SQL analytics without the server. Query CSV files with millions of rows, analyze Parquet datasets, and run complex aggregations — all from a single binary or a Python import. No database server, no setup, no configuration.

Installation

# macOS
brew install duckdb

# Python
pip install duckdb

# Node.js
npm install duckdb

# CLI binary
# Download from https://duckdb.org/docs/installation/

# Start interactive CLI
duckdb

Querying Files Directly

-- Query CSV
SELECT * FROM 'sales.csv' LIMIT 10;

-- Query with aggregation
SELECT category, SUM(revenue), COUNT(*)
FROM 'sales.csv'
GROUP BY category
ORDER BY SUM(revenue) DESC;

-- Query Parquet (much faster for large data)
SELECT * FROM 'data.parquet' WHERE date > '2025-01-01';

-- Glob patterns — query all files
SELECT * FROM 'logs/*.csv';
SELECT * FROM 'data/year=*/month=*/*.parquet';

-- Query JSON
SELECT * FROM read_json_auto('events.json');

-- Query remote files
SELECT * FROM 'https://example.com/data.csv';

Python Integration

import duckdb

# Query CSV directly
result = duckdb.sql("SELECT * FROM 'sales.csv'").df()

# Query pandas DataFrame
import pandas as pd
df = pd.read_csv('large_data.csv')
result = duckdb.sql("SELECT category, AVG(price) FROM df GROUP BY category").df()

# Persistent database
con = duckdb.connect('my_analytics.db')
con.sql("CREATE TABLE sales AS SELECT * FROM 'sales.csv'")
con.sql("SELECT * FROM sales WHERE revenue > 1000").show()

ETL Pipelines

-- CSV to Parquet (huge compression + speed gain)
COPY (SELECT * FROM 'raw_data.csv') TO 'optimized.parquet' (FORMAT PARQUET);

-- Combine multiple CSVs into one Parquet
COPY (SELECT * FROM 'logs/*.csv') TO 'all_logs.parquet' (FORMAT PARQUET);

-- Export to JSON
COPY (SELECT * FROM analysis) TO 'results.json' (FORMAT JSON);

Comparison

FeatureDuckDBSQLitePostgreSQLpandas
TypeOLAPOLTPOLTP/OLAPLibrary
ServerNoNoYesNo
Query CSVDirectNoCOPYread_csv
Query ParquetDirectNoExtensionread_parquet
ColumnarYesNoNoYes
SQL SupportFullFullFullLimited
Speed (analytics)FastSlowFastModerate
Language bindingsManyManyManyPython

Summary

  • DuckDB runs analytical SQL queries without a server — embed in any application
  • Query CSV, Parquet, JSON, and Excel files directly with standard SQL
  • Columnar storage and vectorized execution for fast analytical queries
  • Python, R, Node.js, Java, and many more language bindings
  • Perfect for ETL pipelines, data exploration, and ad-hoc analytics