TL;DR — Quick Summary
DuckDB is an in-process SQL analytics database. Query CSV, Parquet, and JSON files directly with SQL — no server, no setup. The SQLite of analytics.
DuckDB is SQL analytics without the server. Query CSV files with millions of rows, analyze Parquet datasets, and run complex aggregations — all from a single binary or a Python import. No database server, no setup, no configuration.
Installation
# macOS
brew install duckdb
# Python
pip install duckdb
# Node.js
npm install duckdb
# CLI binary
# Download from https://duckdb.org/docs/installation/
# Start interactive CLI
duckdb
Querying Files Directly
-- Query CSV
SELECT * FROM 'sales.csv' LIMIT 10;
-- Query with aggregation
SELECT category, SUM(revenue), COUNT(*)
FROM 'sales.csv'
GROUP BY category
ORDER BY SUM(revenue) DESC;
-- Query Parquet (much faster for large data)
SELECT * FROM 'data.parquet' WHERE date > '2025-01-01';
-- Glob patterns — query all files
SELECT * FROM 'logs/*.csv';
SELECT * FROM 'data/year=*/month=*/*.parquet';
-- Query JSON
SELECT * FROM read_json_auto('events.json');
-- Query remote files
SELECT * FROM 'https://example.com/data.csv';
Python Integration
import duckdb
# Query CSV directly
result = duckdb.sql("SELECT * FROM 'sales.csv'").df()
# Query pandas DataFrame
import pandas as pd
df = pd.read_csv('large_data.csv')
result = duckdb.sql("SELECT category, AVG(price) FROM df GROUP BY category").df()
# Persistent database
con = duckdb.connect('my_analytics.db')
con.sql("CREATE TABLE sales AS SELECT * FROM 'sales.csv'")
con.sql("SELECT * FROM sales WHERE revenue > 1000").show()
ETL Pipelines
-- CSV to Parquet (huge compression + speed gain)
COPY (SELECT * FROM 'raw_data.csv') TO 'optimized.parquet' (FORMAT PARQUET);
-- Combine multiple CSVs into one Parquet
COPY (SELECT * FROM 'logs/*.csv') TO 'all_logs.parquet' (FORMAT PARQUET);
-- Export to JSON
COPY (SELECT * FROM analysis) TO 'results.json' (FORMAT JSON);
Comparison
| Feature | DuckDB | SQLite | PostgreSQL | pandas |
|---|---|---|---|---|
| Type | OLAP | OLTP | OLTP/OLAP | Library |
| Server | No | No | Yes | No |
| Query CSV | Direct | No | COPY | read_csv |
| Query Parquet | Direct | No | Extension | read_parquet |
| Columnar | Yes | No | No | Yes |
| SQL Support | Full | Full | Full | Limited |
| Speed (analytics) | Fast | Slow | Fast | Moderate |
| Language bindings | Many | Many | Many | Python |
Summary
- DuckDB runs analytical SQL queries without a server — embed in any application
- Query CSV, Parquet, JSON, and Excel files directly with standard SQL
- Columnar storage and vectorized execution for fast analytical queries
- Python, R, Node.js, Java, and many more language bindings
- Perfect for ETL pipelines, data exploration, and ad-hoc analytics