DuckDB

Abstract

“SQLite for analytics”: lightweight, embedded, great for data scientists, local OLAP

Vectors vs Rows

Vectorized execution: processes data in chunks of ~2k values per column (default STANDARD_VECTOR_SIZE ≈ 2048)
Why it’s fast: scans, filters, and aggregates run on batches instead of one row at a time

When it’s less ideal: if you need most columns of every row

Some mitigations:

Late materialization & selection vectors → only fetch extra columns for rows that survive filters.

Vectorized operators → gather multiple columns in cache-friendly batches.

SIMD Acceleration

SIMD within vectors: one instruction processes many values at once (AVX2/AVX-512, ARM NEON).
DuckDB uses hand-written C++ intrinsics for comparisons, masks, null handling, aggregates, and compiler auto-vectorization for tight loops.
Tricks used: bitmask filters, bit-packed decoding, branchless comparisons, null-aware ops.

Embedded Design & Threads

Embedded = no client/server overhead: no sockets or IPC → query start-up is near-zero.
Tasks run across a thread pool → not just “one thread per query.”

Storage Formats

CSV: slowest for storage (good only for ad-hoc).
Parquet: columnar, compressed, supports predicate/column pushdown → best for data lakes.
.duckdb: native persistent format for local DBs.
Arrow/IPC: zero-copy interchange with other tools.
With Parquet/duckdb → read only needed columns and row groups.

CS Notes

Recent writing

CUDA Synchronization

Postgres

Fly.io

Explorer

DuckDB

Abstract

Vectors vs Rows

SIMD Acceleration

Embedded Design & Threads

Storage Formats

Table of Contents

Graph View