Abstract


  • “SQLite for analytics”: lightweight, embedded, great for data scientists, local OLAP

Vectors vs Rows


  • Vectorized execution: processes data in chunks of ~2k values per column (default STANDARD_VECTOR_SIZE ≈ 2048)
  • Why it’s fast: scans, filters, and aggregates run on batches instead of one row at a time

When it’s less ideal: if you need most columns of every row

Some mitigations:

  • Late materialization & selection vectors → only fetch extra columns for rows that survive filters.
  • Vectorized operators → gather multiple columns in cache-friendly batches.

SIMD Acceleration


  • SIMD within vectors: one instruction processes many values at once (AVX2/AVX-512, ARM NEON).
  • DuckDB uses hand-written C++ intrinsics for comparisons, masks, null handling, aggregates, and compiler auto-vectorization for tight loops.
  • Tricks used: bitmask filters, bit-packed decoding, branchless comparisons, null-aware ops.

Embedded Design & Threads


  • Embedded = no client/server overhead: no sockets or IPC → query start-up is near-zero.
  • Tasks run across a thread pool → not just “one thread per query.”

Storage Formats


  • CSV: slowest for storage (good only for ad-hoc).
  • Parquet: columnar, compressed, supports predicate/column pushdown → best for data lakes.
  • .duckdb: native persistent format for local DBs.
  • Arrow/IPC: zero-copy interchange with other tools.
  • With Parquet/duckdb → read only needed columns and row groups.