Top recent duckdb news, how-tos and comparisions

DuckDB Storage Engine for MariaDB. When the Sea Lion Learns to Quack.
2026-06-09
MariaDB has introduced a new DuckDB storage engine that allows users to run high-speed analytical queries directly on the same server as transactional data. This integration enables users to perform complex joins between different data types without needing separate systems or ETL pipelines. It is designed for hybrid transactional and analytical processing, offering faster performance for large-scale data analysis while maintaining a familiar SQL interface.
The tiniest logging stack: Fluent Bit, Parquet and DuckDB
2026-06-04
The article describes a lightweight logging stack using Fluent Bit, Parquet files, and DuckDB for small environments. It explains how to efficiently store logs in S3, use Hive partitioning for better performance, and query them using SQL in Grafana. The author also highlights techniques like buffering, aggregation, and compaction to manage file sizes and costs.
New DuckDB-Iceberg Features in v1.5.3
2026-05-29
DuckDB version 1.5.3 introduces several new features for Iceberg tables, including MERGE INTO support and the ability to use ALTER TABLE for schema evolution. The update also adds support for bucket and truncate partition transforms and the latest Iceberg v3 specifications. These improvements allow users to perform more complex data operations and manage metadata more efficiently within the DuckLake ecosystem.
A Double Shot of DuckDB: Vector Similarity Search and Quack
2026-05-28
This article explores DuckDB's new vector similarity search (VSS) and the Quack protocol for client-server communication. The author demonstrates how these features can work together to route and search data across different database instances. The post includes a practical example using image embeddings to show how DuckDB can handle complex, distributed data tasks.
Why MotherDuck refuses to fork DuckDB
2026-05-27
MotherDuck's AI lead explains why the startup chooses to collaborate with the DuckDB foundation instead of forking the open-source project. The company shares data from its large database fleet to help improve the core software for everyone. This partnership allows MotherDuck to grow commercially while supporting the growth of the open-source community.
Benchmarking Vortex File Format vs. Parquet, CSV vs. DuckDB, Polars, Datafusion
2026-05-25
The article compares the new Vortex file format against popular formats like Parquet and CSV using Python tools. While Vortex shows promising speed for scans and random access, the author notes that its current Python integrations are still early and can be unreliable. Ultimately, the author suggests that Vortex may not yet be ready for production due to these integration issues.
Test-Driving the Lance Lakehouse Format in DuckDB
2026-05-21
Lance is an open lakehouse format designed for AI workloads, integrating seamlessly with DuckDB to enable fast vector and hybrid search directly from SQL without leaving the analytical workflow. This partnership allows users to query Lance datasets using familiar SQL interfaces while adding capabilities for AI-oriented access patterns such as vector similarity and full-text searches. The recent benchmark demonstrates that Lance outperforms traditional formats like Parquet, especially in cold runs for tasks involving vector and hybrid searches, making it a suitable alternative for multimodal datasets with embeddings and large binary data.
DuckDB 1.5.3: Not an Ordinary Patch Release
2026-05-20
DuckDB version 1.5.3 is released, focusing on upgrading extensions over minor bug fixes. Key highlights include the introduction of Quack as a core extension for client-server communication, enhanced support for the Iceberg data format with new features like MERGE INTO and better partitioned table operations, and improvements in AWS integration for IAM role authentication. Additionally, the HTTPS extension now respects the HTTP_PROXY environment variable.
Optimising DuckDB performance on large EC2 instances
2026-05-18
This article explores optimizing DuckDB performance on large EC2 instances for data engineering workloads. It highlights that while using many threads can sometimes degrade performance due to inefficiencies and memory bandwidth limits, certain queries benefit from full core utilization. Key recommendations include experimenting with thread settings, leveraging instance store SSDs or high EBS throughput, and utilizing the AWS S3 CRT transfer client for faster uploads/downloads. The findings suggest that DuckDB's performance on large instances is highly dependent on workload type and configuration.
Five LLM agents play Werewolf in-browser, each with a private DuckDB
2026-05-15
This article presents a demonstration of multi-agent information asymmetry enforced at the database layer using DuckDB-WASM inside Web Workers for each agent in a Werewolf game. A gateway worker mediates data sharing between agents with scoped tokens, ensuring that only authorized information is exchanged. The setup allows local inference without server-side processing or supports hosted APIs with provided keys. The architecture emphasizes enforcing information asymmetry through database schema design rather than application code, enabling private reasoning and public statements to remain separate throughout the game.
Quack and VGI: Two Approaches to Bringing RPC to DuckDB
2026-05-12
Quack and VGI are two approaches to enable remote procedure calls (RPC) with DuckDB. Quack is a new binary RPC protocol that turns DuckDB into both a client and server, allowing queries across networks through simple SQL attachments. It uses a concise set of message types for connection, query preparation, data fetching, and disconnection. VGI, or Vector Gateway Interface, extends DuckDB to embed remote functions written in any language supporting Apache Arrow IPC within DuckDB queries, acting as a bridge between DuckDB and external services or models. While both aim at remote computation integration, Quack focuses on federating other DuckDB instances with minimal setup, whereas VGI offers broader compatibility by abstracting the remote function's implementation into an RPC interface. The two projects diverge in wire formats, session cursor management, and deployment strategies (e.g., subprocess vs HTTP transport), catering to different scalability and flexibility needs.
Quack: The DuckDB Client-Server Protocol
2026-05-12
DuckDB has introduced the Quack protocol, enabling multiple DuckDB instances to communicate in a client-server setup. This allows concurrent writing and expands DuckDB's usability beyond its initial niche of an in-process database for interactive analytics. Quack is built on HTTP, supports authentication and authorization through extensible callbacks, and performs well in bulk data transfer and small write operations. Future plans include integration with DuckLake and enhancements to transaction scalability and protocol extension capabilities.
DUCKDB MONTHLY #41: DUCKDB INTERNALS COURSE, FTS WALKTHROUGH, AND A SATELLITE PIPELINE WITH H3 + PARQUET
2026-05-11
The May DuckDB Monthly newsletter, authored by Simon Spti, highlights recent developments and educational content within the DuckDB ecosystem. Key features include Torsten's comprehensive 15-week internal course at the University of Tbingen, Pete's practical guide on implementing full-text search for large email datasets, and Mark's pipeline converting satellite data into H3-cell heatmaps using ZSTD-compressed Parquet files. Additionally, Adam introduces an open-source extension for automatic column-level data lineage, and DuckDB 1.5.0 "Variegata" is released with improvements in CLI ergonomics and native support for semi-structured and geospatial data types. Upcoming events include a meetup in San Francisco on May 21st and The Dive at the Snowflake Summit from June 1-3.
Report with all data