Top recent duckdb news, how-tos and comparisions

INSTANT SQL IS HERE: SPEEDRUN AD-HOC QUERIES AS YOU TYPE

2025-04-23

MotherDuck has launched a groundbreaking feature called Instant SQL, which allows users to preview query results in real-time as they type. This innovation, now available in the MotherDuck and DuckDB Local UI, significantly enhances productivity by enabling users to see data as it's being queried. The feature includes capabilities such to inspect and edit Common Table Expressions (CTEs) in real-time, break down complex column expressions, preview any data that DuckDB can query, and receive instant AI-powered edit suggestions. The article also highlights the technical underpinnings of this feature, such as the use of DuckDB's parser and tokenizer to map cursor positions to Abstract Syntax Trees (ASTs), enabling precise query previews. The release is part of MotherDuck's broader initiative to make analytics accessible to everyone. The post concludes with an invitation to try Instant SQL and a note about hiring opportunities at MotherDuck.

Abusing DuckDB-WASM by making SQL draw 3D graphics (Sort Of)

2025-04-20

This document provides an overview of the author's experience using DuckDB-WASM to create a text-based 3D game. It details the challenges faced and lessons learned from integrating SQL for complex algorithms and JavaScript for orchestration.

Close the Loop: Faster Data Pipelines with MCP, DuckDB and AI

2025-04-15

This content is a blog post from MotherDuck discussing the use of MCP (Machine Copilot) in data pipelines and data engineering. It highlights the benefits of using AI copilots like Cursor to accelerate development cycles, especially when working with tools like DuckDB and MotherDuck.

Make API Data Engineering Fun with DuckDB

2025-04-04

The article discusses using DuckDB to simplify the process of working with APIs and databases in data engineering. It highlights how integrating DuckDB into workflows can make API-to-database tasks more efficient and enjoyable by allowing semi-structured JSON data from API responses to be quickly queried via SQL without extensive parsing or transformation. The author provides a step-by-step guide on using DuckDB for enriching customer data, including fetching IDs, comparing existing data, and utilizing read_json_auto function for bulk reading of JSON data.

Fully Local Data Transformation with dbt and DuckDB

2025-04-04

This document outlines the integration of DuckDB with dbt for data mart creation, file exports, and reverse ETL to a PostgreSQL database. It details the steps involved in setting up the environment, executing models, and measuring performance.

Turning Unusable JSON into Fast Queries: Normalizing FDA Drug Data with DuckDB

2025-04-03

The article discusses the challenges of working with high-cardinality nested fields in JSON data from the FDA drug event dataset using DuckDB. The author faced performance issues due to repeated values in fields like 'pharm_class_epc', which were stored as strings across millions of records. To improve query performance, they normalized these fields by creating a lookup table for unique values and assigning each value an integer ID. This approach allowed them to join the data more efficiently and significantly reduced query execution time from several hours to just 0.166 seconds.

Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB

2025-04-01

The article discusses the introduction of FlockMTL, an extension for database management systems (DBMS) that integrates large language models (LLMs) and retrieval-augmented generation (RAG). This integration aims to simplify the development of knowledge-intensive analytical applications by streamlining data retrieval from both structured and unstructured sources. FlockMTL introduces model-driven scalar and aggregate functions, cost-based optimizations, and novel SQL DDL abstractions such as PROMPT and MODEL. The extension is designed to handle complex operations through tuple-level mappings and reductions, reducing the implementation burden for developers.

Building a Data Pipeline with DuckDB

2025-03-20

Robin Moffatt describes how he built a data pipeline using REST APIs, SQL statements, and DuckDB to extract and analyze water quality monitoring data. The process involves extracting data from multiple sources, cleaning and transforming it with SQL scripts, and loading it into a DuckDB database for querying and analysis through tools like Metabase, Rill, and Superset.

Exporting Spain National Statistics Institute Tables with DuckDB and GH Actions

2025-03-19

The author details the process of exporting and optimizing datasets from INE (National Institute of Statistics) using Python, DuckDB, and GitHub Actions. They describe optimizations such as custom HTTP headers, parallel processing, and generating Parquet files to reduce data size. The project successfully exports over 5000 tables and generates READMEs for each dataset, all hosted on Hugging Face Spaces. The author reflects on the learnings from this project and mentions future plans to create a unified database and explore combining datasets with similar dimensions.

Why DuckDB is my first choice for data processing

2025-03-16

The author discusses why DuckDB has become their preferred tool for data processing. They highlight its speed, ease of installation, and SQL features that make it user-friendly. DuckDB is optimized for analytics queries and can be up to 100 times faster than other SQL-based tools like SQLite or PostgreSQL. The article emphasizes its suitability for CI/CD pipelines due to fast startup time and lack of dependencies. It also mentions the support for various file types, Python API, ACID compliance, UDFs in C++, and future potential through extensions and integration with PostgreSQL.

Preview: Amazon S3 Tables and Lakehouse in DuckDB

2025-03-14

DuckDB has introduced a new preview feature that supports Apache Iceberg REST catalogs, enabling users to connect with Amazon S3 Tables and Amazon SageMaker Lakehouse. This development is part of the ongoing support for Iceberg tables by major data platforms such as Databricks, Snowflake, and Google BigQuery. The feature was made possible through collaboration between DuckDB Labs and AWS. Users can install the necessary extensions in DuckDB to connect with S3 Tables using REST catalogs. The article provides detailed steps on how to set up an Amazon S3 Table bucket, populate it with data via Amazon Athena, and query the tables using DuckDB. It also highlights the schema evolution feature of Iceberg format.

Kicking the tyres on the new DuckDB UI

2025-03-14

The article discusses the recent release of DuckDB 1.2.1 and its new UI. The author explores how the updated version simplifies data exploration by using a notebook interface similar to Jupyter or Zeppelin. Key features highlighted include auto-complete, easy filtering, pivoting for hierarchical data understanding, and usable joins between tables. While the UI is praised for being intuitive and useful, it lacks charting capabilities currently available in Rill Data. The author concludes that DuckDB with its embedded notebooks significantly enhances data exploration.

Comparison of esProc SPL and DuckDB in Multi-Data Source Capabilities

2025-03-13

This article compares the multi-data source capabilities of esProc SPL and DuckDB. While both support a variety of data sources, esProc offers more extensive support, particularly for non-relational databases like MongoDB and Kafka, and provides richer connector libraries through its native interface access. In terms of data processing, esProc supports SQL and its own powerful SPL syntax, making it easier to handle complex scenarios compared to DuckDB's reliance on Python.

DuckDB Execution Plan Visualizer

2025-03-13

This article introduces a visual tool for DuckDB's EXPLAIN plans. It provides an easy-to-use interface to visualize and understand the execution plans generated by DuckDB queries. Users can submit their query plans either through JSON file or directly in the provided format. The service is offered by the Database Systems Research Group at the University of Tbingen.

Database Planet Collision: When PG Falls for DuckDB

2025-03-12

The article discusses the emerging trend of integrating DuckDB with PostgreSQL to create a combined OLTP/OLAP system. It argues that this integration has the potential to disrupt the big data analytics landscape by simplifying architecture, reducing costs, and eliminating data migration headaches. The author highlights several extensions and projects currently involved in this integration effort, drawing parallels to the early days of vector databases and suggesting that a major player will emerge to dominate the space. It emphasizes the practicality and potential for significant cost savings for businesses. The author is actively contributing to the ecosystem by packaging these extensions for easier deployment and adoption. The article concludes with an optimistic outlook and encourages readers to follow the developments in this rapidly evolving area of database technology. The article also includes links to related articles for further reading and a list of PG+DuckDB integration extensions available in the author's PIGSTY repository.

Report with all data