Top recent duckdb news, how-tos and comparisions
2025-11-19
The text discusses the introduction of encrypted database files in DuckDB, highlighting features such as secure data storage, deployment models, and performance considerations. It covers encryption methods, performance benchmarks, and the implications for data security and operational efficiency. The article also touches on the integration with cloud storage and the benefits of using encryption without significant performance penalties.
2025-11-14
The article discusses integrating large language models (LLMs) and retrieval-augmented generation (RAG) into DuckDB, a database system. It introduces FlockMTL, an extension that allows using LLMs within SQL queries. FlockMTL provides new functions for data analysis and supports hybrid search by combining structured and unstructured data. The system aims to simplify complex data processing tasks for developers.
2025-11-13
The text provides an overview of building an NFL prediction model, detailing its features, technical architecture, and lessons learned. Key points include the model's accuracy (57% across ten weeks), integration of advanced techniques like ELO ratings, hot simulations, and QB VALUE adjustments, and the use of modern tools such as DuckDB and dbt. It highlights the shift from traditional data pipelines to efficient Parquet-based workflows, emphasizes the importance of testing to avoid technical debt, and discusses the democratization of data analytics through accessible tools. The model's future includes enhancements like weather modeling and live win probability tracking, demonstrating how individual developers can now achieve professional-grade results with modern tooling.
2025-11-12
DuckDB 1.4.2, a long-term support release, has been launched with bug fixes, performance improvements, and new features like Iceberg support and logging tools. The update includes security patches for encryption vulnerabilities and optional profiling tools to help users analyze query performance. It also adds Vortex file format support and addresses several crashes and errors reported by users.
2025-11-12
The article discusses the performance of single-node frameworks like DuckDB, Polars, and Daft in handling large Lake House datasets on small memory footprints. It compares their execution times and highlights that these tools can efficiently process data without the need for distributed computing. The author also touches on the integration of these tools with Delta Lake and the potential for combining DuckDB and Polars for optimal performance. The main takeaway is that single-node solutions can be viable alternatives to traditional distributed systems like Spark, offering simplicity and cost-effectiveness.
2025-11-05
Bauplan has migrated from DuckDB to DataFusion, achieving significant performance improvements, especially in query latency. The transition involved addressing several technical challenges, including case-sensitivity issues, integration problems with Iceberg, and performance optimizations. The move also aligns with their ongoing efforts to enhance Iceberg compatibility and metadata caching, with collaborations like LiquidCache. The migration was a success, improving both system performance and team collaboration.
2025-10-24
This article discusses a new approach to data discovery for libraries and digital humanities projects, which uses static hosting and in-browser tools to reduce costs and maintenance. The Data.gov Archive Search project demonstrates how dynamic data querying can be done without dedicated servers, using technologies like DuckDB-Wasm. This method allows users to search and filter large datasets directly in their browser, offering a low-cost, sustainable solution. The project highlights the benefits of this approach, including reduced technical overhead and long-term accessibility for cultural heritage and academic collections.
2025-10-24
This blog post discusses the integration of Java Flight Recorder (JFR) data with DuckDB for efficient analysis. The author outlines the structure of JFR data, the design of a DuckDB schema to store it, and the process of importing JFR files into DuckDB. Additionally, it covers the use of DuckDB for querying and analyzing JFR data, including performance considerations and potential future enhancements. The post also mentions the author's ongoing work on JEP Candidate 435 to add a new profiling API to OpenJDK.
2025-10-22
The article discusses how GizmoEdge, a distributed SQL engine, successfully processed a trillion-row dataset in under five seconds, outperforming expectations. It highlights the use of a 1,000-worker cluster on Azure, with each worker running DuckDB and orchestrated through Kubernetes. The system's performance was demonstrated through two key queries, showcasing its ability to handle massive data quickly. The article also mentions GizmoSQL, another product from GizmoData, which completed the same challenge using a single server instance.
2025-10-22
The text discusses how DuckDB, through the DuckPGQ extension, enables efficient graph analysis for detecting complex patterns and hidden paths, such as ownership cycles in financial data. It compares the simplicity of SQL/PGQ's visual syntax with traditional recursive CTE methods, highlighting the usability and performance benefits of DuckDB's approach. The analysis is conducted entirely within DuckDB, leveraging its vectorized engine for high performance without requiring data export or external systems.
2025-10-19
The article describes an experiment where DuckDB was tested at a 10 TB scale. The author used a high-performance setup with 64 cores and 512 GB RAM to push DuckDB to its limits. They found that DuckDB can handle large datasets beyond typical memory limits, but performance depends on efficient disk usage. The key takeaway is that using local disk for data spilling improves performance, and while DuckDB works well, other systems like distributed engines are better for handling extremely large data.
2025-10-18
The text provides a comprehensive guide on building an AI agent to interact with a MotherDuck database. It outlines the steps from setting up the connection, implementing the agent with query tools, validating queries, to testing and refining the system. The guide emphasizes schema exploration, query validation, and performance optimization while addressing common issues such as invalid SQL, wrong table usage, and query inefficiencies.
2025-10-13
The article compares Exasol's distributed MPP architecture with DuckDB, highlighting Exasol's superior performance in single-node scenarios due to its parallel processing and efficient execution engine. It explains how Exasol's design allows full utilization of hardware resources, while DuckDB's single-threaded nature limits its scalability. The discussion also covers differences in query optimization, memory usage, and concurrency capabilities, emphasizing Exasol's strengths in enterprise environments with high user loads.
2025-10-13
The article provides an overview of different streaming patterns and their applications, focusing on how DuckDB can be integrated into these patterns. It discusses the Materialized View Pattern, Streaming Engine Pattern, and Streaming Database Pattern, highlighting DuckDB's capabilities in handling high-throughput data and its potential in these architectures. The article also touches on the challenges and trade-offs of each approach, emphasizing the importance of choosing the right solution based on specific needs.
2025-10-09
The provided text discusses the evolution of DuckDB extensions, their availability across different versions, and the dynamic nature of the community-driven ecosystem. It highlights how extensions are added or removed with each release, influenced by factors like maintainer activity, compatibility, and community adoption. The text also mentions the benefits of using DuckDB extensions and invites readers to subscribe for updates.