Top recent clickhouse news, how-tos and comparisions

ClickHouse gets lazier (and faster): Introducing lazy materialization

2025-04-22

This post explains how lazy materialization in ClickHouse optimizes I/O operations and improves query performance. It details the process of filtering data through primary indexing, PREWHERE clauses, and lazy reading of columns to minimize unnecessary data processing.

ClickHouse vs StarRocks vs Presto vs Trino vs Apache Spark??? ??? Comparing Analytics Engines

2025-04-17

The comparison provides a detailed analysis of five popular analytics engines: Apache Spark, Trino, PrestoDB, StarRocks, and ClickHouse. Each engine is evaluated based on features like query performance, storage formats, SQL support, community and commercial support, and use cases.

Lessons learned from 5 years operating huge ClickHouse?? clusters: Part II

2025-04-16

This text provides a detailed guide on managing and monitoring ClickHouse, a column-oriented database management system. It covers topics such as setting up alerts, understanding system tables, managing materialized views, handling table deletions, and other best practices for optimizing performance and maintaining the cluster's health. The author emphasizes the importance of being aware of potential issues like memory leaks, segfaults, and high simultaneous queries, and suggests tools and resources to help with monitoring ClickHouse clusters. It also mentions Altinity as a valuable resource for understanding ClickHouse setup and management. Key points include: - Setting up alerts for critical metrics (e.g., max simultaneous queries, connectivity issues) - Understanding system tables like `query_log`, `processes`, and `part_log` - Managing materialized views carefully to avoid memory issues - Being cautious with column types that can cause performance problems - Using Altinity's resources for setup and management guidance. The guide is aimed at experienced data engineers and database administrators who are responsible for managing large-scale ClickHouse clusters. It provides practical advice on how to avoid common pitfalls and maintain a healthy, performant cluster environment.

Ursa ??? ClickHouse Research Fork

2025-04-15

The author is working on optimizing Ursa, an analytical database based on ClickHouse. They aim to make it the fastest general-purpose analytical database in the world by implementing various optimizations and improving statistics collection and executor performance. Key areas of focus include offline and runtime statistics, runtime indexes, and executor improvements to reduce CPU underutilization.

Announcing Ruby Gem analytics powered by ClickHouse and Ruby Central

2025-04-15

This document provides an overview of the Ruby Gem analytics dataset available at sql.clickhouse.com. It covers various queries and analyses related to gem downloads, including trends over time, by system, and by version.

MySQL CDC connector for ClickPipes is now in Private Preview

2025-04-10

The article announces the private preview of the MySQL CDC (Change Data Capture) connector for ClickPipes. This connector allows users to replicate MySQL databases to ClickHouse Cloud with ease, supporting continuous replication and one-time migrations. Key features include blazing-fast initial loads, flexible replication modes, table-level filtering, and support for various MySQL data types. Users can sign up for the private preview through a provided link.

Why Denormalization is Not the Answer to Reducing Joins in ClickHouse

2025-04-09

This article discusses the pros and cons of denormalization in ClickHouse, comparing it with normalization. It emphasizes that while denormalization can improve query performance by reducing joins, it often leads to storage inefficiencies, data integrity issues, and complex schema maintenance. The article recommends leveraging ClickHouse's built-in features like materialized views, dictionaries, projections, and join optimizations for better performance and maintainability. Additionally, it suggests pre-joining or pre-aggregating data before loading as an alternative approach when necessary. The text concludes with a recommendation to use GlassFlow (a tool for cleaning Kafka streams) for maintaining clean data without additional load on ClickHouse.

Six Months with ClickHouse at CloudQuery (The Good, the Bad, and the Unexpected)

2025-04-08

The article discusses the experience of adopting ClickHouse at CloudQuery and how it has impacted their data engineering efforts. Herman Schaaf, Joe Karlsson, and Mariano Gappa provide insights into the benefits and challenges faced during this process.

Limits of ClickHouse ReplacingMergeTree and Materialized Views for Data Streams

2025-04-07

Apache Flink can be a solution to address the limitations of ClickHouse's ReplacingMergeTree and Materialized Views for handling large-scale data. Flink provides stream processing capabilities that can preprocess Kafka streams before they are ingested into ClickHouse, ensuring clean and consistent data without the need for ongoing maintenance.

From Kafka to ClickHouse: Understanding Integration Methods and Their Challenges

2025-04-04

This document provides a detailed comparison and analysis of different methods to integrate Apache Kafka with ClickHouse. It covers the strengths and weaknesses of standard solutions like the Kafka Engine, ClickPipes, and Kafka Connect, highlighting their suitability for various use cases. The text also discusses the challenges faced by organizations with specialized requirements that go beyond off-the-shelf solutions.

Make Before Break: Kubernetes StatefulSet Semantics for ClickHouse Cloud

2025-04-02

This post details the challenges faced during the implementation of MultiSTS (Make Before Break) and Live Migrations in ClickHouse Cloud. The process involved extensive triaging and fixing issues, with significant improvements to system tables handling and replica management.

ClickHouse Deduplication with ReplacingMergeTree: How It Works and Limitations

2025-04-02

> ReplacingMergeTree in ClickHouse is effective for eventual deduplication but falls short when immediate consistency and historical data preservation are needed. For comprehensive solutions, consider Glassflow for easy setup of deduplication pipelines.

Lessons learned from operating large ClickHouse clusters in the last 6 years

2025-04-01

: Ingestion is a critical aspect of operating ClickHouse, and many companies struggle with it due to issues like data loss, duplication, and performance problems. Key considerations include managing merges, inserts, reads, mutations, and table design. Proper batch sizes for inserts are recommended, as well as using compact parts and partitioning tables effectively. Memory management is crucial, especially when using materialized views, which can cause out-of-memory errors if not designed correctly. Other common issues include tables going into read-only mode, too many parts being generated, and handling sudden peaks in load. Implementing a backpressure mechanism to manage ingestion during peak times is essential for maintaining data integrity.

A year of Rust in ClickHouse

2025-04-01

Rust is going great in ClickHouse! The article discusses various challenges and solutions encountered during the integration of Rust with C++ in ClickHouse. These include library linking issues, symbol sizes, composability, build profiling, and dependency management. Despite these challenges, the author welcomes contributions from those interested in using Rust for ClickHouse development.

Introducing the query condition cache

2025-03-28

This article discusses the benefits of ClickHouse's query condition cache and provides examples of how it can improve performance for repeated queries with selective filters. The author explains that this feature boosts performance without requiring changes to your schema or manual index tuning.

Report with all data