Top recent clickhouse news, how-tos and comparisions

ClickHouse: How we made our internal data warehouse AI-first
2025-11-12
The article discusses the implementation of an AI-first analytics approach at ClickHouse, focusing on the development of DWAINE, an AI assistant that integrates with existing BI tools like Superset. It highlights the challenges and benefits of transitioning from traditional BI to an AI-driven model, including performance improvements, reduced workload on data teams, and enhanced accessibility for non-technical users. The article also outlines the types of queries DWAINE excels at and the remaining use cases for traditional BI tools, emphasizing the importance of a balanced approach between AI and traditional analytics.
Exasol Outperforms ClickHouse by 10x on TPC-H Analytical Benchmark
2025-11-10
The article presents a comprehensive benchmark comparing Exasol and ClickHouse on analytical workloads, highlighting Exasol's significant performance advantages. Key points include Exasol's 10.3x faster median performance, 207x better performance on complex queries, and 50x better worst-case consistency. These advantages stem from Exasol's superior query optimization, join processing, and execution planning, making it ideal for enterprise data warehousing, business intelligence, and advanced analytics. The benchmark was conducted using TPC-H queries on identical AWS infrastructure, ensuring reproducibility and validity. The results emphasize Exasol's reliability, predictability, and cost-effectiveness, providing faster insights and better user experiences.
Notes on ClickHouse Scaling
2025-11-08
The text provides an overview of managing ClickHouse clusters with replication, sharding, and storage strategies. It discusses replication for fault tolerance, sharding for horizontal scalability, and challenges with using S3 storage. The author also shares insights on architectural considerations for large-scale deployments and mentions related projects like Tinybird and ChistaDATA.
Data modeling for OLAP with AI ft. Michael Klein, Director of Technology at District Cannabis
2025-10-30
This article provides a detailed guide on migrating data to OLAP systems, focusing on transforming transactional data into optimized analytical tables. It highlights key practices such as proper ordering, versioning, and handling pivoted data using tools like ClickHouse and Moose. The migration process involves structuring data with typed interfaces, implementing CDC for real-time updates, and creating materialized views for performance. The text emphasizes the importance of aligning data models with analytical queries and leveraging OLAP features for efficient data processing.
How Netflix optimized its petabyte-scale logging system with ClickHouse
2025-10-23
The article discusses how Netflix has optimized its logging system using ClickHouse to handle massive data volumes efficiently. Key optimizations include rethinking data fingerprinting, custom serialization methods, and sharding tags for faster query performance. These improvements have enabled Netflix to maintain high performance and scalability, allowing engineers to work interactively with data at an unprecedented scale. The article highlights the importance of disciplined engineering and simplicity in achieving these results.
Performance optimizations for storing web server access logs in ClickHouse
2025-10-23
The article provides a detailed guide on achieving high compression ratios for log files using ClickHouse, a columnar database. It walks through the process of converting raw Nginx access logs into a structured format, optimizing data types, and leveraging columnar storage for efficient data compression. The article highlights the importance of choosing the right ordering keys to maximize compression, as demonstrated by the significant difference in compression ratios between using referer/user_agent as the primary key versus time_local. It concludes by emphasizing the benefits of columnar storage for log compression, including improved I/O efficiency, faster query performance, and reduced storage costs. The example of Nginx access logs shows that it's possible to achieve over 170x compression using these techniques.
Data freshness (end-to-end latency) in ClickHouse and Apache Pinot
2025-10-22
The article compares Apache Pinot and ClickHouse as analytical databases, highlighting their different ingestion architectures. ClickHouse's real-time ingestion is a micro-batch approach with compromises in freshness and accuracy, while Apache Pinot's stream-native design prioritizes real-time analytics with millisecond-level freshness, exactly-once semantics, and scalable upserts. The comparison covers operational scalability, multi-tenancy, and the strategic choice for real-time data processing.
How GitLab uses Postgres and ClickHouse to build their data stack?
2025-10-21
The article discusses GitLab's strategic shift to adopt ClickHouse as the foundation for its analytics platform, emphasizing scalability, flexibility, and real-time insights. It outlines the journey from addressing performance bottlenecks to building an event-driven, analytics-first architecture, highlighting the benefits of ClickHouse, operational considerations, and future directions for enhanced analytics capabilities.
Code first CDC from Postgres to ClickHouse with Debezium, Redpanda, and MooseStack
2025-10-17
The text provides a comprehensive guide on setting up a real-time OLAP system using Debezium, Redpanda, and ClickHouse. It outlines the process from capturing database changes, transforming data, and syncing it to ClickHouse for analytical queries. The solution emphasizes code-driven schema management and automated data pipelines to ensure data consistency and reduce manual effort.
ClickHouse table engines & CDC data (MergeTree, Replacing, Collapsing +)
2025-10-15
This article discusses the use of different ClickHouse table engines for Change Data Capture (CDC) in OLAP systems. It compares ReplacingMergeTree, CollapsingMergeTree, and VersionedCollapsingMergeTree, explaining their pros and cons for handling data updates, deletions, and ensuring correctness. The recommendations suggest using ReplacingMergeTree for most CDC use cases due to its balance of speed and correctness, while VersionedCollapsingMergeTree is recommended for strict correctness in complex scenarios. The article also touches on the importance of data pipelines and operational complexity in implementing these engines.
Optimizing writes to OLAP using buffers (ClickHouse, Redpanda, MooseStack)
2025-10-14
This article provides a comprehensive guide on implementing micro-batching for OLAP databases, focusing on ClickHouse, Druid, Pinot, Snowflake, BigQuery, Redshift, and Delta/Lakehouse. It outlines best practices for setting up streaming buffers, using tools like MooseStack, and optimizing data ingestion pipelines. The guide includes code examples in TypeScript and Python, and emphasizes the importance of batch sizes, partitioning, and observability for efficient data processing and system scalability.
Making complex JSON 58x faster, use 3,300x less memory, in ClickHouse
2025-10-09
A new blog post from ClickHouse highlights significant improvements in JSON data handling with the release of v25.8. The new advanced shared data serialization allows efficient querying of JSON documents with tens of thousands of unique paths while maintaining performance for selective reads. This enhancement strengthens ClickHouse's position for handling large-scale semi-structured data.
ClickHouse Extends Series C Financing and Expands Leadership Team to Fuel Growth
2025-10-07
ClickHouse has extended its Series C financing and added new investors, including major firms and notable individuals. The company has grown significantly, with over 2,000 customers and more than quadrupled its annual recurring revenue. It has also expanded its leadership team with key hires and launched new features to enhance its platform for real-time analytics, data warehousing, and AI applications. The company received recognition for its growth, including a spot on the Forbes Cloud 100 list.
Scaling request logging with ClickHouse, Kafka, and Vector
2025-10-07
This article provides a comprehensive guide on migrating from MariaDB to ClickHouse for analytics and append-only data tracking. It outlines the challenges faced during the transition, such as the need for batch inserts and the limitations of row-level operations in ClickHouse. The solution involves using Kafka for durable storage, Vector for efficient data processing, and ClickHouse Cloud for scalable and managed infrastructure. The migration strategy emphasizes dual-writing, feature flags, and gradual cutover to ensure zero downtime and data validation. Key takeaways include the importance of batch processing, the value of seeking expert advice, and the benefits of column-oriented databases for high-volume analytics.
Report with all data