Top recent clickhouse news, how-tos and comparisions
2026-05-19
Tinybird improved its cloud storage efficiency by cleaning up petabytes of orphaned S3 objects, reducing monthly storage costs by about 45%. The cleanup process involved identifying unused metadata and ensuring that only truly unused data was deleted. They also strengthened their recovery procedures after nearly losing real data due to incomplete snapshots during the cleanup process.
2026-05-18
ChatFeatured is a startup that helps brands improve how they appear in AI search engines by influencing the content shared about them. Originally running ClickHouse for analytics, ChatFeatured needed to integrate it with Postgres for transactional workloads without managing two systems separately. By migrating to ClickHouse managed Postgres, the platform's analytics query times dropped from 2.5 minutes to under 1 second, improving user experience and enabling new features.
2026-05-14
At Cloudflare, a migration to a more granular partitioning scheme in ClickHouse led to unexpected performance degradation in billing aggregation jobs. Despite initial checks showing normal I/O and memory usage, analysis revealed hidden bottleneck: excessive lock contention and vector copying during query planning due to tens of thousands of parts per namespace. Three optimizations resolved the issue: acquiring a shared lock for read-only operations, deferring vector copying by creating a shared copy of relevant parts, and implementing binary search for faster part filtering based on sorted partition keys. These changes stabilized query performance and reduced durations significantly, though long-term architectural considerations remain.
2026-05-12
A customer needed to perform semantic search over 20 million embeddings in Tinybird but initially faced timeouts due to fragmented vector similarity indexes across multiple partitions and insufficient memory caching. By consolidating data into a single global index, increasing the vector_similarity_index_cache_size to fit the entire graph in RAM, and maintaining all data within one partition, response times improved dramatically from 15-48 seconds to under 200 milliseconds per query, regardless of the number of top-K results requested. This demonstrates that Tinybird can effectively handle large-scale semantic similarity searches without requiring a separate dedicated vector database.
2026-05-12
ClickHouse Cloud enhances the Join table engine by implementing it as a SharedJoin table backed by MergeTree or ReplacingMergeTree tables, solving open-source drawbacks like non-distribution and lack of frequent updates. This setup enables efficient ANY LEFT joins for dimensional modeling, automatically handling deduplication and compaction, making it ideal for frequently updated dimension data in analytical workflows.
2026-05-07
Postgres Query Insights in ClickHouse Cloud is a new feature that helps diagnose slow-performing queries by providing detailed diagnostic information for every query pattern run by the database. It integrates with pg_stat_ch, an open-source extension that streams per-statement telemetry into ClickHouse, allowing users to see why specific queries are slow across various metrics like total runtime, CPU usage, error counts, and P95 latency. Users can access three surfaces: an overview for quick health checks, a patterns table sorted by impact to identify the most costly queries, and detailed information in a flyout that breaks down each query's performance using percentiles, cache hits, buffer spills, and more. This tool aids in quickly pinpointing issues like sorting spilling to disk or insufficient work_mem, enabling users to apply fixes such as adding indexes or adjusting configuration settings.
2026-05-05
Sprig, an AI-powered product research platform, experienced rapid growth that overwhelmed its initial PostgreSQL setup. After experimenting with ClickHouse and Redis configurations, the company migrated to ScyllaDB Cloud for improved performance and scalability. The move resulted in reduced read latencies by four times and lower operational overhead, allowing Sprig to handle increasing data volumes efficiently.
2026-04-30
ClickHouse, a popular columnar database, has significantly evolved its UPDATE capabilities since 2018. In April 2026, it fully supports standard SQL UPDATE statements via the patch part architecture introduced in ClickHouse 25.7 (July 2025). This update mechanism, alongside lightweight DELETE and on-the-fly mutation features, ensures low-latency, production-grade updates with immediate visibility across distributed clusters. The append-only claim from earlier years is outdated; current versions offer multiple efficient UPDATE pathwaysclassical ALTER TABLE mutations for bulk operations, lightweight mechanisms for single-row changes, and ReplacingMergeTree for high-volume CDC workflows. Post-landing stabilization work includes operational controls like exponential backoff, workload classification, and extensive observability features to diagnose and manage updates reliably. Understanding these enhancements helps teams decide if ClickHouse meets their update requirements today.
2026-04-20
ClickHouse now fully supports JSON data type in its native form since version 25.3, eliminating the previous criticism that it couldn't handle JSON data. This support is thanks to a series of significant pull requests and changes over multiple years, culminating in robust columnar storage and fast query performance. The native JSON type allows each path in JSON objects to be stored as separate Dynamic-typed subcolumns. Features include primary key indexing for these paths, selective path reads without scanning the entire dataset, advanced shared data serialization which makes read operations up to 58 times faster. This evolution has made ClickHouse a competitive option for analytics workloads, achieving performance matches, if not surpassing major alternatives like MongoDB and Elasticsearch.
2026-04-15
The article discusses how ClickHouse's performance in handling JOIN operations has improved significantly between 2022 and early 2026. It details the shift from one join algorithm to several advanced join algorithms, including full sorting merge join, grace hash join, direct join, and concurrency enhancements like ConcurrentHashJoin. The introduction of cost-based global join reordering, runtime bloom filters, and predictive statistics collection have all contributed to a more efficient and scalable JOIN process in ClickHouse by 2026.
2026-04-01
Alexey Milovidov discusses the practical use of coding agents at ClickHouse, emphasizing their effectiveness for specific scenarios despite skepticism. Assumptions about AI's limitations are presented, and the evolution from simplistic tasks to complex backend code development with Claude Opus 4.5 is highlighted. Various usage casesfrom typing code to resolving merge conflictsare detailed, noting increased productivity through automated tests and investigations. Recommendations include careful validation of changes, incremental trust in agents, and maintaining flexibility across model providers. The post concludes by encouraging broader AI adoption within ClickHouse while acknowledging concerns such as cost and quality impacts.
2026-03-26
This article provides ten best practices for optimizing ClickHouse performance, a columnar database management system ideal for real-time analytics on large datasets. Key strategies include choosing an effective primary key and ORDER BY clause to enhance data sorting and compression; selecting efficient data types to minimize storage and maximize speed; avoiding over-partitioning which can degrade performance; utilizing skipping indexes like minmax and set for targeted query acceleration without excessive overhead; adopting the JSON data type judiciously for semi-structured data while considering storage implications; employing appropriate ingestion methods such as bulk loading from object storage, CDC via ClickPipes, or direct application writes with async inserts to manage write latency; leveraging materialized views and projections to pre-compute common aggregations and reduce read load; mastering system tables like `system.query_log` for comprehensive query diagnostics and `system.parts` for partition health checks; optimizing ReplacingMergeTree handling of duplicates via FINAL or argMax patterns based on merge state; and tuning joins through strategic use, denormalization, dictionaries, or pre-aggregated views to balance latency and result accuracy. The practices collectively aim to align data schema and queries with ClickHouse's strengths, achieving order-of-magnitude improvements in efficiency and speed.
2026-03-21
QuestDB and ClickHouse are two fast analytical databases with different strengths. QuestDB excels in high-throughput streaming ingestion and low-latency queries, suitable for time-series data like capital markets and trading. It is lightweight, open-source (Apache 2.0), written in zero-GC Java and C++, and supports various ingestion protocols. ClickHouse, starting as an OLAP engine for e-commerce and ad tech analytics, handles batch loading well but isn't optimized for streaming ingestion or time-series analysis natively. Performance benchmarks show QuestDB is faster on ingestion (4-5x) and point queries in TSBS benchmark suite. However, ClickHouse may perform better on certain analytical queries depending on data setup and use case. The choice depends on whether speed in streaming and time-series workloads (QuestDB) or broader analytics across semi-structured data (ClickHouse) is prioritized.
2026-03-18
Knock, a company focused on observability, adopted ClickHouse to efficiently store high-cardinality BEAM metrics from Erlang/Elixir applications. By leveraging ClickHouse's capabilities for handling large volumes of detailed telemetry data, Knock can monitor and analyze runtime performance at an unprecedented granularity without incurring prohibitive costs associated with other monitoring platforms like Datadog. The article outlines the implementation process, including table design, periodic data dumping using `telemetry_poller`, and example queries that demonstrate the utility of this approach for tracking top-consuming processes and identifying memory hogs across different pods.
2026-03-13
Hookdeck improved its payload search in the ClickHouse database by implementing two techniques: bucketed storage and variable-window iterative scanning. Hashing values into 50 string buckets helped reduce data scanned per row, and iterating through time windows starting from recent data further improved performance. This resulted in a significant reduction in average search latency to around 400ms and p99 latency under 1.4 seconds, making searches faster and more reliable.