Top recent clickhouse news, how-tos and comparisions

Hunting orphan objects: 45% off our ClickHouse storage bill (and a near data-loss incident)
2026-05-19
Tinybird improved its cloud storage efficiency by cleaning up petabytes of orphaned S3 objects, reducing monthly storage costs by about 45%. The cleanup process involved identifying unused metadata and ensuring that only truly unused data was deleted. They also strengthened their recovery procedures after nearly losing real data due to incomplete snapshots during the cleanup process.
How ChatFeatured migrated from PlanetScale Postgres to Postgres Managed by ClickHouse to power AI brand discovery
2026-05-18
ChatFeatured is a startup that helps brands improve how they appear in AI search engines by influencing the content shared about them. Originally running ClickHouse for analytics, ChatFeatured needed to integrate it with Postgres for transactional workloads without managing two systems separately. By migrating to ClickHouse managed Postgres, the platform's analytics query times dropped from 2.5 minutes to under 1 second, improving user experience and enabling new features.
Our billing pipeline was suddenly slow. The culprit was a hidden bottleneck in ClickHouse
2026-05-14
At Cloudflare, a migration to a more granular partitioning scheme in ClickHouse led to unexpected performance degradation in billing aggregation jobs. Despite initial checks showing normal I/O and memory usage, analysis revealed hidden bottleneck: excessive lock contention and vector copying during query planning due to tens of thousands of parts per namespace. Three optimizations resolved the issue: acquiring a shared lock for read-only operations, deferring vector copying by creating a shared copy of relevant parts, and implementing binary search for faster part filtering based on sorted partition keys. These changes stabilized query performance and reduced durations significantly, though long-term architectural considerations remain.
From 48 Seconds to 130 Milliseconds: Vector Search in Tinybird
2026-05-12
A customer needed to perform semantic search over 20 million embeddings in Tinybird but initially faced timeouts due to fragmented vector similarity indexes across multiple partitions and insufficient memory caching. By consolidating data into a single global index, increasing the vector_similarity_index_cache_size to fit the entire graph in RAM, and maintaining all data within one partition, response times improved dramatically from 15-48 seconds to under 200 milliseconds per query, regardless of the number of top-K results requested. This demonstrates that Tinybird can effectively handle large-scale semantic similarity searches without requiring a separate dedicated vector database.
ClickHouse Cloud: Fast, Updatable Lookups with the Join Table Engine
2026-05-12
ClickHouse Cloud enhances the Join table engine by implementing it as a SharedJoin table backed by MergeTree or ReplacingMergeTree tables, solving open-source drawbacks like non-distribution and lack of frequent updates. This setup enables efficient ANY LEFT joins for dimensional modeling, automatically handling deduplication and compaction, making it ideal for frequently updated dimension data in analytical workflows.
Introducing Postgres Query Insights in ClickHouse Cloud
2026-05-07
Postgres Query Insights in ClickHouse Cloud is a new feature that helps diagnose slow-performing queries by providing detailed diagnostic information for every query pattern run by the database. It integrates with pg_stat_ch, an open-source extension that streams per-statement telemetry into ClickHouse, allowing users to see why specific queries are slow across various metrics like total runtime, CPU usage, error counts, and P95 latency. Users can access three surfaces: an overview for quick health checks, a patterns table sorted by impact to identify the most costly queries, and detailed information in a flyout that breaks down each query's performance using percentiles, cache hits, buffer spills, and more. This tool aids in quickly pinpointing issues like sorting spilling to disk or insufficient work_mem, enabling users to apply fixes such as adding indexes or adjusting configuration settings.
ScyllaDB cut Sprig's read latency 4X after Redis and ClickHouse hit a wall
2026-05-05
Sprig, an AI-powered product research platform, experienced rapid growth that overwhelmed its initial PostgreSQL setup. After experimenting with ClickHouse and Redis configurations, the company migrated to ScyllaDB Cloud for improved performance and scalability. The move resulted in reduced read latencies by four times and lower operational overhead, allowing Sprig to handle increasing data volumes efficiently.
Does ClickHouse Support UPDATEs? A 2026 PR-by-PR Analysis
2026-04-30
ClickHouse, a popular columnar database, has significantly evolved its UPDATE capabilities since 2018. In April 2026, it fully supports standard SQL UPDATE statements via the patch part architecture introduced in ClickHouse 25.7 (July 2025). This update mechanism, alongside lightweight DELETE and on-the-fly mutation features, ensures low-latency, production-grade updates with immediate visibility across distributed clusters. The append-only claim from earlier years is outdated; current versions offer multiple efficient UPDATE pathwaysclassical ALTER TABLE mutations for bulk operations, lightweight mechanisms for single-row changes, and ReplacingMergeTree for high-volume CDC workflows. Post-landing stabilization work includes operational controls like exponential backoff, workload classification, and extensive observability features to diagnose and manage updates reliably. Understanding these enhancements helps teams decide if ClickHouse meets their update requirements today.
ClickHouse Native JSON Support in 2026: A PR-by-PR Analysis
2026-04-20
ClickHouse now fully supports JSON data type in its native form since version 25.3, eliminating the previous criticism that it couldn't handle JSON data. This support is thanks to a series of significant pull requests and changes over multiple years, culminating in robust columnar storage and fast query performance. The native JSON type allows each path in JSON objects to be stored as separate Dynamic-typed subcolumns. Features include primary key indexing for these paths, selective path reads without scanning the entire dataset, advanced shared data serialization which makes read operations up to 58 times faster. This evolution has made ClickHouse a competitive option for analytics workloads, achieving performance matches, if not surpassing major alternatives like MongoDB and Elasticsearch.
Are ClickHouse JOINs Slow? A 2026 PR-by-PR Analysis
2026-04-15
The article discusses how ClickHouse's performance in handling JOIN operations has improved significantly between 2022 and early 2026. It details the shift from one join algorithm to several advanced join algorithms, including full sorting merge join, grace hash join, direct join, and concurrency enhancements like ConcurrentHashJoin. The introduction of cost-based global join reordering, runtime bloom filters, and predictive statistics collection have all contributed to a more efficient and scalable JOIN process in ClickHouse by 2026.
Agentic Coding at ClickHouse
2026-04-01
Alexey Milovidov discusses the practical use of coding agents at ClickHouse, emphasizing their effectiveness for specific scenarios despite skepticism. Assumptions about AI's limitations are presented, and the evolution from simplistic tasks to complex backend code development with Claude Opus 4.5 is highlighted. Various usage casesfrom typing code to resolving merge conflictsare detailed, noting increased productivity through automated tests and investigations. Recommendations include careful validation of changes, incremental trust in agents, and maintaining flexibility across model providers. The post concludes by encouraging broader AI adoption within ClickHouse while acknowledging concerns such as cost and quality impacts.
Top 10 best practices tips for ClickHouse
2026-03-26
This article provides ten best practices for optimizing ClickHouse performance, a columnar database management system ideal for real-time analytics on large datasets. Key strategies include choosing an effective primary key and ORDER BY clause to enhance data sorting and compression; selecting efficient data types to minimize storage and maximize speed; avoiding over-partitioning which can degrade performance; utilizing skipping indexes like minmax and set for targeted query acceleration without excessive overhead; adopting the JSON data type judiciously for semi-structured data while considering storage implications; employing appropriate ingestion methods such as bulk loading from object storage, CDC via ClickPipes, or direct application writes with async inserts to manage write latency; leveraging materialized views and projections to pre-compute common aggregations and reduce read load; mastering system tables like `system.query_log` for comprehensive query diagnostics and `system.parts` for partition health checks; optimizing ReplacingMergeTree handling of duplicates via FINAL or argMax patterns based on merge state; and tuning joins through strategic use, denormalization, dictionaries, or pre-aggregated views to balance latency and result accuracy. The practices collectively aim to align data schema and queries with ClickHouse's strengths, achieving order-of-magnitude improvements in efficiency and speed.
Benchmark and comparison: QuestDB vs. ClickHouse
2026-03-21
QuestDB and ClickHouse are two fast analytical databases with different strengths. QuestDB excels in high-throughput streaming ingestion and low-latency queries, suitable for time-series data like capital markets and trading. It is lightweight, open-source (Apache 2.0), written in zero-GC Java and C++, and supports various ingestion protocols. ClickHouse, starting as an OLAP engine for e-commerce and ad tech analytics, handles batch loading well but isn't optimized for streaming ingestion or time-series analysis natively. Performance benchmarks show QuestDB is faster on ingestion (4-5x) and point queries in TSBS benchmark suite. However, ClickHouse may perform better on certain analytical queries depending on data setup and use case. The choice depends on whether speed in streaming and time-series workloads (QuestDB) or broader analytics across semi-structured data (ClickHouse) is prioritized.
Beam Metrics in ClickHouse
2026-03-18
Knock, a company focused on observability, adopted ClickHouse to efficiently store high-cardinality BEAM metrics from Erlang/Elixir applications. By leveraging ClickHouse's capabilities for handling large volumes of detailed telemetry data, Knock can monitor and analyze runtime performance at an unprecedented granularity without incurring prohibitive costs associated with other monitoring platforms like Datadog. The article outlines the implementation process, including table design, periodic data dumping using `telemetry_poller`, and example queries that demonstrate the utility of this approach for tracking top-consuming processes and identifying memory hogs across different pods.
How We Made Payload Search 60x Faster in ClickHouse
2026-03-13
Hookdeck improved its payload search in the ClickHouse database by implementing two techniques: bucketed storage and variable-window iterative scanning. Hashing values into 50 string buckets helped reduce data scanned per row, and iterating through time windows starting from recent data further improved performance. This resulted in a significant reduction in average search latency to around 400ms and p99 latency under 1.4 seconds, making searches faster and more reliable.
Report with all data