To help understand the results of the benchmark, I find it helpful to look at how the benchmark is constructed, and what it tests for. From the README:
"The dataset is represented by one flat table. This is not representative of classical data warehouses, which use a normalized star or snowflake data model. The systems for classical data warehouses may get an unfair disadvantage on this benchmark."
Taking a look at the queries , it looks like it mostly consists of full table scans with filters, aggregations, and sorts. Since it's a single table, there are no joins.
Can you clarify what a "write unit" is? Naively it sounds like it might be blocks x partitions x replicas that actually hit disk. (Which is also probably not very clear to people not already using CH, but I have at least middling knowledge of CH's i/o patterns and I have no clue what a "write unit" is from the page's description.)
One write unit is around 100..200 INSERT queries.
If you are doing INSERT in batches with one million rows, it will give
inserted rows per dollar. Pretty good, IMO.
SELECT formatReadableQuantity(1000000 * 100 / 0.0125) 8.00 billion
If you are doing millions of INSERT queries with one record, without "async_insert" setting, it will cost much more.
That's why we have "write units" instead of just counting inserts.
More helpful would be answers to my questions at https://news.ycombinator.com/item?id=33081502 - async_insert is a relatively new feature, we're still using buffer tables for example - but also most of our "client" inserts are actually onto multi-MV-attached null engines. Those MVs are also often doing some pre-aggregation before hitting our MTs as well. So we might insert a million rows, but the MV aggregates that down into 50k, but then that gets inserted into five persistent tables, each of which has its own sharding/partitioning so that blows up to 200k or something "rows" again. (And at some point those inserts are also going to get compacted into stuff inserted previously / concurrently by the MT itself.)
As I've said several times in this thread, I understand why you don't count inserts or rows. What I don't understand is what unit a WU does actually correspond to. In particular I don't understand its relation to e.g. parts or blocks, which are the units one would focus on optimizing self-hosted offerings.
It's Tyler from ClickHouse.
Check out the response below that has a reference to some of our billing FAQs.
It doesn't mention anything about what a write unit is, except to say you can reduce write units by batching inserts (that part I guessed already.)
There's no way to think about what an actual write unit means. You could measure the costs on a sample workload, but that's far from ideal. Some transparency here would be nice.
I understand the answer is complicated, based on hairy implementation details, and subject to change. Give me the complexity and let me interpret it according to my needs.
Right, that link covers read units which is also what I expected - essentially the number of files I have to touch - but I still have no clue about write units.
Is one block on one non-partitioned non-distributed table one write unit? What about one insert that's two blocks on such a table? What about one block on a null engine with two MVs listening to insert into two non-partitioned non-distributed tables? What if the table is a replacing mergetree, do I incur WUs for compactions? etc.
My worry is that it is essentially 1 WU = 1 new part file, which I understand makes sense to bill on but is tremendously intransparent for users - at least I have no clue how often we roll new part files, instead I'm focused on total network and disk i/o performance on one side and client query latency on the other.
Is lower time the right metric here? Seems normalizing per price would make a more useful metric for big data as long as the response time is reasonable
Yes, ClickBench results are presented as Relative Time, where lower is better. You can read more on the specifics of ClickBench methodology in the GitHub repository here: https://github.com/ClickHouse/ClickBench/
There are other responses from ClickHouse in the comments on the pricing, so I'll defer to their expertise on that topic there. Thank you for your feedback and ideas, as normalizing a price-based benchmark is an interesting concept (and where ClickHouse would expect to lead also given the architecture and efficiency)
This benchmark focuses on analytical query latency for representative analytical queries, so yes - lower number is better.
Wow, I hadn't heard of StarRocks before... seems like an interesting competitor.
See the SelectDB built from Apache Doris and by the creators of Apache Doris. the performance is amazing. https://en.selectdb.com/blog/SelectDB%20Topped%20ClickBench%...
Looks really cool! Great work!
Had a pricing question. Say we connected a log forwarder like Vector to send data to Clickhouse Cloud, once per second. If each write unit is $0.0125, and we execute 86,400 writes over the course of the day, would we end up spending $1080? Do you only really support large batch, low frequency writes?
Hi Cliff - A write unit does not correspond to a single INSERT. A single INSERT with less than 16MB of data generates ~0.01 “write unit”, so a single “write unit” typically corresponds to ~100 INSERTs. In your example, that would come closer to $11 a day. Depending on how small of batches you plan to write in that examples, there may be ways to reduce that spend further, by batching even more or turning on "async_insert" inside ClickHouse.
> and advocating for public, standardized benchmarks
For full transparency, I think you should do the same in ClickHouse. Or is there a strong reason not to run benchmarks on standard analytical workloads like TPC-H, TPC-DS or SSB?
You can't post results of TPC benchmarks without official audit. So it complicates posting results. You can't find common names that are usually compared with ClickHouse there . So open standardized ClickBench tries to encourage benchmarking for everyone.
There are numerous benchmarks that use similar to TPC queries, but those are not standardized and can be misleading. For example a lot of work was done by Fivetran to get this report , but they show only overall geomean for those systems and you can't understand how they actually differ. Anyway their queries are not original TPC - variables are fixed in queries, they run first query when official query is a multiquery.
Contributors from Altinity run SSB with flattened and original schemas . SSB is not well standardized and we see a lot of pairwise comparisons with controversial results - generally you can't just reproduce them and get all the results in single place for the same hardware.
There is a good reference to the available benchmarks for analytical databases: https://github.com/ClickHouse/ClickBench#similar-projects
On couple of occasions I've seen TPC-H benchmarks with the remark that the results are not audited. Is that not possible?
Why are you using 'threads' instead of vcpus or aws instances like it was for other benchmarks? Thats really hard to compare and add suspicions here.
It is related to the "max_threads" setting of ClickHouse, and by default, it is the number of physical CPU cores, which is twice lower as the number of vCPUs.
For example, the c6a.4xlarge instance type in AWS has 16 vCPUs, 8 cores and "max_threads" in ClickHouse will be 8.
Interesting set of results. Ignoring ClickHouse, StarRocks seems to be better in almost all metrics.
I was curious to compare MonetDB, DuckDB, ClickHouse-Local, Elasticsearch, DataFusion, QuestDB, Timescale, and Athena. Amazingly, MonetDB shows up better than DuckDB in all metrics (except storage size), and Athena holds its own and fares admirably well, esp given that it is stateless. While, Timescale and Quest did not come up as good as I hoped they would.
It'd be interesting to see how rockset, starburst (presto/trino), and tiledb fare, if and when they get added to the benchmark.
The particular way in which the data is loaded into DuckDB and the particular machine configuration on which it is run triggers a problem in DuckDB related to memory management. Essentially the standard Linux memory allocator does not like our allocation pattern when doing this load, which causes the system to run out-of-memory despite freeing more memory than we allocate. More info is provided here .
As it is right now the benchmark is not particularly representative of DuckDB's performance. Check back in a few months :)
Thanks. Btw, we use DuckDB (via Node/Deno) for analytics (on Parquet/JSON), and so I must point out that despite the dizzying variation among various language bindings (cpp and python seem more complete), the pace of progress, given the team size, is god-like. It has been super rewarding to follow the project. Also, thanks for permissively licensing it (unlike most other source-available databases).
Goes without saying, if there are cost advantages to be had due to DuckDB's unique strengths, then serverless DuckDB Cloud couldn't come here soon enough.
> despite freeing more memory than we allocate
> despite DuckDB freeing more buffers than it is allocating
Can you please clarify how is that even possible?
That can’t be right, that’s insane. I would find it easy to do 5m inserts in 30 minutes.
A write unit is not the same thing as a single insert, if for that cost you've multiplied it up.
I hate the part of my brain that has allowed the name to interfere with my interest in even looking at it.
No, write unit is not a single INSERT. A single INSERT will take around 0.01 write units.
That's quite high price tag per insert, do you have to write large amount of data per insert?
With an analytical database like ClickHouse, you can write many rows with a single INSERT statement (thousands of rows, millions of rows, and more). In fact, this kind of batching is recommended. Larger inserts will consume more write units than smaller inserts. Check out our billing FAQ for some examples, and we will be enhancing it with more detail as questions from our users come in (we'll work on clarifying this specific point): https://clickhouse.com/docs/en/manage/billing/ We also provide a $300 credit free trial to try out your workload and see how it translates to some of the usage dimensions. Finally, this is a Beta product, so keep the feedback coming!
Thanks, I agree batching inserts would indeed be a good idea and it makes sense that's recommended. However that link you mention (as of now) does not specify what a write unit is. So if that could be clarified, that would be great. Since from your reply it sounds like one INSERT would indeed (at a minimum) incur one write unit. And thus 100 writes could indeed cost $1.25. Which could get expensive, fast.
An INSERT can consume less than one write unit, depending on how many rows and bytes it writes to how many partitions, columns, materialized views, data types, etc. So, a "write unit", which corresponds to hundreds of low-level write operations, typically translates to many batched INSERTS. We are working to improve our examples in the FAQ to clarify - thank you so much for asking the question!
How about if you are streaming in from Kafka and inserting each event as if arrives? Clickhouse is ideal for rapid analytics over event data so to introduce batching would be disappointing.
Batch upload is of course more cost effective, but I would expect that to be more typical in a traditional data warehouse where we are uploading hourly or daily extracts. Clickhouse would likely show up in more real time and streaming architectures where events arrive sporadically.
I am a huge fan and advocate of Clickhouse, but the concept of a write unit is strange and the likely charges if you have to insert small batches would make this unviable to use. A crypto analytics platform I built on top of Clickhouse would cost in the $millions per month vs $X00 or low $X000 with competitors or self hosting.
"Each write operation (INSERT, DELETE, etc) consumes write units depending on the number of rows, columns, and partitions it writes to."
A single INSERT takes around 0.01 write units.
"column oriented database"
We all have our specialties, and that is fine. It is a common pattern that a developer gets comfortable with a particular tech stack, and then uses it for many years without seeing the need for much else. Some developers use Ruby on Rails plus Postgres for everything, others use C# and .NET and SQL Server. It's fine, if that's all you need.
Still, this is the year 2022. Cassandra, to take one example, was released in 2008. For everyone who has needed these fast-read databases, they've been much discussed for 14 years, including here on Hacker News, and on every other tech forum. At this point I think a company can simply assume that most developers will have some idea what a column database is.
Cassandra is not a columnar database, columnar in this sense is about the storage layout. Values for a column are laid next to each other in storage, thus allowing for optimizations in compression and computation, at the expense of reconstructing entire rows. Postgres is a row store, meaning all the columns for a row are stored next to each other in storage, which makes sense if you need all of the values for a row (or the vast majority).
If you don't know what a column oriented DB is, that page is probably not for you.
I know what a column oriented DB is, but ClickHouse was not on my radar before.
The pitch on the landing page is that ClickHouse Cloud is great if you love ClickHouse. If you don't know what ClickHouse is, you have to do some work to find out.
Care to elaborate? I’m a fullstack Rails dev for 12 years and I had no idea either, just like OP. Why alienate potential users from the get go?
Isn't it just a database in Second normal form (2NF)?
No. Column oriented storage db’s make it fast to read 10s to billions of “rows”, usually when you have data that you want to read that can be independent of other columns. Ex: (stock close) - row storage isn’t going to buy you much here, you’d rather read the whole “stock close” column as fast as possible.
Whereas traditional db’s, data like [first name, last name], the columns may have way less meaning on their own and you need both columns to have the data make sense.
A traditional db with B(-)tree storage is much slower when dealing with that type of usage. Storing (stock close) in a single column format makes that type of query much faster.
Column oriented database really isn’t esoteric knowledgeable. You should check out DDIA, especially if you’re doing full stack
Thanks for the well wishes!
And thanks for the honest feedback.
It's always an interesting balance of promoting a new thing (Cloud) and explaining an existing thing. This might be helpful.
(note: I work at ClickHouse)
That's fine. You probably never played a data warehouse. No big deal, we all have our own areas mastered and gaps elsewhere.
Columnar DBs have been around since 1969, predate relational DBs. Seriously, if you're not into large scale data, it really isn't for you.
They have been extremely clear in their support of Ukraine https://clickhouse.com/blog/we-stand-with-ukraine
It's just a bunch of text, doesn't show any support whatsoever.
They can show support by donating to UA defence and showing proof (important - not some neutral org). Otherwise it is not support but a bunch of bs
In places like Russia, those words are very dangerous, and could get you jailed or sent to the front. Even calling it a "war" was/is a punishable offense.
So, yes, words, but the potential consequences of these words have more significance than empty air.
Why do you get to define the acceptable forms of support...?
They support Ukraine, however, given the company spun out of Yandex, the latter is most certainly financially benefitting from their success, and is paying taxes that are funding the war. AFAIR, Yandex also has 2 director sits on Clickhouse board, although that could have changed.
> ClickHouse, Inc. is a Delaware company with headquarters in the San Francisco Bay Area. We have no operations in Russia, no Russian investors, and no Russian members of our Board of Directors. We do, however, have an incredibly talented team of Russian software engineers located in Amsterdam, and we could not be more proud to call them colleagues.
From their "We Stand With Ukraine" page. 
That's interesting, thanks for clarifying. Yandex do show up as an investor in Crunchbase, including in their most recent Series B. The cited blog post says:
> The formation of our company was made possible by an agreement under which Yandex NV, a Dutch company with operations in Russia
While Yandex NV is registered in the Netherlands, it's pretty clear that Yandex NV is directly related to Yandex (specifically, according to Wikipedia, it's a holding company of Yandex). For those who don't know, Yandex is basically the Google of Russia and holds 42% of market share among search engines in Russia.
The blog post does not seem to make any claims that Russia is not benefitting financially from the commercial success of Clickhouse. And given the above, such claim would unlikely be true. As such, I still think it's pretty much safe to assume that a portion of any $$$ paid to Clickhouse ultimately goes to fund the war and kill people.
That said, I sincerely hope they could find a way to stop that flow of money from happening somehow, as otherwise it's a nice technology and a great technical team behind it...
You can see on their jobs page and 'our story' page, they are mostly in the US and The Netherlands.
Yandex is technically incorporated in The Netherlands and has an engineering presence there, but they are as Russian controlled as can be.
> Does anyone know how much of the Clickhouse team (or ownership) is still located in Russia?
No employees in Russia, no ownership from Russia, no influence from Russia.
ClickHouse Cloud provides port 443 as well as 8443. You can insert one row at a time, it's perfectly ok with "async_insert=1".
Honestly that's something they can probably offer separately, but I really prefer this pricing for most use cases where I want a database that is always available but has bursty request/response patterns. This means I can have an analytical database available for all my small services, websites, etc without having to think too much about availability, support, and a constant price overhead. But ClickHouse is so fast you can get pretty far with a $10 VPS, I admit.
Probably the best comparison is CockroachDB Cloud. They have a "serverless" offering based on unit pricing and a dedicated offering based on provisioned servers + support/maintenance overhead. I think that would be the ideal place to go long-term, but I'm super excited for this current one. I love ClickHouse and want to support them.
ClickHouse is also an interesting case because there's lots of options to migrate clusters, use S3 as long-term storage, etc to where I don't particularly feel locked into this offering if I ever wanted to shift into my own.
> I’m a little sad to see them embrace magic “insert unit” pricing, instead of taking the approach Altinity uses where you are renting an instance size that you can compare apples-to-apples with running your own cluster on ec2.
Thanks for this comment. We'll be publishing a blog at Altinity to compare both models. My view is that they both have their place. The BigQuery pricing model is great for testing or systems that don't have constant load. The Altinity model is good for SaaS systems that run constantly and need the ability to bound costs to ensure margins.
Having a selection of vendors that offer different economics seems better for users than everyone competing for margin on the same operational model.
Disclaimer: I'm CEO of Altinity.
Serverless "pay for usage" is different than the fixed-size dedicated cluster pricing model, and can be quite a bit cheaper, especially with spiky query traffic. It should also be more reliable, since you don't have to predict and provision capacity for your peak usage ahead of time.
Disclaimer: I work for ClickHouse.
Ok but it can also be quite a bit more expensive and your bill is less predictable.
Disclaimer: I’m a customer of altinity.
Start a free trial of ClickHouse Cloud (https://clickhouse.cloud/signUp) and compare price/performance with your current set up.
Pay-for-usage should be cheaper in most use cases, and you can further limit your costs by reducing your scale-up limits in the "Advance Scaling" setting (it will impact your performance though -- best to just let the autoscaler do its job...)
Druid and Pinot are more likely to be the peer group (e.g. see https://leventov.medium.com/comparison-of-the-open-source-ol...)
ClickHouse supports ad-hoc analytics, real-time reporting, and time series workloads at the same time. It is perfectly suited for user-facing analytics services. It supports low-latency (<100ms) queries for real-time analytics as well as high query throughput (500 QPS and more) - all of this with real-time data ingestion of logs, events, and time series.
Take some notable examples from the list: https://clickhouse.com/docs/en/about-us/adopters/, something around web analytics, APM, ad networks, telecom data... ClickHouse is perfectly suited for these use cases. But if you try to align these scenarios with, say, BigQuery, they will become almost impossible or prohibitively expensive or just slow.
There are specialized systems for real-time analytics like Druid and Pinot, but ClickHouse does it better: https://benchmark.clickhouse.com/
There are specialized systems for time-series workloads like InfluxDB and TimescaleDB, but ClickHouse does it better: https://gitlab.com/gitlab-org/incubation-engineering/apm/apm... https://arxiv.org/pdf/2204.09795.pdf http://cds.cern.ch/record/2667383/
There are specialized systems for logs and APM, but ClickHouse does it better: https://blog.cloudflare.com/log-analytics-using-clickhouse/
There are specialized systems for ad-hoc analytics, but ClickHouse does it better as well: https://github.com/crottyan/mgbench
Well, even if you want to process a text file, ClickHouse will do it better than any other tool: https://github.com/dcmoura/spyql/blob/master/notebooks/json_...
And ClickHouse looks like a normal relational database - there is no need for multiple components for different tiers (like in Druid), no need for manual partitioning into "daily", "hourly" tables (like you do in Spark and Bigquery), no need for lambda architecture... It's refreshing how something can be both simple and fast.