How does BigQuery caching on time partitioned tables work? - caching

In contrast with the BigQuery documentation, we see that it DOES cache the results when selecting data from a streaming, data partitioned table (Standard SQL).
Example:
When we perform a deterministic date scan on the streaming, data partitioned table using:
where (_PARTITIONTIME > '2017-11-12' or _PARTITIONTIME is null)
...BigQuery caches the data for 5 to 20 minutes if we fire the same exact query within that time frame.
While in my interpretation of the documentation it states that it SHOULD NOT cache the data:
'When any of the tables referenced by the query have recently received streaming inserts (a streaming buffer is attached to the table) even if no new rows have arrived'
Important notes:
Our test query queries heartbeat events that really arrive at us continuously
We actually want this caching behavior, because we do not always need to have data to be actual to the last second. We just want to know if we really can depend on this behavior.
Our Questions:
What is going on here / Why does the BQ caching happen at all?
The time this data stays in the BQ cache is 'random' (between 5-20 minutes). What does this mean?

Thanks for clarifying the question. I think it's an overlook that we didn't disabled caching for partitioned tables with streaming data. It should as otherwise the query might return outdated results.
We invalidate the cache when the table is changed. Streaming into the table will cause the table to be changed. I guess that's why the cache is invalidated between 5 to 20 minutes.

Related

Is Clickhouse Buffer Table appropriate for realtime ingestion of many small inserts?

I am writing an application that plots financial data and interacts with a realtime feed of such data. Due to the nature of the task, live market data may be received very frequently in one-trade-at-a-time fashion. I am using the database locally and I am the only user. There is only one program (my middleware) that will be inserting data to the db. My primary concern is latency - I want to minimize it as much as possible. For that reason, I would like to avoid having a queue (in a sense, I want the Buffer Table to fulfill that role). A lot of the analytics Clickhouse calculates for me are expected to be realtime (as much as possible) as well. I have three questions:
Clarify some limitations/caveats from the Buffer Table documentation
Clarify how querying works (regular queries + materialized views)
What happens when I query the db when data is being flushed
Question 1) Clarify some limitations/caveats from the Buffer Table documentation
Based on Clickhouse documentation, I understand that many small INSERTs are sub-optimal to say the least. While researching the topic I found that the Buffer Engine [1] could be used as a solution. It made sense to me, however when I read Buffer's documentation I found some caveats:
Note that it does not make sense to insert data one row at a time, even for Buffer tables. This will only produce a speed of a few thousand rows per second, while inserting larger blocks of data can produce over a million rows per second (see the section “Performance”).
A few thousand rows per second is absolutely fine for me, however I am concerned about other performance considerations - if I do commit data to the buffer table one row at a time, should I expect spikes in CPU/memory? If I understand correctly, committing one row at a time to a MergeTree table would cause a lot of additional work for the merging job, but it should not be a problem if Buffer Table is used, correct?
If the server is restarted abnormally, the data in the buffer is lost.
I understand that this refers to things like power outage or computer crashing. If I shutdown the computer normally or stop the clickhouse server normally, can I expect the buffer to flush data to the target table?
Question 2) Clarify how querying works (regular queries + materialized views)
When reading from a Buffer table, data is processed both from the buffer and from the destination table (if there is one).
Note that the Buffer tables does not support an index. In other words, data in the buffer is fully scanned, which might be slow for large buffers. (For data in a subordinate table, the index that it supports will be used.)
Does that mean I can use queries against the target table and expect Buffer Table data to be included automatically? Or is it the other way around - I query the buffer table and the target table is included in the background? If either is true (and I don't need to aggregate both tables manually), does that also mean Materialized Views would be populated? Which table should trigger the materialized view - the on-disk table or the buffer table? Or both, in some way?
I rely on Materialized Views a lot and need them updated in realtime (or as close as possible). What would be the best strategy to accomplish that goal?
Question 3) What happens when I query the db when data is being flushed?
My two main concerns here are with regards to:
Running a query at the exact time flushing occurs - is there a risk of duplicated records or omitted records?
At which point are Materialized Views of the target table populated (I suppose it depends on whether it's the target table or the buffer table that triggers the MV)? Is flushing the buffer important in how I structure the MV?
Thank you for your time.
[1] https://clickhouse.tech/docs/en/engines/table-engines/special/buffer/
A few thousand rows per second is absolutely fine for me, however I am
concerned about other performance considerations - if I do commit data
to the buffer table one row at a time, should I expect spikes in
CPU/memory?
No Buffer tables engine don't produce CPU\Memory spikes
If I understand correctly, committing one row at a time to
a MergeTree table would cause a lot of additional work for the merging
job, but it should not be a problem if Buffer Table is used, correct?
Buffer table engine is works as memory buffer which periodically flushed the batch of rows to underlying *MergeTree table, parameters of Buffer table is a size and frequency of flushes
If I shutdown the computer normally or stop the clickhouse server normally, can I expect the buffer to flush data to the target table?
Yes, when server stop normally, Buffer tables will flush their data.
I query the buffer table and the target table is included in the background?
Yes, this is right behavior, when you SELECT from Buffer table, SELECT also will pass into underlying *MergeTree table and flushed data will read from *MergeTree
does that also mean Materialized Views would be populated?
It is not clear,
do you CREATE MATERIALIZED VIEW as trigger FROM *MergeTree table or trigger FROM the Buffer table, and which Table Engine do you use for TO table clause?
I would suggest CREATE MATERIALIZED VIEW as trigger FROM underlying MergeTree table

How to maintain cache in ClickHouse?

I use crontab to schedule a SQL that queries a big table every 2 hours.
select a,b,c,d,e,f,g,h,i,j,k,many_cols from big_table format Null
It takes anywhere from 5 minutes to 30 seconds at a time.
What I can see from the query_log is that when the SQL time is low, the MarkCacheHits value is high, when the time is high, the MarkCacheHits value is low, and the MarkCacheMiss value is high.
And I'm wondering how to make mark cache hit as many as possible? (This is probably not the only big table that needs to be warmed up)
Will mark cache be replaced by other queries and what is its limit?
Does the warm-up way of selecting specific columns really work for an aggregate query of those columns? For example, warm-up SQL is as above, and the aggregate query can be select a,sum(if(b,c,0)) from big_table group by a
My clickhouse server has been hanging occasionally recently, and I can't see any errors or exceptions at the corresponding time from the log. Could this be related to my regular warm-up query of the big table?
In reality you placing data into Linux disk cache.
Will mark cache be replaced by other queries and what is its limit?
yes, will be replaced, 5GB <mark_cache_size>5368709120</mark_cache_size>
Does the warm-up way of selecting specific columns really work for an aggregate query of those columns?
Yes because you put files into Linux cache.
Could this be related to my regular warm-up query of the big table?
No.

What will happen when inserting a row during a long running query

I am writing some data loading code that pulls data from a large, slow table in an oracle database. I have read-only access to the data, and do not have the ability to change indexes or affect the speed of the query in any way.
My select statement takes 5 minutes to execute and returns around 300,000 rows. The system is inserting large batches of new records constantly, and I need to make sure I get every last one, so I need to save a timestamp for the last time I downloaded the data.
My question is: If my select statement is running for 5 minutes, and new rows get inserted while the select is running, will I receive the new rows or not in the query result?
My gut tells me that the answer is 'no', especially since a large portion of those 5 minutes is just the time spent on the data transfer from the database to the local environment, but I can't find any direct documentation on the scenario.
"If my select statement is running for 5 minutes, and new rows get inserted while the select is running, will I receive the new rows or not in the query result?"
No. Oracle enforces strict isolation levels and does not permit dirty reads.
The default isolation level is Read Committed. This means the result set you get after five minutes will be identical to the one you would have got if Oracle could have delivered you all the records in 0.0000001 seconds. Anything committed after you query started running will not be included in the results. That includes updates to the records as well as inserts.
Oracle does this by tracking changes to the table in the UNDO tablespace. Provided it can restrict the original image from that data your query will run to completion; if for any reason the undo information is overwritten your query will fail with the dreaded ORA-1555: Snapshot too old. That's right: Oracle would rather hurl an exception than provide us with an inconsistent result set.
Note that this consistency applies at the statement level. If we run the same query twice within the one transaction we may see two different results sets. If that is a problem (I think not in your case) we need to switch from Read Committed to Serialized isolation.
The Concepts Manual covers Concurrency and Consistency in great depth. Find out more.
So to answer your question, take the timestamp from the time you start the select. Specifically, take the max(created_ts) from the table before you kick off the query. This should protect you from the gap Alex mentions (if records are not committed the moment they are inserted there is the potential to lose records if you base the select on comparing with the system timestamp). Although doing this means you're issuing two queries in the same transaction which means you do need Serialized isolation after all!

Multiple small inserts in clickhouse

I have an event table (MergeTree) in clickhouse and want to run a lot of small inserts at the same time. However the server becomes overloaded and unresponsive. Moreover, some of the inserts are lost. There are a lot of records in clickhouse error log:
01:43:01.668 [ 16 ] <Error> events (Merger): Part 201 61109_20161109_240760_266738_51 intersects previous part
Is there a way to optimize such queries? I know I can use bulk insert for some types of events. Basically, running one insert with many records, which clickhouse handles pretty well. However, some of the events, such as clicks or opens could not be handled in this way.
The other question: why clickhouse decides that similar records exist, when they don't? There are similar records at the time of insert, which have the same fields as in index, but other fields are different.
From time to time I also receive the following error:
Caused by: ru.yandex.clickhouse.except.ClickHouseUnknownException: ClickHouse exception, message: Connect to localhost:8123 [ip6-localhost/0:0:0:0:0:0:0:1] timed out, host: localhost, port: 8123; Connect to ip6-localhost:8123 [ip6-localhost/0:0:0:0:0:0:0:1] timed out
... 36 more
Mostly during project build when test against clickhouse database are run.
Clickhouse has special type of tables for this - Buffer. It's stored in memory and allow many small inserts with out problem. We have near 200 different inserts per second - it works fine.
Buffer table:
CREATE TABLE logs.log_buffer (rid String, created DateTime, some String, d Date MATERIALIZED toDate(created))
ENGINE = Buffer('logs', 'log_main', 16, 5, 30, 1000, 10000, 1000000, 10000000);
Main table:
CREATE TABLE logs.log_main (rid String, created DateTime, some String, d Date)
ENGINE = MergeTree(d, sipHash128(rid), (created, sipHash128(rid)), 8192);
Details in manual: https://clickhouse.yandex/docs/en/operations/table_engines/buffer/
This is known issue when processing large number of small inserts into (non-replicated) MergeTree.
This is a bug, we need to investigate and fix.
For workaround, you should send inserts in larger batches, as recommended: about one batch per second: https://clickhouse.tech/docs/en/introduction/performance/#performance-when-inserting-data.
I've had a similar problem, although not as bad - making ~20 inserts per second caused the server to reach a high loadavg, memory consumption and CPU use. I created a Buffer table which buffers the inserts in memory, and then they are flushed periodically to the "real" on-disk table. And just like magic, everything went quite: loadavg, memory and CPU usage came down to normal levels. The nice thing is that you can run queries against the buffer table, and get back matching rows from both memory and disk - so clients are unaffected by the buffering. See https://clickhouse.tech/docs/en/engines/table-engines/special/buffer/
Alternatively, you can use something like https://github.com/nikepan/clickhouse-bulk: it will buffer multiple inserts and flush them all together according to user policy.
The design of clickhouse MergeEngines is not meant to take small writes concurrently. The MergeTree as much as I understands merges the parts of data written to a table into based on partitions and then re-organize the parts for better aggregated reads. If we do small writes often you would encounter another exception that Merge
Error: 500: Code: 252, e.displayText() = DB::Exception: Too many parts (300). Merges are processing significantly slow
When you would try to understand why the above exception is thrown the idea will be a lot clearer. CH needs to merge data and there is an upper limit as to how many parts can exist! And every write in a batch is added as a new part and then eventually merged with the partitioned table.
SELECT
table, count() as cnt
FROM system.parts
WHERE database = 'dbname' GROUP BY `table` order by cnt desc
The above query can help you monitor parts, observe while writing how the parts would increase and eventually merge down.
My best bet for the above would be buffering the data set and periodically flushing it to DB, but then that means no real-time analytics.
Using buffer is good, however please consider these points:
If the server is restarted abnormally, the data in the buffer is lost.
FINAL and SAMPLE do not work correctly for Buffer tables. These conditions are passed to the destination table, but are not used for processing data in the buffer
When adding data to a Buffer, one of the buffers is locked. (So no reads)
If the destination table is replicated, some expected characteristics of replicated tables are lost when writing to a Buffer table. (no deduplication)
Please read throughly, it's a special case engine: https://clickhouse.tech/docs/en/engines/table-engines/special/buffer/

(TSQL) INSERT doubling time of the query

I have a quite complex multi-join TSQL SELECT query that runs for about 8 seconds and returns about 300K records. Which is currently acceptable. But I need to reuse results of that query several times later, so I am inserting results of the query into a temp table. Table is created in advance with columns that match output of SELECT query. But as soon as I do INSERT INTO ... SELECT - execution time more than doubles to over 20 seconds! Execution plans shows that 46% of the query cost goes to "Table Insert" and 38% to Table Spool (Eager Spool).
Any idea why this is happening and how to speed it up?
Thanks!
The "Why" of it hard to say, we'd need a lot more information. (though my SWAG would be that it has to do with logging...)
However, the solution, 9 times out of 10 is to use SELECT INTO to make your temp table.
I would start by looking at standard tuning itmes. Is disk performing? Are there sufficient resources (IOs, RAM, CPU, etc)? Is there a bottleneck in the RDBMS? Does sound like the issue but what is happening with locking? Does other code give similar results? Is other code performant?
A few things I can suggest based on the information you have provided. If you don't care about dirty reads, you could always change the transaction isolation level (if you're using MS T-SQL)
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
select ...
This may speed things up on your initial query as locks will not need to be done on the data you are querying from. If you're not using SQL server, do a google search for how to do the same thing with the technology you are using.
For the insert portion, you said you are inserting into a temp table. Does your database support adding primary keys or indexes on your temp table? If it does, have a dummy column in there that is an indexed column. Also, have you tried to use a regular database table with this? Depending on your set up, it is possible that using that will speed up your insert times.

Resources