When rows in the table expire according to TTL expiration does Clickhouse remove them immediately and remove all, so you won't see some intermediate states when doing queries? Or If I use group by clause in the TTL expression does it guarantee that aggregation will be applied to all expired records?
It's not atomic (it's atomic on a part level (NOT PARTITION !!!!)).
TTL expression does NOT guarantee anything.
You can see partial result. TTL expression works eventually. Data may be removed sometime in the future OR NEVER.
About NEVER: There are merge_tree settings
merge_with_ttl_timeout │ 14400
Minimal time in seconds, when merge with delete TTL can be repeated.
max_replicated_merges_with_ttl_in_queue │ 1
How many tasks of merging parts with TTL are allowed simultaneously in ReplicatedMergeTree queue.
max_number_of_merges_with_ttl_in_pool │ 2
When there is more than specified number of merges with TTL entries in pool, do not assign new merge with TTL. This is to leave free threads for regular merges and avoid "Too many parts"
TTL wakes up every 4 hours (merge_with_ttl_timeout) and processes limited number of parts if there are free threads in merge tree pool. If the number of partitions which should be processed by TTL is big or there is no free resources in mergetree pool then TTL will not be able to keep up.
Related
My usecase:
I am using Redis for storing high amount of data.
In 1 sec i write around 24k keys in redis with ttl as 30 minutes and i want the keys to be deleted after ttl has expired.
The current redis implementation of evicting keys is it works in tasks and each task pick 20 random keys and see if keys have expired ttl then it deletes those keys and redis recommends not more than 100 such tasks to be used. So if i se hz(no of tasks to 100) then Redis will be able to clear tke keys max # 2000 keys/ sec which is too less for me as my insertion rate is very high which eventually results in out of memory exception when memory gets full.
Alternative i have is :
1/ Hit Random Keys, or keys which we know have expired, this will initiate delete in Redis
2/ Set eviction policy when maxmemory is reached. This will aggressively delete redis keys, when max memory is reached.
3/ Set hz (Frequency), to some higher value. This will initiate more tasks for purging expired keys per sec.
1/ Doesn't seem feasible.
For 2/ & 3/
Based on the current cache timer of 30 minutes, and given insertion rate, we can use
maxmemory 12*1024*1024
maxmemory-samples 10
maxmemory-policy volatile-ttl
hz 100
But using 2 would mean all the time redis will be performing the deletion of keys and then insertion,as i am assuming in my case memory will always be equal to 12 GB
So is it good to use this strategy, or we should write our own keys eviction service over Redis?
Are you using Azure Redis cache? If yes, you can think of using clustering. You can have up to 10 shards in the cluster and this will help you to share your load for all different no. of keys for different operation.
I think the only way to get a definitive answer to your question is to write a test. A synthetic test similar to your actual workload should not be hard to create and will let you know if redis can expire keys as quickly as you can insert them and what impact changing the hz value has on performance.
Using maxmemory should also be a workable strategy. Will mean the allocated memory might always be full, but should work.
Another is to reduce the number of keys you are writing to. If the keys you are writing to contain string values, you can instead write them into a redis hash field.
For example, if your inserts look something like this:
redis.set "aaa", value_a
redis.set "bbb", value_b
You could instead use a hash:
# current_second is just a timestamp in seconds
redis.hset current_second, "aaa", value_a
redis.hset current_second, "bbb", value_b
By writing to a hash with the current timestamp in its key and setting the TTL on the entire hash redis only has to evict one key per second.
Given some of the advantages of using hashes in redis, I expect the hash approach to perform best if your use case is compatible.
Might be worth testing before deciding.
I have a nosql database rows of two types:
Rows that are essentially counters with a high number of updates per second. It doesn't matter if these updates are done in a batch once every n seconds (where n is say 2 seconds).
Rows that contain tree-like structures, and each time the row is updated the tree structure has to be updated. Updating the tree structure each time is expensive, it would be better to do it as a batch job once every n seconds.
This is my plan and then I will explain the part I am struggling to execute and whether I need to move to something like RabbitMQ.
Each row has a unique id which I use as the key for redis. Redis can easily do loads of counter increments no problem. As for the tree structure, each update for the row can use the string append command to appen json instructions on how to modify the existing tree in the database.
This is the tricky part
I want to ensure each row gets updated every n seconds. There will be a large amount of redis keys getting updated.
This was my plan. Have three queues: pre-processing, processing, dead
By default every key is placed in the pre-processing queue when the command for a database update comes in. After exactly n seconds move each key/value which has been there for n seconds to the processing queue (don't know how to do this efficiently and concurrently). Now n seconds have passed, it doesn't matter which order the processing queue is done in and I can have any consumers racing through them. And I will have a dead queue in case tasks keep failing for some reason.
Is there a better way to do this? Is what I am thinking of possible?
We are using a timestamp to ensure that entries in a log table are recorded sequentially, but we have found a potential flaw. Say, for example, we have two nodes in our RAC and the node timestamps are 1000ms off. Our app server inserts two log entries within 30ms seconds of each other. The first insert is serviced by Node1 and the second by Node2. With 1000ms difference between the two nodes, the timestamp could potentially show the log entries occurring in the wrong order! (I would just use a sequence, but our sequences are cached for performance reasons... )
NTP sync doesn't help this situation because NTP has a fault tolerance of 128ms -- which leaves the door open for records to be recorded out of order when they occur more frequently than that.
I have a feeling I'm looking at this problem the wrong way. My ultimate goal is to be able to retrieve the actual sequence that log entries are recorded. It doesn't have to be by a timestamp column.
An Oracle sequence with ORDER specified is guaranteed to return numbers in order across a RAC cluster. So
create sequence my_seq
start with 1
increment by 1
order;
Now, in order to do this, that means that you're going to be doing a fair amount of inter-node communication in order to ensure that access to the sequence is serialized appropriately. That's going to make this significantly more expensive than a normal sequence. If you need to guarantee order, though, it's probably the most efficient approach that you're going to have.
Bear in mind that an attached timestamp on a row is generated at time of the insert or update, but the time that the actual change to the database takes place is when the commit happens - which, depending on the complexity of the transactions, row 1 might get inserted before row2, but gett committed after.
The only thing I am aware of in Oracle across the nodes that guarantees the order is the SCN that Oracle attaches to the transaction, and by which transactions in a RAC environment can be ordered for things like Streams replication.
1000ms? It is one sec, isn't it? IMHO it is a lot. If you really need precise time, then simply give up the idea of global time. Generate timestamps on log server and assume that each log server has it's own local time. Read something about Lamport's time, if you need some theory. But maybe the source of your problem is somewhere else. RAC synchronises time between nodes, and it would log some bigger discrepancy.
If two consecutive events are logged by two different connections, is the same thread using both connections? Or are those evens passed to background threads and then those threads write into the database? i.e. is it logged sequentially or in parallel?
I could see the DBA team advises to set the sequence cache to a higher value at the time of performance optimization. To increase the value from 20 to 1000 or 5000.The oracle docs, says the the cache value,
Specify how many values of the sequence the database preallocates and keeps in memory for faster access.
Somewhere in the AWR report I can see,
select SEQ_MY_SEQU_EMP_ID.nextval from dual
Can any performance improvement be seen if I increase the cache value of SEQ_MY_SEQU_EMP_ID.
My question is:
Is the sequence cache perform any significant role in performance? If so how to know what is the sufficient cache value required for a sequence.
We can get the sequence values from oracle cache before them used out. When all of them were used, oracle will allocate a new batch of values and update oracle data dictionary.
If you have 100000 records need to insert and set the cache size is 20, oracle will update data dictionary 5000 times, but only 20 times if you set 5000 as cache size.
More information maybe help you: http://support.esri.com/en/knowledgebase/techarticles/detail/20498
If you omit both CACHE and NOCACHE, then the database caches 20 sequence numbers by default. Oracle recommends using the CACHE setting to enhance performance if you are using sequences in an Oracle Real Application Clusters environment.
Using the CACHE and NOORDER options together results in the best performance for a sequence. CACHE option is used without the ORDER option, each instance caches a separate range of numbers and sequence numbers may be assigned out of order by the different instances. So more the value of CACHE less writes into dictionary but more sequence numbers might be lost. But there is no point in worrying about losing the numbers, since rollback, shutdown will definitely "lose" a number.
CACHE option causes each instance to cache its own range of numbers, thus reducing I/O to the Oracle Data Dictionary, and the NOORDER option eliminates message traffic over the interconnect to coordinate the sequential allocation of numbers across all instances of the database. NOCACHE will be SLOW...
Read this
By default in ORACLE cache in sequence contains 20 values. We can redefine it by given cache clause in sequence definition. Giving cache caluse in sequence benefitted into that when we want generate big integers then it takes lesser time than normal, otherwise there are no such drastic performance increment by declaring cache clause in sequence definition.
Have done some research and found some relevant information in this regard:
We need to check the database for sequences which are high-usage but defined with the default cache size of 20 - the performance
benefits of altering the cache size of such a sequence can be
noticeable.
Increasing the cache size of a sequence does not waste space, the
cache is still defined by just two numbers, the last used and the
high water mark; it is just that the high water mark is jumped by a
much larger value every time it is reached.
A cached sequence will return values exactly the same as a non-cached
one. However, a sequence cache is kept in the shared pool just as
other cached information is. This means it can age out of the shared
pool in the same way as a procedure if it is not accessed frequently
enough. Everything is the cache is also lost when the instance is
shut down.
Besides spending more time updating oracle data dictionary having small sequence caches can have other negative effects if you work with a Clustered Oracle installation.
In Oracle 10g RAC Grid, Services and Clustering 1st Edition by Murali Vallath it is stated that if you happen to have
an Oracle Cluster (RAC)
a non-partitioned index on a column populated with an increasing sequence value
concurrent multi instance inserts
you can incur in high contention on the rightmost index block and experience a lot of Cluster Waits (up to 90% of total insert time).
If you increase the size of the relevant sequence cache you can reduce the impact of Cluster Waits on your index.
I'm storing a bunch of realtime data in redis. I'm setting a TTL of 14400 seconds (4 hours) on all of the keys. I've set maxmemory to 10G, which currently is not enough space to fit 4 hours of data in memory, and I'm not using virtual memory, so redis is evicting data before it expires.
I'm okay with redis evicting the data, but I would like it to evict the oldest data first. So even if I don't have a full 4 hours of data, at least I can have some range of data (3 hours, 2 hours, etc) with no gaps in it. I tried to accomplish this by setting maxmemory-policy=volatile-ttl, thinking that the oldest keys would be evicted first since they all have the same TTL, but it's not working that way. It appears that redis is evicting data somewhat arbitrarily, so I end up with gaps in my data. For example, today the data from 2012-01-25T13:00 was evicted before the data from 2012-01-25T12:00.
Is it possible to configure redis to consistently evict the older data first?
Here are the relevant lines from my redis.cnf file. Let me know if you want to see any more of the cofiguration:
maxmemory 10gb
maxmemory-policy volatile-ttl
vm-enabled no
AFAIK, it is not possible to configure Redis to consistently evict the older data first.
When the *-ttl or *-lru options are chosen in maxmemory-policy, Redis does not use an exact algorithm to pick the keys to be removed. An exact algorithm would require an extra list (for *-lru) or an extra heap (for *-ttl) in memory, and cross-reference it with the normal Redis dictionary data structure. It would be expensive in term of memory consumption.
With the current mechanism, evictions occur in the main event loop (i.e. potential evictions are checked at each loop iteration before each command is executed). Until memory is back under the maxmemory limit, Redis randomly picks a sample of n keys, and selects for expiration the most idle one (for *-lru) or the one which is the closest to its expiration limit (for *-ttl). By default only 3 samples are considered. The result is non deterministic.
One way to increase the accuracy of this algorithm and mitigate the problem is to increase the number of considered samples (maxmemory-samples parameter in the configuration file).
Do not set it too high, since it will consume some CPU. It is a tradeoff between eviction accuracy and CPU consumption.
Now if you really require a consistent behavior, one solution is to implement your own eviction mechanism on top of Redis. For instance, you could add a list (for non updatable keys) or a sorted set (for updatable keys) in order to track the keys that should be evicted first. Then, you add a daemon whose purpose is to periodically check (using INFO) the memory consumption and query the items of the list/sorted set to remove the relevant keys.
Please note other caching systems have their own way to deal with this problem. For instance with memcached, there is one LRU structure per slab (which depends on the object size), so the eviction order is also not accurate (although more deterministic than with Redis in practice).