TTL for system.query_log is not set (clickhouse) - clickhouse

when I use <engine> tag to set ttl for query_log table in config.xml file - ttl is set for new table query_log after removing old one:
<query_log>
<database>system</database>
<table>query_log</table>
<engine>ENGINE = MergeTree PARTITION BY toYYYYMM(event_date)
ORDER BY (event_date, event_time)
TTL event_date + INTERVAL 1 MINUTE DELETE
SETTINGS min_bytes_for_wide_part = '10M'
</engine>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</query_log>
but when I want to configure ttl in a separate tag <ttl> - ttl doesn't set for new query_log table:
<query_log>
<database>system</database>
<table>query_log</table>
<partition_by>toYYYYMM(event_date)</partition_by>
<ttl>event_date + INTERVAL 1 MINUTE DELETE</ttl>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</query_log>
I use clickhouse 20.8.2.3
Can someone help me to solve this problem, please? I want to use <ttl> option

20.8.2.3 is out support.
You need to upgrade.
https://github.com/ClickHouse/ClickHouse/blob/master/CHANGELOG.md#clickhouse-release-v211215-stable-2021-01-18
ClickHouse release v21.1.2.15-stable 2021-01-18
Allow specifying TTL to remove old entries from system log tables, using the <ttl> attribute in config.xml. #17438 (Du Chuan).

Related

Is there a way to update with exising field value using JDBC Sink Connector?

I want to implement the page view count. On each visit to the page, an event will be published to Kafka. The event includes pageId and Date.
I want to use the JDBC connector to increase the page count against the pageId and date.
Is it possible with JDBC Sink connector? How to proceed?
Yes, you can set insert.mode to upsert or update rather than the default.
Keep in mind that the database query will overwrite the count field, not increase it (as this is not how UPDATE queries work), so you must run some other process that will sum the total counts before writing to the database.
https://docs.confluent.io/kafka-connect-jdbc/current/sink-connector/sink_config_options.html#writes
https://rmoff.net/2021/03/12/kafka-connect-jdbc-sink-deep-dive-working-with-primary-keys/
You could also remove the count completely from the Kafka data, and just have a table of "page view logs", then run SELECT date, page, COUNT(*) GROUP BY date, page; in the database directly.

Field's Max and Min values as Default values in Control

I have 2 controls, Start Date and End Date. I would like to have the min and max of a particular field to be selected as default values of the controls. Is there anyway to do it. I tried creating a calculated field, max or min({field},[],pre_filter) but later realized that we can't add calculated field into a parameter. I'm using Standard Edition. Any help/idea is much appreciated.
I encountered a similar question recently and developed a workaround for this by connecting to my Redshift cluster which required 2 things:
A table housing all users for the dashboard in question
A table that houses the metrics I'm setting defaults on
I created a separate dataset for setting default parameters which contained a complete list of my users, along with the min/max values from querying the second table with the value. Something like:
SELECT USER_NAME
, MIN_METRIC
, MAX_METRIC
FROM USERS A
CROSS JOIN (SELECT MIN(METRIC_VALUE) MIN_METRIC
, MAX(METRIC_VALUE) MAX_METRIC
FROM METRIC_TABLE) B
Once you've built this new data set, you'd add it to your existing analysis and utilize it for setting default parameters, adding the controls, and setting the filters to key off of them.
The downside to this approach is that it does require an exhaustive user list as any null users would see whatever the non-dynamic defaults are, but with an appropriate user table, this shouldn't be an issue.

User_Sessions in vertica

I have a requirement where in I have to capture user_session details for last few months. When I query user_sessions table, I have information only for last three, four days. Is there anyway, we could get the user_sessions details for last 6 months?
Thank you,
Sadagopan
User_session is a view on top of 3 diffract data collectors tables , data collectors tables include info about many events and activity’s exists on Vertica , this info is being persists on disk with some default retention period .
You have two main options to have 6 months historical view of your sessions
1. Change the setting of the retention period of relevant DC tables to 6Mounts
2. Develop a script or process that will run each few days and merge the content of the user_session to user define local table .
For options #1 you need to run the below API for each one of the DC tables (be careful using this options require extra disk space on the Vertica side ) .
SELECT set_data_collector_time_policy('SessionEnds', '1 day'::interval);
SELECT set_data_collector_time_policy('SessionStarts', '1 day'::interval);
SELECT set_data_collector_time_policy('RuntimePriorityChanges', '1 day'::interval);

MAX() SQL Equivalent on Redis

I'm new on Redis, and now I have problem to improve my stat application. The current SQL to generate the statistic is here:
SELECT MIN(created_at), MAX(created_at) FROM table ORDER BY id DESC limit 10000
It will return MIN and MAX value from created_at field.
I have read about RANGE and SCORING on Redis, seem them can be used to solve this problem. But I still confused about SCORING for last 10000 records. Are they can be used to solve this problem, or is there another way to solve this problem using Redis?
Regards
Your target appears to be somewhat unclear - are you looking to store all the records in Redis? If so, what other columns does the table table have and what other queries do you run against it?
I'll take your question at face value, but note that in most NoSQL databases (Redis included) you need to store your data according to how you plan on fetching it. Assuming that you want to get the min/max creation dates of the last 10K records, I suggest that you keep them in a Sorted Set. The Sorted Set's members will be the unique id and their scores will be the creation date (use the epoch value), for example, rows with ids 1, 2 & 3 were created at dates 10, 100 & 1000 respectively:
ZADD table 10 1 100 2 1000 3 ...
Getting the minimal creation date is easy now - just do ZRANGE table 0 0 WITHSCORES - and the max is just a ZRANGE table -1 -1 WITHSCORES away. The only "tricky" part is making sure that the Sorted Set is kept updated, so for every new record you'll need to remove the lowest id from the set and add the new one. In pseudo Python code this would look something like the following:
def updateMinMaxSortedSet(id, date):
count = redis.zcount('table')
if count > 10000:
redis.zrem('table', id-10000)
redis.zadd('table', id, date)

date difference in terms of days using mongotemplate

I have 3 columns in my mongodb named as days (long), startDate (java.util.Date), endDate (java.util.Date). What all I want to fetch the records between startDate and (endDate-days) OR (endDate-startDate) <= days.
Can you please let me know how could i achieve this using mongoTemplate spring.
I don't want to fetch all the records from table and then resolve this on java side since in future my table may have million of records.
Thanks
Jitender
There is no way to do this in the query on the DB side (the end minus start part). What I recommend if this is an important feature for your application is that you alter the schema to maintain in the document the delta between the two fields in the format you need it. Since you can update that field when you update endDate (or if you populate both dates at the same time you can just compute the field then).
If you receive this data in bulk from another source, or if you do multi-updates of the endDate then you will probably need another job to run and periodically compute the delta of the documents where it's not computed (then you can start with always setting delta to 99999 and update it in this job to accurate value once endDate is set).
While you can use $where clause, it will be a very slow full collection scan so I would not suggest its use - it's probably better to come up with a more performant alternative even if it requires altering the schema.
http://docs.mongodb.org/manual/reference/operator/where/

Resources