Time comparison in ClickHouse - clickhouse

Maybe I'm missing something simple, but I could not make time filtering to work.
Here is my sample query:
select toTimeZone(ts, 'Etc/GMT+2') as z
from (select toDateTime('2019-08-31 20:35:00') AS ts)
where z > '2019-08-31 20:34:00'
I would expect 0 results, but getting:
2019-08-31T18:35:00+00:00
Is it a bug, or do I misuse the toTimeZone() function?
Thanks!

ClickHouse stores DateTime as Unix timestamp - other words without timezone.
But timezone is taken into account when sql-query executed:
SELECT
toDateTime('2019-08-31 20:35:00', 'UTC') AS origin_date,
toTimeZone(origin_date, 'Etc/GMT+2') AS d1,
toTypeName(d1) AS type1,
toUnixTimestamp(d1) AS t1,
toTimeZone(origin_date, 'UTC') AS d2,
toTypeName(d2) AS type2,
toUnixTimestamp(d2) AS t2
FORMAT Vertical
Row 1:
──────
origin_date: 2019-08-31 20:35:00
d1: 2019-08-31 18:35:00
type1: DateTime('Etc/GMT+2')
t1: 1567283700 # <-- t1 == t2
d2: 2019-08-31 20:35:00
type2: DateTime('UTC')
t2: 1567283700 # <-- t1 == t2
Your query works correctly.
To 'reset the timezone' of z-date can be used this way:
SELECT toDateTime(toString(toTimeZone(ts, 'Etc/GMT+2'))) AS z
FROM
(
SELECT toDateTime('2019-08-31 20:35:00') AS ts
)
WHERE z > '2019-08-31 20:34:00'

TZ is a property of the type not of the value
DESCRIBE TABLE
(
SELECT
toTimeZone(toDateTime('2019-08-31 20:35:00'), 'Etc/GMT+2') AS x,
toDateTime('2019-08-31 20:35:00') AS y
)
┌─name─┬─type──────────────────┬─
│ x │ DateTime('Etc/GMT+2') │
│ y │ DateTime │
└──────┴───────────────────────┴─
SELECT toTimeZone(ts, 'Etc/GMT+2') AS z
FROM
(
SELECT toDateTime('2019-08-31 20:35:00') AS ts
)
WHERE z > toDateTime('2019-08-31 20:34:00', 'Etc/GMT+2')
Ok.
0 rows in set. Elapsed: 0.002 sec.

Related

Oracle sql - Merging two tables with n periods each into one table

I am trying to merge two tables with n periods into one:
I have the below tables:
Tables
Period1 .. Period750 represents a date, eg, period 1 = Jan 1st, Period2, Jan 2nd ...
How can we get to that result ?
thank you for the advice,
regards,
Oscar
select
product, startdate
, sum(period1) as period1
, sum(period2) as period2
...
, sum(period750) as period750
from(
select * from table1 union all
select * from table2 union all
...
)
group by product, startdate
Use a MERGE statement:
MERGE INTO table1 t1
USING table2 t2
ON (t1.product = t2.product AND t1.stardate = t2.stardate)
WHEN MATCHED THEN
UPDATE
SET period1 = t1.period1 + t2.period1,
period2 = t1.period2 + t2.period2,
-- ...
period749 = t1.period749 + t2.period749,
period750 = t1.period740 + t2.period750
WHEN NOT MATCHED THEN
INSERT (
product,
stardate,
period1,
period2,
-- ...,
period749,
period750
) VALUES (
t2.product,
t2.stardate,
t2.period1,
t2.period2,
-- ...,
t2.period749,
t2.period750
);

Clickhouse SQL Query: Average in intervals

I have a table:
deviceId, valueDateTime, value, valueType
Where the valueType - temperature, pressure, etc.
I have several parameters for query: begin, end (period), and time interval (for example 20 minutes)
I want to get charts for the period for each deviceId and valueType with series of average values for each interval in the period.
EDIT:
Above is the final task, at this moment I just experimenting with this task and I use https://play.clickhouse.tech/?file=playground where I trying to solve a similar task. I want to calculate the average Age in the time interval grouped by Title field. And I have a problem, how to add grouping by Title?
-- 2013-07-15 00:00:00 - begin
-- 2013-07-16 00:00:00 - end
-- 1200 - average in interval 20m
SELECT t, avg(Age) as Age FROM (
SELECT
arrayJoin(
arrayMap(x -> addSeconds(toDateTime('2013-07-15 00:00:00'), x * 1200),
range(toUInt64(dateDiff('second', toDateTime('2013-07-15 00:00:00'), toDateTime('2013-07-16 00:00:00'))/1200)))
) as t,
null as Age
UNION ALL
SELECT
(addSeconds(
toDateTime('2013-07-15 00:00:00'),
1200 * intDivOrZero(dateDiff('second', toDateTime('2013-07-15 00:00:00'), EventTime), 1200))
) as t,
avg(Age) as Age
FROM `hits_100m_obfuscated`
WHERE EventTime BETWEEN toDateTime('2013-07-15 00:00:00') AND toDateTime('2013-07-16 00:00:00')
GROUP BY t
)
GROUP BY t ORDER BY t;
EDITED 2
Correct answer from vladimir adapted to be used and tested on https://play.clickhouse.tech/?file=playground
SELECT
Title, -- as deviceId
JavaEnable, -- as valueType
groupArray((rounded_time, avg_value)) values
FROM (
WITH 60 * 20 AS interval
SELECT
Title,
JavaEnable,
toDateTime(intDiv(toUInt32(EventTime), interval) * interval)
AS rounded_time, -- EventTime as valueDateTime
avg(Age) avg_value -- Age as value
FROM `hits_100m_obfuscated`
WHERE
EventTime BETWEEN toDateTime('2013-07-15 00:00:00')
AND toDateTime('2013-07-16 00:00:00')
GROUP BY
Title,
JavaEnable,
rounded_time
ORDER BY rounded_time
)
GROUP BY
Title,
JavaEnable
ORDER BY
Title,
JavaEnable
Try this query:
SELECT
deviceId,
valueType,
groupArray((rounded_time, avg_value)) values
FROM (
WITH 60 * 20 AS interval
SELECT
deviceId,
valueType,
toDateTime(intDiv(toUInt32(valueDateTime), interval) * interval) AS rounded_time,
avg(value) avg_value
FROM
(
/* emulate the test dataset */
SELECT
number % 4 AS deviceId,
now() - (number * 60) AS valueDateTime,
number % 10 AS value,
if((number % 2) = 1, 'temp', 'pres') AS valueType
FROM numbers(48)
)
/*WHERE valueDateTime >= begin AND valueDateTime < end */
GROUP BY
deviceId,
valueType,
rounded_time
ORDER BY rounded_time
)
GROUP BY
deviceId,
valueType
ORDER BY
deviceId,
valueType
/*
┌─deviceId─┬─valueType─┬─values────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ 0 │ pres │ [('2021-02-12 06:00:00',4),('2021-02-12 06:20:00',4),('2021-02-12 06:40:00',4),('2021-02-12 07:00:00',0)] │
│ 1 │ temp │ [('2021-02-12 06:00:00',5),('2021-02-12 06:20:00',5),('2021-02-12 06:40:00',5),('2021-02-12 07:00:00',1)] │
│ 2 │ pres │ [('2021-02-12 06:00:00',4),('2021-02-12 06:20:00',4),('2021-02-12 06:40:00',4)] │
│ 3 │ temp │ [('2021-02-12 06:00:00',5),('2021-02-12 06:20:00',5),('2021-02-12 06:40:00',5)] │
└──────────┴───────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────┘
*/
I would recommend using Grafana to visualize CH report (see Grafana ClickHouse datasource).

Time series query based on another table

Initial data
CREATE TABLE a_table (
id UInt8,
created_at DateTime
)
ENGINE = MergeTree()
PARTITION BY tuple()
ORDER BY id;
CREATE TABLE b_table (
id UInt8,
started_at DateTime,
stopped_at DateTime
)
ENGINE = MergeTree()
PARTITION BY tuple()
ORDER BY id;
INSERT INTO a_table (id, created_at) VALUES
(1, '2020-01-01 00:00:00'),
(2, '2020-01-02 00:00:00'),
(3, '2020-01-03 00:00:00')
;
INSERT INTO b_table (id, started_at, stopped_at) VALUES
(1, '2020-01-01 00:00:00', '2020-01-01 23:59:59'),
(2, '2020-01-02 00:00:00', '2020-01-02 23:59:59'),
(3, '2020-01-04 00:00:00', '2020-01-04 23:59:59')
;
Expected result: The 'a_table' rows by condition
b_table.started_at >= a_table.created_at AND
b_table.stopped_at <= a_table.created_at
+----+---------------------+
| id | created_at |
+----+---------------------+
| 1 | 2020-01-01 00:00:00 |
+----+---------------------+
| 2 | 2020-01-02 00:00:00 |
+----+---------------------+
What have i tried
-- No errors, empty result
SELECT a_table.*
FROM a_table
INNER JOIN b_table
ON b_table.id = a_table.id
WHERE b_table.started_at >= a_table.created_at
ANd b_table.stopped_at <= a_table.created_at
;
SELECT a_table.*
FROM a_table
ASOF INNER JOIN (
SELECT * FROM b_table
) q
ON q.id = a_table.id
AND q.started_at >= a_table.created_at
-- Error:
-- Invalid expression for JOIN ON.
-- ASOF JOIN expects exactly one inequality in ON section,
-- unexpected stopped_at <= created_at.
-- AND q.stopped_at <= a_table.created_at
;
WHERE b_table.started_at >= a_table.created_at
ANd b_table.stopped_at <= a_table.created_at
Wrong condition >= <= --> <= >=
20.8.7.15
SELECT
a_table.*,
b_table.*
FROM a_table
INNER JOIN b_table ON b_table.id = a_table.id
WHERE (b_table.started_at <= a_table.created_at) AND (b_table.stopped_at >= a_table.created_at)
┌─id─┬──────────created_at─┬─b_table.id─┬──────────started_at─┬──────────stopped_at─┐
│ 1 │ 2020-01-01 00:00:00 │ 1 │ 2020-01-01 00:00:00 │ 2020-01-01 23:59:59 │
│ 2 │ 2020-01-02 00:00:00 │ 2 │ 2020-01-02 00:00:00 │ 2020-01-02 23:59:59 │
└────┴─────────────────────┴────────────┴─────────────────────┴─────────────────────┘
In real production such queries would not work. Because JOIN is very slow.
It needs re-design. It hard to say how without knowing why do you have the second table. Probably I would use rangeHashed external dictionary.

how can I calculated point of each user per day with sum all the points from beginning to that day in clickhouse

I have this data in clickhouse:
final point of each user in day is sum(point) from the beginning to that day.
e.g: point of user 1 in 2020-07-02 is 800 and in 2020-07-03 is 200.
I need this result: Point of each user per day:
select uid, d, t from (
select uid, groupArray(date) dg, arrayCumSum(groupArray(spt)) gt from
(select uid, date, sum(pt) spt from
(select 1 tid, '2020-07-01' date, 1 uid, 500 pt
union all
select 1 tid, '2020-07-02' date, 1 uid, 300 pt
union all
select 1 tid, '2020-07-03' date, 1 uid, -600 pt)
group by uid, date
order by uid, date)
group by uid) array join dg as d, gt as t
┌─uid─┬─d──────────┬───t─┐
│ 1 │ 2020-07-01 │ 500 │
│ 1 │ 2020-07-02 │ 800 │
│ 1 │ 2020-07-03 │ 200 │
└─────┴────────────┴─────┘

Ora-00932 - expected NUMBER got -

I have been running the below query without issue:
with Nums (NN) as
(
select 0 as NN
from dual
union all
select NN+1 -- (1)
from Nums
where NN < 30
)
select null as errormsg, trunc(sysdate)-NN as the_date, count(id) as the_count
from Nums
left join
(
SELECT c1.id, trunc(c1.c_date) as c_date
FROM table1 c1
where c1.c_date > trunc(sysdate) - 30
UNION
SELECT c2.id, trunc(c2.c_date)
FROM table2 c2
where c2.c_date > trunc(sysdate) -30
) x1
on x1.c_date = trunc(sysdate)-Nums.NN
group by trunc(sysdate)-Nums.NN
However, when I try to pop this in a proc for SSRS use:
procedure pr_do_the_thing (RefCur out sys_refcursor)
is
oops varchar2(100);
begin
open RefCur for
-- see above query --
;
end pr_do_the_thing;
I get
Error(): PL/SQL: ORA-00932: inconsistent datatypes: expected NUMBER got -
Any thoughts? Like I said above, as a query, there is no issue. As a proc, the error appears at note (1) int eh query.
This seems to be bug 18139621 (see MOS Doc ID 2003626.1). There is a patch available, but if this is the only place you encounter this, it might be simpler to switch to a hierarchical query:
with Nums (NN) as
(
select level - 1
from dual
connect by level <= 31
)
...
You could also calculate the dates inside the CTE (which also fails with a recursive CTE):
with Dates (DD) as
(
select trunc(sysdate) - level + 1
from dual
connect by level <= 31
)
select null as errormsg, DD as the_date, count(id) as the_count
from Dates
left join
(
SELECT c1.id, trunc(c1.c_date) as c_date
FROM table1 c1
where c1.c_date > trunc(sysdate) - 30
UNION
SELECT c2.id, trunc(c2.c_date)
FROM table2 c2
where c2.c_date > trunc(sysdate) -30
) x1
on x1.c_date = DD
group by DD;
I'd probably organise it slightly differently, so the subquery doesn't limit the date range directly:
with dates (dd) as
(
select trunc(sysdate) - level + 1
from dual
connect by level <= 31
)
select errormsg, the_date, count(id) as the_count
from (
select null as errormsg, d.dd as the_date, c1.id
from dates d
left join table1 c1 on c1.c_date >= d.dd and c1.c_date < d.dd + 1
union all
select null as errormsg, d.dd as the_date, c2.id
from dates d
left join table2 c2 on c2.c_date >= d.dd and c2.c_date < d.dd + 1
)
group by errormsg, the_date;
but as always with these things, check the performance of each approach...
Also notice that I've switched from union to union all. If an ID could appear more than once on the same day, in the same table or across both tables, then the counts will be different - you need to decide whether you want to count them once or as many times as they appear. That applies to your original query too.

Resources