how can I calculated point of each user per day with sum all the points from beginning to that day in clickhouse

how can I calculated point of each user per day with sum all the points from beginning to that day in clickhouse - clickhouse

I have this data in clickhouse:
final point of each user in day is sum(point) from the beginning to that day.
e.g: point of user 1 in 2020-07-02 is 800 and in 2020-07-03 is 200.
I need this result: Point of each user per day:

select uid, d, t from (
select uid, groupArray(date) dg, arrayCumSum(groupArray(spt)) gt from
(select uid, date, sum(pt) spt from
(select 1 tid, '2020-07-01' date, 1 uid, 500 pt
union all
select 1 tid, '2020-07-02' date, 1 uid, 300 pt
union all
select 1 tid, '2020-07-03' date, 1 uid, -600 pt)
group by uid, date
order by uid, date)
group by uid) array join dg as d, gt as t
┌─uid─┬─d──────────┬───t─┐
│ 1 │ 2020-07-01 │ 500 │
│ 1 │ 2020-07-02 │ 800 │
│ 1 │ 2020-07-03 │ 200 │
└─────┴────────────┴─────┘

Related

clickhouse sum arrays at same index [duplicate]

I am trying to add an array column element by element after a group by another column.
Having the table A below:
id units
1 [1,1,1]
2 [3,0,0]
1 [5,3,7]
3 [2,5,2]
2 [3,2,6]
I would like to query something like:
select id, sum(units) from A group by id
And get the following result:
id units
1 [6,4,8]
2 [6,2,6]
3 [2,5,2]
Where the units arrays in rows with the same id get added element by element.

Try this query:
SELECT id, sumForEach(units) units
FROM (
/* emulate dataset */
SELECT data.1 id, data.2 units
FROM (
SELECT arrayJoin([(1, [1,1,1]), (2, [3,0,0]), (1, [5,3,7]), (3, [2,5,2]), (2, [3,2,6])]) data))
GROUP BY id
/* Result
┌─id─┬─units───┐
│ 1 │ [6,4,8] │
│ 2 │ [6,2,6] │
│ 3 │ [2,5,2] │
└────┴─────────┘
*/

Clickhouse SQL Query: Average in intervals

I have a table:
deviceId, valueDateTime, value, valueType
Where the valueType - temperature, pressure, etc.
I have several parameters for query: begin, end (period), and time interval (for example 20 minutes)
I want to get charts for the period for each deviceId and valueType with series of average values for each interval in the period.
EDIT:
Above is the final task, at this moment I just experimenting with this task and I use https://play.clickhouse.tech/?file=playground where I trying to solve a similar task. I want to calculate the average Age in the time interval grouped by Title field. And I have a problem, how to add grouping by Title?
-- 2013-07-15 00:00:00 - begin
-- 2013-07-16 00:00:00 - end
-- 1200 - average in interval 20m
SELECT t, avg(Age) as Age FROM (
SELECT
arrayJoin(
arrayMap(x -> addSeconds(toDateTime('2013-07-15 00:00:00'), x * 1200),
range(toUInt64(dateDiff('second', toDateTime('2013-07-15 00:00:00'), toDateTime('2013-07-16 00:00:00'))/1200)))
) as t,
null as Age
UNION ALL
SELECT
(addSeconds(
toDateTime('2013-07-15 00:00:00'),
1200 * intDivOrZero(dateDiff('second', toDateTime('2013-07-15 00:00:00'), EventTime), 1200))
) as t,
avg(Age) as Age
FROM `hits_100m_obfuscated`
WHERE EventTime BETWEEN toDateTime('2013-07-15 00:00:00') AND toDateTime('2013-07-16 00:00:00')
GROUP BY t
)
GROUP BY t ORDER BY t;
EDITED 2
Correct answer from vladimir adapted to be used and tested on https://play.clickhouse.tech/?file=playground
SELECT
Title, -- as deviceId
JavaEnable, -- as valueType
groupArray((rounded_time, avg_value)) values
FROM (
WITH 60 * 20 AS interval
SELECT
Title,
JavaEnable,
toDateTime(intDiv(toUInt32(EventTime), interval) * interval)
AS rounded_time, -- EventTime as valueDateTime
avg(Age) avg_value -- Age as value
FROM `hits_100m_obfuscated`
WHERE
EventTime BETWEEN toDateTime('2013-07-15 00:00:00')
AND toDateTime('2013-07-16 00:00:00')
GROUP BY
Title,
JavaEnable,
rounded_time
ORDER BY rounded_time
)
GROUP BY
Title,
JavaEnable
ORDER BY
Title,
JavaEnable

Try this query:
SELECT
deviceId,
valueType,
groupArray((rounded_time, avg_value)) values
FROM (
WITH 60 * 20 AS interval
SELECT
deviceId,
valueType,
toDateTime(intDiv(toUInt32(valueDateTime), interval) * interval) AS rounded_time,
avg(value) avg_value
FROM
(
/* emulate the test dataset */
SELECT
number % 4 AS deviceId,
now() - (number * 60) AS valueDateTime,
number % 10 AS value,
if((number % 2) = 1, 'temp', 'pres') AS valueType
FROM numbers(48)
)
/*WHERE valueDateTime >= begin AND valueDateTime < end */
GROUP BY
deviceId,
valueType,
rounded_time
ORDER BY rounded_time
)
GROUP BY
deviceId,
valueType
ORDER BY
deviceId,
valueType
/*
┌─deviceId─┬─valueType─┬─values────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ 0 │ pres │ [('2021-02-12 06:00:00',4),('2021-02-12 06:20:00',4),('2021-02-12 06:40:00',4),('2021-02-12 07:00:00',0)] │
│ 1 │ temp │ [('2021-02-12 06:00:00',5),('2021-02-12 06:20:00',5),('2021-02-12 06:40:00',5),('2021-02-12 07:00:00',1)] │
│ 2 │ pres │ [('2021-02-12 06:00:00',4),('2021-02-12 06:20:00',4),('2021-02-12 06:40:00',4)] │
│ 3 │ temp │ [('2021-02-12 06:00:00',5),('2021-02-12 06:20:00',5),('2021-02-12 06:40:00',5)] │
└──────────┴───────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────┘
*/
I would recommend using Grafana to visualize CH report (see Grafana ClickHouse datasource).

Group by date with sparkline like data in the one query

I have the time-series data from the similar hosts that stored in ClickHouse table in the next structure:
event_type | event_day
------------|---------------------
type_1 | 2017-11-09 20:11:28
type_1 | 2017-11-09 20:11:25
type_2 | 2017-11-09 20:11:23
type_2 | 2017-11-09 20:11:21
Each row in the table means the presence of a value 1 for event_type on the datetime. To quickly assess the situation I need to indicate the sum (total) + the last seven values (pulse), like this:
event_type | day | total | pulse
------------|------------|-------|-----------------------------
type_1 | 2017-11-09 | 876 | 12,9,23,67,5,34,10
type_2 | 2017-11-09 | 11865 | 267,120,234,425,102,230,150
I tried to get it with one request in the following way, but it failed - the pulse consists of the same values:
with
arrayMap(x -> today() - 7 + x, range(7)) as week_range,
arrayMap(x -> count(event_type), week_range) as pulse
select
event_type,
toDate(event_date) as day,
count() as total,
pulse
from database.table
group by day, event_type
event_type | day | total | pulse
------------|------------|-------|-------------------------------------------
type_1 | 2017-11-09 | 876 | 876,876,876,876,876,876,876
type_2 | 2017-11-09 | 11865 | 11865,11865,11865,11865,11865,11865,11865
Please point out where is my mistake and how to get desired?

select event_type, groupArray(1)(day)[1], arraySum(pulse) total7, groupArray(7)(cnt) pulse
from (
select
event_type,
toDate(event_date) as day,
count() as cnt
from database.table
where day >= today()-30
group by event_type,day
order by event_type,day desc
)
group by event_type
order by event_type

I would consider calculating pulse on the server-side, CH just provides the required data.
Can be used neighbor-window function:
SELECT
number,
[neighbor(number, -7), neighbor(number, -6), neighbor(number, -5), neighbor(number, -4), neighbor(number, -3), neighbor(number, -2), neighbor(number, -1)] AS pulse
FROM
(
SELECT number
FROM numbers(10, 15)
ORDER BY number ASC
)
┌─number─┬─pulse──────────────────┐
│ 10 │ [0,0,0,0,0,0,0] │
│ 11 │ [0,0,0,0,0,0,10] │
│ 12 │ [0,0,0,0,0,10,11] │
│ 13 │ [0,0,0,0,10,11,12] │
│ 14 │ [0,0,0,10,11,12,13] │
│ 15 │ [0,0,10,11,12,13,14] │
│ 16 │ [0,10,11,12,13,14,15] │
│ 17 │ [10,11,12,13,14,15,16] │
│ 18 │ [11,12,13,14,15,16,17] │
│ 19 │ [12,13,14,15,16,17,18] │
│ 20 │ [13,14,15,16,17,18,19] │
│ 21 │ [14,15,16,17,18,19,20] │
│ 22 │ [15,16,17,18,19,20,21] │
│ 23 │ [16,17,18,19,20,21,22] │
│ 24 │ [17,18,19,20,21,22,23] │
└────────┴────────────────────────┘

Materialized view for calculated results

I have a table like below, where State is a limited set of updates (e.g. Start, End):
CREATE TABLE event_updates (
event_id Int32,
timestamp DateTime,
state String
) ENGINE Log;
And I want to be able quickly run queries like:
SELECT count(*)
FROM (
SELECT event_id,
minOrNullIf(timestamp, state = 'Start') as start,
minOrNullIf(timestamp, state = 'End') as end,
end - start as duration,
duration < 10 as is_fast,
duration > 300 as is_slow
FROM event_updates
GROUP BY event_id)
WHERE start >= '2020-08-20 00:00:00'
AND start < '2020-08-20 00:00:00'
AND is_slow;
But those queries are slow when there is a lot of data, I'm guessing because the calculations are required for every row.
Example data:
┌─event_id─┬───────────timestamp─┬─state─┐
│ 1 │ 2020-08-21 09:58:00 │ Start │
│ 1 │ 2020-08-21 10:18:00 │ End │
│ 2 │ 2020-08-21 10:23:00 │ Start │
│ 2 │ 2020-08-21 10:23:05 │ End │
│ 3 │ 2020-08-21 10:23:00 │ Start │
│ 3 │ 2020-08-21 10:24:00 │ End │
│ 3 │ 2020-08-21 11:24:00 │ End │
│ 4 │ 2020-08-21 10:30:00 │ Start │
└──────────┴─────────────────────┴───────┘
And example query:
SELECT
event_id,
minOrNullIf(timestamp, state = 'Start') AS start,
minOrNullIf(timestamp, state = 'End') AS end,
end - start AS duration,
duration < 10 AS is_fast,
duration > 300 AS is_slow
FROM event_updates
GROUP BY event_id
ORDER BY event_id ASC
┌─event_id─┬───────────────start─┬─────────────────end─┬─duration─┬─is_fast─┬─is_slow─┐
│ 1 │ 2020-08-21 09:58:00 │ 2020-08-21 10:18:00 │ 1200 │ 0 │ 1 │
│ 2 │ 2020-08-21 10:23:00 │ 2020-08-21 10:23:05 │ 5 │ 1 │ 0 │
│ 3 │ 2020-08-21 10:23:00 │ 2020-08-21 10:24:00 │ 60 │ 0 │ 0 │
│ 4 │ 2020-08-21 10:30:00 │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │
└──────────┴─────────────────────┴─────────────────────┴──────────┴─────────┴─────────┘
What I would like to produce is a pre-calculated table like:
CREATE TABLE event_stats (
event_id Int32,
start Nullable(DateTime),
end Nullable(DateTime),
duration Nullable(Int32),
is_fast Nullable(UInt8),
is_slow Nullable(UInt8)
);
But I can't work out how to create this table with a materialized view or find a better way.

At first, I would
use MergeTree-engine instead of Log to get benefits from sorting key
CREATE TABLE event_updates (
event_id Int32,
timestamp DateTime,
state String
) ENGINE MergeTree
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, state);
constraint the origin dataset by applying WHERE-clause to timestamp and state (in your query processed the whole dataset)
SELECT count(*)
FROM (
SELECT event_id,
minOrNullIf(timestamp, state = 'Start') as start,
minOrNullIf(timestamp, state = 'End') as end,
end - start as duration,
duration < 10 as is_fast,
duration > 300 as is_slow
FROM event_updates
WHERE timestamp >= '2020-08-20 00:00:00' AND timestamp < '2020-09-20 00:00:00'
AND state IN ('Start', 'End')
GROUP BY event_id
HAVING start >= '2020-08-20 00:00:00' AND start < '2020-09-20 00:00:00'
AND is_slow);
If these ones don't help need to consider use AggregatingMergeTree to manipulate the precalculated aggregates not raw data.

How to sum arrays element by element after group by in clickhouse

I am trying to add an array column element by element after a group by another column.
Having the table A below:
id units
1 [1,1,1]
2 [3,0,0]
1 [5,3,7]
3 [2,5,2]
2 [3,2,6]
I would like to query something like:
select id, sum(units) from A group by id
And get the following result:
id units
1 [6,4,8]
2 [6,2,6]
3 [2,5,2]
Where the units arrays in rows with the same id get added element by element.

Try this query:
SELECT id, sumForEach(units) units
FROM (
/* emulate dataset */
SELECT data.1 id, data.2 units
FROM (
SELECT arrayJoin([(1, [1,1,1]), (2, [3,0,0]), (1, [5,3,7]), (3, [2,5,2]), (2, [3,2,6])]) data))
GROUP BY id
/* Result
┌─id─┬─units───┐
│ 1 │ [6,4,8] │
│ 2 │ [6,2,6] │
│ 3 │ [2,5,2] │
└────┴─────────┘
*/

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

how can I calculated point of each user per day with sum all the points from beginning to that day in clickhouse - clickhouse

I have this data in clickhouse: final point of each user in day is sum(point) from the beginning to that day. e.g: point of user 1 in 2020-07-02 is 800 and in 2020-07-03 is 200. I need this result: Point of each user per day:

Related

clickhouse sum arrays at same index [duplicate]

Clickhouse SQL Query: Average in intervals

Group by date with sparkline like data in the one query

Materialized view for calculated results

How to sum arrays element by element after group by in clickhouse

Categories

Resources