Pivot In clickhouse - clickhouse

Pivot In clickhouse - clickhouse

I want to do a pivot in clickhouse
I have data in the form of
rule_name | result
'string_1', 'result_1'
'string_2', 'result_2'
'string_3', 'result_3'
'string_4', 'result_4'
I want to pivot it to this such that string_1, string_2 ... are columns
and the resultant should have 4 columns and one row(result_1, result_2, result_3, result_4)
string_1 | string_2 | string_3 | string_4
result_1 | result_2 | result_3 | result_4
┌─string_1────┬─string_2─────┬─string_3─────┬─string_4─────┐
│ result_1 result_2 result_3 result_4
└─────────────┴──────────────┴──────────────┴──────────────┘
How do I achieve this ?

select anyIf(result, rule_name = 'string_1') string_1,
anyIf(result, rule_name = 'string_2') string_2,
anyIf(result, rule_name = 'string_3') string_3,
anyIf(result, rule_name = 'string_4') string_4
from (
select 'string_1' rule_name, 'result_1' result
union all select 'string_2', 'result_2'
union all select 'string_3', 'result_3'
union all select 'string_4', 'result_4')
┌─string_1─┬─string_2─┬─string_3─┬─string_4─┐
│ result_1 │ result_2 │ result_3 │ result_4 │
└──────────┴──────────┴──────────┴──────────┘

Related

clickhouse sum arrays at same index [duplicate]

I am trying to add an array column element by element after a group by another column.
Having the table A below:
id units
1 [1,1,1]
2 [3,0,0]
1 [5,3,7]
3 [2,5,2]
2 [3,2,6]
I would like to query something like:
select id, sum(units) from A group by id
And get the following result:
id units
1 [6,4,8]
2 [6,2,6]
3 [2,5,2]
Where the units arrays in rows with the same id get added element by element.

Try this query:
SELECT id, sumForEach(units) units
FROM (
/* emulate dataset */
SELECT data.1 id, data.2 units
FROM (
SELECT arrayJoin([(1, [1,1,1]), (2, [3,0,0]), (1, [5,3,7]), (3, [2,5,2]), (2, [3,2,6])]) data))
GROUP BY id
/* Result
┌─id─┬─units───┐
│ 1 │ [6,4,8] │
│ 2 │ [6,2,6] │
│ 3 │ [2,5,2] │
└────┴─────────┘
*/

Q: How to configure ClickHouse to return NULL instead of 0?

Let's say I have a table created as such without any record:
create table metric (date Int32) Engine=MergeTree ORDER BY (date);
If I run this query
select max(date) from metric;
ClickHouse returns
+-----------+
| max(date) |
+-----------+
| 0 |
+-----------+
1 row in set (0.02 sec)
instead of
+-----------+
| max(date) |
+-----------+
| NULL |
+-----------+
1 row in set (0.02 sec)
Is possible to configure ClickHouse to return NULL without have to write query like this:
select max(toNullable(date)) from metric;

Use setting aggregate_functions_null_for_empty:
SELECT max(date)
FROM metric
SETTINGS aggregate_functions_null_for_empty = 1
/*
┌─maxOrNull(date)─┐
│ ᴺᵁᴸᴸ │
└─────────────────┘
*/
or consider using OrNull-combinator:
SELECT maxOrNull(date)
FROM metric
/*
┌─maxOrNull(date)─┐
│ ᴺᵁᴸᴸ │
└─────────────────┘
*/

Time series query based on another table

Initial data
CREATE TABLE a_table (
id UInt8,
created_at DateTime
)
ENGINE = MergeTree()
PARTITION BY tuple()
ORDER BY id;
CREATE TABLE b_table (
id UInt8,
started_at DateTime,
stopped_at DateTime
)
ENGINE = MergeTree()
PARTITION BY tuple()
ORDER BY id;
INSERT INTO a_table (id, created_at) VALUES
(1, '2020-01-01 00:00:00'),
(2, '2020-01-02 00:00:00'),
(3, '2020-01-03 00:00:00')
;
INSERT INTO b_table (id, started_at, stopped_at) VALUES
(1, '2020-01-01 00:00:00', '2020-01-01 23:59:59'),
(2, '2020-01-02 00:00:00', '2020-01-02 23:59:59'),
(3, '2020-01-04 00:00:00', '2020-01-04 23:59:59')
;
Expected result: The 'a_table' rows by condition
b_table.started_at >= a_table.created_at AND
b_table.stopped_at <= a_table.created_at
+----+---------------------+
| id | created_at |
+----+---------------------+
| 1 | 2020-01-01 00:00:00 |
+----+---------------------+
| 2 | 2020-01-02 00:00:00 |
+----+---------------------+
What have i tried
-- No errors, empty result
SELECT a_table.*
FROM a_table
INNER JOIN b_table
ON b_table.id = a_table.id
WHERE b_table.started_at >= a_table.created_at
ANd b_table.stopped_at <= a_table.created_at
;
SELECT a_table.*
FROM a_table
ASOF INNER JOIN (
SELECT * FROM b_table
) q
ON q.id = a_table.id
AND q.started_at >= a_table.created_at
-- Error:
-- Invalid expression for JOIN ON.
-- ASOF JOIN expects exactly one inequality in ON section,
-- unexpected stopped_at <= created_at.
-- AND q.stopped_at <= a_table.created_at
;

WHERE b_table.started_at >= a_table.created_at
ANd b_table.stopped_at <= a_table.created_at
Wrong condition >= <= --> <= >=
20.8.7.15
SELECT
a_table.*,
b_table.*
FROM a_table
INNER JOIN b_table ON b_table.id = a_table.id
WHERE (b_table.started_at <= a_table.created_at) AND (b_table.stopped_at >= a_table.created_at)
┌─id─┬──────────created_at─┬─b_table.id─┬──────────started_at─┬──────────stopped_at─┐
│ 1 │ 2020-01-01 00:00:00 │ 1 │ 2020-01-01 00:00:00 │ 2020-01-01 23:59:59 │
│ 2 │ 2020-01-02 00:00:00 │ 2 │ 2020-01-02 00:00:00 │ 2020-01-02 23:59:59 │
└────┴─────────────────────┴────────────┴─────────────────────┴─────────────────────┘
In real production such queries would not work. Because JOIN is very slow.
It needs re-design. It hard to say how without knowing why do you have the second table. Probably I would use rangeHashed external dictionary.

Group by date with sparkline like data in the one query

I have the time-series data from the similar hosts that stored in ClickHouse table in the next structure:
event_type | event_day
------------|---------------------
type_1 | 2017-11-09 20:11:28
type_1 | 2017-11-09 20:11:25
type_2 | 2017-11-09 20:11:23
type_2 | 2017-11-09 20:11:21
Each row in the table means the presence of a value 1 for event_type on the datetime. To quickly assess the situation I need to indicate the sum (total) + the last seven values (pulse), like this:
event_type | day | total | pulse
------------|------------|-------|-----------------------------
type_1 | 2017-11-09 | 876 | 12,9,23,67,5,34,10
type_2 | 2017-11-09 | 11865 | 267,120,234,425,102,230,150
I tried to get it with one request in the following way, but it failed - the pulse consists of the same values:
with
arrayMap(x -> today() - 7 + x, range(7)) as week_range,
arrayMap(x -> count(event_type), week_range) as pulse
select
event_type,
toDate(event_date) as day,
count() as total,
pulse
from database.table
group by day, event_type
event_type | day | total | pulse
------------|------------|-------|-------------------------------------------
type_1 | 2017-11-09 | 876 | 876,876,876,876,876,876,876
type_2 | 2017-11-09 | 11865 | 11865,11865,11865,11865,11865,11865,11865
Please point out where is my mistake and how to get desired?

select event_type, groupArray(1)(day)[1], arraySum(pulse) total7, groupArray(7)(cnt) pulse
from (
select
event_type,
toDate(event_date) as day,
count() as cnt
from database.table
where day >= today()-30
group by event_type,day
order by event_type,day desc
)
group by event_type
order by event_type

I would consider calculating pulse on the server-side, CH just provides the required data.
Can be used neighbor-window function:
SELECT
number,
[neighbor(number, -7), neighbor(number, -6), neighbor(number, -5), neighbor(number, -4), neighbor(number, -3), neighbor(number, -2), neighbor(number, -1)] AS pulse
FROM
(
SELECT number
FROM numbers(10, 15)
ORDER BY number ASC
)
┌─number─┬─pulse──────────────────┐
│ 10 │ [0,0,0,0,0,0,0] │
│ 11 │ [0,0,0,0,0,0,10] │
│ 12 │ [0,0,0,0,0,10,11] │
│ 13 │ [0,0,0,0,10,11,12] │
│ 14 │ [0,0,0,10,11,12,13] │
│ 15 │ [0,0,10,11,12,13,14] │
│ 16 │ [0,10,11,12,13,14,15] │
│ 17 │ [10,11,12,13,14,15,16] │
│ 18 │ [11,12,13,14,15,16,17] │
│ 19 │ [12,13,14,15,16,17,18] │
│ 20 │ [13,14,15,16,17,18,19] │
│ 21 │ [14,15,16,17,18,19,20] │
│ 22 │ [15,16,17,18,19,20,21] │
│ 23 │ [16,17,18,19,20,21,22] │
│ 24 │ [17,18,19,20,21,22,23] │
└────────┴────────────────────────┘

Exclude rows based on condition from two columns

My question is very similar to this one, except that I want to exclude all columns that have a unique value in a column.
If we assume that to be the input.
Name | Location
-------------------
Bob | Shasta
Bob | Leaves
Sean | Leaves
Sean | Leaves
Dylan | Shasta
Dylan | Redwood
Dylan | Leaves
I want the output to be
Name | Location
-------------------
Bob | Shasta
Bob | Leaves
Dylan | Shasta
Dylan | Redwood
Dylan | Leaves
In this case, Sean is being excluded because he always has the same location.
In SQL, there exists a subquery called whereexists. How do we do this in clickhouse?

Try this query:
SELECT Name, Location
FROM (
/* emulate the origin dataset */
SELECT test_data.1 AS Name, test_data.2 AS Location
FROM (
SELECT arrayJoin([
('Bob', 'Shasta'),
('Bob', 'Leaves'),
('Sean', 'Leaves'),
('Sean', 'Leaves'),
('Dylan', 'Shasta'),
('Dylan', 'Redwood'),
('Dylan', 'Leaves')]) AS test_data))
WHERE Name IN (
SELECT Name
FROM (
/* emulate the origin dataset */
SELECT test_data.1 AS Name, test_data.2 AS Location
FROM (
SELECT arrayJoin([
('Bob', 'Shasta'),
('Bob', 'Leaves'),
('Sean', 'Leaves'),
('Sean', 'Leaves'),
('Dylan', 'Shasta'),
('Dylan', 'Redwood'),
('Dylan', 'Leaves')]) AS test_data))
GROUP BY Name
HAVING uniq(Location) > 1)
/* result
┌─Name──┬─Location─┐
│ Bob │ Shasta │
│ Bob │ Leaves │
│ Dylan │ Shasta │
│ Dylan │ Redwood │
│ Dylan │ Leaves │
└───────┴──────────┘
*/

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Pivot In clickhouse - clickhouse

Related

clickhouse sum arrays at same index [duplicate]

Q: How to configure ClickHouse to return NULL instead of 0?

Time series query based on another table

Group by date with sparkline like data in the one query

Exclude rows based on condition from two columns

Categories

Resources