How to pivot subgroups? - clickhouse

We have created a flat table for Clickhouse and are trying to get records from this table to create a Materialized view. The logic is if e_id is null the record is 'TypeB', if e_id is not null then record is 'TypeA'. Both TypeA and TypeB records will have the same p_id and s_id. We want to create one record per p_id+s_id combination.
The query given below works well with filter (p_id =1 and s_id = 1) but without filters - the exception is "DB::Exception: Scalar subquery returned more than one row"
Is it possible to do this in ClickHouse?
Would it be possible to create Materialized View with such a query?
select p_id,s_id,
groupArray(e_id),
groupArray(name),
(select groupArray(name)
from flat_table
where e_id is null and p_id =1 and s_id = 1
group by p_id,s_id) as typeB
from flat_table
where e_id is not null and p_id =1 and s_id = 1
group by p_id,s_id;
/*
This what the table looks like:
Flat_table
p_id s_id e_id name
1 1 1 Jake
1 1 2 Bob
1 1 null Barby
1 1 null Ella
This is expected result:
p_id s_id e_id typeA typeB
1 1 [1,2] [Jake,Bob] [Barby,Ella]
*/

Let's try this query:
SELECT p_id, s_id, e_ids, typeA, typeB
FROM (
SELECT
p_id,
s_id,
groupArray((e_id, name)) eid_names,
arrayMap(x -> x.1, arrayFilter(x -> not isNull(x.1), eid_names)) e_ids,
arrayMap(x -> x.2, arrayFilter(x -> not isNull(x.1), eid_names)) typeA,
arrayMap(x -> x.2, arrayFilter(x -> isNull(x.1), eid_names)) typeB
FROM test.test_006
GROUP BY p_id, s_id)
/* Result
┌─p_id─┬─s_id─┬─e_ids─┬─typeA────────────┬─typeB──────────────┐
│ 2 │ 2 │ [1,2] │ ['Jake2','Bob2'] │ ['Barby2','Ella2'] │
│ 1 │ 1 │ [1,2] │ ['Jake','Bob'] │ ['Barby','Ella'] │
└──────┴──────┴───────┴──────────────────┴────────────────────┘
*/
/* Data preparation queries */
CREATE TABLE test.test_006
(
`p_id` Int32,
`s_id` Int32,
`e_id` Nullable(Int32),
`name` String
)
ENGINE = Memory
INSERT INTO test.test_006
VALUES (1, 1, 1, 'Jake'), (1, 1, 2, 'Bob'), (1, 1, null, 'Barby'), (1, 1, null, 'Ella'),
(2, 2, 1, 'Jake2'), (2, 2, 2, 'Bob2'), (2, 2, null, 'Barby2'), (2, 2, null, 'Ella2')

Related

clickhouse sum arrays at same index [duplicate]

I am trying to add an array column element by element after a group by another column.
Having the table A below:
id units
1 [1,1,1]
2 [3,0,0]
1 [5,3,7]
3 [2,5,2]
2 [3,2,6]
I would like to query something like:
select id, sum(units) from A group by id
And get the following result:
id units
1 [6,4,8]
2 [6,2,6]
3 [2,5,2]
Where the units arrays in rows with the same id get added element by element.
Try this query:
SELECT id, sumForEach(units) units
FROM (
/* emulate dataset */
SELECT data.1 id, data.2 units
FROM (
SELECT arrayJoin([(1, [1,1,1]), (2, [3,0,0]), (1, [5,3,7]), (3, [2,5,2]), (2, [3,2,6])]) data))
GROUP BY id
/* Result
┌─id─┬─units───┐
│ 1 │ [6,4,8] │
│ 2 │ [6,2,6] │
│ 3 │ [2,5,2] │
└────┴─────────┘
*/

Time series query based on another table

Initial data
CREATE TABLE a_table (
id UInt8,
created_at DateTime
)
ENGINE = MergeTree()
PARTITION BY tuple()
ORDER BY id;
CREATE TABLE b_table (
id UInt8,
started_at DateTime,
stopped_at DateTime
)
ENGINE = MergeTree()
PARTITION BY tuple()
ORDER BY id;
INSERT INTO a_table (id, created_at) VALUES
(1, '2020-01-01 00:00:00'),
(2, '2020-01-02 00:00:00'),
(3, '2020-01-03 00:00:00')
;
INSERT INTO b_table (id, started_at, stopped_at) VALUES
(1, '2020-01-01 00:00:00', '2020-01-01 23:59:59'),
(2, '2020-01-02 00:00:00', '2020-01-02 23:59:59'),
(3, '2020-01-04 00:00:00', '2020-01-04 23:59:59')
;
Expected result: The 'a_table' rows by condition
b_table.started_at >= a_table.created_at AND
b_table.stopped_at <= a_table.created_at
+----+---------------------+
| id | created_at |
+----+---------------------+
| 1 | 2020-01-01 00:00:00 |
+----+---------------------+
| 2 | 2020-01-02 00:00:00 |
+----+---------------------+
What have i tried
-- No errors, empty result
SELECT a_table.*
FROM a_table
INNER JOIN b_table
ON b_table.id = a_table.id
WHERE b_table.started_at >= a_table.created_at
ANd b_table.stopped_at <= a_table.created_at
;
SELECT a_table.*
FROM a_table
ASOF INNER JOIN (
SELECT * FROM b_table
) q
ON q.id = a_table.id
AND q.started_at >= a_table.created_at
-- Error:
-- Invalid expression for JOIN ON.
-- ASOF JOIN expects exactly one inequality in ON section,
-- unexpected stopped_at <= created_at.
-- AND q.stopped_at <= a_table.created_at
;
WHERE b_table.started_at >= a_table.created_at
ANd b_table.stopped_at <= a_table.created_at
Wrong condition >= <= --> <= >=
20.8.7.15
SELECT
a_table.*,
b_table.*
FROM a_table
INNER JOIN b_table ON b_table.id = a_table.id
WHERE (b_table.started_at <= a_table.created_at) AND (b_table.stopped_at >= a_table.created_at)
┌─id─┬──────────created_at─┬─b_table.id─┬──────────started_at─┬──────────stopped_at─┐
│ 1 │ 2020-01-01 00:00:00 │ 1 │ 2020-01-01 00:00:00 │ 2020-01-01 23:59:59 │
│ 2 │ 2020-01-02 00:00:00 │ 2 │ 2020-01-02 00:00:00 │ 2020-01-02 23:59:59 │
└────┴─────────────────────┴────────────┴─────────────────────┴─────────────────────┘
In real production such queries would not work. Because JOIN is very slow.
It needs re-design. It hard to say how without knowing why do you have the second table. Probably I would use rangeHashed external dictionary.

How to sum arrays element by element after group by in clickhouse

I am trying to add an array column element by element after a group by another column.
Having the table A below:
id units
1 [1,1,1]
2 [3,0,0]
1 [5,3,7]
3 [2,5,2]
2 [3,2,6]
I would like to query something like:
select id, sum(units) from A group by id
And get the following result:
id units
1 [6,4,8]
2 [6,2,6]
3 [2,5,2]
Where the units arrays in rows with the same id get added element by element.
Try this query:
SELECT id, sumForEach(units) units
FROM (
/* emulate dataset */
SELECT data.1 id, data.2 units
FROM (
SELECT arrayJoin([(1, [1,1,1]), (2, [3,0,0]), (1, [5,3,7]), (3, [2,5,2]), (2, [3,2,6])]) data))
GROUP BY id
/* Result
┌─id─┬─units───┐
│ 1 │ [6,4,8] │
│ 2 │ [6,2,6] │
│ 3 │ [2,5,2] │
└────┴─────────┘
*/

Hierarchical query find all parents with some filters

I've a structure like:
CREATE TABLE BUSINESS_UNIT (
ID NUMBER,
DESCPRIPTION VARCHAR2(100 CHAR),
TAG VARCHAR2(10 CHAR)
);
CREATE TABLE USERS (
ID NUMBER,
LOGIN VARCHAR2(10 CHAR)
);
CREATE TABLE BU_USERS (
ID_BU NUMBER,
ID_USER NUMBER
);
CREATE TABLE BU_TREE (
ID_BU NUMBER,
ID_PARENT NUMBER
);
INSERT INTO BUSINESS_UNIT VALUES(1, 'ONE', 'WHITE');
INSERT INTO BUSINESS_UNIT VALUES(2, 'TWO', 'RED');
INSERT INTO BUSINESS_UNIT VALUES(3, 'THREE', 'YELLOW');
INSERT INTO BUSINESS_UNIT VALUES(4, 'FOUR', 'GREEN');
INSERT INTO BUSINESS_UNIT VALUES(5, 'FIVE', 'GREEN');
INSERT INTO BUSINESS_UNIT VALUES(6, 'SIX', 'RED');
INSERT INTO BUSINESS_UNIT VALUES(7, 'SEVEN', 'GREEN');
INSERT INTO USERS VALUES(1, 'USER1');
INSERT INTO BU_USERS VALUES(5, 1);
/*
___1w___
___2r___ 3y___
__4g 7g 6r
5g
*/
INSERT INTO BU_TREE VALUES(5, 4);
INSERT INTO BU_TREE VALUES(4, 2);
INSERT INTO BU_TREE VALUES(7, 2);
INSERT INTO BU_TREE VALUES(2, 1);
INSERT INTO BU_TREE VALUES(3, 1);
INSERT INTO BU_TREE VALUES(6, 3);
And I have to get parent record with TAG = "RED".
I tried something like:
SELECT
B.ID
FROM BUSINESS_UNIT B
INNER JOIN BU_TREE T ON (B.ID = T.ID_BU)
START WITH B.ID = (SELECT ID_BU FROM BU_USERS WHERE ID_USER = 1)
CONNECT BY PRIOR B.ID = T.ID_PARENT
But it returns only the "directly" father, that is the 5.
So, for the user one I have to get its parent red business unit: the number two.
Why my query doesn't returns all the parents?
Thanks
Why my query doesn't returns all the parents?
Well, it does return all the parents.
But there is no any parents that meet the condition CONNECT BY PRIOR B.ID = T.ID_PARENT, so the query shows only 1 record.
Please look at the below query, it shows the whole resultset of the join:
SELECT *
FROM BUSINESS_UNIT B
inner JOIN BU_TREE T ON (B.ID = T.ID_BU);
ID DESCPRIPTI TAG ID_BU ID_PARENT
---------- ---------- ---------- ---------- ----------
2 TWO RED 2 1
3 THREE YELLOW 3 1
4 FOUR GREEN 4 2
5 FIVE GREEN 5 4
6 SIX RED 6 3
7 SEVEN GREEN 7 2
The subquery (SELECT ID_BU FROM BU_USERS WHERE ID_USER = 1) gives 5, so your query is equivalent to:
SELECT *
FROM BUSINESS_UNIT B
inner JOIN BU_TREE T ON (B.ID = T.ID_BU);
START WITH B.ID = 5
CONNECT BY PRIOR B.ID = T.ID_PARENT;
WHen running this query, in the frst step Oracle evaluates START WITH B.ID = 5 conditions, and picks rows that meet this condition, that is:
ID DESCPRIPTI TAG ID_BU ID_PARENT
---------- ---------- ---------- ---------- ----------
5 FIVE GREEN 5 4
In the next step, oracle applies CONNECT BY PRIOR B.ID = T.ID_PARENT; condition to the above "prior" record (taking PRIOR ID = 5 from this record), and then search for records that meet the right side of the condition (5 = ID_PARENT) in the resultset of join.
There is no any record with parent_id = 5 in the result of join, so the final result of the connect-by query is:
ID ID_BU ID_PARENT
---------- ---------- ----------
5 5 4

oracle sql update with sub select

I want to update 1 field of the last 2 rows of a table. So I need a subquery.
Both sql works - how can I combine these 2 SQL commands?
select command (works, last 2 rows):
SELECT * FROM (select * from mytable WHERE id='62741' ORDER BY lfdnr DESC) mytable2 WHERE rownum <= 2;
Result:
LFDNR ID M2
361782 62741 8,5
361774 62741 8,6
Update (?, exists, in, merge ?)
UPDATE mytable set m2='8,4' WHERE EXISTS (select * from mytable WHERE id='62741' and rownum <=2 ORDER BY lfdnr DESC);
Result:
Fehlerbericht -
SQL-Fehler: ORA-00907: missing right parenthesis
00907. 00000 - "missing right parenthesis"
*Cause:
*Action:
Thank you for helping me!
Michael
You could use rowid pseudocolumn:
update mytable set m2 = '8, 4'
where rowid in (select rowid
from (
select rowid, row_number() over (order by lfdnr desc) rn
from mytable where id = '62741')
where rn <= 2 )
Test:
create table mytable (id varchar2(5), lfdnr number(5), m2 varchar2(10));
insert into mytable values ('62705', 1, 'abc');
insert into mytable values ('62741', 2, 'xyz');
insert into mytable values ('62741', 3, 'qwe');
insert into mytable values ('62741', 4, 'rty');
ID LFDNR M2
----- ------ ----------
62705 1 abc
62741 2 xyz
62741 3 8, 4
62741 4 8, 4

Resources