ClickHouse Materialized View generating too slow

ClickHouse Materialized View generating too slow - performance

We collect raw events into a ClickHouse table. Table structure:
CREATE TABLE IF NOT EXISTS raw_events
(
owner_id UInt32,
user_id UInt32,
event_datetime DateTime,
event_type_id UInt8,
unique_id FixedString(18),
data String,
attr_1 UInt32,
attr_2 UInt32
)
engine = MergeTree PARTITION BY toYYYYMMDD(event_datetime)
ORDER BY (owner_id, user_id, event_type_id, event_datetime);
At the moment raw_events table contains 120000000 (120M) rows.
We aim to show our users some aggregate statistics for those events, thus we created a materialized view:
CREATE MATERIALIZED VIEW test
ENGINE = AggregatingMergeTree() PARTITION BY (date)
ORDER BY (owner_id, user_id, date)
AS SELECT
toYYYYMMDD(event_datetime) as date,
owner_id,
user_id,
multiIf(
event_type_id == 1, 10,
event_type_id == 2, 20,
event_type_id == 3, 30,
event_type_id == 4, 40,
event_type_id == 32, 50,
event_type_id >= 64, 60,
0
) as status,
attr_1,
attr_2,
COUNT() as count,
COUNT(DISTINCT unique_id) as unique_count
FROM raw_events
GROUP BY owner_id, user_id, date, status, attr_1, attr_2
ORDER BY owner_id, user_id, date;
If we run the select query separately, it takes around 1 second to generate a response for a single owner_id. But creating a materialized view for the same select takes too much time. After executing create materialized view query it generated just 200 records and it took it ~10 minutes. So it looks like it will take days to completely build a view for the current 120M records table.
What am I missing? Maybe there are some tricks for order/group clauses to make it run faster? At the moment it's much easier for me to just run select+group query instead of using a materialized view.
Additional question: is there a way to check the progress for materialized view building for an existing table?

AggregatingMergeTree -- uses ORDER BY as collapsing rule.
AggregatingMergeTree -- have to be used with AggregateFunctions + -State + -Merge combinators
ORDER BY -- is excessive
CREATE MATERIALIZED VIEW test
ENGINE = AggregatingMergeTree() PARTITION BY (date)
ORDER BY (owner_id, user_id, date, status, attr_1, attr_2) ---<<<<<<<<<<<<<<<<<<<<-----------
AS SELECT
toYYYYMMDD(event_datetime) as date,
owner_id,
user_id,
multiIf(
event_type_id == 1, 10,
event_type_id == 2, 20,
event_type_id == 3, 30,
event_type_id == 4, 40,
event_type_id == 32, 50,
event_type_id >= 64, 60,
0
) as status,
attr_1,
attr_2,
countState() as count, ---<<<<<<<<<<<<<<<<<<<<-----------
uniqState(unique_id) as unique_count ---<<<<<<<<<<<<<<<<<<<<-----------
FROM raw_events
GROUP BY owner_id, user_id, date, status, attr_1, attr_2
https://gist.github.com/den-crane/a72614fbe6d23eb9c2f1bce40c66893f
https://gist.github.com/den-crane/49ce2ae3a688651b9c2dd85ee592cb15
https://den-crane.github.io/Everything_you_should_know_about_materialized_views_commented.pdf

Related

How to compare two records of 2 separate tables in the same schema and show only the difference in Oracle Database?

I have a source table and a target table in the same schema. Based on the primary key value , I would want to compare the records of the source and destination tables and show only the columns which are having different values.
Could you please help me out on how to get a solution for the same ?
Note : DB version I am having : Oracle Database 19c Enterprise Edition

In Oracle, you can use CONCAT to concatenate multiple texts into one. Since you have multiple columns, you can view this problem-space as the concatenation of many expressions, one for each column. Your expression for a column could look like:
CASE WHEN ((r1.c1 IS NULL) AND (r2.c1 IS NULL)) OR (r1.c1 = r2.c1)
THEN ''
ELSE CONCAT('Table1: ', r1.c1, ', Table2: ', r2.c1, ';')
END
Now that we know how the expression looks like for a field, let's assume there are three fields:
SELECT CONCAT(
CASE WHEN ((r1.c1 IS NULL) AND (r2.c1 IS NULL)) OR (r1.c1 = r2.c1)
THEN ''
ELSE CONCAT('Table1: ', r1.c1, ', Table2: ', r2.c1, ';')
END,
CASE WHEN ((r1.c2 IS NULL) AND (r2.c2 IS NULL)) OR (r1.c2 = r2.c2)
THEN ''
ELSE CONCAT('Table1: ', r1.c2, ', Table2: ', r2.c2, ';')
END,
CASE WHEN ((r1.c3 IS NULL) AND (r2.c3 IS NULL)) OR (r1.c3 = r2.c3)
THEN ''
ELSE CONCAT('Table1: ', r1.c3, ', Table2: ', r2.c3, ';')
END) AS diff
FROM table1 r1
JOIN table2 r2
ON r1.id = r2.id;
This is the basic idea, but you will encounter some problems, like the types of the fields. If they are not textual, then you will need to convert them into some text. Also, if you need to reuse this diff tool, then you cannot assume that you know the number of fields to compare or even their name, so you will need to load the fields and generate the query based on their names and types.

I have a similar job which compares two large tables (26 mio rows each). I don't need to know which column is different, but if I rephrase the query to do that, it would be something along the lines of
CREATE TABLE t1 (
id NUMBER PRIMARY KEY,
a NUMBER NOT NULL,
b NUMBER NOT NULL,
c NUMBER NULL
);
CREATE TABLE t2 (
id NUMBER PRIMARY KEY,
x NUMBER NOT NULL,
y NUMBER NOT NULL,
z NUMBER NULL
);
INSERT INTO t1 VALUES (1, 10, 20, 30);
INSERT INTO t2 VALUES (1, 10, 20, 30);
INSERT INTO t1 VALUES (2, 10, 20, 30);
INSERT INTO t2 VALUES (2, 11, 20, 30);
INSERT INTO t1 VALUES (3, 10, 21, 30);
INSERT INTO t2 VALUES (3, 10, 20, 30);
INSERT INTO t1 VALUES (4, 10, 20, 31);
INSERT INTO t2 VALUES (4, 10, 20, 30);
INSERT INTO t1 VALUES (5, 10, 20, null);
INSERT INTO t2 VALUES (5, 10, 20, 30);
SELECT id,
CASE WHEN a <> x THEN 1 ELSE 0 END a_x,
CASE WHEN b <> y THEN 1 ELSE 0 END b_y,
CASE WHEN c <> z
OR (c IS NULL AND z IS NOT NULL)
OR (c IS NOT NULL AND z IS NULL) THEN 1 ELSE 0 END c_z
FROM t1 JOIN t2 USING (id)
WHERE a <> x
OR b <> y
OR c <> z OR (c IS NULL AND z IS NOT NULL)
OR (c IS NOT NULL AND z IS NULL);
ID A_X B_Y C_Z
2 1 0 0
3 0 1 0
4 0 0 1
5 0 0 1
The code is lengthy, but I got a script that looks at the data dictionary and writes the query.
However, I'm not totally happy with the comparison of the nullable columns. There is a undokumented system function that makes that easier, but I never got it properly working...

oracle trigger exact fetch returns more than requested number of rows

I am trying to get the Quantity from the transaction table. Try to get the quantity of the sell and quantity of buy. Use Portfolio_Number, Stock_Code, Buy_Sell to verify the quantity.
Transaction Table (Portfolio_Number, Transaction_Date,
Stock_Code, Exchange_Code, Broker_Number, Buy_Sell, Quantity, Price_Per_Share)
create or replace trigger TR_Q5
before Insert on
Transaction
for each row
declare
V_quantityB number(7,0);
V_quantityS number(7,0);
begin
if(:new.buy_sell ='S') then
select quantity
into V_quantityS
from transaction
where :new.portfolio_number = portfolio_number
and :new.stock_code = stock_code
and buy_sell='S'
;
if V_quantityS>=1 then
Raise_Application_Error(-20020, 'not S');
end if;
end if;
try to insert
INSERT INTO Transaction
(Portfolio_Number, Transaction_Date, Stock_Code, Exchange_Code, Broker_Number, Buy_Sell, Quantity, Price_Per_Share)
values
(500, To_Date('09-Feb-2020 16:41:00', 'DD-Mon-YYYY HH24:MI:SS'), 'IBM', 'TSX', 4, 'S', 10000, 25.55 );
but it shows up the error
exact fetch returns more than requested number of rows

The error you mentioned is self-explanatory. select you wrote should return just 1 row, but it returns more than that. As you can't put several rows into a scalar number variable, you got the error.
What would fix it? For example, aggregation:
select sum(quantity)
into V_quantityS
...
or perhaps
select distinct quantity
or even
select quantity
...
where rownum = 1
However, beware: trigger is created on the transaction table, and you are selecting from it at the same time which leads to mutating table error. What to do with that? Use a compound trigger.

exclude part of the select not to consider date where clause

i have a select(water readings, previous water reading, other columns) , a "where clause" that is based on date water reading date. however for previous water reading it must not consider the where clause. I want to get previous meter reading regardless where clause date range.
looked at union problem is that i have to use the same clause,
SELECT
WATERREADINGS.name,
WATERREADINGS.date,
LAG( WATERREADINGS.meter_reading,1,NULL) OVER(
PARTITION BY WATERREADINGS.meter_id,WATERREADINGS.register_id
ORDER BY WATERREADINGS.meter_id DESC,WATERREADINGS.register_id
DESC,WATERREADINGS.readingdate ASC,WATERREADINGS.created ASC
) AS prev_water_reading,
FROM WATERREADINGS
WHERE waterreadings.waterreadingdate BETWEEN '24-JUN-19' AND
'24-AUG-19' and isactive = 'Y'
The prev_water_reading value must not be restricted by the date BETWEEN '24-JUN-19' AND '24-AUG-19' predicate but the rest of the sql should be.

You can do this by first finding the previous meter readings for all rows and then filtering those results on the date, e.g.:
WITH meter_readings AS (SELECT waterreadings.name,
waterreadings.date dt,
lag(waterreadings.meter_reading, 1, NULL) OVER (PARTITION BY waterreadings.meter_id, waterreadings.register_id
ORDER BY waterreadings.readingdate ASC, waterreadings.created ASC)
AS prev_water_reading,
FROM waterreadings
WHERE isactive = 'Y')
-- the meter_readings subquery above gets all rows and finds their previous meter reading.
-- the main query below then applies the date restriction to the rows from the meter_readings subquery.
SELECT name,
date,
prev_water_reading,
FROM meter_readings
WHERE dt BETWEEN to_date('24/06/2019', 'dd/mm/yyyy') AND to_date('24/08/2019', 'dd/mm/yyyy');

Perform the LAG in an inner query that is not filtered by dates and then filter by the dates in the outer query:
SELECT name,
"date",
prev_water_reading
FROM (
SELECT name,
"date",
LAG( meter_reading,1,NULL) OVER(
PARTITION BY meter_id, register_id
ORDER BY meter_id DESC, register_id DESC, readingdate ASC, created ASC
) AS prev_water_reading,
waterreadingdate --
FROM WATERREADINGS
WHERE isactive = 'Y'
)
WHERE waterreadingdate BETWEEN DATE '2019-06-24' AND DATE '2019-08-24'
You should also not use strings for dates (that require an implicit cast using the NLS_DATE_FORMAT session parameter, which can be changed by any user in their own session) and use date literals DATE '2019-06-24' or an explicit cast TO_DATE( '24-JUN-19', 'DD-MON-RR' ).
You also do not need to reference the table name for every column when there is only a single table as this clutters up your code and makes it difficult to read and DATE is a keyword so you either need to wrap it in double quotes to use it as a column name (which makes the column name case sensitive) or should use a different name for your column.

I've added a subquery with previous result without filter and then joined it with the main table with filters:
SELECT
WATERREADINGS.name,
WATERREADINGS.date,
w_lag.prev_water_reading
FROM
WATERREADINGS,
(SELECT name, date, LAG( WATERREADINGS.meter_reading,1,NULL) OVER(
PARTITION BY WATERREADINGS.meter_id,WATERREADINGS.register_id
ORDER BY WATERREADINGS.meter_id DESC,WATERREADINGS.register_id
DESC,WATERREADINGS.readingdate ASC,WATERREADINGS.created ASC
) AS prev_water_reading
FROM WATERREADINGS) w_lag
WHERE waterreadings.waterreadingsdate BETWEEN '24-JUN-19' AND '24-AUG-19' and isactive = 'Y'
and WATERREADINGS.name = w_lag.name
and WATERREADINGS.date = w_lag.date

Show report horizontally with SQL or PL/SQL

I am developing a report which should display the data horizontally.
What must be shown is the following:
email#1.com 12/09/2013 11/09/2013 10/09/2013 09/09/2013...
email#2.com 22/03/2013 21/03/2013 12/02/2013 02/01/2013...
Well, I have these data organized in two tables:
Member and Report.
The Member table has the email address and the Report table has dates, and each email can have many different dates.
I can easily retrieve that information vertically:
SELECT M.EMAIL, R.LAST_OPEN_DATE
FROM MEMBER M, REPORT R
WHERE M.MEMBER_ID = R.MEMBER_ID
AND R.STATUS = 1
AND TRUNC(R.LAST_OPEN_DATE) >= TRUNC(SYSDATE) - 120;
However to show the results horizontally is complicated, anyone have a tip or know how I can do this?
I'm using Oracle 11g.

You can get the dates into columns with pivot:
SELECT *
FROM (
SELECT M.EMAIL, R.LAST_OPEN_DATE,
ROW_NUMBER() OVER (PARTITION BY M.MEMBER_ID
ORDER BY R.LAST_OPEN_DATE DESC) AS RN
FROM MEMBER M, REPORT R
WHERE M.MEMBER_ID = R.MEMBER_ID
AND R.STATUS = 1
AND TRUNC(R.LAST_OPEN_DATE) >= TRUNC(SYSDATE) - 120
)
PIVOT (MIN(LAST_OPEN_DATE) FOR (RN) IN (1, 2, 3, 4, 5, 6, 7, 8));
SQL Fiddle.
Essentially this is assigning a number to each report date for each member, and then the pivot is based on that ranking number.
But you'd need to have each of the possible number of days listed; if you can have up to 240 report dates, the PIVOT IN clause would need to be every number up to 240, i.e. IN (1, 2, 3, ..., 239, 240), not just up to eight as in that Fiddle.
If you ever had a member with more than 240 dates you wouldn't see some of them, so whatever high number you pick would have to be high enough to cover every possibility, now and in the foreseeable future. As your query is limited to 120 days, even 240 seems quite high, but perhaps you have more than one per day - in which case there is no real upper limit.
You could potentially have to format each date column individually, but hopefully your reporting layer is taking care of that.
If you just wanted to perform string aggregation using the multiple dates for each email, in you could do this in 11g:
SELECT M.EMAIL,
LISTAGG(TO_CHAR(R.LAST_OPEN_DATE, 'DD/MM/YYYY'), ' ')
WITHIN GROUP (ORDER BY R.LAST_OPEN_DATE DESC)
FROM MEMBER M, REPORT R
WHERE M.MEMBER_ID = R.MEMBER_ID
AND R.STATUS = 1
AND TRUNC(R.LAST_OPEN_DATE) >= TRUNC(SYSDATE) - 120
GROUP BY M.EMAIL;
EMAIL DATES
-------------------- -------------------------------------------
email#1.com 12/04/2014 11/04/2014 10/04/2014 09/04/2014
email#2.com 12/05/2014 02/04/2014 22/03/2014 21/03/2014
SQL Fiddle.
Which is OK for a text report, but not if this query is feeding into a reporting tool.

First of all, number of columns in a query is determined before hand and can't be adjusted by the data. To overcome that, you might be interested in dynamic query
But, in simple static case, you will need to use PIVOT construction.
As a first step, you will need to assign rows to the columns
select EMAIL, row_number() over (partition by email order by last_date) col
from yourtable
then you add "magic" PIVOT:
<your query>
PIVOT
(
max(last_date)
for col in (1, 2, 3, ..., 240)
)

Oracle sql to count instances of different values in single column - continuation

This is in continuation to an earlier question I had posted
Here is the link
Oracle sql to count instances of different values in single column
In further continuation to the pivot query, I am trying to do something like
for col in (
Count_status20 as col20,
Count_status30 or Count_status40 as col30,
Count_status50 as col50)
The input remaning the same as earlier question.
Basically here I am trying to sum statuses in 30 or 40 as one column.

Try it like this:
select *
from
(
select tkey, status,
decode(status, 30, 30, 40, 30,status) as col
from tableB b
left join tableA a
on a.fkey = b.fkey
) src
pivot
(
count(status)
for col in ('20' as Count_Status20,
'30' as Count_Status3040,
'50' as Count_Status50)
) piv;
Here is a fiddle

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

ClickHouse Materialized View generating too slow - performance

Related

How to compare two records of 2 separate tables in the same schema and show only the difference in Oracle Database?

oracle trigger exact fetch returns more than requested number of rows

exclude part of the select not to consider date where clause

Show report horizontally with SQL or PL/SQL

Oracle sql to count instances of different values in single column - continuation

Categories

Resources