Suppose I have 2 tables A and B. I create a MV(materialized view) with a join query of two tables, psuedo like:
create materialized view a_b engine = Memory as
select * from(
select * from A
) all inner join (
select * from B
) using some_col;
I known that a_b is only updated when inserting data into table A and nothing else happen when push data to B. I want my MV have to update when both table are updated.
My workaround is to create another MV that change postition of A, B and point to a_b like
create materialized view a_b_2 to a_b as
select * from(
select * from B
) all inner join (
select * from A
) using same_col;
I have some questions about this approach:
1. Are there any more legal way to archive same effect in clickhouse?
2. Suppose I have 2 incoming batches data BD_A and BD_B are going to insert to A and B simultaneously. Some data of 2 batches themself (BD_A_B) is fit join condition . Is there any chance that the MV lost those BD_A_B because MV a_b processes BD_A with before-inserted B and MV a_b_2 processes BD_B with before-inserted A.
As far as I understand, you are trying to have a workaround of a limitation.
Clickhouse does not support multiple source tables for a MV and they have quite good reasons for this. I actually asked this to devs and got this answer:
In ClickHouse materialized view behaves more like BEFORE INSERT TRIGGER, each time processing new block arrived with insert.
So that is quite natural limitation as inserts to 2 different table will come asynchronously and you usually expect to see in JOINs whole table not only newly arrived blocks.
Related
I am making a new procedure making queries to a huge table.
The structure of my procedure is as follows:
{
open cursor for
QUERY 1
UNION
QUERY 2
UNION
QUERY 3
}
The structure of QUERY 1 is INNER JOIN 2 ([INNER JOIN 1 (TABLE) x (TABLE) ] x TABLE )
The structure of QUERY 2 is INNER JOIN 3 ([INNER JOIN 1 (TABLE) x (TABLE) ] x TABLE )
Is there a way to store [INNER JOIN 1 (TABLE) x (TABLE) ] somewhere so that I don't have to do it twice?
EDIT: Forgot to add that I cannot create a table outside of the procedure because multiple instances of this procedure will run in parallel. They will just block each other from running by inserting in the same table. Also, I don't know how many instances will run in parallel so I cannot create as many tables as instances.
Don't create any tables from PL/SQL. It is possible (hint: dynamic SQL), but that's not how Oracle works.
If you need a table, then create it BEFORE running this procedure, either using CREATE TABLE (and name all columns you need), or using CTAS (Create Table As Select) which would - basically - be your current query.
That table can be "normal" or "global (or private, depending on database version) temporary table" (GTT). If you use a GTT, only you can see data stored within. If it is a "normal" table, everyone sees data so you might need to pay attention to who sees & uses what.
Another option is to use the CTE (Common Table Expression, a.k.a. the WITH factoring clause) which can be used directly in the procedure as
with your_view as
(select ...
from table1 join table2 on ...
join table3 on ...
)
select whatever
from some_other_table join your_view
where ...
union
select whatever_else
from yet_another_table join your_View
where ...
[EDIT, after seeing your edit]
If you don't want to use a CTE for some reason, then a GTT might be your choice. Why? See my 3rd paragraph ("everyone sees only their own data").
You could always use a Global Temporary table:
https://oracle-base.com/articles/misc/temporary-tables#temporary-tables
I want to insert data from two different tables (say table A and table B ) into a third table (table C) in oracle.
I have written two different cursors for fetching data from table A and B separately, and populated two collections based on these two tables.
Now, i want to insert the data in those two collections into the third table (table C), how can i get this done.
Now there are two common columns that are present in both the columns, say for example ID and YEARMONTH, these two columns are there in all tables (A, B and C).
I have tried doing a merge based on these two fields.
but i am looking for an efficient and more convenient way to do this.
You didn't provide code you wrote, so I'll guess: cursors mean PL/SQL. If you're doing it in a loop, row-by-row, it'll be slow-by-slow.
As there are common columns in both tables (A and B), I'd suggest doing it in pure SQL: join those two tables and insert the result into C. Something like
insert into c (id, yearmonth, ...)
select a.id, a.yearmonth, ...
from a join b on a.id = b.id;
Make sure that indexes exist on columns you use to join tables. Or, even better, compare explain plans in both cases (with and without indexes) and choose an option which seems to be the best.
insert into tableC
select * from tableA where ...
union
select * from tableB where ...
I have a dimensional model with a large fact table (millions of rows) which is range partitioned by date and smaller dimensional tables that are not partitioned. I came across materialized views which is often used in these scenarios to improve query performance.
Now, I want to know which way is better of the following two to utilize these materialized views to get aggregated reports.
A. Create one with the by joining the whole fact table with each of the dimension tables required.
create materialized view my_mview execute immediate query rewrite
select
fact.col1, dim1.col2, dim2.col3, sum(fact.col4)
from
my_fact fact
inner join
my_dim1 dim1
on fact.dim1_key = dim1.dim1_key
inner join
my_dim2 dim2
on fact.dim2_key = dim2.dim2_key group by fact.col1, dim1.col2, dim2.col3
This seems like the most basic way of using them. But it seems
rather limiting and I would require a new materialzed view for each
variation of the query I want to create.
B. Create it over the aggregation of the fact table and utilize the query rewrite when doing a dimensional join back.
create materialized view my_mview execute immediate query rewrite
select
col1, dim1.dim2_key, dim2.dim_key, sum(fact.col4)
from
my_fact fact
And do the join as above in case A, which will use this aggregated materialzed view for the join and not the whole fact table.
Can anyone tell me when I would use each case or the other?
Your first example works exactly as you described.
For the second example the query should be:
create materialized view my_mview execute immediate query rewrite
select
col1, fact.dim2_key, fact.dim_key, sum(fact.col4)
from
my_fact fact
group by
col1, fact.dim2_key, fact.dim_key
This will automatically speed up aggregates such as
select sum(fact.col4)
from fact
select fact.dim_key,sum(fact.col4)
from fact
group by fact.dim_key
select fact.dim2_key,sum(fact.col4)
from fact
group by fact.dim2_key
I don't think Oracle will rewrite your first type of query to this MV automatically because in the MV the join columns are already grouped by (They also should be grouped in your second example). It never happened for us. This however may also depend on if there are relationships defined between dim and fact table and the value of QUERY_REWRITE_INTEGRITY parameter, so there is still some room for testing here.
You may still get a performance gain by writing a query in a specific way
WITH preaggr as (
select
col1, fact.dim2_key, fact.dim_key, sum(fact.col4)
from
my_fact fact
group by
col1, fact.dim2_key, fact.dim_key
)
select
dim2.col1,
sum(preaggr.col4)
from
preaggr
join
dim2
on
preaggr.dim2_key = fact.dim2_key
group by
dim2.col1
I have two Hive tables of the same structure (schema). What would be an efficient SQL request to concatenate them into a single table with the same structure?
Update, this works quite fast in my case:
CREATE TABLE xy AS SELECT *
FROM (
SELECT *
FROM x
UNION ALL
SELECT *
FROM y
) tmp;
If you are trying to merge table_A and table_b into a single one, the easiest way is to use the UNION ALL operator. You can find the syntax and use cases here - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Union
"union all" is a right solution but might be expensive, resource/time wise. I'd recommend creating a table with two partitions, one for table A and another for Table B. This way, no need to merge (or union all). The merged table is available as soon as both partitions get populated.
I have a materialized view that looks somewhat like the following and I'm wondering if there is anyway to have this materialized view 'fast' refreshable? Basically, I'm asking the following:
Can a materialized view contain oracle functions such as COALESCE, NVL, NVL2, etc and still be fast refreshable
Can a materialized view contain functions that I have made and still be fast refreshable.
Can a materialized view contain joins to derived tables and still be fast refreshable?
I checked the Oracle documentation about this, and it did not list these restrictions, however after testing the case below on my own system, I don't believe it is possible.
Oracle version: 10g
SELECT COALESCE (col1, col2),
myOracleFunction(col3, col4)
FROM tableA a
LEFT OUTER JOIN
(SELECT id, MAX (sample_key) prim_sam_key
FROM table_sample
GROUP BY id
HAVING COUNT (1) = 1) b ON a.id = b.id;
Requirements from the link you provided that you're missing:
COUNT(*) must be specified.
The SELECT list must contain all GROUP BY columns.
Also, the following requirement indicates that, for your query, a fast refresh will only be possible if table_sample has been updated, but tableA has not:
Materialized aggregate views with outer joins are fast refreshable
after conventional DML and direct loads, provided only the outer table
has been modified. Also, unique constraints must exist on the join
columns of the inner join table. If there are outer joins, all the
joins must be connected by ANDs and must use the equality (=)
operator.
Finally, when asking about materialized views, it is always a good idea to state exactly what materialized view logs you have created.