Multiple joins with count on a hive table - hadoop

So I have 4 different tables and I want to put them in one table with one of the columns from the tables and the # of times a particular value appears in that column. All the columns are strings.
For example:
table A
col1
20190204
20190204
20190204
20190205
20190205
20190205
Table B
col1
20200204
20200204
20200204
20200204
20200205
20200205
20200205
TableC
col1
20210204
20210204
20210204
20210204
20210205
20210205
20210205
TableD
col1
20220204
20220204
20220204
20220204
20220205
20220205
20220205
TableE -- All the 4 tables will go into here
TableE is empty and needs to be populated with the dates from the other tables and the number of times they occur in those tables. For example:
col1(tablea) col2 col3(tbaleb) col4 col5(tablec) col6
20190204 4 20200204 4 20210204 4
20190205 3 20200205 3 20210205 3
col7(tabled) col8
20220205 3
20220205 4
etc...
I am new to hue, so I tried something like this:
insert overwrite into tablee (
tablee.tablea.date, tablee.tablea.datecount,
tablee.tablebdate, tablee.tableb.datecount,
tablee.tablecdate, tablee.tablec.datecount,
tablee.tableddate, tablee.tablea.datedcount,
select tablea.date, count(tablea.date),
tableb.date, count(tableb.date),
tablec.date, count(tablec.date),
tabled.date, count(tabled.date)
)
from tablea, tableb, tablec, tabled
left join tablee on (tablea.date=tablee.date)
left join tablee on (tableb.date=tablee.date)
left join tablee on (tablec.date=tablee.date)
left join tablee on (tabled.date=tablee.date);
But I am not able to get it to work correctly. Does anyone have any tips?

Please check if below query gives your desired result set.
select * from (select col1,count(*) from tablea group by 1)a
full outer join
(select col1,count(*) from tableb group by 1)b on a.col1=b.col1
full outer join
(select col1,count(*) from tablec group by 1)c on b.col1=c.col1
full outer join
(select col1,count(*) from tabled group by 1)d on c.col1=d.col1;
First all the grouped data from each table is calculated and then we doing full outer join to include all values of col1 from each table to get result set. Finally if the result set is what the desired one we can convert select statement to insert into/overwrite statement.

Related

Reduce overload on pl/sql

I have a requirement to do matching of few attributes one by one. I'm looking to avoid multiple select statements. Below is the example.
Table1
Col1|Price|Brand|size
-----------------------
A|10$|BRAND1|SIZE1
B|10$|BRAND1|SIZE1
C|30$|BRAND2|SIZE2
D|40$|BRAND2|SIZE4
Table2
Col1|Col2|Col3
--------------
B|XYZ|PQR
C|ZZZ|YYY
Table3
Col1|COL2|COL3|LIKECOL1|Price|brand|size
-----------------------------------------
B|XYZ|PQR|A|10$|BRAND1|SIZE1
C|ZZZ|YYY|D|NULL|BRAND2|NULL
In table3, I need to insert data from table2 by checking below conditions.
Find a match for record in table2, if Brand and size, Price match
If no match found, then try just Brand, Size
still no match found, try brand only
In the above example, for the first record in table2, found match with all the 3 attributes and so inserted into table3 and second record, record 'D' is matching but only 'Brand'.
All I can think of is writing 3 different insert statements like below into an oracle pl/sql block.
insert into table3
select from tab2
where all 3 attributes are matching;
insert into table3
select from tab2
where brand and price are matching
and not exists in table3 (not exists is to avoid
inserting the same record which was already
inserted with all 3 attributes matched);
insert into table3
select from tab2
where Brand is matching and not exists in table3;
Can anyone please suggest a better way to achieve it in any better way avoiding multiple times selecting from table2.
This is a case for OUTER APPLY.
OUTER APPLY is a type of lateral join that allows you join on dynamic views that refer to tables appearing earlier in your FROM clause. With that ability, you can define a dynamic view that finds all the matches, sorts them by the pecking order you've specified, and then use FETCH FIRST 1 ROW ONLY to only include the 1st one in the results.
Using OUTER APPLY means that if there is no match, you will still get the table B record -- just with all the match columns null. If you don't want that, you can change OUTER APPLY to CROSS APPLY.
Here is a working example (with step by step comments), shamelessly stealing the table creation scripts from Michael Piankov's answer:
create table Table1 (Col1,Price,Brand,size1)
as select 'A','10','BRAND1','SIZE1' from dual union all
select 'B','10','BRAND1','SIZE1' from dual union all
select 'C','30','BRAND2','SIZE2' from dual union all
select 'D','40','BRAND2','SIZE4'from dual
create table Table2(Col1,Col2,Col3)
as select 'B','XYZ','PQR' from dual union all
select'C','ZZZ','YYY' from dual;
-- INSERT INTO table3
SELECT t2.col1, t2.col2, t2.col3,
t1.col1 likecol1,
decode(t1.price,t1_template.price,t1_template.price, null) price,
decode(t1.brand,t1_template.brand,t1_template.brand, null) brand,
decode(t1.size1,t1_template.size1,t1_template.size1, null) size1
FROM
-- Start with table2
table2 t2
-- Get the row from table1 matching on col1... this is our search template
inner join table1 t1_template on
t1_template.col1 = t2.col1
-- Get the best match from table1 for our search
-- template, excluding the search template itself
outer apply (
SELECT * FROM table1 t1
WHERE 1=1
-- Exclude search template itself
and t1.col1 != t2.col1
-- All matches include BRAND
and t1.brand = t1_template.brand
-- order by match strength based on price and size
order by case when t1.price = t1_template.price and t1.size1 = t1_template.size1 THEN 1
when t1.size1 = t1_template.size1 THEN 2
else 3 END
-- Only get the best match for each row in T2
FETCH FIRST 1 ROW ONLY) t1;
Unfortunately is not clear what do you mean when say match. What is you expectation if there is more then one match?
Should it be only first matching or it will generate all available pairs?
Regarding you question how to avoid multiple inserts there is more then one way:
You could use multitable insert with INSERT first and condition.
You could join table1 to self and get all pairs and filter results in where condition
You could use analytical function
I suppose there is another ways. But why you would like to avoid 3 simple inserts. Its easy to read and maintain. And may be
There is example with analytical function next:
create table Table1 (Col1,Price,Brand,size1)
as select 'A','10','BRAND1','SIZE1' from dual union all
select 'B','10','BRAND1','SIZE1' from dual union all
select 'C','30','BRAND2','SIZE2' from dual union all
select 'D','40','BRAND2','SIZE4'from dual
create table Table2(Col1,Col2,Col3)
as select 'B','XYZ','PQR' from dual union all
select'C','ZZZ','YYY' from dual
with s as (
select Col1,Price,Brand,size1,
count(*) over(partition by Price,Brand,size1 ) as match3,
count(*) over(partition by Price,Brand ) as match2,
count(*) over(partition by Brand ) as match1,
lead(Col1) over(partition by Price,Brand,size1 order by Col1) as like3,
lead(Col1) over(partition by Price,Brand order by Col1) as like2,
lead(Col1) over(partition by Brand order by Col1) as like1,
lag(Col1) over(partition by Price,Brand,size1 order by Col1) as like_desc3,
lag(Col1) over(partition by Price,Brand order by Col1) as like_desc2,
lag(Col1) over(partition by Brand order by Col1) as like_desc1
from Table1 t )
select t.Col1,t.Col2,t.Col3, coalesce(s.like3, like_desc3, s.like1, like_desc1, s.like1, like_desc1),
case when match3 > 1 then size1 end as size1,
case when match1 > 1 then Brand end as Brand,
case when match2 > 1 then Price end as Price
from table2 t
left join s on s.Col1 = t.Col1
COL1 COL2 COL3 LIKE_COL SIZE1 BRAND PRICE
B XYZ PQR A SIZE1 BRAND1 10
C ZZZ YYY D - BRAND2 -

Merge table data

Table1 Table2 Table3 Table4
Sl Name City index len bre col tax income price dicount org
1 ABC XYZ 1 10 12 1 23 40 1 10 XYZ
2 DEF asd 2 12 14 2 24 42 2 6 asd
3 ghi jkl 3 78 89 3 0 gah
These entries correspond to respective tables. I want to fetch data from all 4 tables irrespective of whether values are present in Table2 or not. Any null value in Table2 should not hamper the output.
select tab1.Name,
tab2.len,
tab3.tax,
tab4.org
From Table1 tab1,
Table2 tab2,
Table3 tab3,
Table4 tab4
where tab1.sl=tab2.index(+)
AND tab2.index(+)=tab3.col
AND tab3.col=tab4.price;
This query only returns results for those Sl for which there is entry in table 2. How can I resolve this?
To use a proper ANSI left join:
select tab1.Name,
tab2.len,
tab3.tax,
tab4.org
From Table1 tab1
inner join Table3 tab3 on tab1.sl.tab3.col
inner join Table4 tab4 on tab3.col=tab4.price
left join Table2 tab2 on tab1.sl=tab2.index;
This makes your code much more readable.
Try following ---
select tab1.Name,
tab2.len,
tab3.tax,
tab4.org
From
Table1 tab1 left join Table2 tab2
on tab1.sl=tab2.index(+) join Table3 tab3
on tab2.index(+)=tab3.col join Table4 tab4
on tab3.col=tab4.price;
Look, you should move from the 1990s into the early 2000s, by rewriting your query without the 'orrible omega-join (+) syntax.
Converting omega to join, your query comes out like this.
SELECT tab1.Name,
tab2.len,
tab3.tax,
tab4.org
FROM Table1 tab1,
left join Table2 tab2 ON tab1.sl=tab2.index
right join Table3 tab3 ON tab2.index=tab3.col
inner join Table4 tab4 ON tab3.col=tab4.price;
And, then the apparently chaotic combination of right, left, and inner join operations hints at the solution to your problem.
Change over to all left joins and your Table1 rows won't be suppressed when they don't match other tables.
SELECT tab1.Name,
tab2.len,
tab3.tax,
tab4.org
FROM Table1 tab1
LEFT JOIN Table2 tab2 ON tab1.sl=tab2.index
LEFT JOIN Table3 tab3 ON tab2.index=tab3.col
LEFT JOIN Table4 tab4 ON tab3.col=tab4.price;
Even if you must use the old omega join syntax, you should use it in a way which won't suppress rows from Table1
select tab1.Name,
tab2.len,
tab3.tax,
tab4.org
From Table1 tab1,
Table2 tab2,
Table3 tab3,
Table4 tab4
where tab1.sl=tab2.index(+)
AND tab2.index=tab3.col(+)
AND tab3.col=tab4.price(+);
The position of the (+) on the right means it's a left join, and vice versa.

oracle insert into column using subquery

I want to insert data into a column in the table.
Table a
ID col1 col2
1 A null
2 B null
Table b
ID col1
1 C
2 D
Expected results:
Table A
ID col1 col2
1 A C
2 B D
I tried this:
insert into tableA (col2)
select b.col1
from tableB b , tableA a
where b.id = a.id
and I received
0 row inserted.
How do I insert the col1 in B into col2 in A for the matching 'id' columns?
Thank you.
You must use Merge statement when inserting based on joins.
Also in table tableA col2 already exist but you want to insert a value on join then you must update that column.
merge into tablea a
using tableb b
on (b.id = a.id)
when matched
then
update set a.col2 = b.col1;
What you want to do shouldn't require a subquery. I'm not a huge fan of the table a, table b notation, try this:
update a
set col2 = b.col1
from tableB b
join tableA a
on a.id = b.id

Group By Clause - Oracle SQL

I have to display about 5 columns from 2 database tables using the Group By clause as follows:
SELECT T1.COLUMN_A, T1.COLUMN_B, T1.COLUMN_C, SUM(T2.COLUMN_D)
FROM TABLE1 T1, TABLE2 T2
WHERE T1.COLUMN_A = T2.COLUMN_A
GROUP BY T1.COLUMN_A
Now COLUMN_B has the same value across all the rows having the same COLUMN_A and COLUMN_B is a amount field.
COLUMN_C is a date field and would be same across the same COLUMN_A values.
Ex. Here is dummy data TABLE T1
COLUMN_A COLUMN_B COLUMN_C
1 $25 09/15/2911 12:00:00 AM
1 $25 09/15/2011 12:00:00 AM
2 $20 12/12/2011 12:00:00 AM
...
TABLE T2:
COLUMN_A COLUMN_D
1 $100
1 $10
2 $200
2 $200
.....
Running the query does not work with following error: ORA-00979: not a GROUP BY expression
Removing COLUMN_B and COLUMN_C would work. However I need these columns as well.
Can anyone please suggest the required changed?
This should work
SELECT T1.COLUMN_A, T1.COLUMN_B, T1.COLUMN_C, SumColumnD
FROM TABLE1 T1
INNER JOIN
(SELECT COLUMN_A, SUM(COLUMN_D) AS SumColumnD
FROM TABLE2 T2
GROUP BY COLUMN_A) t ON T1.COLUMN_A = t.COLUMN_A
If the values of COLUMN_B and COLUMN_C are the same across same values of COLUMN_A, then you can simply add them to theGROUP BY clause:
SELECT T1.COLUMN_A, T1.COLUMN_B, T1.COLUMN_C, SUM(T2.COLUMN_D)
FROM TABLE1 T1, TABLE2 T2
WHERE T1.COLUMN_A = T2.COLUMN_A
GROUP BY T1.COLUMN_A, T1.COLUMN_B, T1.COLUMN_C
You've specified columns COLUMN_B and COLUMN_C in your SELECT list, so Oracle will need to provide a value for them when GROUPing BY COLUMN_A. However. Oracle doesn't know that these columns are constant across same values of COLUMN_A, and you get the error because in general it has no way of returning a value for these columns.
Adding COLUMN_B and COLUMN_C to the GROUP BY clause shouldn't affect the results of the query and should allow you to use them in your SELECT list.

How to update a table with null values with data from other table at one time?

I have 2 tables - A and B . Table A has two columns, pkey (primary key) and col1. Table B also has two columns, pr_key (primary key but not a foreign key) and column1. Both tables have 4 rows. Table B has no values in column1, while table A has column1 values for all 4 rows. So my data looks like this
Table A
pkey col1
A 10
B 20
C 30
D 40
Table B
pr_key column1
A null
B null
C null
D null
I want to update table B to set the column1 value of each row equal to the column1 value of the equivalent row from table A in a single DML statement.
Should be something like that (depends on SQL implementation you use, but in general, the following is rather standard. In particular should work in MS-SQL and in MySQL.
INSERT INTO tblB (pr_key, column1)
SELECT pkey, col1
FROM tblA
-- WHERE some condition (if you don't want 100% of A to be copied)
The question is a bit unclear as to the nature of tblB's pr_key, if for some reason this was a default/auto-incremented key for that table, it could just then be omitted from both the column list (in parenthesis) and in the SELECT that follows. In this fashion upon insertion of each new row, a new value would be generated.
Edit: It appears the OP actually wants to update table B with values from A.
The syntax for this should then be something like
UPDATE tblB
SET Column1 = A.Col1
FROM tblA AS A
JOIN tblB AS B ON B.pr_key = A.pkey
This may perform better:
MERGE INTO tableB
USING (select pkey, col1 from tableA) a
ON (tableB.pr_key = a.pkey)
WHEN MATCHED THEN UPDATE
SET tableB.column1 = a.col1;
It sounds like you want to do a correlated update. The syntax for that in Oracle is
UPDATE tableB b
SET column1 = (SELECT a.column1
FROM tableA a
WHERE a.pkey = b.pr_key)
WHERE EXISTS( SELECT 1
FROM tableA a
WHERE a.pkey = b.pr_key )
The WHERE EXISTS clause isn't necessary if tableA and tableB each have 4 rows and have the same set of keys in each. It is much safer to include that option, though, to avoid updating column1 values of tableB to NULL if there is no matching row in tableA.

Resources