Hive: Joining tables with different scenarios - hadoop

I have a question on joining tables in a different scenario. Please find the sample tables below.
Capacity of expected table row 3-5 should be repeated as table 2 does not have those fields.
could anyone please help to get expected table?
Table 1:
No ProjectID Capacity
1 514 4
2 418 10
3 418 30
4 401 40
5 502 41
Table2:
NO ProjectID Capacity1 Capacity2
1 514 4 10
2 418 10 20
Expected Table:
NO ProjectID Capacity1 Capacity2
1 514 4 10
2 418 10 20
3 418 30 30
4 401 40 40
5 502 41 41

1.Do left outer join
2.For the values not matching take them from table 1 with if condition.
select t1.no,t1.projectid,t1.capacity1,if(t2.capacity2 is null,t1.capacity,t2.capacity)
from table1 t1 left outer join table2 t2 on t1.no=t2.no
I think above query meets your requirement let me know if need any more help.

Related

Get Total count

I want to merge two columns(Sender and Receiver) and get the Transaction Type count then merge another table with using Sender_Receiver primary id.
Sender Receiver Type Amount Date
773787639 777611388 1 300 2/1/2019
773631898 776806843 4 450 8/20/2019
773761571 777019819 6 369 2/11/2019
774295511 777084440 34 1000 1/22/2019
774263079 776816905 45 678 6/27/2019
774386894 777202863 12 2678 2/10/2019
773671537 777545555 14 38934 9/29/2019
774288117 777035194 18 21 4/22/2019
774242382 777132939 21 1275 9/30/2019
774144715 777049859 30 6309 7/4/2019
773911674 776938987 10 3528 5/1/2019
773397863 777548054 15 35892 7/6/2019
776816905 772345091 6 1234 7/7/2019
777035194 775623065 4 453454 7/20/2019
Second Table
Mobile_number Age
773787639 34
773787632 23
774288117 65
I am try to get like this kind of table
Sender/Receiver Type_1 Type_4 Type_12...... Type_45 Age
773787639 3 2 0 0 23
773631898 1 0 1 2 56
773397863 2 2 0 0 65
772345091 1 1 0 3 32
Ok, I have seen your old question and you just need inner join in sub-query as following:
SELECT
SenderReceiver,
COUNT(CASE WHEN Type = 1 THEN 1 END) AS Type_1,
COUNT(CASE WHEN Type = 2 THEN 1 END) AS Type_2,
COUNT(CASE WHEN Type = 3 THEN 1 END) AS Type_3,
...
COUNT(CASE WHEN Type = 45 THEN 1 END) AS Type_45,
Age -- changes here
FROM
( SELECT sr.SenderReceiver, sr.Type, st.Age from -- changes here
(SELECT Sender AS SenderReceiver, Type FROM yourTable
UNION ALL
SELECT Receiver, Type FROM yourTable) sr
join <second_table> st on st.Mobile_number = sr.SenderReceiver -- changes here
) t
GROUP BY
SenderReceiver,
Age; -- changes here
Changes done in your previous query are marked with comments -- changes here.
Please replace the name of the <second_table> with the original name of the table.
Cheers!!

Oracle SQL Joining tables using IN clause takes long time

I have two queries which I need to join for business purpose to make them one-step process
SELECT
empid,
assest_x_id,
asset_y_id
FROM emp e
WHERE e.empid = 'SOME_UNIQUE_VAL';
result:
EMPID ASSEST_X_ID ASSET_Y_ID
======= ============ ===========
1234 abc pqr
-- Even though millions rows in table this will return 1 row always within milliseconds as it's using PK column with unique value.
Now there is another table for asset_values in separate DB current market price
(also a million rows)
SELECT
asset_id,
assest_type,
asset_current_price
FROM asset_values#x_db a
WHERE (asset_id, assest_type) IN (('abc', 'X'), ('pqr', 'Y'));
result:
asset_id asset_type assest_current_price
======== ========= =============
abc X 10000
pqr Y 5000
This will also return 2-3 rows always in few millisecs as Primary key is defined for combination of asset_id,asset_type values and there exists only 3 type of assets as X/Y/Z.
(Note: Not possible to normalize this table further data in business rules)
**Now to make a one step process query in script I tried to join these queries which can take empid from user and get all desired reults.
But now problem is that when I try to merge these two in single query like below runs for 15+ mins to give results**
SELECT
a.asset_id,
a.asset_type,
asset_current_price
FROM asset_values#x_db a, emp b
WHERE b.empid = 'SAME_UNIQUE_VAL'
AND (asset_id, asset_type) IN ((b.asset_x_id, 'X'), (b.asset_y_id, 'Y'));
Surprisingly Explain Plan is also good. (bytes:597 cost:2)
Can someone please give your expert advice on this?
SELECT STATEMENT ALL_ROWSCost: 6 Bytes: 690 Cardinality: 2
13 CONCATENATION
6 NESTED LOOPS Cost: 3 Bytes: 345 Cardinality: 1
3 PARTITION RANGE SINGLE Cost: 1 Bytes: 2,232 Cardinality: 9 Partition #: 3 Partitions accessed #1
2 TABLE ACCESS BY LOCAL INDEX ROWID TABLE MYPRDAOWN.EMP Object Instance: 2 Cost: 1 Bytes: 2,232 Cardinality: 9 Partition #: 4 Partitions accessed #1
1 INDEX RANGE SCAN INDEX MYPRDAOWN.EMP_7IX Cost: 1 Cardinality: 9 Partition #: 5 Partitions accessed #1
5 FILTER Cost: 1 Bytes: 97 Cardinality: 1
4 REMOTE REMOTE SERIAL_FROM_REMOTE ASSEST_VALUES XDB
12 NESTED LOOPS Cost: 3 Bytes: 345 Cardinality: 1
9 PARTITION RANGE SINGLE Cost: 1 Bytes: 2,232 Cardinality: 9 Partition #: 9 Partitions accessed #1
8 TABLE ACCESS BY LOCAL INDEX ROWID TABLE MYPRDAOWN.EMP Object Instance: 2 Cost: 1 Bytes: 2,232 Cardinality: 9 Partition #: 10 Partitions accessed #1
7 INDEX RANGE SCAN INDEX MYPRDAOWN.EMP_7IX Cost: 1 Cardinality: 9 Partition #: 11 Partitions accessed #1
11 FILTER Cost: 1 Bytes: 97 Cardinality: 1
10 REMOTE REMOTE SERIAL_FROM_REMOTE ASSEST_VALUES XDB
from http://docs.oracle.com/cd/B28359_01/server.111/b28274/optimops.htm#i49732:
Nested loop joins are useful when small subsets of data are being joined and if the join condition is an efficient way of accessing the second table.
Use hash joins which:
... are used for joining large data sets. The optimizer uses the smaller of two tables or data sources to build a hash table on the join key in memory. It then scans the larger table, probing the hash table to find the joined rows.
to implement it use hint
SELECT /*+ use_hash(a,b) */ a.asset_id,
a.asset_type,
asset_current_price
FROM asset_values#x_db a,
emp b
WHERE b.empid = 'SAME_UNIQUE_VAL'
AND (asset_id, asset_type) IN ((b.asset_x_id, 'X'), (b.asset_y_id, 'Y'));

Update query taking long time in oracle 10g

I have a table which holds more then 2 million records, I am trying to update a table using following query
UPDATE toc T
SET RANK =
65535
- (SELECT COUNT (*)
FROM toc T2
WHERE S_KEY LIKE '00010001%'
AND A_ID IS NOT NULL
AND T2.TARGET = T.TARGET
AND T2.RANK > T.RANK)
WHERE S_KEY LIKE '00010001%' AND A_ID IS NOT NULL
Usually this query tooks 5 mins to update 50000 rows in our staging db which is a exact replica of production db but in our production db it is taking 6 hours to execute...
I tried Oracle advisory to select the correct execution plan but nothing is working...
Plan
UPDATE STATEMENT ALL_ROWSCost: 329,471
6 UPDATE TT.TOC
2 TABLE ACCESS BY INDEX ROWID TABLE TT.TOC Cost: 5 Bytes: 4,173,236 Cardinality: 54,911
1 INDEX SKIP SCAN INDEX TT.DATASTAT_SORTKEY_IDX Cost: 4 Cardinality: 1
5 SORT AGGREGATE Bytes: 76 Cardinality: 1
4 TABLE ACCESS BY INDEX ROWID TABLE TT.TOC Cost: 5 Bytes: 76 Cardinality: 1
3 INDEX SKIP SCAN INDEX TT.DATASTAT_SORTKEY_IDX Cost: 4 Cardinality: 1
I can see the following wait events
1,066 db file sequential read 10,267 0 3,993 0 6 39,933,580
1,066 db file scattered read 413 0 188 0 6 1,876,464
Any help will be greatly appreciated.
here is the current list of indexes
DSTAT_SKEY_IDX D_STATUS 1
DSTAT_SKEY_IDX S_KEY 2
IDX$$_165A0002 N_LABEL 1
S_KEY_IDX S_KEY 1
XAK1_TOC N_RELATIONSHIP 1
XAK2_TOC TARGET 1
XAK2_TOC N_LABEL 2
XAK2_TOC D_STATUS 3
XAK2_TOC A_ID 4
XIE1_TOC N_RELBASE 1
XIF4_TOC SOURCE_FILE_ID 1
XIF5_TOC A_ID 1
XPK_TOC N_ID 1
Atif
You're doing a skip scan where you supposedly want to do a range scan.
A range scan is only possible when the index columns are ordered by descending selectivity - in your case it seems that it should be S_KEY - TARGET - RANK
Update: rewriting the query in different order wouldn't make any difference. What matters is the sequence of the columns in the indexes of that table.
first show us the current index columns for that table:
select index_name, column_name, column_position from all_ind_columns where table_name = 'TOC'
then you could create a new index, e.g.
create index toc_i_s_key_target_rank on toc (s_key, target, rank) compress;

update from temp table to original table

I have two tables one is original table and second one is temp table. Temp table having the correct record. Unique column is cust_id. My table structure is
Customer table
cust_id amount
12 100
13 120
14 130
15 250
20 70
25 110
28 900
temp table
cust_id amount
12 300
13 190
14 110
15 240
20 30
25 210
28 500
I want to update the record from temp table to customer orignal table using customer id.
It can be done using merge statement.
merge into original_table ot
using temp_table tp
on (ot.cust_id = tp.cust_id)
when matched
then update set ot.amount = tp. amount

Hive: Joining two tables with different keys

I have two tables like below. Basically i want to join both of them and expected the result like below.
First 3 rows of table 2 does not have any activity id just empty.
All fields are tab separated. Category "33" is having three description as per table 2.
We need to make use of "Activity ID" to get the result for "33" category as there are 3 values for that.
could anyone tell me how to achieve this output?
TABLE: 1
Empid Category ActivityID
44126 33 TRAIN
44127 10 UFL
44128 12 TOI
44129 33 UNASSIGNED
44130 15 MICROSOFT
44131 33 BENEFITS
44132 43 BENEFITS
TABLE 2:
Category ActivityID Categdesc
10 billable
12 billable
15 Non-billable
33 TRAIN Training
33 UNASSIGNED Bench
33 BENEFITS Benefits
43 Benefits
Expected Output:
44126 33 Training
44127 10 Billable
44128 12 Billable
44129 33 Bench
44130 15 Non-billable
44131 33 Benefits
44132 43 Benefits
It's little difficult to do this Hive as there are many limitations. This is how I solved it but there could be a better way.
I named your tables as below.
Table1 = EmpActivity
Table2 = ActivityMas
The challenge comes due to the null fields in Table2. I created a view and Used UNION to combine result from two distinct queries.
Create view actView AS Select * from ActivityMas Where Activityid ='';
SELECT * From (
Select EmpActivity.EmpId, EmpActivity.Category, ActivityMas.categdesc
from EmpActivity JOIN ActivityMas
ON EmpActivity.Category = ActivityMas.Category
AND EmpActivity.ActivityId = ActivityMas.ActivityId
UNION ALL
Select EmpActivity.EmpId, EmpActivity.Category, ActView.categdesc from EmpActivity
JOIN ActView ON EmpActivity.Category = ActView.Category
)
You have to use top level SELECT clause as the UNION ALL is not directly supported from top level statements. This will run total 3 MR jobs. ANd below is the result I got.
44127 10 billable
44128 12 billable
44130 15 Non-billable
44132 43 Benefits
44131 33 Benefits
44126 33 Training
44129 33 Bench
I'm not sure if I understand your question or your data, but would this work?
select table1.empid, table1.category, table2.categdesc
from table1 join table2
on table1.activityID = table2.activityID;

Resources