Left Outer Join via a link table, using min() to restrict join to one row

Left Outer Join via a link table, using min() to restrict join to one row - oracle

I am trying to write an Oracle SQL query to join two tables that are linked via a link table (by that I mean a table with 2 columns, each a foreign key to the primary tables). A min() function is to be used to limit the results from the left outer join to a single row.
My model consists of "parents" and "nephews". Parents can have 0 or more nephews. Parents can be enabled or disabled. Each nephew has a birthday date. The goal of my query is:
Print a single row for each enabled parent, listing that parent's oldest nephew (ie the one with the min(birthday)).
My problem is illustrated here at sqlfiddle: http://sqlfiddle.com/#!4/9a3be0d/1
I can form a query that lists all of the nephews for the enabled parents, but that is not good enough- I just want one row per parent which includes just the oldest nephew. Forming the where clause to the outer table seems to be my stumbling block.
My tables and sample data:
create table parent (parent_id number primary key, parent_name varchar2(50), enabled int);
create table nephew (nephew_id number primary key, birthday date, nephew_name varchar2(50));
create table parent_nephew_link (parent_id number not null, nephew_id number not null);
parent table:
+----+-------------+---------+
| id | parent_name | enabled |
+----+-------------+---------+
| 1 | Donald | 1 |
+----+-------------+---------+
| 2 | Minnie | 0 |
+----+-------------+---------+
| 3 | Mickey | 1 |
+----+-------------+---------+
nephew table:
+-----------+------------+-------------+
| nephew_id | birthday | nephew_name |
+-----------+------------+-------------+
| 100 | 01/01/2017 | Huey |
+-----------+------------+-------------+
| 101 | 01/01/2016 | Dewey |
+-----------+------------+-------------+
| 102 | 01/01/2015 | Louie |
+-----------+------------+-------------+
| 103 | 01/01/2014 | Morty |
+-----------+------------+-------------+
| 104 | 01/01/2013 | Ferdie |
+-----------+------------+-------------+
parent_nephew_link table:
+-----------+-----------+
| parent_id | nephew_id |
+-----------+-----------+
| 1 | 100 |
+-----------+-----------+
| 1 | 101 |
+-----------+-----------+
| 1 | 102 |
+-----------+-----------+
| 3 | 103 |
+-----------+-----------+
| 3 | 104 |
+-----------+-----------+
My (not correct) query:
-- This query is not right, it returns a row for each nephew
select parent_name, nephew_name
from parent p
left outer join parent_nephew_link pnl
on p.parent_id = pnl.parent_id
left outer join nephew n
on n.nephew_id = pnl.nephew_id
where enabled = 1
-- I wish I could add this clause to restrict the result to the oldest
-- nephew but p.parent_id is not available in sub-selects.
-- You get an ORA-00904 error if you try this:
-- and n.birthday = (select min(birthday) from nephew nested where nested.parent_id = p.parent_id)
My desired output would be:
+-------------+-------------+
| parent_name | nephew_name |
+-------------+-------------+
| Donald | Louie |
+-------------+-------------+
| Mickey | Ferdie |
+-------------+-------------+
Thanks for any advice!
John
markaaronky's suggestion
I tried using markaaronky's suggestion but this sql is also flawed.
-- This query is not right either, it returns the correct data but only for one parent
select * from (
select parent_name, n.nephew_name, n.birthday
from parent p
left outer join parent_nephew_link pnl
on p.parent_id = pnl.parent_id
left outer join nephew n
on n.nephew_id = pnl.nephew_id
where enabled = 1
order by parent_name, n.birthday asc
) where rownum <= 1

Why not:
(1) include the n.birthday from the nephews table in your SELECT statement
(2) add an ORDER BY n.birthday ASC to your query
(3) also modify your select so that it only takes the top row?
I tried to write this out in sqlfiddle for you but it doesn't seem to like table aliases (e.g. it throws an error when I write n.birthday), but I'm sure that's legal in Oracle, even though I'm a SQL Server guy.
Also, if I recall correctly, Oracle doesn't have a SELECT TOP like SQL Server does... you have to do something like "WHERE ROWNUM = 1" instead? Same concept... you're just ordering your results so the oldest nephew is the first row, and you're only taking the first row.
Perhaps an undesired side effect is you WOULD get the birthday along with the names in your results. If that's unacceptable, my apologies. It looked like your question has been sitting unanswered for a while and this solution should at least give you a start.
Lastly, since you don't have a NOT NULL constraint on your birthday column and are doing left outer joins, you might make the query safer by adding AND n.birthday IS NOT NULL

Use:
select parent_name, nephew_name
from parent p
left outer join
(
SELECT pnl.parent_id, n.nephew_name
FROM parent_nephew_link pnl
join nephew n
on n.nephew_id = pnl.nephew_id
AND n.BIRTHDAY = (
SELECT min( BIRTHDAY )
FROM nephew n1
JOIN parent_nephew_link pnl1
ON pnl1.NEPHEW_ID = n1.NEPHEW_ID
WHERE pnl1.PARENT_ID = pnl.PARENT_ID
)
) ppp
on p.parent_id = ppp.parent_id
where p.enabled = 1
Demo: http://sqlfiddle.com/#!4/98758/23
| PARENT_NAME | NEPHEW_NAME |
|-------------|-------------|
| Mickey | Louie |
| Donald | Ferdie |

Related

Clickhouse running diff with grouping

General Task
A table consists of three columns (time, key, value). The task is to calculate a running difference for each key.
So, from input
---------------
| time | key | value |
---------------
| 1 | A | 4 |
| 2 | B | 1 |
| 3 | A | 6 |
| 4 | A | 7 |
| 5 | B | 3 |
| 6 | B | 7 |
it is desired to get
----------------------
| key | value | delta |
----------------------
| A | 4 | 0 |
| B | 1 | 0 |
| A | 6 | 2 |
| A | 7 | 1 |
| B | 3 | 2 |
| B | 7 | 4 |
Approaches
runningDifference function. Works, if the key is fixed. So we can
select *, runningDifference(value) from
(SELECT key, value from table where key = 'A' order by time)
Note that subquery is necessary here. This solution suffers when you want to get this for different keys
groupArray.
select key, groupArray(value) from
(SELECT key, value from table order by time)
group by key
So, now we get a key and an array of elements with this key. Good.
But how to calculate a sliding difference? If we could do that, then ARRAY JOIN would lead us to a result.
Or we can even zip the array with itself and then apply lambda (we have arrayMap for that) but... we don't have any zip alternative.
Any ideas?
Thanks in advance.

Solution with arrays:
WITH
groupArray(value) as time_sorted_vals,
arrayEnumerate(time_sorted_vals) as indexes,
arrayMap( i -> time_sorted_vals[i] - time_sorted_vals[i-1], indexes) as running_diffs
SELECT
key,
running_diffs
FROM
(SELECT key, value from table order by time)
GROUP by key
Other option (doing sort inside each group separately, which is more optimal in a lot of cases)
WITH
groupArray( tuple(value,time) ) as val_time_tuples,
arraySort( x -> x.2, val_time_tuples ) as val_time_tuples_sorted,
arrayMap( t -> t.1, indexes) as time_sorted_vals,
arrayEnumerate(time_sorted_vals) as indexes,
arrayMap( i -> time_sorted_vals[i] - time_sorted_vals[i-1], indexes) as running_diffs
SELECT
key,
running_diffs
FROM
time
GROUP by key
and you can apply ARRAY JOIN on the result afterward.

Lately I've also encountered the problem and Clickhouse offers function arrayDifference.
WITH
groupArray(value) as vals
arrayDifference(vals) as running_diffs
SELECT
key,
running_diffs
FROM
(SELECT key, value from table order by time)
GROUP by key

This question was posted years ago, for today, Sep 29th, 2021, we can use arrayDifference instead of arrayMap. And we can ARRAY JOIN so that we can get a tabulated result instead of a nested array.
SELECT key, sorted_time, time_sorted_vals, running_diffs
FROM (
WITH
groupArray( tuple(value,time) ) as val_time_tuples,
arraySort( x -> x.2, val_time_tuples ) as val_time_tuples_sorted,
arrayMap( t -> t.1, val_time_tuples_sorted) as time_sorted_vals,
arrayMap( t -> t.2, val_time_tuples_sorted) as sorted_time,
arrayDifference(time_sorted_vals) as running_diffs
SELECT
key,
sorted_time,
time_sorted_vals,
running_diffs
FROM
table_name
GROUP by key)
ARRAY JOIN sorted_time, time_sorted_vals, running_diffs
The only restriction is that the value column should not be of nullable types.

Insert value based on min value greater than value in another row

It's difficult to explain the question well in the title.
I am inserting 6 values from (or based on values in) one row.
I also need to insert a value from a second row where:
The values in one column (ID) must be equal
The values in column (CODE) in the main source row must be IN (100,200), whereas the other row must have value of 300 or 400
The value in another column (OBJID) in the secondary row must be the lowest value above that in the primary row.
Source Table looks like:
OBJID | CODE | ENTRY_TIME | INFO | ID | USER
---------------------------------------------
1 | 100 | x timestamp| .... | 10 | X
2 | 100 | y timestamp| .... | 11 | Y
3 | 300 | z timestamp| .... | 10 | F
4 | 100 | h timestamp| .... | 10 | X
5 | 300 | g timestamp| .... | 10 | G
So to provide an example..
In my second table I want to insert OBJID, OBJID2, CODE, ENTRY_TIME, substr(INFO(...)), ID, USER
i.e. from my example a line inserted in the second table would look like:
OBJID | OBJID2 | CODE | ENTRY_TIME | INFO | ID | USER
-----------------------------------------------------------
1 | 3 | 100 | x timestamp| substring | 10 | X
4 | 5 | 100 | h timestamp| substring2| 10 | X
My insert for everything that just comes from one row works fine.
INSERT INTO TABLE2
(ID, OBJID, INFO, USER, ENTRY_TIME)
SELECT ID, OBJID, DECODE(CODE, 100, (SUBSTR(INFO, 12,
LENGTH(INFO)-27)),
600,'CREATE') INFO, USER, ENTRY_TIME
FROM TABLE1
WHERE CODE IN (100,200);
I'm aware that I'll need to use an alias on TABLE1, but I don't know how to get the rest to work, particularly in an efficient way. There are 2 million rows right now, but there will be closer to 20 million once I start using production data.

You could try this:
select primary.* ,
(select min(objid)
from table1 secondary
where primary.objid < secondary.objid
and secondary.code in (300,400)
and primary.id = secondary.id
) objid2
from table1 primary
where primary.code in (100,200);

Ok, I've come up with:
select OBJID,
min(case when code in (300,400) then objid end)
over (partition by id order by objid
range between 1 following and unbounded following
) objid2,
CODE, ENTRY_TIME, INFO, ID, USER1
from table1;
So, you need a insert select the above query with a where objid2 is not null and code in (100,200);

ORDER BY subquery and ROWNUM goes against relational philosophy?

Oracle 's ROWNUM is applied before ORDER BY. In order to put ROWNUM according to a sorted column, the following subquery is proposed in all documentations and texts.
select *
from (
select *
from table
order by price
)
where rownum <= 7
That bugs me. As I understand, table input into FROM is relational, hence no order is stored, meaning the order in the subquery is not respected when seen by FROM.
I cannot remember the exact scenarios but this fact of "ORDER BY has no effect in the outer query" I have read more than once. Examples are in-line subqueries, subquery for INSERT, ORDER BY of PARTITION clause, etc. For example in
OVER (PARTITION BY name ORDER BY salary)
the salary order will not be respected in outer query, and if we want salary to be sorted at outer query output, another ORDER BY need to be added in the outer query.
Some insights from everyone on why the relational property is not respected here and order is stored in the subquery ?

The ORDER BY in this context is in effect Oracle's proprietary syntax for generating an "ordered" row number on a (logically) unordered set of rows. This is a poorly designed feature in my opinion but the equivalent ISO standard SQL ROW_NUMBER() function (also valid in Oracle) may make it clearer what is happening:
select *
from (
select ROW_NUMBER() OVER (ORDER BY price) rn, *
from table
) t
where rn <= 7;
In this example the ORDER BY goes where it more logically belongs: as part of the specification of a derived row number attribute. This is more powerful than Oracle's version because you can specify several different orderings defining different row numbers in the same result. The actual ordering of rows returned by this query is undefined. I believe that's also true in your Oracle-specific version of the query because no guarantee of ordering is made when you use ORDER BY in that way.
It's worth remembering that Oracle is not a Relational DBMS. In common with other SQL DBMSs Oracle departs from the relational model in some fundamental ways. Features like implicit ordering and DISTINCT exist in the product precisely because of the non-relational nature of the SQL model of data and the consequent need to work around keyless tables with duplicate rows.

Not surprisingly really, Oracle treats this as a bit of a special case. You can see that from the execution plan. With the naive (incorrect/indeterminate) version of the limit that crops up sometimes, you get SORT ORDER BY and COUNT STOPKEY operations:
select *
from my_table
where rownum <= 7
order by price;
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 3 (34)| 00:00:01 |
| 1 | SORT ORDER BY | | 1 | 13 | 3 (34)| 00:00:01 |
|* 2 | COUNT STOPKEY | | | | | |
| 3 | TABLE ACCESS FULL| MY_TABLE | 1 | 13 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(ROWNUM<=7)
If you just use an ordered subquery, with no limit, you only get the SORT ORDER BY operation:
select *
from (
select *
from my_table
order by price
);
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 3 (34)| 00:00:01 |
| 1 | SORT ORDER BY | | 1 | 13 | 3 (34)| 00:00:01 |
| 2 | TABLE ACCESS FULL| MY_TABLE | 1 | 13 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------
With the usual subquery/ROWNUM construct you get something different,
select *
from (
select *
from my_table
order by price
)
where rownum <= 7;
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 3 (34)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 1 | 13 | 3 (34)| 00:00:01 |
|* 3 | SORT ORDER BY STOPKEY| | 1 | 13 | 3 (34)| 00:00:01 |
| 4 | TABLE ACCESS FULL | MY_TABLE | 1 | 13 | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=7)
3 - filter(ROWNUM<=7)
The COUNT STOPKEY operation is still there for the outer query, but the inner query (inline view, or derived table) now has a SORT ORDER BY STOPKEY instead of the simple SORT ORDER BY. This is all hidden away in the internals so I'm speculating, but it looks like the stop key - i.e. the row number limit - is being pushed into the subquery processing, so in effect the subquery may only end up with seven rows anyway - though the plan's ROWS value doesn't reflect that (but then you get the same plan with a different limit), and it still feels the need to apply the COUNT STOPKEY operation separately.
Tom Kyte covered similar ground in an Oracle Magazine article, when talking about "Top- N Query Processing with ROWNUM" (emphasis added):
There are two ways to approach this:
- Have the client application run that query and fetch just the first N rows.
- Use that query as an inline view, and use ROWNUM to limit the results, as in SELECT * FROM ( your_query_here ) WHERE ROWNUM <= N.
The second approach is by far superior to the first, for two reasons. The lesser of the two reasons is that it requires less work by the client, because the database takes care of limiting the result set. The more important reason is the special processing the database can do to give you just the top N rows. Using the top- N query means that you have given the database extra information. You have told it, "I'm interested only in getting N rows; I'll never consider the rest." Now, that doesn't sound too earth-shattering until you think about sorting—how sorts work and what the server would need to do.
... and then goes on to outline what it's actually doing, rather more authoritatively than I can.
Interestingly I don't think the order of the final result set is actually guaranteed; it always seems to work, but arguably you should still have an ORDER BY on the outer query too to make it complete. It looks like the order isn't really stored in the subquery, it just happens to be produced like that. (I very much doubt that will ever change as it would break too many things; this ends up looking similar to a table collection expression which also always seems to retain its ordering - breaking that would stop dbms_xplan working though. I'm sure there are other examples.)
Just for comparison, this is what the ROW_NUMBER() equivalent does:
select *
from (
select ROW_NUMBER() OVER (ORDER BY price) rn, my_table.*
from my_table
) t
where rn <= 7;
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 52 | 4 (25)| 00:00:01 |
|* 1 | VIEW | | 2 | 52 | 4 (25)| 00:00:01 |
|* 2 | WINDOW SORT PUSHED RANK| | 2 | 26 | 4 (25)| 00:00:01 |
| 3 | TABLE ACCESS FULL | MY_TABLE | 2 | 26 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RN"<=7)
2 - filter(ROW_NUMBER() OVER ( ORDER BY "PRICE")<=7)

Adding to sqlvogel's good answer :
"As I understand, table input into FROM is relational"
No, table input into FROM is not relational. It is not relational because "table input" are tables and tables are not relations. The myriads of quirks and oddities in SQL eventually all boil down to that simple fact : the core building brick in SQL is the table, and a table is not a relation. To sum up the differences :
Tables can contain duplicate rows, relations cannot. (As a consequence, SQL offers bag algebra, not relational algebra. As another consequence, it is as good as impossible for SQL to even define equality comparison for its most basic building brick !!! How would you compare tables for equality given that you might have to deal with duplicate rows ?)
Tables can contain unnamed columns, relations cannot. SELECT X+Y FROM ... As a consequence, SQL is forced into "column identity by ordinal position", and as a consequence of that, you get all sorts of quirks, e.g. in SELECT A,B FROM ... UNION SELECT B,A FROM ...
Tables can contain duplicate column names, relations cannot. A.ID and B.ID in a table are not distinct column names. The part before the dot is not part of the name, it is a "scope identifier", and that scope identifier "disappears" once you're "outside the SELECT" it appears/is introduced in. You can verify this with a nested SELECT : SELECT A.ID FROM (SELECT A.ID, B.ID FROM ...). It won't work (unless your particular implementation departs from the standard in order to make it work).
Various SQL constructs leave people with the impression that tables do have an ordering to rows. The ORDER BY clause, obviously, but also the GROUP BY clause (which can be made to work only by introducing rather dodgy concepts of "intermediate tables with rows grouped together"). Relations simply are not like that.
Tables can contain NULLs, relations cannot. This one has been beaten to death.
There should be some more, but I don't remember them off the tip of the hat.

How to transpose/pivot data in hive?

I know there's no direct way to transpose data in hive. I followed this question: Is there a way to transpose data in Hive? , but as there is no final answer there, could not get all the way.
This is the table I have:
| ID | Code | Proc1 | Proc2 |
| 1 | A | p | e |
| 2 | B | q | f |
| 3 | B | p | f |
| 3 | B | q | h |
| 3 | B | r | j |
| 3 | C | t | k |
Here Proc1 can have any number of values. ID, Code & Proc1 together form a unique key for this table. I want to Pivot/ transpose this table so that each unique value in Proc1 becomes a new column, and corresponding value from Proc2 is the value in that column for the corresponding row. In essense, I'm trying to get something like:
| ID | Code | p | q | r | t |
| 1 | A | e | | | |
| 2 | B | | f | | |
| 3 | B | f | h | j | |
| 3 | C | | | | k |
In the new transformed table, ID and code are the only primary key. From the ticket I mentioned above, I could get this far using the to_map UDAF. (Disclaimer - this may not be a step in the right direction, but just mentioning here, if it is)
| ID | Code | Map_Aggregation |
| 1 | A | {p:e} |
| 2 | B | {q:f} |
| 3 | B | {p:f, q:h, r:j } |
| 3 | C | {t:k} |
But don't know how to get from this step to the pivot/transposed table I want.
Any help on how to proceed will be great!
Thanks.

Here is the approach i used to solved this problem using hive's internal UDF function, "map":
select
b.id,
b.code,
concat_ws('',b.p) as p,
concat_ws('',b.q) as q,
concat_ws('',b.r) as r,
concat_ws('',b.t) as t
from
(
select id, code,
collect_list(a.group_map['p']) as p,
collect_list(a.group_map['q']) as q,
collect_list(a.group_map['r']) as r,
collect_list(a.group_map['t']) as t
from (
select
id,
code,
map(proc1,proc2) as group_map
from
test_sample
) a
group by
a.id,
a.code
) b;
"concat_ws" and "map" are hive udf and "collect_list" is a hive udaf.

Here is the solution I ended up using:
add jar brickhouse-0.7.0-SNAPSHOT.jar;
CREATE TEMPORARY FUNCTION collect AS 'brickhouse.udf.collect.CollectUDAF';
select
id,
code,
group_map['p'] as p,
group_map['q'] as q,
group_map['r'] as r,
group_map['t'] as t
from ( select
id, code,
collect(proc1,proc2) as group_map
from test_sample
group by id, code
) gm;
The to_map UDF was used from the brickhouse repo: https://github.com/klout/brickhouse

Yet another solution.
Pivot using Hivemall to_map function.
SELECT
uid,
kv['c1'] AS c1,
kv['c2'] AS c2,
kv['c3'] AS c3
FROM (
SELECT uid, to_map(key, value) kv
FROM vtable
GROUP BY uid
) t
uid c1 c2 c3
101 11 12 13
102 21 22 23
Unpivot
SELECT t1.uid, t2.key, t2.value
FROM htable t1
LATERAL VIEW explode (map(
'c1', c1,
'c2', c2,
'c3', c3
)) t2 as key, value
uid key value
101 c1 11
101 c2 12
101 c3 13
102 c1 21
102 c2 22
102 c3 23

I have not written this code, but I think you can use some of the UDFs provided by klouts brickhouse: https://github.com/klout/brickhouse
Specifically, you could do something like use their collect as mentioned here: http://brickhouseconfessions.wordpress.com/2013/03/05/use-collect-to-avoid-the-self-join/
and then explode the arrays (they will be of differing length) using the methods detailed in this post http://brickhouseconfessions.wordpress.com/2013/03/07/exploding-multiple-arrays-at-the-same-time-with-numeric_ra

I have created one dummy table called hive using below query-
create table hive (id Int,Code String, Proc1 String, Proc2 String);
Loaded all the data in the table-
insert into hive values('1','A','p','e');
insert into hive values('2','B','q','f');
insert into hive values('3','B','p','f');
insert into hive values('3','B','q','h');
insert into hive values('3','B','r','j');
insert into hive values('3','C','t','k');
Now use the below query to achieve the output.
select id,code,
case when collect_list(p)[0] is null then '' else collect_list(p)[0] end as p,
case when collect_list(q)[0] is null then '' else collect_list(q)[0] end as q,
case when collect_list(r)[0] is null then '' else collect_list(r)[0] end as r,
case when collect_list(t)[0] is null then '' else collect_list(t)[0] end as t
from(
select id, code,
case when proc1 ='p' then proc2 end as p,
case when proc1 ='q' then proc2 end as q,
case when proc1 ='r' then proc2 end as r,
case when proc1 ='t' then proc2 end as t
from hive
) dummy group by id,code;

In case of numeric value you can use below hive query:
Sample data
ID cust_freq Var1 Var2 frequency
220444 1 16443 87128 72.10140547
312554 6 984 7339 0.342452643
220444 3 6201 87128 9.258396518
220444 6 47779 87128 2.831972441
312554 1 6055 7339 82.15209213
312554 3 12868 7339 4.478333954
220444 2 6705 87128 15.80822558
312554 2 37432 7339 13.02712127
select id, sum(a.group_map[1]) as One, sum(a.group_map[2]) as Two, sum(a.group_map[3]) as Three, sum(a.group_map[6]) as Six from
( select id,
map(cust_freq,frequency) as group_map
from table
) a group by a.id having id in
( '220444',
'312554');
ID one two three six
220444 72.10140547 15.80822558 9.258396518 2.831972441
312554 82.15209213 13.02712127 4.478333954 0.342452643
In above example I have't used any custom udf. It is only using in-built hive functions.
Note :For string value in key write the vale as sum(a.group_map['1']) as One.

For Unpivot, we can simply use below logic.
SELECT Cost.Code, Cost.Product, Cost.Size
, Cost.State_code, Cost.Promo_date, Cost.Cost, Sales.Price
FROM
(Select Code, Product, Size, State_code, Promo_date, Price as Cost
FROM Product
Where Description = 'Cost') Cost
JOIN
(Select Code, Product, Size, State_code, Promo_date, Price as Price
FROM Product
Where Description = 'Sales') Sales
on (Cost.Code = Sales.Code
and Cost.Promo_date = Sales.Promo_date);

Below is also a way for Pivot
SELECT TM1_Code, Product, Size, State_code, Description
, Promo_date
, Price
FROM (
SELECT TM1_Code, Product, Size, State_code, Description
, MAP('FY2018Jan', FY2018Jan, 'FY2018Feb', FY2018Feb, 'FY2018Mar', FY2018Mar, 'FY2018Apr', FY2018Apr
,'FY2018May', FY2018May, 'FY2018Jun', FY2018Jun, 'FY2018Jul', FY2018Jul, 'FY2018Aug', FY2018Aug
,'FY2018Sep', FY2018Sep, 'FY2018Oct', FY2018Oct, 'FY2018Nov', FY2018Nov, 'FY2018Dec', FY2018Dec) AS tmp_column
FROM CS_ME_Spirits_30012018) TmpTbl
LATERAL VIEW EXPLODE(tmp_column) exptbl AS Promo_date, Price;

You can use case statements and some help from collect_set to achieve this. You can check this out. You can check detail answer at - http://www.analyticshut.com/big-data/hive/pivot-rows-to-columns-in-hive/
Here is the query for reference,
SELECT resource_id,
CASE WHEN COLLECT_SET(quarter_1)[0] IS NULL THEN 0 ELSE COLLECT_SET(quarter_1)[0] END AS quarter_1_spends,
CASE WHEN COLLECT_SET(quarter_2)[0] IS NULL THEN 0 ELSE COLLECT_SET(quarter_2)[0] END AS quarter_2_spends,
CASE WHEN COLLECT_SET(quarter_3)[0] IS NULL THEN 0 ELSE COLLECT_SET(quarter_3)[0] END AS quarter_3_spends,
CASE WHEN COLLECT_SET(quarter_4)[0] IS NULL THEN 0 ELSE COLLECT_SET(quarter_4)[0] END AS quarter_4_spends
FROM (
SELECT resource_id,
CASE WHEN quarter='Q1' THEN amount END AS quarter_1,
CASE WHEN quarter='Q2' THEN amount END AS quarter_2,
CASE WHEN quarter='Q3' THEN amount END AS quarter_3,
CASE WHEN quarter='Q4' THEN amount END AS quarter_4
FROM billing_info)tbl1
GROUP BY resource_id;

Oracle Insert Into Child & Parent Tables

I have a table - let's call it MASTER - with a lot of rows in it. Now, I had to created another table called 'MASTER_DETAILS', which will be populated with data from another system. Suh data will be accessed via DB Link.
MASTER has a FK to MASTER_DETAIL (1 -> 1 Relationship).
I created a SQL to populate the MASTER_DETAILS table:
INSERT INTO MASTER_DETAILS(ID, DETAIL1, DETAILS2, BLAH)
WITH QUERY_FROM_EXTERNAL_SYSTEM AS (
SELECT IDENTIFIER,
FIELD1,
FIELD2,
FIELD3
FROM TABLE#DB_LINK
--- DOZENS OF INNERS AND OUTER JOINS HERE
) SELECT MASTER_DETAILS_SEQ.NEXTVAL,
QES.FIELD1,
QES.FIELD2,
QES.FIELD3
FROM MASTER M
INNER JOIN QUERY_FROM_EXTERNAL_SYSTEM QES ON QES.IDENTIFIER = M.ID
--- DOZENS OF JOINS HERE
Approach above works fine to insert all the values into the MASTER_DETAILS.
Problem is:
In the approach above, I cannot insert the value of MASTER_DETAILS_SEQ.CURRVAL into the MASTER table. So I create all the entries into the DETAILS table but I don't link them to the MASTER table.
Does anyone see a way out to this problem using only a INSERT statement? I wish I could avoid creating a complex script with LOOPS and everything to handle this problem.
Ideally I want to do something like this:
INSERT INTO MASTER_DETAILS(ID, DETAIL1, DETAILS2, BLAH) AND MASTER(MASTER_DETAILS_ID)
WITH QUERY_FROM_EXTERNAL_SYSTEM AS (
SELECT IDENTIFIER,
FIELD1,
FIELD2,
FIELD3
FROM TABLE#DB_LINK
--- DOZENS OF INNERS AND OUTER JOINS HERE
) SELECT MASTER_DETAILS_SEQ.NEXTVAL,
QES.FIELD1,
QES.FIELD2,
QES.FIELD3
FROM MASTER M
INNER JOIN QUERY_FROM_EXTERNAL_SYSTEM QES ON QES.IDENTIFIER = M.ID
--- DOZENS OF JOINS HERE,
SELECT MASTER_DETAILS_SEQ.CURRVAL FROM DUAL;
I know such approach does not work on Oracle - but I am showing this SQL to demonstrate what I want to do.
Thanks.

If there is really a 1-to-1 relationship between the two tables, then they could arguably be a single table. Presumably you have a reason to want to keep them separate. Perhaps the master is a vendor-supplied table you shouldn't touch and the detail is extra data; but then you're changing the master anyway by adding the foreign key field. Or perhaps the detail will be reloaded periodically and you don't want to update the master table; but then you have to update the foreign key field anyway. I'll assume you're required to have a separate table, for whatever reason.
If you put a foreign key on the master table that refers to the primary key on the detail table, you're are restricted to it only ever being a 1-to-1 relationship. If that really is the case then conceptually it shouldn't matter which way the relationship is built - which table has the primary key and which has the foreign key. And if it isn't then your model will break when your detail table (or the remote query) comes back with two rows related to the same master - even if you're sure that won't happen today, will it always be true? The pluralisation of the name master_details suggests that might be expected. Maybe. Having the relationship the other way would prevent that being an issue.
I'm guessing you decided to put the relationship that way round so you can join the tables using the detail's key:
select m.column, md.column
from master m
join master_details md on md.id = m.detail_id
... because you expect that to be the quickest way, since md.id will be indexed (implicitly, as a primary key). But you could achieve the same effect by adding the master ID to the details table as a foreign key:
select m.column, md.column
from master m
join master_details md on md.master_id = m.id
It is good practice to index foreign keys anyway, and as long as you have an index on master_details.master_id then the performance should be the same (more or less, other factors may come in to play but I'd expect this to generally be the case). This would also allow multiple detail records in the future, without needing to modify the schema.
So as a simple example, let's say you have a master table created and populated with some dummy data:
create table master(id number, data varchar2(10),
constraint pk_master primary key (id));
create sequence seq_master start with 42;
insert into master (id, data)
values (seq_master.nextval, 'Foo ' || seq_master.nextval);
insert into master (id, data)
values (seq_master.nextval, 'Foo ' || seq_master.nextval);
insert into master (id, data)
values (seq_master.nextval, 'Foo ' || seq_master.nextval);
select * from master;
ID DATA
---------- ----------
42 Foo 42
43 Foo 43
44 Foo 44
The changes you've proposed might look like this:
create table detail (id number, other_data varchar2(10),
constraint pk_detail primary key(id));
create sequence seq_detail;
alter table master add (detail_id number,
constraint fk_master_detail foreign key (detail_id)
references detail (id));
insert into detail (id, other_data)
select seq_detail.nextval, 'Foo ' || seq_detail.nextval
from master m
-- joins etc
;
... plus the update of the master's foreign key, which is what you're struggling with, so let's do that manually for now:
update master set detail_id = 1 where id = 42;
update master set detail_id = 2 where id = 43;
update master set detail_id = 3 where id = 44;
And then you'd query as:
select m.data, d.other_data
from master m
join detail d on d.id = m.detail_id
where m.id = 42;
DATA OTHER_DATA
---------- ----------
Foo 42 Bar 1
Plan hash value: 2192253142
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 22 | 2 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 22 | 2 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| MASTER | 1 | 13 | 1 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | PK_MASTER | 1 | | 0 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| DETAIL | 3 | 27 | 1 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | PK_DETAIL | 1 | | 0 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
If you swap the relationship around the changes become:
create table detail (id number, master_id, other_data varchar2(10),
constraint pk_detail primary key(id),
constraint fk_detail_master foreign key (master_id)
references master (id));
create index ix_detail_master_id on detail (master_id);
create sequence seq_detail;
insert into detail (id, master_id, other_data)
select seq_detail.nextval, m.id, 'Bar ' || seq_detail.nextval
from master m
-- joins etc.
;
No update of the master table is needed, and the query becomes:
select m.data, d.other_data
from master m
join detail d on d.master_id = m.id
where m.id = 42;
DATA OTHER_DATA
---------- ----------
Foo 42 Bar 1
Plan hash value: 4273661231
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 19 | 2 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 19 | 2 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| MASTER | 1 | 10 | 1 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | PK_MASTER | 1 | | 0 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| DETAIL | 1 | 9 | 1 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | IX_DETAIL_MASTER_ID | 1 | | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
The only real difference in the plan is that you now have a range scan instead of a unique scan; if you're really sure it's 1-to-1 you could make the index unique but there's not much benefit.
SQL Fiddle of this approach.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Left Outer Join via a link table, using min() to restrict join to one row - oracle

Related

Clickhouse running diff with grouping

Insert value based on min value greater than value in another row

ORDER BY subquery and ROWNUM goes against relational philosophy?

How to transpose/pivot data in hive?

Oracle Insert Into Child & Parent Tables

Categories

Resources