Join two tables in HIVE with sub query - hadoop

I need to get the cost of an item at a certain date and time. I have these two tables:
create table sales ( product_id int, items_sold int, date_loaded date );
create table product ( product_id int, description string, item_cost double, date_loaded date );
The product table is a history of each item. If the cost of an item today is $1.00 but the cost of that item yesterday was $0.99 I would have two records one for each day. When I load my sales data I need to reflect the cost of the item yesterday and not today's cost.
Here is the query I am trying to execute:
SELECT s.product_id, s.items_sold, p.description, s.items_sold * p.item_cost as total_cost FROM sales s, product p
WHERE
p.product_id = s.product_id and
p.date_loaded <= (
SELECT MAX(pp.date_loaded)
FROM product pp
WHERE
pp.product_id = s.product_id and
pp.date_loaded <= s.date_loaded
)
SALES TABLE:
|PRODUCT_ID |ITEMS_SOLD |DATE_LOADED |
|1 |4 |2016-06-30 |
|1 |5 |2016-07-01 |
|1 |6 |2016-07-02 |
|1 |3 |2016-07-03 |
PRODUCT TABLE:
|PRODUCT_ID |DESCRIPTION |ITEM_COST |DATE_LOADED |
|1 |ITEM A |0.99 |2016-06-20 |
|1 |ITEM A |1.00 |2016-07-02 |
I would expect to see this result:
|PRODUCT_ID |ITEMS_SOLD |DESCRIPTION |ITEM_COST |TOTAL_COST |
|1 |4 |ITEM A |0.99 |3.96 |
|1 |5 |ITEM A |0.99 |4.95 |
|1 |6 |ITEM A |1.00 |6.00 |
|1 |3 |ITEM A |1.00 |3.00 |
From everything I have read this form of a sub query is not allowed. So how can I accomplish this in HIVE?

It can be accomplished with CTE and Lag widow function
With result as(select PRODUCT_ID, DESCRIPTION, ITEM_COST , DATE_LOADED ,
LEAD(DATE_LOADED, 1,'2999-01-01')
OVER (ORDER BY DATE_LOADED) AS fromdate from PRODUCT )
SELECT s.product_id, s.items_sold, p.description, s.items_sold * p.item_cost
as total_cost FROM sales s join result p on s.product_id = p.product_id
where s.DATE_LOADED >= p.DATE_LOADED and s.DATE_LOADED < p.fromdate ;

Related

ORACLE Query Get Last ID Using MIN Based On Quantity Consumed By ID

I have Incoming Stock transaction data using Oracle:
ID | DESCRIPTION | PART_NO | QUANTITY | DATEADDED
TR5 | FG | P0025 | 5 | 06-SEP-2017 08:20:33 <-- just now added
TR4 | Test | TEST1 | 8 | 05-SEP-2017 15:11:15
TR3 | FG | GSDFGSG | 10 | 31-AUG-2017 16:26:04
TR2 | FG | GSDFGSG | 2 | 31-AUG-2017 16:05:39
TR1 | FG | GSDFGSG | 2 | 30-AUG-2017 16:30:16
And now I'm grouping that data to be:
TR_ID | PART_NO | TOTAL
TR1 | GSDFGSG | 14
TR4 | TEST1 | 8
TR5 | P0025 | 5 <-- just now added
Query Code:
SELECT MIN(TRANSACTION_EQUIPMENTID) as TR_ID,
PART_NO,
SUM(T.QUANTITY) AS TOTAL
FROM WA_II_TBL_TR_EQUIPMENT T
GROUP BY T.PART_NO
As you can see on that data and query code, I'm show TR_ID using MIN to get first ID on first transaction.
And now I have Outgoing transaction data:
Assume I try to get quantity 8
ID_FK | QUANTITY
TR1 | 8
And now I want to get last ID due to quantity 8 has been consumed
ID | DESCRIPTION | PART_NO | QUANTITY
TR3| FG | GSDFGSG | 10 <-- CONSUMED 4+2+2, TOTAL 8
TR2| FG | GSDFGSG | 2 <-- CONSUMED 2+2, TOTAL 4
TR1| FG | GSDFGSG | 2 <-- CONSUMED 2
As you can see above, TR1, TR2 has been consumed. Now I want the query
SELECT MIN(TRANSACTION_EQUIPMENTID) as TR_ID,
PART_NO,
SUM(T.QUANTITY) AS TOTAL
FROM WA_II_TBL_TR_EQUIPMENT T
GROUP BY T.PART_NO
get the last id is : TR3, due to TR1 & TR2 has been consumed.
How to do that in query?
Take minimum id where growing sum is greater than 8. Use analytic sum():
select min(id) id
from (select t.*,
sum(quantity) over (partition by part_no order by id) sq
from t
where part_no = 'GSDFGSG'
)
where sq >= 8
Test data, output:
create table t(ID varchar2(3), DESCRIPTION varchar2(5),
PART_NO varchar2(8), QUANTITY number(5), DATEADDED date);
insert into t values ('TR4', 'Test', 'TEST1', 8, timestamp '2017-09-05 15:11:15');
insert into t values ('TR3', 'FG', 'GSDFGSG', 10, timestamp '2017-08-31 16:26:04');
insert into t values ('TR2', 'FG', 'GSDFGSG', 2, timestamp '2017-08-31 16:05:39');
insert into t values ('TR1', 'FG', 'GSDFGSG', 2, timestamp '2017-08-30 16:30:16');
insert into t values ('TR5', 'FG', 'GSDFGSG', 3, timestamp '2017-08-31 17:00:00');
Edit:
Add part_no and total columns and group by clause:
select min(id) id, part_no, min(sq) total
from (select t.*,
sum(quantity) over (partition by part_no order by id) sq
from t
where part_no = 'GSDFGSG'
)
where sq >= 8
group by part_no
ID PART_NO TOTAL
--- -------- ----------
TR3 GSDFGSG 14

Using partition by to get count in Oracle

I have a EMP table. I need to get number of employees in each department grouped by country name = 'INDIA','USA', 'AUSTRALIA'.
For example,
DEPARTMENT | #EMPLOYEE(INDIA) | #EMPLOYEE(USA) | # EMPLOYEE(AUSTRALIA)
ACCOUNTING | 5 |2 | 3
IT | 5 |2 | 1
BUSINESS | 1 |4 | 3
I need to use Partition BY to do it. I am able to use PARTITION by to get the total count of employees for each department. But I am not able to subgroup by country name.
Please give me suggestions.
Thank you.
Consider conditional count.
SELECT DEPARTMENT,
COUNT(CASE WHEN Country = 'INDIA' THEN 1 END) as emp_india,
COUNT(CASE WHEN Country = 'USA' THEN 1 END) as emp_usa,
COUNT(CASE WHEN Country = 'AUSTRALIA' THEN 1 END) as emp_australia
GROUP BY DEPARTMENT

How to select duplicate rows and group by two columns

I have the following table:
ID|user_id|group_id|subject |book_id
1| 2 |3 |history |1
2| 4 |3 |history |1
3| 5 |3 |art |2
4| 2 |3 |art |2
5| 1 |4 |sport |5
I would like to list all rows for group 3(id) that have duplicate rows with the same subject_id and book_id. The subject and book_id is what would determine the 2 or more rows to be duplicate.
I would like my distinct results to look like this:
|subject |book_id|
|history |1 |
|art |2 |
Using either query builder or eloquent
A SQL query to get the desired result may look
SELECT subject, book_id
FROM table1
WHERE group_id = 3
GROUP BY subject, book_id
HAVING COUNT(*) > 1
Here is a SQLFiddle demo
Now the same using the Laravel Query Builder
$duplicates = DB::table('table1')
->select('subject', 'book_id')
->where('group_id', 3)
->groupBy('subject', 'book_id')
->havingRaw('COUNT(*) > 1')
->get();

Recursive sum of values in an hierarchical table in Oracle 10g

Assuming I have this table:
CREATE TABLE MY_EXAMPLE ( ID NUMBER , PARENT NUMBER , VALUE NUMBER );
Insert into MY_EXAMPLE (ID,PARENT,VALUE) values (1,null,100);
Insert into MY_EXAMPLE (ID,PARENT,VALUE) values (2,1,50);
Insert into MY_EXAMPLE (ID,PARENT,VALUE) values (3,null,0);
Insert into MY_EXAMPLE (ID,PARENT,VALUE) values (4,2,1000);
Insert into MY_EXAMPLE (ID,PARENT,VALUE) values (5,1,1);
|id |parent |value |
|1 |null |100 |
|2 |1 |50 |
|3 |null |0 |
|4 |2 |1000 |
|5 |1 |1 |
I need to create a view (which should perform well) with the same number of rows but giving the row's plus the children's value summed. Many levels are possible as well as many children.
|id |parent |value |
|1 |null |1151 | (sum of 1 + 2 + 4 + 5)
|2 |1 |1050 | (sum of 2 + 4)
|3 |null |0 | (only 3 because has no children)
|4 |2 |1000 | (only 4 because has no children)
|5 |1 |1 | (only 5 because has no children)
ps.: I tried something like this but it didn't work in Oracle 10g first because the keyword RECURSIVE is not supported and second because it won't allow recursive WITH ("forward or recursive reference of a query name in WITH clause is not allowed").
Also I couldn't figure out a way to do it with CONNECT BY that includes the id and parent columns and gives me the whole table (in my attempts I always had to use START WITH).
You will have to create a recursive function:
CREATE FUNCTION RECURSIVE_ADD(
ROOT_ID IN NUMBER)
RETURN NUMBER
IS
TOTAL NUMBER;
BEGIN
SELECT SUM(VALUE)
INTO TOTAL
FROM (
(
SELECT VALUE FROM MY_EXAMPLE WHERE ID = ROOT_ID
)
UNION
(
SELECT recursive_add(id) FROM my_example WHERE parent = root_id
));
RETURN total;
END;
select id, parent, value, RECURSIVE_ADD(id) from my_example;
Make sure you don't have a cycle in your data (for example, if you set the parent of 1 to 2) otherwise this will never terminate. There are other ways to do this in newer versions of Oracle, but this will work in 10g.

Complex SQL query to join two tables

Problem:
Given two tables: TableA, TableB, where TableA has a one-to-many relationship with TableB, I want to retrieve all records in TableB for where the search criteria matches a certain column in TableB and return NULL for the unique TableA records for the same attribute.
Table Structures:
Table A
ID(Primary Key) | Name | City
1 | ABX | San Francisco
2 | ASDF | Oakland
3 | FDFD | New York
4 | GFGF | Austin
5 | GFFFF | San Francisco
Table B
ATTR_ID |Attr_Type | Attr_Name | Attr_Value
1 | TableA | Attr_1 | Attr_Value_1
2 | TableD | Attr_1 | Attr_Value_2
1 | TableA | Attr_2 | Attr_Value_3
3 | TableA | Attr_4 | Attr_Value_4
9 | TableC | Attr_2 | Attr_Value_5
Table B holds attribtue names and values and is a common table used across multiple tables. Each table is identified by Attr_Type and ATTR_ID (which maps to the IDs of different tables).
For instance, the record in Table A with ID 1 has two attributes in Table B with Attr_Names: Attr_1 and Attr_2 and so on.
Expected Output
ID | Name | City | TableB.Attr_Value
1 | ABX | San Francisco | Attr_Value_1
2 | ASDF | Oakland | Attr_Value_2
3 | FDFD | New York | NULL
4 | GFGF | Austin | NULL
5 | GFFFF | San Francisco | NULL
Search Criteria:
Get rows from Table B for each record in Table A with ATTR_NAME Attr_1. If a particular TableA record doesn't have Attr_1, return null.
My Query
select id, name, city,
b.attr_value from table_A
join table_B b on
table_A.id =b.attr_id and b.attr_name='Attr_1'
This is a strange data structure. You need a left outer join with the conditions in the on clause:
select a.id, a.name, a.city, b.attr_value
from table_A a left join
table_B b
on a.id = b.attr_id and b.attr_name = 'Attr_1' and b.attr_type = 'TableA';
I added the attr_type condition, because that seems logic with this data structure.
I dont have an sql server to test the command, but what you want is an inner/outer join query. You could do something like this
select id, name, city,
b.attr_value from table_A
join table_B b on
table_A.id *= b.attr_id and b.attr_name *= 'Attr_1'
Something like this should do the trick for you

Resources