How to use 2 for loops in Hive - hadoop

How do I use 2 for loops in Hive?
I have input data as below:
1 a 3
15 b 4
1 b 2
25 a 5
15 c 3
1 a 3
15 c 2
25 b 4
Intermediate Output: For 1 count total no. of a and b, similar for 15 and 25
1 a 6
1 b 2
15 b 4
15 c 5
25 a 5
25 b 4
Final output: Need for 1 max count
1 a 6
15 c 5
25 a 5

You can use window functions and get the results.Check this out:
> select * from shailesh;
INFO : OK
+----------------+----------------+----------------+--+
| shailesh.col1 | shailesh.col2 | shailesh.col3 |
+----------------+----------------+----------------+--+
| 1 | a | 3 |
| 15 | b | 4 |
| 1 | b | 2 |
| 25 | a | 5 |
| 15 | c | 3 |
| 1 | a | 3 |
| 15 | c | 2 |
| 25 | b | 4 |
+----------------+----------------+----------------+--+
8 rows selected (0.359 seconds)
> create table shailesh2 as select col1, col2, max(col3s) col3s2 from (select col1,col2,sum(col3) over(partition by col1,col2) col3s from shailesh ) t group by col1, col2;
INFO : OK
+-----------------+-----------------+-------------------+--+
| shailesh2.col1 | shailesh2.col2 | shailesh2.col3s2 |
+-----------------+-----------------+-------------------+--+
| 1 | a | 6 |
| 1 | b | 2 |
| 15 | b | 4 |
| 15 | c | 5 |
| 25 | a | 5 |
| 25 | b | 4 |
+-----------------+-----------------+-------------------+--+
6 rows selected (0.36 seconds)
> select col1, col2, col3s2 from (select col1,col2,col3s2, rank() over(partition by col1 order by col3s2 desc) as rk from shailesh2) t2 where rk=1;
INFO : OK
+-------+-------+---------+--+
| col1 | col2 | col3s2 |
+-------+-------+---------+--+
| 1 | a | 6 |
| 15 | c | 5 |
| 25 | a | 5 |
+-------+-------+---------+--+
3 rows selected (37.224 seconds)

Related

Laravel leftJoin returns null from 2nd table

I have 2 table duty_sheets
centerId | centerName | p1 | p2 | p3 | p4 | ...p22 | examiId
1 | xyz | 1 | 5 | 8 | 7 | 1 | 1
2 | abc | 9 | 1 | 6 | 6 | 1 | 1
and feedback
id | centerId | inspectorId | A | B | C | examiId
1 | 1 | 1 | 1 | 5 | 8 | 1
2 | 2 | 9 | 9 | 1 | 6 | 1
here is my code
$center = DutySheet::select('duty_sheets.centerId', 'duty_sheets.centerName','feedback.id')
->leftJoin('feedback', function ($leftJoin) {
$leftJoin->on('duty_sheets.examId', 'feedback.examId')
->where("duty_sheets.centerId", 'feedback.centerId')
->where("feedback.inspectorId", 1);
})
->where("duty_sheets.examId", 1)
->where("p20", 1)
->get();
dd($center);
to retrieve "All rows from DutySheet where p20 = 1 and dutysheet.examId = 1, and relevant rows from feedback depend on centerId, inspectorId and examId.
The problem is that the query return feedback.id as null while the record exist in feedback table with the ids.
Laravel version = 9
The problem is in left Join
->where("duty_sheets.centerId", 'feedback.centerId')
This build a where against the value 'feedback.centerId'
duty_sheets.centerId='feedback.centerId'
You need use
->on("duty_sheets.centerId",'=', 'feedback.centerId')
Or
->whereColumn("duty_sheets.centerId", 'feedback.centerId')

How to calculate difference between 2 entries based on dates?

I have an Oracle DB View like:
DATE | PRODUCT_NUMBER | PRODUCT_COUNT | PRODUCT_FACTOR
2018-01-01 | 1 | 10 | 3
2018-03-15 | 1 | 8 | 3
2019-02-11 | 1 | 11 | 3
2019-08-01 | 1 | 5 | 3
2019-08-01 | 2 | 20 | 5
2019-08-02 | 2 | 15 | 5
2019-06-01 | 2 | 5 | 5
2020-07-01 | 2 | 30 | 5
2018-07-07 | 3 | 100 | 2
Where,
DATE is the date
NUMBER is a unique Product Number
COUNT is the number of items from the Product Number in the storage facility
FACTOR is the number of products that fit into a storage rack
I now need to know how much it changed since the last update for every Product Number.
Since the first entry has no past date to compare to, change is undefined and something like NULL, NONE, 0 or so. Doesn't matter as long as I can filter those out later.
Some products only have 1 entry, those should be ignored (nothing to calculate difference on).
End result should be:
DATE | PRODUCT_NUMBER | PRODUCT_COUNT | PRODUCT_FACTOR | PRODUCT_CHANGE | CHANGE_FACTOR
2018-01-01 | 1 | 10 | 3 | NULL | NULL
2018-03-15 | 1 | 8 | 3 | 2 # 10-8 | 6 # 2*3
2019-02-11 | 1 | 11 | 3 | -3 # 8-11 | -9 # 3*-3
2019-08-01 | 1 | 5 | 3 | 6 # 11-5 | 18 # 6*3
2019-08-01 | 2 | 20 | 5 | -15 # 5-20 | -75 # -15*5
2019-08-02 | 2 | 15 | 5 | 5 # 20-15 | 25 # 5*5
2019-06-01 | 2 | 5 | 5 | NULL | NULL
2020-07-01 | 2 | 30 | 5 | -15 # 15-30 | -75 # -15*5
How can I achieve this within Oracle SQL?
End result is a bit unclear:
Why for product_number 2 15 and 5 values are compared - 2019-06-01 is less than 2019-08-01 and should be first row
Why change_factor for product 1 on the first row is 3 and for product 2 it's null
Why change_factor for 2019-02-11 is calculated as 11 * 0 instead of 0 * 3
Assumming all of this as typos(changed 2019-06-01 to 2019-09-01) you can use something like below
select dt, product_number, product_count, product_factor, product_change, product_change*product_factor change_factor
from (
select "DATE" dt, product_number, product_count, product_factor,
greatest(lag(product_count) over(partition by product_number order by "DATE") - product_count, 0) product_change
from test_tab t1
where (select count(1) from test_tab t2 where t1.product_number = t2.product_number and rownum < 3) > 1
)
fiddle
See also LAG documentation

LINQ Code that counts employee gender in each position and group by department and place in a matrix table

I just want to ask on how to create an LINQ code that can fill up my html table.
Please look at my Tables below
Table EMP: note* my "Male" is boolean
+----+---------------+--------+--------+
| id | Male| JS_REF |DEPT_ID | POS_ID |
+----+---------------+--------+--------+
| 1 | 1 | 1 | 2 | 3 |
| 2 | 0 | 2 | 2 | 3 |
| 3 | 1 | 3 | 1 | 2 |
| 4 | 1 | 2 | 4 | 2 |
| 5 | 1 | 1 | 5 | 5 |
| 6 | 0 | 4 | 6 | 1 |
| 7 | 1 | 1 | 1 | 1 |
| 8 | 0 | 2 | 2 | 3 |
+----+---------------+--------+--------+
Table:JOB_STATUS
+----+--------------------+
| id | JS_REF| JS_TITLE |
+----+--------------------+
| 1 | 1 |Undefined |
| 2 | 2 |Regular |
| 3 | 3 |Contructual |
| 4 | 4 |Probationary|
+----+--------------------+
Table:DEPTS
+----+--------------------+
| id | DEPT_ID| DEPT_NAME |
+----+--------------------+
| 1 | 1 |Admin |
| 2 | 2 |Accounting |
| 3 | 3 |Eginnering |
| 4 | 4 |HR |
+----+--------------------+
Table: POSITIONS
+----+--------------------+
| id | POS_ID| DEPT_NAME |
+----+--------------------+
| 1 | 1 |Clerk |
| 2 | 2 |Accountant |
| 3 | 3 |Bookeeper |
| 4 | 4 |Assistant |
| 5 | 5 |Mechanic |
| 6 | 6 |Staff |
+----+--------------------+
I'd made a static table on what will be the outcome of the LINQ code
Here's the picture:
Here's what i've tried so far:
SELECT tb.DEPT_NAME,TB.JS_TITLE, TB.Male, TB.Female, (TB.Male + TB.Female) AS 'Total Employees' FROM
(
SELECT JS_TITLE,DEPT_NAME,
SUM(CASE WHEN MALE = 1 THEN 1 ELSE 0 END) AS Male,
SUM(CASE WHEN MALE = 0 THEN 1 ELSE 0 END) AS Female
FROM EMP
left join JOB_STATUS on JOB_STATUS.JS_REF = EMP.JS_REF
left join DEPTS on DEPTS.DEPT_ID = EMP.DEPT_ID
GROUP BY JS_TITLE,DEPT_NAME
) AS TB
ORDER BY CASE WHEN TB.MALE IS NULL THEN 1 ELSE 0 END
If anyone can help me or give me some tips on how can I implement this im stuck in this part.
101 is total count for male, 23 for female. (the values are just copy and pasted, that's why the values are the same)
(Actual data result)

Repeat values with in the GROUP in HIVE

I am trying to repeat a row value in the subsequent rows with in GROUP. A Group can have one or more TAG. The requirement is to populate NEW_TAG in the row where the TAG is populated and in the subsequent rows until another TAG populated with in the same group or we reach end of that GROUP.
Current Table Required Table
GROUPID SEQ TAG GROUPID SEQ TAG NEW_TAG
------- --- ---- ------- --- --- --------
1 1 1 1
1 2 1 2
1 3 1 3
1 4 4 1 4 4 4
1 5 1 5 4
1 6 1 6 4
1 7 1 7 4
1 8 1 8 4
2 1 2 1
2 2 2 2
2 3 2 3
2 4 2 4
2 5 5 2 5 5 5
2 6 2 6 5
2 7 2 7 5
2 8 2 8 5
2 9 9 2 9 9 9
2 10 2 10 9
2 11 2 11 9
Thanks
Assuming TAG is always increasing
max(TAG) over
(
partition by GROUPID
order by SEQ
rows between unbounded preceding
and current row
) as NEW_TAG
select *
,max(TAG) over
(
partition by GROUPID
order by SEQ
rows between unbounded preceding
and current row
) as NEW_TAG
from mytable
;
+---------+--------+--------+---------+
| groupid | seq | tag | new_tag |
+---------+--------+--------+---------+
| 1 | 1 | | |
| 1 | 2 | | |
| 1 | 3 | | |
| 1 | 4 | 4 | 4 |
| 1 | 5 | | 4 |
| 1 | 6 | | 4 |
| 1 | 7 | | 4 |
| 1 | 8 | | 4 |
| 2 | 1 | | |
| 2 | 2 | | |
| 2 | 3 | | |
| 2 | 4 | | |
| 2 | 5 | 5 | 5 |
| 2 | 6 | | 5 |
| 2 | 7 | | 5 |
| 2 | 8 | | 5 |
| 2 | 9 | 9 | 9 |
| 2 | 10 | | 9 |
| 2 | 11 | | 9 |
+---------+--------+--------+---------+

Sort values in an associative array pl/sql

If ID is even I must sort the values that correspond to that ID DESC , if the ID is odd I must sort the values ASC. This is the table called Grades.
ID|COL1|COL2|COL3|COL4|COL5|COL6|COL7|
1 | 6 | 3 | 8 | 4 | 7 | 8 | 4 |
2 | 5 | 7 | 9 | 2 | 1 | 7 | 8 |
3 | 2 | 7 | 4 | 8 | 1 | 5 | 9 |
4 | 8 | 4 | 7 | 9 | 4 | 1 | 4 |
5 | 7 | 5 | 2 | 5 | 2 | 6 | 4 |
The result must be this:
ID|COL1|COL2|COL3|COL4|COL5|COL6|COL7|
1 | 3 | 4 | 4 | 6 | 7 | 8 | 8 |
2 | 9 | 8 | 7 | 7 | 5 | 2 | 1 |
3 | 1 | 2 | 4 | 5 | 7 | 8 | 9 |
4 | 9 | 8 | 7 | 4 | 4 | 4 | 1 |
5 | 2 | 2 | 4 | 5 | 5 | 6 | 7 |
As you can see ID=1->odd number so the values must be sorted ASC
This is the code so far:
declare
type grades_array is table of grades%rowtype index by pls_integer;
grades_a grades_array;
cnt number;
begin
Select count(id) into cnt from grades;
For i in 1..cnt loop
--I used an associative array
Select * into grades_a(i) from grades where grades.id=i;
end loop;
For i in grades_a.FIRST..grades_a.LAST loop
if (mod(grades_a(i).id,2)=1)then .......
--I don't know how to sort the specific rows, in this case ASC
--dbms_output.put_line(grades_a(i).col1);
end if;
end loop;
--Also it is specified in the exercise that the table can change, e.g add more columns
end;
I would simply use PIVOT/UNPIVOT for this.
First UNPIVOT the table and assign a rank to each column value in ascending/descending order.
SQL Fiddle
Query 1:
SELECT id,
colval,
ROW_NUMBER () OVER (
PARTITION BY id
ORDER BY CASE MOD (id, 2) WHEN 1 THEN colval END,
CASE MOD (id, 2) WHEN 0 THEN colval END DESC) r
FROM x UNPIVOT (colval FOR colname
IN (col1 AS 'col1', col2 AS 'col2', col3 AS 'col3', col4 AS 'col4',
col5 AS 'col5', col6 AS 'col6', col7 AS 'col7')
)
Results:
| ID | COLVAL | R |
|----|--------|---|
| 1 | 3 | 1 |
| 1 | 4 | 2 |
| 1 | 4 | 3 |
| 1 | 6 | 4 |
| 1 | 7 | 5 |
| 1 | 8 | 6 |
| 1 | 8 | 7 |
| 2 | 9 | 1 |
| 2 | 8 | 2 |
| 2 | 7 | 3 |
| 2 | 7 | 4 |
| 2 | 5 | 5 |
| 2 | 2 | 6 |
| 2 | 1 | 7 |
| 3 | 1 | 1 |
| 3 | 2 | 2 |
| 3 | 4 | 3 |
| 3 | 5 | 4 |
| 3 | 7 | 5 |
| 3 | 8 | 6 |
| 3 | 9 | 7 |
| 4 | 9 | 1 |
| 4 | 8 | 2 |
| 4 | 7 | 3 |
| 4 | 4 | 4 |
| 4 | 4 | 5 |
| 4 | 4 | 6 |
| 4 | 1 | 7 |
| 5 | 2 | 1 |
| 5 | 2 | 2 |
| 5 | 4 | 3 |
| 5 | 5 | 4 |
| 5 | 5 | 5 |
| 5 | 6 | 6 |
| 5 | 7 | 7 |
Then PIVOT the result based on the rank.
Query 2:
WITH pivoted AS (
SELECT id,
colval,
ROW_NUMBER () OVER (
PARTITION BY id
ORDER BY CASE MOD (id, 2) WHEN 1 THEN colval END,
CASE MOD (id, 2) WHEN 0 THEN colval END DESC) r
FROM x UNPIVOT (colval FOR colname
IN (col1 AS 'col1', col2 AS 'col2', col3 AS 'col3', col4 AS 'col4',
col5 AS 'col5', col6 AS 'col6', col7 AS 'col7')
)
)
SELECT * FROM pivoted
PIVOT (MAX (colval)
FOR r
IN (1 AS col1, 2 AS col2, 3 AS col3, 4 AS col4,
5 AS col5, 6 AS col6, 7 AS col7))
Results:
| ID | COL1 | COL2 | COL3 | COL4 | COL5 | COL6 | COL7 |
|----|------|------|------|------|------|------|------|
| 1 | 3 | 4 | 4 | 6 | 7 | 8 | 8 |
| 2 | 9 | 8 | 7 | 7 | 5 | 2 | 1 |
| 3 | 1 | 2 | 4 | 5 | 7 | 8 | 9 |
| 4 | 9 | 8 | 7 | 4 | 4 | 4 | 1 |
| 5 | 2 | 2 | 4 | 5 | 5 | 6 | 7 |

Resources