Computing lag in Hive by a variable - hadoop

My input table looks like:
guest_id days
101 79
101 70
101 68
101 61
102 101
102 90
102 55
103 99
103 90
Note that, days are in descending order,by guest_id
Desired output table:
guest_id days days_diff
101 79 0
101 70 9
101 68 2
101 61 7
102 101 0
102 90 11
102 55 35
103 99 0
103 90 9
days_diff is the first order difference by guest_id (not throughout days column)

You need to have a unique id column as well (otherwise Hive doesn't know about the order of your rows).
Then you can just self join on id=id+1 to get your differences:
select a.guest_id,
a.days,
case when a.guest_id = b.guest_id then b.days-a.days else 0 end days_diff
from
input a
join input b on a.id=b.id-1
Edit: As pointed out by Kunal in the comments, Hive does have a Lag window function which requires a PARTITION BY ... ORDER BY clause; you still need something to order your table by, for example if you have a date column you would used this like the following:
SELECT guest_id,
days,
LAG(days, 1, 0) OVER (PARTITION BY guest_id ORDER BY date)
FROM input;

Related

pl sql how can i fetch records with max values from another column by each room number

table
room-number
entry-number
electricity
n100
5
100
n100
4
90
n200
2
75
n200
1
69
n300
6
150
n300
5
111
result should be
room-number
electricity
n100
100
n200
75
n300
150
I'm not sure because I haven't tried it but
try this.
SELECT room-number, MAX(electricity) FROM table group by room-number

Retrieve the list of data from database using hibernate criteria

I have a table called employee_comp_field, where salary fields are available
comp_field id | year_id | compensation_field
1 101 salary
2 101 bonus
3 101 pf
4 101 allowance
5 102 salary
6 102 bonus
7 102 pf
8 102 allowance
Then I have another table where employee salary data get stored emp_compensation against each field. As you can see emp_id 10 has three set of records as he got three time salary hike in the same year(year_id=101), which can be identified by salary_order field.
id | year_id | emp_id | comp_field_id | amount | comp_order
1 101 10 1 10000 1
2 101 10 2 1000 1
3 101 10 3 1000 1
4 101 10 4 100 1
5 101 10 1 12000 2
6 101 10 2 100 2
7 101 10 3 10000 2
8 101 10 4 10000 2
9 101 10 1 15000 3
10 101 10 2 500 3
11 101 10 3 150 3
12 101 10 4 1500 3
13 101 11 1 13000 1
14 101 11 2 1300 1
15 101 11 3 null 1
16 101 11 4 150 1
I want to identify all the employees list with max salary_order
my desire output will be below:
id | year_id | emp_id | comp_field_id | amount | comp_order
9 101 10 1 15000 3
10 101 10 2 500 3
11 101 10 3 150 3
12 101 10 4 1500 3
13 101 11 1 13000 1
14 101 11 2 1300 1
15 101 11 3 null 1
16 101 11 4 150 1
as emp_id 10 got three time salary hike...so I retrieve the list of records with salary_order 3
and emp_id 11 got one ony so I retrieve that set of records ony with salary_order 1
Can someone please help me here, how to retrieve my desire output using hibernate criteria.
My thought is to first retrieve all the list based on emp_id and then using java stream if we can filter it out to get the desired output.
Please suggest the best possible way.
The best possible way.
is subjective. It can be the fastest, it can be the shortest. It could be anything.
I will give you an example of how you could build a query in mysql to replicate your output. This might be tricky to solve with Criteria though since the table is being self joined.
select a.*
from emp_compensation a
left outer join emp_compensation b on a.emp_id = b.emp_id
and a.comp_field_id = b.comp_field_id
and a.comp_order < b.comp_order
where b.emp_id is null

Query table and group results into ranges in Laravel

I have a database table that stores multiple records of survey scores, the scores are between 1-100. I'm trying to present a frequency distribution on the apps front end, by grouping the scores into the following range;
Less than 20
20-30
30-40
40-50
50-60
60-70
70-80
80-90
90-100
So if the table had the data 87, 92, 95, 98, the user would see
80 - 90 (1)
90 - 100 (3)
etc. I think collections are the way to go about it, but I don't know where to start to get this sort of output, or whether it's even possible in Laravel?
Yes, it's possible. I believe this is the SQL query that you need (assume your table name is "scores", and "score" is the appropriate field):
select (case when score between 0 and 20 then 'Less than 20'
when score between 21 and 30 then 'Between 21 and 30'
when score between 31 and 40 then 'Between 31 and 40'
when score between 41 and 50 then 'Between 41 and 50'
when score between 51 and 60 then 'Between 51 and 60'
when score between 61 and 70 then 'Between 61 and 70'
when score between 71 and 80 then 'Between 71 and 80'
when score between 81 and 90 then 'Between 81 and 90'
when score between 91 and 100 then 'Between 91 and 100'
end) as score_range, count(*) as count
from scores
group by score_range
order by min(score);
So for Laravel it could work like this:
$frequency = DB::select("SELECT (CASE
WHEN score BETWEEN 0 AND 20 THEN 'Less than 20'
WHEN score BETWEEN 21 AND 30 THEN '20-30'
WHEN score BETWEEN 31 AND 40 THEN '30-40'
WHEN score BETWEEN 41 AND 50 THEN '40-50'
WHEN score BETWEEN 51 AND 60 THEN '50-60'
WHEN score BETWEEN 61 AND 70 THEN '60-70'
WHEN score BETWEEN 71 AND 80 THEN '70-80'
WHEN score BETWEEN 81 AND 90 THEN '80-90'
WHEN score BETWEEN 91 AND 100 THEN '90-100'
END) AS score_range, COUNT(*) as count
FROM scores
GROUP BY score_range
ORDER BY MIN(score);");
You can just edit the text titles.
In this query "40-50" (for example) it means, that the score is between 41 and 50. Also you can replace "ORDER BY MIN(score)" to "ORDER BY count" if you want.

Is it possible to find maximum value of 2 or more column in a table?

for example : I have a table as follows
id math science english history
1 80 90 90 90
2 70 60 81 78
3 69 50 45 80
4 30 40 10 80
i only want to find the maximum value in column math and science.
Is it possible?
Simply use this :
select max(science),max(math) from your_table

Oracle LEAD & LAG analytics functions

I have a temp table using to test and need direction with some analytics function. Still trying to figure out my real solution.. and any help to lead me in right direction will be appreciated.
A1 B1
40 5
50 4
60 3
70 2
90 1
Tyring to find the previous value and subtract and add the column
SELECT A1, B1,
(A1-B1) AS C1,
(A1-B1) + LEAD((A1-B1),1,0) OVER (ORDER BY ROWNUM) AS G1
FROM TEST;
The output is not what I expect
A1 B1 C1
40 5 35
50 4 46
60 3 57
70 2 68
90 1 89
From last rows (5th row), first subtract A1 -B2 to get C1..then (C1+ previous A1) - previous row B1 that is ---> 89 + 70 - 2 = 157 (save results in C1 previous row)
4th row: 157+60 -3 = 214
repeat until the first row...
Expected final output should be ;--
A1 B1 C1
40 5 295
50 4 260
60 3 214
70 2 157
90 1 89
LAG and LEAD only get a single row's value not an aggregation of multiple rows and it is not applied recursively.
You want:
SELECT A1,
B1,
SUM( A1 - B1 ) OVER ( ORDER BY ROWNUM
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) AS C1
FROM test;

Resources