How to categorize to age ranges and find the top 3? - hadoop

Having a hive table with age column consisting of age of persons.
Have to count and display the top 3 age categories.
Ex: whether below 10, 10-15, 15-20, 20-25, 25-30, ...
Which age category appears more.
Please suggest me a query to do this.

select case
when age <= 10 then '0-10'
else concat_ws
(
'-'
,cast(floor(age/5)*5 as string)
,cast((floor(age/5)+1)*5 as string)
)
end as age_group
,count(*) as cnt
from mytable
group by 1
order by cnt desc
limit 3
;
You might need to set this parameter:
set hive.groupby.orderby.position.alias=true;
Demo
with mytable as
(
select floor(rand()*100) as age
from (select 1) x lateral view explode(split(space(100),' ')) pe
)
select case
when age <= 10 then '0-10'
else concat_ws('-',cast(floor(age/5)*5 as string),cast((floor(age/5)+1)*5 as string))
end as age_group
,count(*) as cnt
,sort_array(collect_list(age)) as age_list
from mytable
group by 1
order by cnt desc
;
+-----------+-----+------------------------------+
| age_group | cnt | age_list |
+-----------+-----+------------------------------+
| 0-10 | 9 | [0,0,1,3,3,6,8,9,10] |
| 25-30 | 9 | [26,26,28,28,28,28,29,29,29] |
| 55-60 | 8 | [55,55,56,57,57,57,58,58] |
| 35-40 | 7 | [35,35,36,36,37,38,39] |
| 80-85 | 7 | [80,80,81,82,82,82,84] |
| 30-35 | 6 | [31,32,32,32,33,34] |
| 70-75 | 6 | [70,70,71,71,72,73] |
| 65-70 | 6 | [65,67,67,68,68,69] |
| 50-55 | 6 | [51,53,53,53,53,54] |
| 45-50 | 5 | [45,45,48,48,49] |
| 85-90 | 5 | [85,86,87,87,89] |
| 75-80 | 5 | [76,77,78,79,79] |
| 20-25 | 5 | [20,20,21,22,22] |
| 15-20 | 5 | [17,17,17,18,19] |
| 10-15 | 4 | [11,12,12,14] |
| 95-100 | 4 | [95,95,96,99] |
| 40-45 | 3 | [41,44,44] |
| 90-95 | 1 | [93] |
+-----------+-----+------------------------------+

Related

How can I write a recursive query in Power Query to traverse up a tree

Newbie question: I have a table with ID, ParentID, and Type. I want to create two new columns (StrategyID, SubstrategyID) that contains the ID for the row if its Type = 'Strategy' or 'Substrategy'. Otherwise, I want to look at its parent row and return that ID if it matches the Types sought. If not, repeat and look at the parent of the parent, etc. I am not getting the syntax for functions in general and recursive functions in particular in PowerQuery.
I've looked at many examples and videos, and found some help, but not specifically for what I am trying to do.
------------------------------------------------------------
| Existing columns New Colums |
------------------------------------------------------------
| ID | ParentID | Type | StrategyID | SubstrategyID |
| 1 | 0 | Strategy | 1 | |
| 2 | 1 | Substrategy | 1 | 2 |
| 3 | 2 | Feature | 1 | 2 |
| 4 | 3 | Story | 1 | 2 |
| 5 | 3 | Story | 1 | 2 |
| 6 | 1 | Substrategy | 1 | 6 |
| 7 | 6 | Feature | 1 | 6 |
| 8 | 7 | Story | 1 | 6 |
| 9 | 7 | Story | 1 | 6 |
| 10 | 0 | Strategy | 10 | |
| 11 | 10 | Substrategy | 10 | 11 |
| 12 | 11 | Feature | 10 | 11 |
| 13 | 12 | Story | 10 | 11 |
| 14 | 12 | Story | 10 | 11 |
| 15 | 12 | Story | 10 | 11 |
| 16 | 10 | Substrategy | 10 | 16 |
| 17 | 16 | Feature | 10 | 16 |
| 18 | 17 | Story | 10 | 16 |
| 19 | 17 | Story | 10 | 16 |
------------------------------------------------------------
'''
Give this a try. Assumes source data in Table1 with 3 columns --"ID", "ParentID" and "Type"
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
ChangedType = Table.TransformColumnTypes(Source,{{"ID", type text}, {"ParentID", type text}}),
ID_List = List.Buffer( ChangedType[ID] ),
ParentID_List = List.Buffer( ChangedType[ParentID] ),
Type_List = List.Buffer( ChangedType[Type] ),
Highest = (n as text, searchfor as text) as text =>
let
Spot = List.PositionOf( ID_List, n ),
ThisType = Type_List{Spot},
Parent_ID = ParentID_List{Spot}
in if Parent_ID = null or ThisType=searchfor then ID_List{Spot} else #Highest(Parent_ID,searchfor),
FinalTable = Table.AddColumn( ChangedType, "StrategyID", each Highest( [ID],"Strategy" ), type text),
FinalTable2 = Table.AddColumn( FinalTable, "SubstrategyID", each Highest( [ID],"Substrategy" ), type text),
#"Replaced Errors" = Table.ReplaceErrorValues(FinalTable2, {{"SubstrategyID", null}})
in #"Replaced Errors"
I think you want to use PATH and PATHITEM.
Assuming your table is called 'Table'
create a new column:
Path = PATH(Table[ID],Table[ParentID])
Then:
StrategyID = PATHITEM(Table[Path],1,1)
SubstrategyID = PATHITEM(Table[Path],2,1)

How to get the max value for elements having the same id under Laravel

Using Laravel/Eloquent, I would like to retrieve the max value for each week_id in the following table.
+---------+-----------+
| week_id | value |
+---------+-----------+
| 5 | |
| 6 | 1 |
| 6 | |
| 6 | |
| 7 | 3 |
| 7 | 4 |
| 7 | |
+---------+-----------+
With MySql I would do it like this:
SELECT week_id, max(value) as max_value FROM foo_table GROUP BY week_id
=>
+---------+-----------+
| week_id | max_value |
+---------+-----------+
| 5 | |
| 6 | 1 |
| 7 | 4 |
+---------+-----------+
How could I achieve the same under Laravel?
Try this:
DB::table('foo_table')
->select('week_id', DB:raw('max(value) as max_value'))
->groupBy('week_id')
->get();

laravel eloquent relation with 3 table

Hi I want query from 3 table
user:
+----+-----------+
| id | name |
+----+-----------+
| 1 | Denny |
| 2 | Agus |
| 3 | Dini |
| 4 | Angel |
+----+-----------+
History_Education
+----+-----------+-------------+
| id | userId | educationId |
+----+-----------+-------------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 1 |
| 4 | 2 | 2 |
+----+-----------+-------------+
Education
+----+-----------+----------+
| id | level | Name |
+----+-----------+----------+
| 1 | 1 | SD |
| 2 | 2 | SMP |
| 3 | 3 | SMA |
| 4 | 4 | S1 |
+----+-----------+----------+
How to query with laravel Eloquent to get the latest user education order by Level DESC
expected:
+----+-----------+----------------------+
| id | Name | Latest_Education |
+----+-----------+----------------------+
| 1 | Denny | SMP |
| 2 | Agus | SMP |
| 3 | Dini | - |
| 4 | Angel | - |
+----+-----------+----------------------+
In normal query:
select id,name ,(select E.name from education E inner join History_eductaion HE on E.id = HE.education_id where HE.userId =U.id limit 1 order by E.level DESC )latest_education from USER U
How to translate to laravel eloquent?
The quick answer is use the query builder - your query needed to be altered slightly to match the table names and columns as listed in your question:
$result = DB::table('user')
->select([
'id',
'name',
DB::raw("(select E.name from Education E inner join History_Education HE on E.id = HE.educationId where HE.userId = user.id order by E.level DESC limit 1) as latest_education")
])
->get();

Indexes hints in a Subquery

I have a SQL statement that has performance issues.
Adding the following index and a SQL hint to use the index improves the performance 10 fold but I do not understand why.
BUS_ID is part of the primary key(T1.REF is the other part fo the key) and clustered index on the T1 table.
The T1 table has about 100,000 rows. BUS_ID has only 6 different values. Similarly the T1.STATUS column can only have a limited number of
possibilities and the majority of these(99%) will be the same value.
If I run the query without the hint(/*+ INDEX ( T1 T1_IDX1) NO_UNNEST */) it takes 5 seconds and with the hint it takes .5 seconds.
I don't understand how the index helps the subquery as T1.STATUS isn't used in any of the 'where' or 'join' clauses in the subquery.
What am I missing?
SELECT
/*+ NO_UNNEST */
t1.bus_id,
t1.ref,
t2.cust,
t3.cust_name,
t2.po_number,
t1.status_old,
t1.status,
t1.an_status
FROM t1
LEFT JOIN t2
ON t1.bus_id = t2.bus_id
AND t1.ref = t2.ref
JOIN t3
ON t3.cust = t2.cust
AND t3.bus_id = t2.bus_id
WHERE (
status IN ('A', 'B', 'C') AND status_old IN ('X', 'Y'))
AND EXISTS
( SELECT /*+ INDEX ( T1 T1_IDX1) NO_UNNEST */
*
FROM t1
WHERE ( EXISTS ( SELECT /*+ NO_UNNEST */
*
FROM t6
WHERE seq IN ( '0', '2' )
AND t1.bus_id = t6.bus_id)
OR (EXISTS
(SELECT /*+ NO_UNNEST */
*
FROM t6
WHERE seq = '1'
AND (an_status = 'Y'
OR
an_status = 'X')
AND t1.bus_id = t6.bus_id))
AND t2.ref = t1.ref))
AND USER IN ('FRED')
AND ( t2.status != '45'
AND t2.status != '20')
AND NOT EXISTS ( SELECT
/*+ NO_UNNEST */
*
FROM t4
WHERE EXISTS
(
SELECT
/*+ NO_UNNEST */
*
FROM t5
WHERE pd IN ( '1',
'0' )
AND appl = 'RYP'
AND appl_id IN ( 'RL100')
AND t4.id = t5.id)
AND t2.ref = p.ref
AND t2.bus_id = p.bus_id);
Edited to include Explain Plan and index.
Without Index hint
------------------------------------------------------|-------------------------------------
Operation | Options |Cost| # |Bytes | CPU Cost | IO COST
------------------------------------------------------|-------------------------------------
select statement | | 20 | 1 | 211 | 15534188 | 19 |
view | | 20 | 1 | 211 | 15534188 | 19 |
count | | | | | | |
view | | 20 | 1 | 198 | 15534188 | 19 |
sort | ORDER BY | 20 | 1 | 114 | 15534188 | 19 |
nested loops | | 7 | 1 | 114 | 62487 | 7 |
nested loops | | 7 | 1 | 114 | 62487 | 7 |
nested loops | | 6 | 1 | 84 | 53256 | 6 |
inlist iterator | | | | | | |
TABLE access t1 | INDEX ROWID | 4 | 1 | 29 | 36502 | 4 |
index-t1_idx#3 | RANGE SCAN | 3 | 1 | | 28686 | 3 |
TABLE access - t2 | INDEX ROWID | 2 | 1 | 55 | 16754 | 2 |
index t2_idx#0 | UNIQUE SCAN | 1 | 1 | | 9042 | 1 |
filter | | | | | | |
TABLE access-t1 | INDEX ROWID | 2 | 1 | 15 | 7433 | 2 |
TABLE access-t6 | INDEX ROWID | 3 | 1 | 4 | 23169 | 3 |
index-t6_idx#0 | UNIQUE RANGE SCAN | 1 | 3 | | 7721 | 1 |
filter | | | | | | |
TABLE access-t6 | INDEX ROWID | 2 | 2 | 8 | 15363 | 2 |
index-t6_idx#0 | UNIQUE RANGE SCAN | 1 | 3 | | 7521 | 1 |
index-t4_idx#1 | RANGE SCAN | 3 | 1 | 28 | 21584 | 3 |
inlist iterator | | | | | | |
index-t5_idx#1 | RANGE SCAN | 4 | 1 | 24 | 42929 | 4 |
index-t3_idx#0 | INDEX UNIQUE SCAN | 0 | 1 | | 1900 | 0 |
TABLE access-t3 | INDEX ROWID | 1 | 1 | 30 | 9231 | 1 |
--------------------------------------------------------------------------------------------
With Index hint
------------------------------------------------------|-------------------------------------
Operation | Options |Cost| # |Bytes | CPU Cost | IO COST
------------------------------------------------------|-------------------------------------
select statement | | 21 | 1 | 211 | 15549142 | 19 |
view | | 21 | 1 | 211 | 15549142 | 19 |
count | | | | | | |
view | | 21 | 1 | 198 | 15549142 | 19 |
sort | ORDER BY | 21 | 1 | 114 | 15549142 | 19 |
nested loops | | 7 | 1 | 114 | 62487 | 7 |
nested loops | | 7 | 1 | 114 | 62487 | 7 |
nested loops | | 6 | 1 | 84 | 53256 | 6 |
inlist iterator | | | | | | |
TABLE access t1 | INDEX ROWID | 4 | 1 | 29 | 36502 | 4 |
index-t1_idx#3 | RANGE SCAN | 3 | 1 | | 28686 | 3 |
TABLE access - t2 | INDEX ROWID | 2 | 1 | 55 | 16754 | 2 |
index t2_idx#0 | UNIQUE SCAN | 1 | 1 | | 9042 | 1 |
filter | | | | | | |
TABLE access-t1 | INDEX ROWID | 3 | 1 | 15 | 22387 | 2 |
index-t1_idx#1 | FULL SCAN | 2 |97k| | 14643 | |
TABLE access-t6 | INDEX ROWID | 3 | 1 | 4 | 23169 | 3 |
index-t6_idx#0 | UNIQUE RANGE SCAN | 1 | 3 | | 7721 | 1 |
filter | | | | | | |
TABLE access-t6 | INDEX ROWID | 2 | 2 | 8 | 15363 | 2 |
index-t6_idx#0 | UNIQUE RANGE SCAN | 1 | 3 | | 7521 | 1 |
index-t4_idx#1 | RANGE SCAN | 3 | 1 | 28 | 21584 | 3 |
inlist iterator | | | | | | |
index-t5_idx#1 | RANGE SCAN | 4 | 1 | 24 | 42929 | 4 |
index-t3_idx#0 | INDEX UNIQUE SCAN | 0 | 1 | | 1900 | 0 |
TABLE access-t3 | INDEX ROWID | 1 | 1 | 30 | 9231 | 1 |
--------------------------------------------------------------------------------------------
Table Index
CREATE INDEX T1_IDX#1 ON T1 (BUS_ID, STATUS)

MySQL equivalent of ORACLES rank()

Oracle has 2 functions - rank() and dense_rank() - which i've found very useful for some applications. I am doing something in mysql now and was wondering if they have something equivalent to those?
Nothing directly equivalent, but you can fake it with some (not terribly efficient) self-joins. Some sample code from a collection of MySQL query howtos:
SELECT v1.name, v1.votes, COUNT(v2.votes) AS Rank
FROM votes v1
JOIN votes v2 ON v1.votes < v2.votes OR (v1.votes=v2.votes and v1.name = v2.name)
GROUP BY v1.name, v1.votes
ORDER BY v1.votes DESC, v1.name DESC;
+-------+-------+------+
| name | votes | Rank |
+-------+-------+------+
| Green | 50 | 1 |
| Black | 40 | 2 |
| White | 20 | 3 |
| Brown | 20 | 3 |
| Jones | 15 | 5 |
| Smith | 10 | 6 |
+-------+-------+------+
how about this "dense_rank implement" in MySQL
CREATE TABLE `person` (
`id` int(11) DEFAULT NULL,
`first_name` varchar(20) DEFAULT NULL,
`age` int(11) DEFAULT NULL,
`gender` char(1) DEFAULT NULL);
INSERT INTO `person` VALUES
(1,'Bob',25,'M'),
(2,'Jane',20,'F'),
(3,'Jack',30,'M'),
(4,'Bill',32,'M'),
(5,'Nick',22,'M'),
(6,'Kathy',18,'F'),
(7,'Steve',36,'M'),
(8,'Anne',25,'F'),
(9,'Mike',25,'M');
the data before dense_rank() like this
mysql> select * from person;
+------+------------+------+--------+
| id | first_name | age | gender |
+------+------------+------+--------+
| 1 | Bob | 25 | M |
| 2 | Jane | 20 | F |
| 3 | Jack | 30 | M |
| 4 | Bill | 32 | M |
| 5 | Nick | 22 | M |
| 6 | Kathy | 18 | F |
| 7 | Steve | 36 | M |
| 8 | Anne | 25 | F |
| 9 | Mike | 25 | M |
+------+------------+------+--------+
9 rows in set (0.00 sec)
the data after dense_rank() like this,including "partition by" function
+------------+--------+------+------+
| first_name | gender | age | rank |
+------------+--------+------+------+
| Anne | F | 25 | 1 |
| Jane | F | 20 | 2 |
| Kathy | F | 18 | 3 |
| Steve | M | 36 | 1 |
| Bill | M | 32 | 2 |
| Jack | M | 30 | 3 |
| Mike | M | 25 | 4 |
| Bob | M | 25 | 4 |
| Nick | M | 22 | 6 |
+------------+--------+------+------+
9 rows in set (0.00 sec)
the query statement is
select first_name,t1.gender,age,FIND_IN_SET(age,t1.age_set) as rank from person t2,
(select gender,group_concat(age order by age desc) as age_set from person group by gender) t1
where t1.gender=t2.gender
order by t1.gender,rank

Resources