Unable to get a expected output using hive aggregate function - hadoop

I have a created a table (movies) in Hive as below(id,name,year,rating,views)
1,The Nightmare Before Christmas,1993,3.9,4568
2,The Mummy,1932,3.5,4388
3,Orphans of the Storm,1921,3.2,9062
4,The Object of Beauty,1991,2.8,6150
5,Night Tide,1963,2.8,5126
6,One Magic Christmas,1985,3.8,5333
7,Muriel's Wedding,1994,3.5,6323
8,Mother's Boys,1994,3.4,5733
9,Nosferatu: Original Version,1929,3.5,5651
10,Nick of Time,1995,3.4,5333
I want to write a hive query to get the name of the movie with highest views.
select name,max(views) from movies;
but it gives me an error
FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP BY key name
but doing a group by with name gives me the complete list (which is expected).
What changes should I make to my query?

It is very possible that there is a simpler way to do this.
select name
from(
select max(views) as views
, name
, row_number() over (order by max(views) desc) as row_num
from movies
group by name
) m
where row_num = 1

After little bit of digging, I found out that the answer is not so straightforward as we do in SQL. Below query gives the expected result.
select a.name,a.views from movies a left semi join(select max(views) views from movies)b on (a.views=b.views);

Related

How to retrieve data from 3 tables using sub query oracle SQL

I want to retrieve users name and there responsibility_key where there end_date is null and i want to convert it to (sysdate+1) using nvl but i am only able to retrieve the responsibility_key not the name please help.
The error in the image says "column ambiguously defined". Take a close look. Your last END_DATE could refer to either the u alias or the table from the subquery. Change it to match the rest of your subquery (FIND_USER_GROUPS_DIRECT.END_DATE)
EDIT
Your query is
select u.USER_NAME, d.responsibility_key from FND_USER u,FND_RESPONSIBILITY_VL d
where responsibility_id in(
select responsibility_id from
FND_USER_RESP_GROUPS_DIRECT WHERE END_USER_RESP_GROUPS_DIRECT.END_DATE=nvl(END_DATE,sysdate+1)) and
u.END_DATE=nvl(END_DATE,SYSDATE + 1)
;
The query isn't formatted, which makes it hard to read.
Not all columns are qualified with table name (or aliases), as mentioned in the comments.
The query currently uses an implicit join.
The query is impossible to understand without seeing the table definitions (desc [table_name]).
For points 1 and 2, a properly formatted query will look something like
select u.user_name, d.responsibility_key
from
fnd_user u,
fnd_responsibility_vl d
where
d.responsibility_id in (
select urgd.responsibility_id
from
fnd_user_resp_groups_direct urgd
where
urgd.end_date = nvl(u.end_date, sysdate+1)
) and
u.end_date = nvl(urgd.end_date, sysdate + 1)
;
This makes it easier to read and in addition to this, you can see that without table definitions I guessed (see point 4) as to which tables the end_date column belongs in your query. If I had to guess, so does Oracle. That means you have an ambiguity problem. To fix it, take a close look at the end_date column as it appears in your original query and where you do not prefix it with anything, you need to prefix it with the appropriate alias (after you have aliased all your tables).
For point 3, you can write your query more clearly with an explicit join and by using aliases for all columns. As for the explicit join I have no idea what your tables look like but one possibility is something like
select u.user_name, d.responsibility_key
from fnd_user u
join fnd_responsibility_vl d
on u.id = d.user_id
where
d.responsibility_id in (
select responsibility_id
from fnd_user_resp_groups_direct urgd
where
urgd.end_date = nvl(u.end_date, sysdate+1)
) and
u.end_date = nvl(urgd.end_date, sysdate+1)
;
If you follow these points you will get to the root of the error.

Query create table as with inner join select and group by (ORACLE)

This is my query, if i run the error is : Error in query: ORA-00907: missing right parenthesis, anybody can solve my problem?
Create table r_tcash_loci_act_tmp AS (
SELECT DISTINCT
R_TCASH_LOCI_ACT.MSISDN AS MSISDN_LOCI,
R_TCASH_ACT_MSISDN.MSISDN AS MSISDN_ACT,
R_TCASH_LOCI_ACT.AREA,
R_TCASH_LOCI_ACT.REGIONAL,
R_TCASH_LOCI_ACT.BRANCH,
R_TCASH_LOCI_ACT.SUB_BRANCH,
R_TCASH_LOCI_ACT.CLUSTERX,
R_TCASH_LOCI_ACT.UPDATED,
R_TCASH_ACT_MSISDN.DAILY,
R_TCASH_ACT_MSISDN.TOTAL_TRX,
R_TCASH_ACT_MSISDN.TOTAL_VOL
FROM R_TCASH_ACT_MSISDN
INNER JOIN R_TCASH_LOCI_ACT
ON R_TCASH_LOCI_ACT.MSISDN = R_TCASH_ACT_MSISDN.MSISDN
GROUP BY R_TCASH_LOCI_ACT.MSISDN AS MSISDN_LOCI
HAVING COUNT(R_TCASH_LOCI_ACT.MSISDN ) > 10);
It's a simple syntax error. You included the column alias in the GROUP BYclause (probably a cut'n'paste error).
GROUP BY R_TCASH_LOCI_ACT.MSISDN AS MSISDN_LOCI
So just remove the AS MSISDN_LOCI.
Although given that you have a DISTINCT clause and no aggregated columns it's a mystery why you have the GROUP BY at all. You should remove the whole line.

Hive LATERAL VIEW and WHERE Clause using Sub query

I'm looking for a way to optimize my query.
We have a table with events called lea, with a column app_properties, which are tags, stored as a comma separated string.
I would like to select all the events that match the result of a query that select the desired tags.
My first try:
SELECT uuid, app_properties, tag
FROM events
LATERAL VIEW explode(split(app_properties, '(, |,)')) tag_table AS tag
WHERE tag IN (SELECT source_value FROM mapping WHERE indicator = 'Bandwidth Usage')
But Hive will not allow this...
FAILED: SemanticException [Error 10249]: Line 4:6 Unsupported SubQuery Expression 'tag': Correlating expression cannot contain unqualified column references.
Gave it another try by replacing WHERE tag IN by WHERE tag_table.tag IN but not luck...
FAILED: SemanticException Line 4:6 Invalid table alias tag_table' in definition of SubQuery sq_1 [tag_table.tag IN (SELECT source_value FROM mapping WHERE indicator = 'Bandwidth Usage')] used as sq_1 at Line 4:20.
In the end... The query below gives the desired result, but I've a feeling that this is not the most optimized way of solving this use case. Has anyone ran into the same use case where you need the select from a LATERAL VIEW using a Sub query?
SELECT to_date(substring(events.time, 0, 10)) as date, t2.code, t2.indicator, count(1) as total
FROM events
LEFT JOIN (
SELECT distinct t.uuid, im.code, im.indicator
FROM mapping im
RIGHT JOIN (
SELECT tag, uuid
FROM events
LATERAL VIEW explode(split(app_properties, '(, |,)')) tag_table AS tag
) t
ON im.source_value = t.tag AND im.indicator = 'Bandwidth Usage'
WHERE im.source_value IS NOT NULL
) t2 ON (events.uuid = t2.uuid)
WHERE t2.code IS NOT NULL
GROUP BY to_date(substring(events.time, 0, 10)), t2.code, t2.indicator;
The Hive subquery in the WHERE clause can be used with IN, NOT IN, EXIST, or NOT
EXIST as follows. If the alias (see the following example for the employee table) is not specified before columns (name) in the WHERE condition, Hive will report the error Correlating expression cannot contain unqualified column references. This is a limitation of the Hive subquery.
From Apache Hive Essentials.
I guess this problem is also caused by subquery.
events should have an alias

HIve join without common filed

I have the following tables:
Table1:
user_name Url
Rahul www.cric.info.com
ranbir www.rogby.com
sahil www.google.com
banit www.yahoo.com
Table2:
Keyword category
cric sports
footbal sports
google search
I want to search Table1 by matching the keyword in Table2. I can perform the same using case statement and the query works but it is not the right approach because each time I have to add the case statement when I will add new search keyword.
select user_name from table1
case when url like '%cric%' then sports
else 'undefined'
end as category
from table1;
Thanks find the soluntions for this approach. FIrst we need to do the Join and after that we need to filter the record.
select user_name,url,Keyword,catagory from(select table1.user_name,table1.url ,table2.keyword,table2.catagory from table1 left outer join table2)a where a.url like (concat('%',a.phrase,'%')
Not sure about more current versions, but I've run into a similar problem... the primary issue is that Hive only supports equi-join statements... when you apply logic to either side of the join, it has difficulty translating into a Map Reduce function.
The alternative method, if you have a reliably structured field, is that you can create a matching key from the larger field. For example, if you know that you're looking for your keyword to exist in the second position of a dot-delimited URI, you could do something like:
select
Uri
, split(Uri, "\\.")[1] as matchKey
from
Table1
join Table2 on Table2.keyword = Table1.matchKey
;

Vertica subquery with limit

I found problem in writing subquery with limit 1 to get the top record.
Here is my problem example.
Table Master(id,set)
Table Detail(id,set,code)
i am trying to get the latest code for each set in Master table.
Following is the query which i tried but got an error that limit 1 is not supported for correlated subqueries and it should contain GROUP By clause.
select id,set,(select code from detail where set=master.set order by id desc limit 1) from master;
And the result wolud be like
please help me if this is wrong way, i am new to this vertica database.
thnk you.
I'm not sure about that particular error, but you can use the rank() analytic function to produce results like this:
select id, set, code from (
select M.id, D.set, D.code, rank() over (partition by D.set order by D.id desc) as rank
from detail as D
right outer join master as M
on D.set = M.set) as ranks
where rank = 1
order by id;
The inner subquery uses the rank() function to assign a rank to each row within a set. The outer query just picks out rows with rank 1.

Resources