Hive gives error when trying to find record with min subquery - hadoop

In hive,
I am trying to select the entry with the minimum timestamp, however it's throwing the following error, not sure what is the reason.
select * from sales where partition_batch_ts = (select max(partition_batch_ts) from sales);
Error
Error while compiling statement: FAILED: ParseException line 1:91 cannot recognize input near 'select' 'max' '(' in expression specification

I think you need to use proper table alias. Also, IN must be used instead of =
SELECT s1.*
FROM sales s1
WHERE s1.partition_batch_ts IN
(SELECT MAX(partition_batch_ts)
FROM sales s2);
From Hive manual, SUBQUERIES :
As of Hive 0.13 some types of subqueries are supported in the WHERE
clause.

Related

Correct syntax for table name under Inner Join?

I am a complete beginner to BigQuery, and I am trying to create an inner join between two table names, where the column 'title' is the joining column. I believe my syntax is correct, but I do not know what I am doing wrong when I input the ON clause. Here is my syntax:
SELECT
*
FROM
book-to-film-adaptations.movies.movies_metadata_relevant
JOIN
book-to-film-adaptations.goodreads_books.goodreads_books_relevant_data
ON
movies_metadata_relevant.title = goodreads_books_relevant_data.title
I get this error message: Unrecognized name: movies_metadata_relevant at [8:3]
I have tried it with the full names (book-to-film-adaptations.movies.movies_metadata_relevant), but then I get an error message: "Syntax error: Unexpected keyword TO"
Any suggestions?
Thanks
You need to alias tables and use those like in below example - but in this case you will need
...
...
FROM
`book-to-film-adaptations.movies.movies_metadata_relevant` t1
JOIN
`book-to-film-adaptations.goodreads_books.goodreads_books_relevant_data` t2
ON
t1.title = t2.title
or if join columns have same name (like in your case) you can use below version
...
...
FROM
`book-to-film-adaptations.movies.movies_metadata_relevant` t1
JOIN
`book-to-film-adaptations.goodreads_books.goodreads_books_relevant_data` t2
USING (title)

Error: ORA-00905: missing keyword when joining table to a select query

I am trying to link a table to a select query and I get the Error: ORA-00905: missing keyword. This is the oracle sql I have written.
When I run things separately data pulls. Its when I try to join them I get the error. I tried adding in the Group BY and Order by per some help information I found on the Internet but still get the same error.
SELECT
AS_MASTER_NF.CONTRACT_NO
, AS_HISTORY_NF.AH_CONTRACT_NBR
, AS_MASTER_NF.ID
, AS_MASTER_NF.A_INVENT_DATE
, AS_MASTER_NF.A_DISP_DATE
FROM INFL_IDS.AS_MASTER_NF
LEFT JOIN (SELECT
NVL(SUBSTR(ALTERNATE_ID, 0, INSTR(ALTERNATE_ID, '*')-1), ALTERNATE_ID) AS ASSET
, AS_HISTORY_NF.AH_CONTRACT_NBR
FROM INFL_IDS.AS_HISTORY_NF
WHERE LENGTH (AH_CONTRACT_NBR)> 3) AS ASHISTORY
ON INFL_IDS.AS_MASTER_NF.ID = INFL_IDS.ASHISTORY.ASSET
WHERE AS_MASTER_NF.A_INVENT_DATE IS NOT NULL
GROUP BY
AS_MASTER_NF.CONTRACT_NO
, AS_HISTORY_NF.AH_CONTRACT_NBR
, AS_MASTER_NF.ID
, AS_MASTER_NF.A_INVENT_DATE
, AS_MASTER_NF.A_DISP_DATE
ORDER BY
AS_MASTER_NF.CONTRACT_NO
, AS_HISTORY_NF.AH_CONTRACT_NBR
, AS_MASTER_NF.ID
, AS_MASTER_NF.A_INVENT_DATE
, AS_MASTER_NF.A_DISP_DATE
FETCH FIRST 20 ROWS ONLY
Change:
ON INFL_IDS.AS_MASTER_NF.ID = INFL_IDS.ASHISTORY.ASSET
to:
ON INFL_IDS.AS_MASTER_NF.ID = ASHISTORY.ASSET
Because, ASHISTORY is not a table or view under INFL_IDS schema but just a sub-query which's defined in this sql.

Hive Query FAILED: ParseException line cannot recognize input near '(' 'WITH' 'DATA_SET' in select clause

I get a failure upon compiling a Hive View query using "WITH" Clause in the select statement. Below is the same view which I try to create and I encounter the error.
create view test_view as(
with data_set as
(select * from test_data )
select * from data_set
) ;
Error - Error while compiling statement: FAILED: ParseException line 1:24 cannot recognize input near '(' 'with' 'data_set' in select
clause
Please help.
Issue is due to the bracket :)
once I remove the bracket from after create view view_name as (*.. it stated working...

Unable to get a expected output using hive aggregate function

I have a created a table (movies) in Hive as below(id,name,year,rating,views)
1,The Nightmare Before Christmas,1993,3.9,4568
2,The Mummy,1932,3.5,4388
3,Orphans of the Storm,1921,3.2,9062
4,The Object of Beauty,1991,2.8,6150
5,Night Tide,1963,2.8,5126
6,One Magic Christmas,1985,3.8,5333
7,Muriel's Wedding,1994,3.5,6323
8,Mother's Boys,1994,3.4,5733
9,Nosferatu: Original Version,1929,3.5,5651
10,Nick of Time,1995,3.4,5333
I want to write a hive query to get the name of the movie with highest views.
select name,max(views) from movies;
but it gives me an error
FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP BY key name
but doing a group by with name gives me the complete list (which is expected).
What changes should I make to my query?
It is very possible that there is a simpler way to do this.
select name
from(
select max(views) as views
, name
, row_number() over (order by max(views) desc) as row_num
from movies
group by name
) m
where row_num = 1
After little bit of digging, I found out that the answer is not so straightforward as we do in SQL. Below query gives the expected result.
select a.name,a.views from movies a left semi join(select max(views) views from movies)b on (a.views=b.views);

Hive LATERAL VIEW and WHERE Clause using Sub query

I'm looking for a way to optimize my query.
We have a table with events called lea, with a column app_properties, which are tags, stored as a comma separated string.
I would like to select all the events that match the result of a query that select the desired tags.
My first try:
SELECT uuid, app_properties, tag
FROM events
LATERAL VIEW explode(split(app_properties, '(, |,)')) tag_table AS tag
WHERE tag IN (SELECT source_value FROM mapping WHERE indicator = 'Bandwidth Usage')
But Hive will not allow this...
FAILED: SemanticException [Error 10249]: Line 4:6 Unsupported SubQuery Expression 'tag': Correlating expression cannot contain unqualified column references.
Gave it another try by replacing WHERE tag IN by WHERE tag_table.tag IN but not luck...
FAILED: SemanticException Line 4:6 Invalid table alias tag_table' in definition of SubQuery sq_1 [tag_table.tag IN (SELECT source_value FROM mapping WHERE indicator = 'Bandwidth Usage')] used as sq_1 at Line 4:20.
In the end... The query below gives the desired result, but I've a feeling that this is not the most optimized way of solving this use case. Has anyone ran into the same use case where you need the select from a LATERAL VIEW using a Sub query?
SELECT to_date(substring(events.time, 0, 10)) as date, t2.code, t2.indicator, count(1) as total
FROM events
LEFT JOIN (
SELECT distinct t.uuid, im.code, im.indicator
FROM mapping im
RIGHT JOIN (
SELECT tag, uuid
FROM events
LATERAL VIEW explode(split(app_properties, '(, |,)')) tag_table AS tag
) t
ON im.source_value = t.tag AND im.indicator = 'Bandwidth Usage'
WHERE im.source_value IS NOT NULL
) t2 ON (events.uuid = t2.uuid)
WHERE t2.code IS NOT NULL
GROUP BY to_date(substring(events.time, 0, 10)), t2.code, t2.indicator;
The Hive subquery in the WHERE clause can be used with IN, NOT IN, EXIST, or NOT
EXIST as follows. If the alias (see the following example for the employee table) is not specified before columns (name) in the WHERE condition, Hive will report the error Correlating expression cannot contain unqualified column references. This is a limitation of the Hive subquery.
From Apache Hive Essentials.
I guess this problem is also caused by subquery.
events should have an alias

Resources