Hive SQL error: Failed rule ‘identifier’ in the Select target - hadoop

I wrote a hive sql query here:
SELECT
dt,
COUNT(CASE WHEN search_word like ‘%A%’ THEN id END) AS a,
COUNT(CASE WHEN search_word like ‘%B%’ THEN id END) AS b,
FROM database
GROUP BY dt
However, Hive returns an error :
Error while compiling statement : Failed
ParseExceptionline 3:7 Failed to recognise predicate ‘AS’. Failed rule: ‘identifier’ in the select target.
I searched this error and my assumption is it might be come from AS reserve word. But I still do not understand how to fix it.

Related

Elasticsearch SQL query in Canvas : it doesn't work like SQL?

I begin to work with the Canvas section in Kibana - and to retrieve data, it uses Elasticsearch SQL.
What I try to do is to retrieve the count of several values ; and I need to group certain values together - the ones that start with the same letters.
My SQL query looks like this :
SELECT
(SELECT COUNT(*) FROM logs WHERE status LIKE 'missingValue%'),
(SELECT COUNT(*) FROM logs WHERE status LIKE 'errorValue%'),
(SELECT COUNT(*) FROM logs WHERE status='exactErrorValue'),
(SELECT COUNT(*) FROM logs WHERE status='anotherExactErrorValue')
When I test this query, using SQL and a little database, it works
Now, I want to make this work inside an element of my canvas. I choose a horizontal bar chart to represent it.
This is my elasticsearch SQL query :
SELECT
(SELECT COUNT(*) FROM "monitoring-func-*"
WHERE status LIKE 'missingValue%'),
(SELECT COUNT(*) FROM "monitoring-func-*"
WHERE status LIKE 'errorValue%'),
(SELECT COUNT(*) FROM "monitoring-func-*"
WHERE status='exactErrorValue'),
(SELECT COUNT(*) FROM "monitoring-func-*"
WHERE status='anotherExactErrorValue')
And I get this error :
{
"error": {
"message": "[essql] > Unexpected error from Elasticsearch: [unresolved_exception] Invalid call to nullable on an unresolved object ScalarSubquery[With[{}]
\\_Project[[?COUNT(?*)]]
\\_Filter[(status) REGEX (LikePattern)#5139]
\\_UnresolvedRelation[[][index=monitoring-func-*],null,Unknown index [monitoring-func-*]],5142] AS ?"
}
}
Seeing "unknown Index", I first thought that the wildcard was the problem.
But it's not, it's perfectly fine in my others Elasticsearch queries.
Is there something about the Subqueries, the multiple SELECT, that Elasticsearch SQL doesn't handle well ?
I didn't find any ressource or topics on this, but maybe I've searched the wrong way.
Depending on your Elasticsearch version, essql either doesn't support subqueries or it is very limited, here is the documentation.

Hive gives error when trying to find record with min subquery

In hive,
I am trying to select the entry with the minimum timestamp, however it's throwing the following error, not sure what is the reason.
select * from sales where partition_batch_ts = (select max(partition_batch_ts) from sales);
Error
Error while compiling statement: FAILED: ParseException line 1:91 cannot recognize input near 'select' 'max' '(' in expression specification
I think you need to use proper table alias. Also, IN must be used instead of =
SELECT s1.*
FROM sales s1
WHERE s1.partition_batch_ts IN
(SELECT MAX(partition_batch_ts)
FROM sales s2);
From Hive manual, SUBQUERIES :
As of Hive 0.13 some types of subqueries are supported in the WHERE
clause.

Error in semantic exception return 0 rows in hive

I got the below error in hive while I am executing the hql like this.
Select a.col,b.col from tab a left join tab b on a.id =b.id and a.code in
(Select c.code from tab c where c.id=123 and c.dec='123a');
Note: c.id is pk and c.dec is unique id
Error: semantic in exception 0 children returned.
Can any one help to resolve the above issue.
I think this is happening because you are using a MySQL reserved keyword "desc" as your column name. So you can change it and try again.
You can refer to this link for the list of all the reserved keywords for MySQL :https://dev.mysql.com/doc/refman/5.5/en/keywords.html

Unable to get a expected output using hive aggregate function

I have a created a table (movies) in Hive as below(id,name,year,rating,views)
1,The Nightmare Before Christmas,1993,3.9,4568
2,The Mummy,1932,3.5,4388
3,Orphans of the Storm,1921,3.2,9062
4,The Object of Beauty,1991,2.8,6150
5,Night Tide,1963,2.8,5126
6,One Magic Christmas,1985,3.8,5333
7,Muriel's Wedding,1994,3.5,6323
8,Mother's Boys,1994,3.4,5733
9,Nosferatu: Original Version,1929,3.5,5651
10,Nick of Time,1995,3.4,5333
I want to write a hive query to get the name of the movie with highest views.
select name,max(views) from movies;
but it gives me an error
FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP BY key name
but doing a group by with name gives me the complete list (which is expected).
What changes should I make to my query?
It is very possible that there is a simpler way to do this.
select name
from(
select max(views) as views
, name
, row_number() over (order by max(views) desc) as row_num
from movies
group by name
) m
where row_num = 1
After little bit of digging, I found out that the answer is not so straightforward as we do in SQL. Below query gives the expected result.
select a.name,a.views from movies a left semi join(select max(views) views from movies)b on (a.views=b.views);

Hadoop - error message when declaring variable within query

I have tried the following query within HUE's Beeswax Query Editor:
SET MAXDATE=(SELECT MAX(DATA_DAY) FROM DB1.DESTINATION_TABLE);
SELECT COUNT(*) FROM DB2.SOURCE_TABLE
WHERE YEAR(DATA_DAY) >= '2015'
AND DATA_DAY > ${HIVECONF:MAXDATE};
This query will not run and produces the following error message:
FAILED: ParseException line 1:4 missing KW_ROLE at 'MAXDATE' near 'MAXDATE' line 1:11 missing EOF at '=' near 'MAXDATE'
Any advice on what the problem is? I don't understand what the KW_ROLE message means.
I come from a SQL Server background and would just run the following within SQL Server, but am trying to find a functional Hadoop/Hive equivalent.
SELECT COUNT(*) FROM DB2.SOURCE_TABLE
WHERE YEAR(DATA_DAY) >= '2015'
AND DATA_DAY > (SELECT MAX(DATA_DAY) FROM DB1.DESTINATION_TABLE)
Query which you have tried contains syntax issue. HiveConf should surrounded by single quotes.
SET MAXDATE=(SELECT MAX(DATA_DAY) FROM DB1.DESTINATION_TABLE);
SELECT COUNT(*) FROM DB2.SOURCE_TABLE
WHERE YEAR(DATA_DAY) >= '2015'
AND DATA_DAY > '${HIVECONF:MAXDATE}';
As far as I know, hive support the following syntax too.
SELECT COUNT(*) FROM DB2.SOURCE_TABLE a
JOIN
(SELECT MAX(DATA_DAY) AS max_date FROM DB1.DESTINATION_TABLE) b
WHERE YEAR(a.DATA_DAY) >= '2015'
AND a.DATA_DAY > b.max_date;
But it's not a good implementation if there are bunches of data on DB1.DESTINATION_TABLE.
In such case each query would task lots of sub-querys in SELECT MAX(DATA_DAY) FROM DB1.DESTINATION_TABLE.
If possible, you could store your SELECT MAX(DATA_DAY) FROM DB1.DESTINATION_TABLE result in another table, maybe Max_table.
Then the sql would be like this:
SELECT COUNT(*) FROM DB2.SOURCE_TABLE
JOIN Max_table
WHERE YEAR(DB2.SOURCE_TABLE.DATA_DAY) >= '2015' and
DB2.SOURCE_TABLE.DATA_DAY > (Max_table.DATA_DAY)

Resources