hql query gives an error

hql query gives an error - hadoop

I try to execute this query in hql
SELECT
t.retweeted_screen_name,
sum(retweets) AS total_retweets,
count(*) AS tweet_count
FROM (SELECT
retweeted_status.user.screen_name as retweeted_screen_name,
retweeted_status.text,
max(retweet_count) as retweets
FROM tweets
GROUP BY retweeted_status.user.screen_name,
retweeted_status.text) t
GROUP BY t.retweeted_screen_name
ORDER BY total_retweets DESC
LIMIT 10;
But I'm getting this error:
Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
Anyone can help me to fix this?

I think you need write sum(t.retweets) AS total_retweets, instead of sum(retweets) AS total_retweets,

Related

Hive SQL error: Failed rule ‘identifier’ in the Select target

I wrote a hive sql query here:
SELECT
dt,
COUNT(CASE WHEN search_word like ‘%A%’ THEN id END) AS a,
COUNT(CASE WHEN search_word like ‘%B%’ THEN id END) AS b,
FROM database
GROUP BY dt
However, Hive returns an error :
Error while compiling statement : Failed
ParseExceptionline 3:7 Failed to recognise predicate ‘AS’. Failed rule: ‘identifier’ in the select target.
I searched this error and my assumption is it might be come from AS reserve word. But I still do not understand how to fix it.

Select Statement Using Dual Table As A Sub Query In Oracle Generates "From keyword not found" Error

I get below error message while executing following queries using a select statement from dual table as a sub query:
Error: ORA-00923: FROM keyword not found where expected
Query 1:
select a.dt_1, a.dt_2, a.dt_1=a.dt_2 as "match_type" from
(select to_date(replace('2020-05-14 00:00:00',' 00:00:00',''), 'yyyy/mm/dd') as "dt_1", to_date('14/05/2020','dd/mm/yyyy') as "dt_2" from dual) a
Query 2:
select a.dt_1, a.dt_2, a.dt_1=a.dt_2 as match_type from
(select to_date(replace('2020-05-14 00:00:00',' 00:00:00',''), 'yyyy/mm/dd') as dt_1, to_date('14/05/2020','dd/mm/yyyy') as dt_2 from dual) a
When I individually run sub query it executes as expected, however when I run the whole statement it generates error.
Any help is appreciated.

Your match_type column is generating the error. Oracle doesn't support relational operator matching. You may try below query -
SELECT a.dt_1,
a.dt_2,
CASE WHEN a.dt_1=a.dt_2 THEN 'TRUE' ELSE 'FLASE' END AS "match_type"
FROM (SELECT TO_DATE(REPLACE('2020-05-14 00:00:00',' 00:00:00',''), 'yyyy/mm/dd') as "dt_1",
TO_DATE('14/05/2020','dd/mm/yyyy') as "dt_2"
FROM DUAL) a;

Map Side join fails with return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask

I am trying to perform map-side joins in hive, but it keeps failing with the following message
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
The queries that I run are as follows
set hive.auto.convert.join=true;
select a.customer_id , b.order_id from customers_sample a JOIN orders_map b on a.customer_id=b.order_customer_id;
I am trying these queries in cloudera-quickstart-vm-5.12.0
Could someone please help me with what's missing here.

Hadoop - error message when declaring variable within query

I have tried the following query within HUE's Beeswax Query Editor:
SET MAXDATE=(SELECT MAX(DATA_DAY) FROM DB1.DESTINATION_TABLE);
SELECT COUNT(*) FROM DB2.SOURCE_TABLE
WHERE YEAR(DATA_DAY) >= '2015'
AND DATA_DAY > ${HIVECONF:MAXDATE};
This query will not run and produces the following error message:
FAILED: ParseException line 1:4 missing KW_ROLE at 'MAXDATE' near 'MAXDATE' line 1:11 missing EOF at '=' near 'MAXDATE'
Any advice on what the problem is? I don't understand what the KW_ROLE message means.
I come from a SQL Server background and would just run the following within SQL Server, but am trying to find a functional Hadoop/Hive equivalent.
SELECT COUNT(*) FROM DB2.SOURCE_TABLE
WHERE YEAR(DATA_DAY) >= '2015'
AND DATA_DAY > (SELECT MAX(DATA_DAY) FROM DB1.DESTINATION_TABLE)

Query which you have tried contains syntax issue. HiveConf should surrounded by single quotes.
SET MAXDATE=(SELECT MAX(DATA_DAY) FROM DB1.DESTINATION_TABLE);
SELECT COUNT(*) FROM DB2.SOURCE_TABLE
WHERE YEAR(DATA_DAY) >= '2015'
AND DATA_DAY > '${HIVECONF:MAXDATE}';

As far as I know, hive support the following syntax too.
SELECT COUNT(*) FROM DB2.SOURCE_TABLE a
JOIN
(SELECT MAX(DATA_DAY) AS max_date FROM DB1.DESTINATION_TABLE) b
WHERE YEAR(a.DATA_DAY) >= '2015'
AND a.DATA_DAY > b.max_date;
But it's not a good implementation if there are bunches of data on DB1.DESTINATION_TABLE.
In such case each query would task lots of sub-querys in SELECT MAX(DATA_DAY) FROM DB1.DESTINATION_TABLE.
If possible, you could store your SELECT MAX(DATA_DAY) FROM DB1.DESTINATION_TABLE result in another table, maybe Max_table.
Then the sql would be like this:
SELECT COUNT(*) FROM DB2.SOURCE_TABLE
JOIN Max_table
WHERE YEAR(DB2.SOURCE_TABLE.DATA_DAY) >= '2015' and
DB2.SOURCE_TABLE.DATA_DAY > (Max_table.DATA_DAY)

Strange error when use hive udf through jdbc client

all. I met a strange error when I use hive udf through jdbc client.
I have a udf to help me convert a string into time stamp format called reformat_date. I firstly execute ADD JAR and CREATE TEMPORARY FUNCTION, both work fine.
The SQL also can be explained in hive cli mode, and can be executed. But when use jdbc client, I got errors:
Query returned non-zero code: 10, cause:
FAILED: Error in semantic analysis: Line 1:283 Wrong arguments ''20121201000000'':
org.apache.hadoop.hive.ql.metadata.HiveException:
Unable to execute method public org.apache.hadoop.io.Text com.aa.datawarehouse.hive.udf.ReformatDate.evaluate(org.apache.hadoop.io.Text) on object com.aa.datawarehouse.hive.udf.ReformatDate#4557e3e8 of class com.aa.datawarehouse.hive.udf.ReformatDate with arguments {20121201000000:org.apache.hadoop.io.Text} of size 1:
at com.aa.statistic.dal.impl.TjLoginDalImpl.selectAwakenedUserCount(TjLoginDalImpl.java:258)
at com.aa.statistic.backtask.service.impl.UserBehaviorAnalysisServiceImpl.recordAwakenedUser(UserBehaviorAnalysisServiceImpl.java:326)
at com.aa.statistic.backtask.controller.BackstatisticController$21.execute(BackstatisticController.java:773)
at com.aa.statistic.backtask.controller.BackstatisticController$DailyExecutor.execute(BackstatisticController.java:823)
My SQL is
select count(distinct a.user_id) as cnt from ( select user_id, user_kind, login_date, login_time from tj_login_hive where p_month = '2012_12' and login_date = '20121201' and user_kind = '0' ) a join ( select user_id from tj_login_hive where p_month <= '2012_12' and datediff(to_date(reformat_date(concat('20121201', '000000'))), to_date(reformat_date(concat(login_date, '000000')))) >= 90 ) b on a.user_id = b.user_id
Thanks.

i think your udf threw exception.
if reformat_date function is that you make, you should check your logic.
if not, you should check the udf's specification.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

hql query gives an error - hadoop

I think you need write sum(t.retweets) AS total_retweets, instead of sum(retweets) AS total_retweets,

Related

Hive SQL error: Failed rule ‘identifier’ in the Select target

Select Statement Using Dual Table As A Sub Query In Oracle Generates "From keyword not found" Error

Map Side join fails with return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask

Hadoop - error message when declaring variable within query

Strange error when use hive udf through jdbc client

Categories

Resources