hql query gives an error - hadoop

I try to execute this query in hql
SELECT
t.retweeted_screen_name,
sum(retweets) AS total_retweets,
count(*) AS tweet_count
FROM (SELECT
retweeted_status.user.screen_name as retweeted_screen_name,
retweeted_status.text,
max(retweet_count) as retweets
FROM tweets
GROUP BY retweeted_status.user.screen_name,
retweeted_status.text) t
GROUP BY t.retweeted_screen_name
ORDER BY total_retweets DESC
LIMIT 10;
But I'm getting this error:
Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
Anyone can help me to fix this?

I think you need write sum(t.retweets) AS total_retweets, instead of sum(retweets) AS total_retweets,

Related

Hive SQL error: Failed rule ‘identifier’ in the Select target

I wrote a hive sql query here:
SELECT
dt,
COUNT(CASE WHEN search_word like ‘%A%’ THEN id END) AS a,
COUNT(CASE WHEN search_word like ‘%B%’ THEN id END) AS b,
FROM database
GROUP BY dt
However, Hive returns an error :
Error while compiling statement : Failed
ParseExceptionline 3:7 Failed to recognise predicate ‘AS’. Failed rule: ‘identifier’ in the select target.
I searched this error and my assumption is it might be come from AS reserve word. But I still do not understand how to fix it.

Select Statement Using Dual Table As A Sub Query In Oracle Generates "From keyword not found" Error

I get below error message while executing following queries using a select statement from dual table as a sub query:
Error: ORA-00923: FROM keyword not found where expected
Query 1:
select a.dt_1, a.dt_2, a.dt_1=a.dt_2 as "match_type" from
(select to_date(replace('2020-05-14 00:00:00',' 00:00:00',''), 'yyyy/mm/dd') as "dt_1", to_date('14/05/2020','dd/mm/yyyy') as "dt_2" from dual) a
Query 2:
select a.dt_1, a.dt_2, a.dt_1=a.dt_2 as match_type from
(select to_date(replace('2020-05-14 00:00:00',' 00:00:00',''), 'yyyy/mm/dd') as dt_1, to_date('14/05/2020','dd/mm/yyyy') as dt_2 from dual) a
When I individually run sub query it executes as expected, however when I run the whole statement it generates error.
Any help is appreciated.
Your match_type column is generating the error. Oracle doesn't support relational operator matching. You may try below query -
SELECT a.dt_1,
a.dt_2,
CASE WHEN a.dt_1=a.dt_2 THEN 'TRUE' ELSE 'FLASE' END AS "match_type"
FROM (SELECT TO_DATE(REPLACE('2020-05-14 00:00:00',' 00:00:00',''), 'yyyy/mm/dd') as "dt_1",
TO_DATE('14/05/2020','dd/mm/yyyy') as "dt_2"
FROM DUAL) a;

Map Side join fails with return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask

I am trying to perform map-side joins in hive, but it keeps failing with the following message
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
The queries that I run are as follows
set hive.auto.convert.join=true;
select a.customer_id , b.order_id from customers_sample a JOIN orders_map b on a.customer_id=b.order_customer_id;
I am trying these queries in cloudera-quickstart-vm-5.12.0
Could someone please help me with what's missing here.

Hadoop - error message when declaring variable within query

I have tried the following query within HUE's Beeswax Query Editor:
SET MAXDATE=(SELECT MAX(DATA_DAY) FROM DB1.DESTINATION_TABLE);
SELECT COUNT(*) FROM DB2.SOURCE_TABLE
WHERE YEAR(DATA_DAY) >= '2015'
AND DATA_DAY > ${HIVECONF:MAXDATE};
This query will not run and produces the following error message:
FAILED: ParseException line 1:4 missing KW_ROLE at 'MAXDATE' near 'MAXDATE' line 1:11 missing EOF at '=' near 'MAXDATE'
Any advice on what the problem is? I don't understand what the KW_ROLE message means.
I come from a SQL Server background and would just run the following within SQL Server, but am trying to find a functional Hadoop/Hive equivalent.
SELECT COUNT(*) FROM DB2.SOURCE_TABLE
WHERE YEAR(DATA_DAY) >= '2015'
AND DATA_DAY > (SELECT MAX(DATA_DAY) FROM DB1.DESTINATION_TABLE)
Query which you have tried contains syntax issue. HiveConf should surrounded by single quotes.
SET MAXDATE=(SELECT MAX(DATA_DAY) FROM DB1.DESTINATION_TABLE);
SELECT COUNT(*) FROM DB2.SOURCE_TABLE
WHERE YEAR(DATA_DAY) >= '2015'
AND DATA_DAY > '${HIVECONF:MAXDATE}';
As far as I know, hive support the following syntax too.
SELECT COUNT(*) FROM DB2.SOURCE_TABLE a
JOIN
(SELECT MAX(DATA_DAY) AS max_date FROM DB1.DESTINATION_TABLE) b
WHERE YEAR(a.DATA_DAY) >= '2015'
AND a.DATA_DAY > b.max_date;
But it's not a good implementation if there are bunches of data on DB1.DESTINATION_TABLE.
In such case each query would task lots of sub-querys in SELECT MAX(DATA_DAY) FROM DB1.DESTINATION_TABLE.
If possible, you could store your SELECT MAX(DATA_DAY) FROM DB1.DESTINATION_TABLE result in another table, maybe Max_table.
Then the sql would be like this:
SELECT COUNT(*) FROM DB2.SOURCE_TABLE
JOIN Max_table
WHERE YEAR(DB2.SOURCE_TABLE.DATA_DAY) >= '2015' and
DB2.SOURCE_TABLE.DATA_DAY > (Max_table.DATA_DAY)

Strange error when use hive udf through jdbc client

all. I met a strange error when I use hive udf through jdbc client.
I have a udf to help me convert a string into time stamp format called reformat_date. I firstly execute ADD JAR and CREATE TEMPORARY FUNCTION, both work fine.
The SQL also can be explained in hive cli mode, and can be executed. But when use jdbc client, I got errors:
Query returned non-zero code: 10, cause:
FAILED: Error in semantic analysis: Line 1:283 Wrong arguments ''20121201000000'':
org.apache.hadoop.hive.ql.metadata.HiveException:
Unable to execute method public org.apache.hadoop.io.Text com.aa.datawarehouse.hive.udf.ReformatDate.evaluate(org.apache.hadoop.io.Text) on object com.aa.datawarehouse.hive.udf.ReformatDate#4557e3e8 of class com.aa.datawarehouse.hive.udf.ReformatDate with arguments {20121201000000:org.apache.hadoop.io.Text} of size 1:
at com.aa.statistic.dal.impl.TjLoginDalImpl.selectAwakenedUserCount(TjLoginDalImpl.java:258)
at com.aa.statistic.backtask.service.impl.UserBehaviorAnalysisServiceImpl.recordAwakenedUser(UserBehaviorAnalysisServiceImpl.java:326)
at com.aa.statistic.backtask.controller.BackstatisticController$21.execute(BackstatisticController.java:773)
at com.aa.statistic.backtask.controller.BackstatisticController$DailyExecutor.execute(BackstatisticController.java:823)
My SQL is
select count(distinct a.user_id) as cnt from ( select user_id, user_kind, login_date, login_time from tj_login_hive where p_month = '2012_12' and login_date = '20121201' and user_kind = '0' ) a join ( select user_id from tj_login_hive where p_month <= '2012_12' and datediff(to_date(reformat_date(concat('20121201', '000000'))), to_date(reformat_date(concat(login_date, '000000')))) >= 90 ) b on a.user_id = b.user_id
Thanks.
i think your udf threw exception.
if reformat_date function is that you make, you should check your logic.
if not, you should check the udf's specification.

Resources