Hive query cli works, same via hue fails - hadoop

I have a weird issue with hue (version 3.10).
I have a very simple hive query:
drop table if exists csv_dump;
create table csv_dump row format delimited fields terminated by ',' lines terminated by '\n' location '/user/oozie/export' as select * from sample;
running this query in the hive editor works
running this query as an oozie workflow command line works
running this query command line with beeline works
running this query via an oozie workflow from hive fails
Fail in that case means:
drop and create are not run, or at least do not have any effect
a prepare action in the workflow will be executed
the hive2 step in the workflow still says succeeded
a following step will be executed.
Now I did try with different users (oozie and ambari, adapting the location as relevant), with exactly the same success/failure cases.
I cannot find any relevant logs, except maybe from hue:
------------------------
Beeline command arguments :
-u
jdbc:hive2://ip-10-0-0-139.eu-west-1.compute.internal:10000/default
-n
oozie
-p
DUMMY
-d
org.apache.hive.jdbc.HiveDriver
-f
s.q
-a
delegationToken
--hiveconf
mapreduce.job.tags=oozie-e686d7aaef4a29c020059e150d36db98
Fetching child yarn jobs
tag id : oozie-e686d7aaef4a29c020059e150d36db98
Child yarn jobs are found -
=================================================================
>>> Invoking Beeline command line now >>>
0: jdbc:hive2://ip-10-0-0-139.eu-west-1.compu> drop table if exists csv_dump; cr
eate table csv_dump0 row format delimited fields terminated by ',' lines termina
ted by '\n' location '/user/ambari/export' as select * from sample;
<<< Invocation of Beeline command completed <<<
Hadoop Job IDs executed by Beeline:
<<< Invocation of Main class completed <<<
Oozie Launcher, capturing output data:
=======================
#
#Thu Jul 07 13:12:39 UTC 2016
hadoopJobs=
=======================
Oozie Launcher, uploading action data to HDFS sequence file: hdfs://ip-10-0-0-139.eu-west-1.compute.internal:8020/user/oozie/oozie-oozi/0000011-160707062514560-oozie-oozi-W/hive2-f2c9--hive2/action-data.seq
Oozie Launcher ends
Where I see that beeline is started, but I do not see any mapper allocated as I do command line.
Would anybody have any idea of what could go wrong?
Thanks,
Guillaume

As explained by #romain in the comments, new lines need to be added in the sql script. Then all is good.

Related

running hive script in a shell script via oozie shell action

I have shell script " test.sh " as below
#!/bin/bash
export UDR_START_DT=default.test_tab_$(date +"%Y%m%d" -d "yesterday")
echo "Start date : "$UDR_START_DT
hive -f tst_tab.hql
the above shell script is saved in a folder in hadoop
/scripts/Linux/test.sh
the tst_tab.hql contains a simple create table statement, as I am just testing to get the hive working. This hive file is saved in the My documents folder in hue (same folder where my workflow is saved)
I have created an Oozie workflow that calls test.sh in a shell action.
Issue I am facing:
the above shell script runs successfully until line 3.
but when I add line 4 (hive -f tst_tab.hql), it generates the error
Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
I verified YARN logs too. Nothing helpful there.

Impala shell - Hive Beeline - "Argument list too long"

I have a Cloudera cluster on which multiple impala jobs are running all the time (i.e. cronjobs containing impala-shell commands). However I have a few INSERT INTO queries that are unusually long: they contain a lot of 'CASE...WHEN...THEN' lines. When these queries are run in impala-shell the command fails with an error "Argument list too long". They run just fine in Hue, but can't get them to run on the commandline.
Are there any workarounds for this?
I've tried running the command via Hive beeline (instead of impala) and setting 'hive.script.operator.truncate.env = true'. Beeline failed with the same error. I've tried if calling the query from a separate file made any difference (it doesn't). Can I save the 'CASE...WHEN...THEN' lines in separate vars perhaps (using 'set') and call those in the query? Or would that be another dead-end? A colleague mentioned a User Defined Function (UDF) might help, but I'm not sure. Opinions?

Pig 0.12.0 won't execute shell commands with timezone change using backticks

I'm using Hue for PIG scripts on amazon EMR. I want to make a shell call to get the date in a particular timezone into a variable which I will use to define an output folder path for writing the output to. Eventually I want to use a if else fi loop to get a particular date from a week, so the time zone will be mentioned at various places in the command.
Sample Script
ts = LOAD 's3://testbucket1/input/testdata-00000.gz' USING PigStorage('\t');
STORE ts INTO 's3://testbucket1/$OUTPUT_FOLDER' USING PigStorage('\t');
Pig parameter definition in Hue:
This works: OUTPUT FOLDER = `/bin/date +%Y%m%d`
This doesn't work: OUTPUT FOLDER = `TZ=America/New_York /bin/date +%Y%m%d`
Both of the commands execute perfectly in the bash shell. But the second command gives the following error:
2015-06-23 21:43:42,901 [main] INFO org.apache.pig.tools.parameters.PreprocessorContext - Executing command : TZ=America/Phoenix /bin/date +%Y%m%d
2015-06-23 21:43:42,913 [main] ERROR org.apache.pig.Main - ERROR 2999: Unexpected internal error. Error executing shell command: TZ=America/Phoenix /bin/date +%Y%m%d. Command exit with exit code of 126
From the GNU manual: If a command is found but is not executable, the return status is 126.
How do I resolve this?
Configuration details:
AMI version:3.7.0
Hadoop distribution:Amazon 2.4.0
Applications:Hive 0.13.1, Pig 0.12.0, Impala 1.2.4, Hue
Underlying shell: bash
User: hadoop (while using Pig and while using Bash)
If you need any clarifications then please do comment on this question. I will update it as needed.
EDIT: Under the hood, Pig calculates the value by executing "bash -c exec (command)" and assigning it to the variable, where (command) is whatever we put as a value for the variable in Hue
If I do:
date --date='TZ="America/Los_Angeles"' '+%Y%m%d'
20150624
e.g.
%default date_dir `date --date='TZ="America/Los_Angeles"' '+%Y%m%d'`;

Need to pass Variable from Shell Action to Oozie Shell using Hive

All,
Looking to pass variable from shell action to the oozie shell. I am running commands such as this, in my script:
#!/bin/sh
evalDate="hive -e 'set hive.execution.engine=mr; select max(cast(create_date as int)) from db.table;'"
evalPartition=$(eval $evalBaais)
echo "evaldate=$evalPartition"
Trick being that it is a hive command in the shell.
Then I am running this to get it in oozie:
${wf:actionData('getPartitions')['evaldate']}
But it pulls a blank every time! I can run those commands in my shell fine and it seems to work but oozie does not. Likewise, if I run the commands on the other boxes of the cluster, they run fine as well. Any ideas?
The issue was configuration regarding to my cluster. When I ran as oozie user, I had write permission issues to /tmp/yarn. With that, I changed the command to run as:
baais="export HADOOP_USER_NAME=functionalid; hive yarn -hiveconf hive.execution.engine=mr -e 'select max(cast(create_date as int)) from db.table;'"
Where hive allows me to run as yarn.
The solution to your problem is to use "-S" switch in hive command for silent output. (see below)
Also, what is "evalBaais"? You might need to replace this with "evalDate". So your code should look like this -
#!/bin/sh
evalDate="hive -S -e 'set hive.execution.engine=mr; select max(cast(create_date as int)) from db.table;'"
evalPartition=$(eval $evalDate)
echo "evaldate=$evalPartition"
Now you should be able to capture the out.

Table not found exception when running hive query via an Oozie shell script

I m trying to run a hive count query on a table from a bash action in the Oozie workflow but I always get a table not found exception.
#!/bin/bash
COUNT=$(hive -S -e "SELECT COUNT(*) FROM <table_name> where <condition>;")
echo $COUNT
The idea is to get the count stored in a variable for further analysis. This works absolutely fine if run it directly from a local file on the shell.
I can do this by splitting it into 2 separate actions, where I first output hive query result to a temp directory and then read the file in the bash script.
Any help appreciated. Thanks!
Fixed it. I had some user permissions issue in accessing the table and also had to add the following property config to do the trick:
SET mapreduce.job.credentials.binary = ${HADOOP_TOKEN_FILE_LOCATION}

Resources