Impala shell - Hive Beeline - "Argument list too long" - shell

I have a Cloudera cluster on which multiple impala jobs are running all the time (i.e. cronjobs containing impala-shell commands). However I have a few INSERT INTO queries that are unusually long: they contain a lot of 'CASE...WHEN...THEN' lines. When these queries are run in impala-shell the command fails with an error "Argument list too long". They run just fine in Hue, but can't get them to run on the commandline.
Are there any workarounds for this?
I've tried running the command via Hive beeline (instead of impala) and setting 'hive.script.operator.truncate.env = true'. Beeline failed with the same error. I've tried if calling the query from a separate file made any difference (it doesn't). Can I save the 'CASE...WHEN...THEN' lines in separate vars perhaps (using 'set') and call those in the query? Or would that be another dead-end? A colleague mentioned a User Defined Function (UDF) might help, but I'm not sure. Opinions?

Related

I do not want by Bash script to stop if a Hive command fails

I have a bash script sending a lot of HiveQL commands to hive. The problem is that I do not want it to stop if one of these commands fails. I tried the usual Bash command:
set +e
but it does not work (the script stops running if one of the Hive command fails). Do you know where is the problem ? An option in my hive config or something :-) ?
Thank you !
EDIT: I use the Hiveshell, doing something like this:
#Send my command to hive ...
hive -S -e "\"$MyCommand\""
#... but I want my script continue running if the command fails :-).

Need to pass Variable from Shell Action to Oozie Shell using Hive

All,
Looking to pass variable from shell action to the oozie shell. I am running commands such as this, in my script:
#!/bin/sh
evalDate="hive -e 'set hive.execution.engine=mr; select max(cast(create_date as int)) from db.table;'"
evalPartition=$(eval $evalBaais)
echo "evaldate=$evalPartition"
Trick being that it is a hive command in the shell.
Then I am running this to get it in oozie:
${wf:actionData('getPartitions')['evaldate']}
But it pulls a blank every time! I can run those commands in my shell fine and it seems to work but oozie does not. Likewise, if I run the commands on the other boxes of the cluster, they run fine as well. Any ideas?
The issue was configuration regarding to my cluster. When I ran as oozie user, I had write permission issues to /tmp/yarn. With that, I changed the command to run as:
baais="export HADOOP_USER_NAME=functionalid; hive yarn -hiveconf hive.execution.engine=mr -e 'select max(cast(create_date as int)) from db.table;'"
Where hive allows me to run as yarn.
The solution to your problem is to use "-S" switch in hive command for silent output. (see below)
Also, what is "evalBaais"? You might need to replace this with "evalDate". So your code should look like this -
#!/bin/sh
evalDate="hive -S -e 'set hive.execution.engine=mr; select max(cast(create_date as int)) from db.table;'"
evalPartition=$(eval $evalDate)
echo "evaldate=$evalPartition"
Now you should be able to capture the out.

how to write a sqoop job using shell script and run them sequentially?

I need to run a set of sqoop jobs one after another inside a shell script. How can I achieve it? By default, it runs all the job in parallel which results in performance taking a hit. should i remove the "-m" parameter and run ?
-m parameter is used to run multiple map-only jobs for each sqoop command but not for all the commands that you issue.
so removing -m parameter will not help you to solve the problem.
first you need to write a shell script file with your sqoop commands
#!/bin/bash
sqoop_command_1
sqoop_command_2
sqoop_command_3
save the above command with some name like sqoop_jobs.sh
then issue permissions to run on the shell file
chmod 777 sqoop_jobs.sh
now you can run/execute your shell file by issuing the following command within your terminal
>./sqoop_jobs.sh
I hope this will help

Table not found exception when running hive query via an Oozie shell script

I m trying to run a hive count query on a table from a bash action in the Oozie workflow but I always get a table not found exception.
#!/bin/bash
COUNT=$(hive -S -e "SELECT COUNT(*) FROM <table_name> where <condition>;")
echo $COUNT
The idea is to get the count stored in a variable for further analysis. This works absolutely fine if run it directly from a local file on the shell.
I can do this by splitting it into 2 separate actions, where I first output hive query result to a temp directory and then read the file in the bash script.
Any help appreciated. Thanks!
Fixed it. I had some user permissions issue in accessing the table and also had to add the following property config to do the trick:
SET mapreduce.job.credentials.binary = ${HADOOP_TOKEN_FILE_LOCATION}

Oozie shell Action - Running hive from shell issue

Based on a condition being true I am executing hive -e in shell script.It works fine.When I put this script in Shell action in Oozie and run ,I get a scriptName.sh: line 42: hive:command not found exception.
I tried passing the < env-var >PATH=/usr/lib/hive< /env-var> in the shell action, but I guess I am making some mistake there,because I get the same error scriptName.sh: line 42: hive:command not found
Edited:
I used which hive in the shell script. Its output is not consistent.I get two variations of output :
1. /usr/bin/hive along with a Delegation token can be issued only with kerberos or web authentication Java IOException."
2.which : hive not in {.:/sbin:/usr/bin:/usr/sbin:...}
Ok finally I figured it out .Might be a trivial thing for experts on Shell but can help someone starting out.
1. hive : command not found It was not a classpath issue.It was a shell issue.The environment i am running in is a korn shell (echo $SHELL to find out). But the hive script(/usr/lib/hive/bin/hive.sh) is a bash shell.So i changed the shebang (#! /bin/bash) in my script and it worked.
2.Delegation Token can only be issued with kerberos or web authentication.
In my hive script i added SET mapreduce.job.credentials.binary = ${HADOOP_TOKEN_FILE_LOCATION} HADOOP_TOKEN_FILE_LOCATION is a variable that holds the location of jobToken.This token needs to be passed for authentication of access to HDFS data(in my case,an HDFS read operation,through Hive Select query) in a secure cluster.Know more on Delegation Token Here .
Obviously, u miss shell environment variables.
To confirm it, use export in called shell by oozie.
If u use oozie call shell, a simple way is use /bin/bash -l your_script.
PS. PATH is a list of directories, so u need append ${HIVE_HOME}/bin to your PATH not ${HIVE_HOME}/bin/hive.

Resources