I have tested writing hive query output to a file by executing hive queries inside shell script using hive –e and hive –f options. when i tried executing the shell script from putty it is working fine, however in the hue box from oozie workflow the same shell script is not writing any results to local file.
Using Insert overwrite directory I can directly write hive query output to a directory inside HDFS however for each query it creates a new directory so I can not use this option.
Please suggest me any alternative option to write multiple hive query output to a single file by executing shell script from oozie workflow.
Thanks in advance.
When running shell action via Oozie workflow, it will run on any of the datanodes. check the output path is present in the datanode
Related
I have created two hive scripts script1.hql and script2.hql.
Is it possible to run the script script2.hql from script1.hql?
I read about using the source command, but could not get around about its use.
Any pointers/ref docs will be appreciated..
Use source <filepath> command:
source /tmp/script2.hql; --inside script1
The docs are here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
Hive will include text of /tmp/script2.hql and execute it in the same context, so all variables defined for main script will be accessible script2 commands.
source command looks for local path (not HDFS). Copy file to local directory before executing.
Try using command and see if you can execute
hive -f /home/user/sample.sql
I have a hive script which is a select query triggered using oozie , how to i configure oozie to write the output of the hive script to a file on hdfs.
I do not want to use INSERT OVERWRITE option.
Is there an easy way where I can tell oozie to save the output to so and so location .
I'm using Azure Blob stoarge,Data factory with HDInsight cluster .
I've a shell script which contain hadoop and hive related code , i'm trying to add/create a hive/Pig activity in ADF ,from the code of pig/hive i'm calling a shell script ; as
myFile.pig
sh /myFolder/myscript.sh
==========================
myFile.hql
!/myFolder/myscript.sh
while executing,I'm getting Java.IO.Excption | No such file or directory .
as per the exception pig/hive file is not able to recognize the shell script path ;
Did anyone faced similar issue or anyone deployed pig/hive activity along with shell script from ADF.
I've tried multiple ways and all possible path combination to pass the location of the shell script but it was not picked up , any help /suggestion/pointer, will be highly appreciable .
Thanks in advance.
Upload the shell script to blob storage and then invoke that script to pig or hive, Bleow is the steps.
Hive
!sh hadoop fs -ls wasbs://contyaineName#StorageAccountName.blob.core.windows.net/pathToScript/testshell.ksh
Pig
sh hadoop fs -ls wasbs://contyaineName#StorageAccountName.blob.core.windows.net/pathToScript/testshell.ksh
I am running a Shell script with Oozie. First I uploaded the script to HDFS, then script should forward its logs to a log file in the same directory where this script is stored in HDFS, meaning the generated log file should be in the HDFS.
Anyone knows how to achieve this goal?
...a log file in the same directory where this script is stored in HDFS...
The Oozie Shell action contains a <file> element with the HDFS path of the script. But the way it works is not what you seem to think:
Oozie aks YARN to allocate a container somewhere
Oozie asks YARN to download some files to the container's private local filesystem (in CWD), and especially the <file> stuff
finally, Oozie asks YARN to run the
local version of the script
Bottom line: the script that is executed has no way to know its original HDFS directory. The Action must pass that directory explicitly as a script argument, or as an env variable.
Assuming that you use an env variable, the obvious solution to run the archive part is something like
hdfs dfs -appendToFile ./MySession.log $LOG_ARCHIVE_DIR/archive.log
I would like to see the contents of an HDFS directory from within Hive Beeline, using an "ls" command. Similarly, I'd like to see what the default HDFS directory is set to, perhaps with a "pwd" command.
Is there any way to do this in Beeline, or am I stuck going to linux prompt and using hadoop instead?
You can enter !sh at the beeline prompt followed by shell commands, for example:
!sh pwd
This shows you the working directory in the host filesystem, of course. You can use the same mechanism to issue HDFS commands:
!sh hdfs dfs -ls /
I'm not aware of any mechanism that gives a default HDFS directory. Are you thinking of Hive databases?
For more info say help at the beeline prompt.