Handling fs (hadoop shell) command errors in PIG script - hadoop

I have a PIG script with a couple of statements.
Sample script:
register x.jar;
fs -rmr <file-path>;
LOAD 'X' AS (uuid:chararray,value:chararray);
I'm invoking the fs shell to be deleting a file on the HDFS:
fs -rmr <file-path>
This would delete if present else continue.
If the file/directory is not present, the script exits & throws an error saying: No such file or directory
I run it using the following command:
pig -f filename.pig -param parameter1=value
"-f" is stopping it forcefully.
If I avoid "-f", I get the below error:
2015-02-02 02:50:15,388 [main] ERROR org.apache.pig.Main - ERROR 2997: Encountered IOException. org.apache.pig.tools.parameters.ParameterSubstitutionException: Undefined parameter : parameter1
How can I continue irrespective of this error?

If you want to use parameter substitution in a pig script you need to escape the parameters in the script correctly by using the $ sign:
fs -rmr $parameter1;
If you call the script with -param parameter1=value this will replace $parameter1 with value.

If you need to provide parameter values while executing the pig script , the following is the syntax.
pig --param parameter1=value filename.pig
Try and let me know if you find any issues

Related

running hive script in a shell script via oozie shell action

I have shell script " test.sh " as below
#!/bin/bash
export UDR_START_DT=default.test_tab_$(date +"%Y%m%d" -d "yesterday")
echo "Start date : "$UDR_START_DT
hive -f tst_tab.hql
the above shell script is saved in a folder in hadoop
/scripts/Linux/test.sh
the tst_tab.hql contains a simple create table statement, as I am just testing to get the hive working. This hive file is saved in the My documents folder in hue (same folder where my workflow is saved)
I have created an Oozie workflow that calls test.sh in a shell action.
Issue I am facing:
the above shell script runs successfully until line 3.
but when I add line 4 (hive -f tst_tab.hql), it generates the error
Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
I verified YARN logs too. Nothing helpful there.

Run pig in oozie shell action

I have created a simple pigscript which loads 10 records and stores in a table.
When I invoke this pig(stored in HDFS) file using oozie Shell action,I get and error as follows:
>>> Invoking Shell command line now >>
Exit code of the Shell command 5
<<< Invocation of Shell command completed <<<
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
I have put the shell file in the lib folder in the workspace and added all the required jar files in the same lib folder. Please help me to solve this issue.
I solved this this issue by the following steps:
1)Created a workflow in hue placing a pig action to invoke pigscript.
2)Generated the workflow.xml file by clicking the run button.
3)Ran the workflow.xml through commandline by adding a shell wrapper class which iterates and gives dates as input parameters.
JOB.PROPERTIES file:
oozie.use.system.libpath=True
security_enabled=False
dryrun=False
jobTracker=<jobtracker>
nameNode=<nameNode>
oozie.wf.application.path = /user/hue/oozie/workspaces/hue-oozie-1470122057.79/workflow.xml
shell file:
for date in 20160101 20160102 20160103
oozie job -oozie http://<serverip>:11000/oozie -config job.properties run

error in running pig script in tez mode with hacatalog

I was running a pig script with tez as the execution engine and using hcatalog. Below is my pig script.
set exectype=tez;
a = load 'hive table' using org.apache.pig.hcatalog.hive.HCatloader();
when I entered the following in command line,
pig -useHCatalog -x tez /home/script.pig
I got an error:
"error encountered during parsing " ";" "; " at line1, column 17.
Can anyone tell me what the issue is. Is there any different way to set execution engine inside a script?
I think you should use:
set exectype tez
instead of :
set exectype=tez;
And anyway, isn't specifying "-x tez" enough to set the execution type? Why do you need to add it in the script as well?

Pig 0.12.0 won't execute shell commands with timezone change using backticks

I'm using Hue for PIG scripts on amazon EMR. I want to make a shell call to get the date in a particular timezone into a variable which I will use to define an output folder path for writing the output to. Eventually I want to use a if else fi loop to get a particular date from a week, so the time zone will be mentioned at various places in the command.
Sample Script
ts = LOAD 's3://testbucket1/input/testdata-00000.gz' USING PigStorage('\t');
STORE ts INTO 's3://testbucket1/$OUTPUT_FOLDER' USING PigStorage('\t');
Pig parameter definition in Hue:
This works: OUTPUT FOLDER = `/bin/date +%Y%m%d`
This doesn't work: OUTPUT FOLDER = `TZ=America/New_York /bin/date +%Y%m%d`
Both of the commands execute perfectly in the bash shell. But the second command gives the following error:
2015-06-23 21:43:42,901 [main] INFO org.apache.pig.tools.parameters.PreprocessorContext - Executing command : TZ=America/Phoenix /bin/date +%Y%m%d
2015-06-23 21:43:42,913 [main] ERROR org.apache.pig.Main - ERROR 2999: Unexpected internal error. Error executing shell command: TZ=America/Phoenix /bin/date +%Y%m%d. Command exit with exit code of 126
From the GNU manual: If a command is found but is not executable, the return status is 126.
How do I resolve this?
Configuration details:
AMI version:3.7.0
Hadoop distribution:Amazon 2.4.0
Applications:Hive 0.13.1, Pig 0.12.0, Impala 1.2.4, Hue
Underlying shell: bash
User: hadoop (while using Pig and while using Bash)
If you need any clarifications then please do comment on this question. I will update it as needed.
EDIT: Under the hood, Pig calculates the value by executing "bash -c exec (command)" and assigning it to the variable, where (command) is whatever we put as a value for the variable in Hue
If I do:
date --date='TZ="America/Los_Angeles"' '+%Y%m%d'
20150624
e.g.
%default date_dir `date --date='TZ="America/Los_Angeles"' '+%Y%m%d'`;

how to call Pig scripts from shell script sequentially

I have squence of Pig scripts in a file and I want to execute it from Shell script
which execute pig scripts sqeuenciatly.
For Ex:
sh script.sh /it/provider/file_name PIGddl.txt
Suppose PIGddl.txt has Pig scripts like
Record Count
Null validation e.t.c
If all the Pig queries are in one file then how to execute the pig scripts from Shell scripts?
below idea works ,but if you want sequential process like if 1 execute then execute 2 else execute 3 kind of flow,you may go with Oozie for running and scheduling the jobs.
#!/bin/sh
x=1
while [ $x -le 3 ]
do
echo "pig_dcnt$x.pig will be run"
pig -f /home/Scripts/PigScripts/pig_dcnt$x.pig --param timestamp=$timestamp1
x=$(( $x + 1 ))
done
I haven't tested this but I'm pretty sure this will work fine.
Lets assume you have two pig files which you want to run using shell script then you would write a shell script file with following:
#!/bin/bash
pig
exec pig_script_file1.pig
exec pig_script_file2.pig
so when you run this shell script, initially it will execute pig command and goes into grunt shell and there it will execute your pig files in the order that you have mentioned
Update:
The above solution doesn't work. Please refer the below one which is
tested
Update your script file with the following so that it can run your pig files in the order that you have defined
#!/bin/bash
pig pig_script_file1.pig
pig pig_script_file2.pig
Here is what you have to do
1. Keep xxx.pig file at some location #
2. to execute this pig script from shell use the below command
pig -p xx=date(if you have some arguments to pass) -p xyz=value(if there is another arguments to be passed) -f /path/xxx.pig
-f is used to execute the pig lines of code from .pig file.

Resources