How to execute multiple PIG scripts parallely‏? - hadoop

I have multiple PIG Script with and currently I am executing it in sequential manner using command pig -x mapreduce /path/to/Script/Script1.pig && /path/to/Script/Script2.pig && /path/to/Script/Script3.pig
Now I am looking for executing those scripts in parallel to improve the performance as all are independent of each other. I tried to search for it but not getting exactly.
So is there any way through which I can execute all PIG scripts parallely?

#!/bin/bash
pig -x mapreduce /path/to/Script/Script1.pig &
pig -x mapreduce /path/to/Script/Script2.pig &
pig -x mapreduce /path/to/Script/Script3.pig &
wait
echo "Done!"

You should be able to use Apache Oozie http://oozie.apache.org/

Related

I do not want by Bash script to stop if a Hive command fails

I have a bash script sending a lot of HiveQL commands to hive. The problem is that I do not want it to stop if one of these commands fails. I tried the usual Bash command:
set +e
but it does not work (the script stops running if one of the Hive command fails). Do you know where is the problem ? An option in my hive config or something :-) ?
Thank you !
EDIT: I use the Hiveshell, doing something like this:
#Send my command to hive ...
hive -S -e "\"$MyCommand\""
#... but I want my script continue running if the command fails :-).

is it possible to execute more than one hive queries parallely

I have a script where it will read & execute one hql at a time,but i want to execute more than one hql at a time.Please let me know is there any way to do so.
If you use hive -e 'some command' you can use Bash &:
hive -e 'some command' &
hive -f someFile.hql &
etc..
Approach 1 (oozie):
One of the easiest and straightforward approach to run all your hql's is to use oozie. Create an oozie job and define hive actions in parallel and submit your job.
Approach 2 (Shell):
Create multiple shell scripts, with each shell script having a hive -e '<<query>>' and run all the shell scripts in parallel with a cron job (or again you can use oozie to run the shell scripts).
Although approach 2 works, I'd recommend approach 1 since oozie is the way to go to run hive scripts in parallel.

error in running pig script in tez mode with hacatalog

I was running a pig script with tez as the execution engine and using hcatalog. Below is my pig script.
set exectype=tez;
a = load 'hive table' using org.apache.pig.hcatalog.hive.HCatloader();
when I entered the following in command line,
pig -useHCatalog -x tez /home/script.pig
I got an error:
"error encountered during parsing " ";" "; " at line1, column 17.
Can anyone tell me what the issue is. Is there any different way to set execution engine inside a script?
I think you should use:
set exectype tez
instead of :
set exectype=tez;
And anyway, isn't specifying "-x tez" enough to set the execution type? Why do you need to add it in the script as well?

TEZ as execution at job level

How to selectively set TEZ as execution engine for PIG jobs?
We can set execution engine in pig.properties but its at the cluster impacts all the jobs of cluster.
Its possible if the jobs are submitted through Templeton.
Example of PowerShell usage
New-AzureHDInsightPigJobDefinition -Query $QueryString -StatusFolder $statusFolder -Arguments #("-x”, “tez")
Example of CURL usage:
curl -s -d file=<file name> -d arg=-v -d arg=-x -d arg=tez 'https://<dnsname.azurehdinsight.net>/templeton/v1/pig?user.name=admin'
Source: http://blogs.msdn.com/b/tiny_bits/archive/2015/09/19/pig-tez-as-execution-at-job-level.aspx
you can pass the execution engine as parameter as shown below, for mapreduce it is mr and for tez it is tez.
pig -useHCatalog -Dexectype=mr -Dmapreduce.job.queuename=<queue name> -param_file dummy.param dummy.pig

how to call Pig scripts from shell script sequentially

I have squence of Pig scripts in a file and I want to execute it from Shell script
which execute pig scripts sqeuenciatly.
For Ex:
sh script.sh /it/provider/file_name PIGddl.txt
Suppose PIGddl.txt has Pig scripts like
Record Count
Null validation e.t.c
If all the Pig queries are in one file then how to execute the pig scripts from Shell scripts?
below idea works ,but if you want sequential process like if 1 execute then execute 2 else execute 3 kind of flow,you may go with Oozie for running and scheduling the jobs.
#!/bin/sh
x=1
while [ $x -le 3 ]
do
echo "pig_dcnt$x.pig will be run"
pig -f /home/Scripts/PigScripts/pig_dcnt$x.pig --param timestamp=$timestamp1
x=$(( $x + 1 ))
done
I haven't tested this but I'm pretty sure this will work fine.
Lets assume you have two pig files which you want to run using shell script then you would write a shell script file with following:
#!/bin/bash
pig
exec pig_script_file1.pig
exec pig_script_file2.pig
so when you run this shell script, initially it will execute pig command and goes into grunt shell and there it will execute your pig files in the order that you have mentioned
Update:
The above solution doesn't work. Please refer the below one which is
tested
Update your script file with the following so that it can run your pig files in the order that you have defined
#!/bin/bash
pig pig_script_file1.pig
pig pig_script_file2.pig
Here is what you have to do
1. Keep xxx.pig file at some location #
2. to execute this pig script from shell use the below command
pig -p xx=date(if you have some arguments to pass) -p xyz=value(if there is another arguments to be passed) -f /path/xxx.pig
-f is used to execute the pig lines of code from .pig file.

Resources