how to call Pig scripts from shell script sequentially - shell

I have squence of Pig scripts in a file and I want to execute it from Shell script
which execute pig scripts sqeuenciatly.
For Ex:
sh script.sh /it/provider/file_name PIGddl.txt
Suppose PIGddl.txt has Pig scripts like
Record Count
Null validation e.t.c
If all the Pig queries are in one file then how to execute the pig scripts from Shell scripts?

below idea works ,but if you want sequential process like if 1 execute then execute 2 else execute 3 kind of flow,you may go with Oozie for running and scheduling the jobs.
#!/bin/sh
x=1
while [ $x -le 3 ]
do
echo "pig_dcnt$x.pig will be run"
pig -f /home/Scripts/PigScripts/pig_dcnt$x.pig --param timestamp=$timestamp1
x=$(( $x + 1 ))
done

I haven't tested this but I'm pretty sure this will work fine.
Lets assume you have two pig files which you want to run using shell script then you would write a shell script file with following:
#!/bin/bash
pig
exec pig_script_file1.pig
exec pig_script_file2.pig
so when you run this shell script, initially it will execute pig command and goes into grunt shell and there it will execute your pig files in the order that you have mentioned
Update:
The above solution doesn't work. Please refer the below one which is
tested
Update your script file with the following so that it can run your pig files in the order that you have defined
#!/bin/bash
pig pig_script_file1.pig
pig pig_script_file2.pig

Here is what you have to do
1. Keep xxx.pig file at some location #
2. to execute this pig script from shell use the below command
pig -p xx=date(if you have some arguments to pass) -p xyz=value(if there is another arguments to be passed) -f /path/xxx.pig
-f is used to execute the pig lines of code from .pig file.

Related

setFilterIfMissing not working in hbase shell script

I have hbase shell query which works correctly in power shell using hbase shell:
scan 'table', {FILTER=>"SingleColumnFilter('ids','author_ids','=','regexstring:.*USER123.*',true,true)
I'm trying to rewrite the same to the shell script that can be run from the command line in the background using &, but setFilterIfMissing not working - records where specific column is missing are also presented in the output, not sure why
echo scan "'table', {FILTER=>org.apache.hadoop.hbase.filter.SingleColumnValueFilter.new(Bytes.toBytes('ids'),Bytes.toBytes('author_ids'),org.apache.hadoop.hbase.filter.CompareFilter::CompareOp.valueOf('EQUAL'),org.apache.hadoop.hbase.filter.RegexStringComparator.new('.*USER123.*')).setFilterIfMissing(true)}" | hbase shell -n > output.out
your help will be much appreciated

I do not want by Bash script to stop if a Hive command fails

I have a bash script sending a lot of HiveQL commands to hive. The problem is that I do not want it to stop if one of these commands fails. I tried the usual Bash command:
set +e
but it does not work (the script stops running if one of the Hive command fails). Do you know where is the problem ? An option in my hive config or something :-) ?
Thank you !
EDIT: I use the Hiveshell, doing something like this:
#Send my command to hive ...
hive -S -e "\"$MyCommand\""
#... but I want my script continue running if the command fails :-).

running hive script in a shell script via oozie shell action

I have shell script " test.sh " as below
#!/bin/bash
export UDR_START_DT=default.test_tab_$(date +"%Y%m%d" -d "yesterday")
echo "Start date : "$UDR_START_DT
hive -f tst_tab.hql
the above shell script is saved in a folder in hadoop
/scripts/Linux/test.sh
the tst_tab.hql contains a simple create table statement, as I am just testing to get the hive working. This hive file is saved in the My documents folder in hue (same folder where my workflow is saved)
I have created an Oozie workflow that calls test.sh in a shell action.
Issue I am facing:
the above shell script runs successfully until line 3.
but when I add line 4 (hive -f tst_tab.hql), it generates the error
Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
I verified YARN logs too. Nothing helpful there.

is it possible to execute more than one hive queries parallely

I have a script where it will read & execute one hql at a time,but i want to execute more than one hql at a time.Please let me know is there any way to do so.
If you use hive -e 'some command' you can use Bash &:
hive -e 'some command' &
hive -f someFile.hql &
etc..
Approach 1 (oozie):
One of the easiest and straightforward approach to run all your hql's is to use oozie. Create an oozie job and define hive actions in parallel and submit your job.
Approach 2 (Shell):
Create multiple shell scripts, with each shell script having a hive -e '<<query>>' and run all the shell scripts in parallel with a cron job (or again you can use oozie to run the shell scripts).
Although approach 2 works, I'd recommend approach 1 since oozie is the way to go to run hive scripts in parallel.

how to write a sqoop job using shell script and run them sequentially?

I need to run a set of sqoop jobs one after another inside a shell script. How can I achieve it? By default, it runs all the job in parallel which results in performance taking a hit. should i remove the "-m" parameter and run ?
-m parameter is used to run multiple map-only jobs for each sqoop command but not for all the commands that you issue.
so removing -m parameter will not help you to solve the problem.
first you need to write a shell script file with your sqoop commands
#!/bin/bash
sqoop_command_1
sqoop_command_2
sqoop_command_3
save the above command with some name like sqoop_jobs.sh
then issue permissions to run on the shell file
chmod 777 sqoop_jobs.sh
now you can run/execute your shell file by issuing the following command within your terminal
>./sqoop_jobs.sh
I hope this will help

Resources