Execute multiple sqoop commands from a file - sqoop

I have multiple sqoop commands, and I want to execute them sequentially. How can I do this.
Currently, --options-file allows us to execute one command at a time.

Use shell script. Write commands one by one and execute the script.It will definitely work.
#!/bin/bash
echo "*************SQOOP IMPORT JOB UTILITY*******************"
# First Sqoop command
echo
sqoop import --connect jdbc:db2://localhost:<PORT_NUMBER>/<DB> --table TABLE_NAME_1 --username user -password pass -m 1 2> log1.txt
# Second Sqoop command
echo
sqoop import --connect jdbc:db2://localhost:<PORT_NUMBER>/<DB> --table TABLE_NAME_2 --username user -password pass -m 1 2> log2.txt
echo "Check log file for sqoop jobs status"
Run shell script
./myscript.sh

I am not sure if that is possible only with Sqoop but for my case i have used Oozie to execute multiple Sqoop commands.

Related

Dump data from beeline in HDFS directory

i am writing a bash script to export dynamic sql query into a hql file in HDFS directory.I am going to run this bash through oozie.
sql_v= select 'create table table_name from user_tab_columns where ...;'
beeline -u "$sql_v" > local_path
sql_v variable will store dynamic create table command which i want to store in a hql file in hdfs directory. If i run above 2 steps it runs fine because i am storing data in local path but instead of passing local_path i want to store sql in hdfs directory.Is there a way i can pass hdfs path instead of local_path like below but this doesn't work. Can i use any other command instead of beeline to achieve this ?
beeline -u "$sql_v" | hdfs dfs -appendToFile -
If the goal is to write the output of beeline to hdfs file then below options should work fine since both commands will pipe the standard output of beeline to hadoop commands as input which is recognized by (-).
beeline -u beeline_connection_string .... -e "$sql_v" | hadoop fs -put - /user/userid/file.hql
beeline -u beeline_connection_string .... -e "$sql_v" | hadoop fs -appendToFile - /user/userid/file.hql
Note:
1. It's a little unclear based on your question and comments on why can't you use the suggestion given by #cricket_007 and why to go for a beeline in particular.
echo "$sql_v" > file.hql
hadoop fs -put file.hql /user/userid/file.hql
beeline -u beeline_connection_string .... -e "$sql_v" > file.hql
hadoop fs -appendToFile file.hql /user/userid/file.hql
beeline -u beeline_connection_string .... -e "$sql_v" > file.hql
hadoop fs -put file.hql /user/userid/file.hql
If oozie shell action is used to run the bash script which containing the sql_v and beeline command, beeline needs to be present in the node where shell action will run if not you will face beeline not found an error.
Refer: beeline-command-not-found-error

Oozie iterative workflow

I am building an application to ingest data from MYSQL DB to hive tables. App will be scheduled to execute every day.
The very first action is to read a Hive table to load import table info e.g name, type etc., and create a list of tables in a file to import. Next a Sqoop action to transfer data for each table in sequence.
Is it possible to create a shell script Oozie action which will iterate through the table list and launch oozie sub-workflow Sqoop action for each table in sequence? Could you provide some reference? Also any suggestion of a better approach!
I have come up with following shell script containing Sqoop action. It works fine with some environment variable tweaking.
hdfs_path='hdfs://quickstart.cloudera:8020/user/cloudera/workflow/table_metadata' table_temp_path='hdfs://quickstart.cloudera:8020/user/cloudera/workflow/hive_temp
if $(hadoop fs -test -e $hdfs_path)
then
for file in $(hadoop fs -ls $hdfs_path | grep -o -e "$hdfs_path/*.*");
do
echo ${file}
TABLENAME=$(hadoop fs -cat ${file});
echo $TABLENAME
HDFSPATH=$table_temp_path
sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/retail_db --table departments --username=retail_dba --password=cloudera --direct -m 1 --delete-target-dir --target-dir $table_temp_path/$TABLENAME
done
fi

overwrite hdfs directory Sqoop import

Is it possible to overwrite HDFS directory automatically instead of overwriting it every time manually while Sqoop import?
(Do we have any option like "--overwrite" like we have for hive import "--hive-overwrite")
Use --delete-target-dir
​It will delete <HDFS-target-dir> provided in command before writing data to this directory.
Use this: --delete-target-dir
This will work for overwriting the hdfs directory using sqoop syntax:
$ sqoop import --connect jdbc:mysql://localhost/dbname --username username -P --table tablename --delete-target-dir --target-dir '/targetdirectorypath' -m 1
E.g:
$ sqoop import --connect jdbc:mysql://localhost/abc --username root -P --table empsqooptargetdel --delete-target-dir --target-dir '/tmp/sqooptargetdirdelete' -m 1
This command will refresh the corresponding hdfs directory or hive table data with updated/fresh data, every time this command is run.

SQOOP export in shell script fails

I am exporting a table from hive to mysql with the help of shell script.The below is the sqoop export command
sqoop export --connect jdbc:mysql://192.168.154.129:3306/ey -username root --table call_detail_records --export-dir /apps/hive/warehouse/xademo.db/call_detail_records --fields-terminated-by '|' --lines-terminated-by '\n' --m 4 --batch
The above command works fine from the CLI. but it doesnt work from the shell script and it generates the below Warning and error.
Warning :
15/05/05 13:30:06 WARN sqoop.SqoopOptions: Character argument '|' has multiple characters; only the first will be used.
15/05/05 13:30:06 WARN sqoop.SqoopOptions: Character argument '\n' has multiple characters; only the first will be used.
Error:
15/05/05 13:30:50 INFO mapreduce.Job: map 0% reduce 0%
15/05/05 13:31:56 INFO mapreduce.Job: Task Id : attempt_1430805361424_0046_m_000001_0, Status : FAILED
Error: java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.RuntimeException: Can't parse input data: 'PHONE_NUM|PLAN|DATE|STAUS|BALANCE|IMEI|REGION'
at customer_details.__loadFromFields(customer_details.java:464)
at customer_details.parse(customer_details.java:382)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
Caused by: java.util.NoSuchElementException
at java.util.ArrayList$Itr.next(ArrayList.java:834)
at customer_details.__loadFromFields(customer_details.java:434)
... 12 more
My Sqoop command in shell script will have variables which will be expanded.
nohup sqoop export --connect jdbc:mysql://192.168.154.129:3306/ey -username root --table $TBL_NAME --export-dir $HIVE_DIR --fields-terminated-by "$FIELD_SEP" --lines-terminated-by "'"'\'"$LINE_SEP""'" --m $NUM_MAPPERS --batch > $sqoop_outs/$TBL_NAME.out 2>&1 &
Any help is highly appreciated.
I am struggling with this for long time...
Atlast i found the reason, it is the disparate treatment of " and ' in the SQOOP command when i run from the CLI and Shell script.
Solution :
I had to change in my shell script as follows
nohup sqoop export --connect jdbc:mysql://192.168.154.129:3306/ey -username root --table $TBL_NAME --export-dir $HIVE_DIR --fields-terminated-by "$FIELD_SEP" --lines-terminated-by '\'"$LINE_SEP" --m $NUM_MAPPERS --batch > $sqoop_outs/$TBL_NAME.out 2>&1 &
which will issue the SQOOP command as follows, but it worked fine
sqoop export --connect jdbc:mysql://192.168.154.129:3306/ey -username root --table call_detail_records --export-dir /apps/hive/warehouse/xademo.db/call_detail_records --fields-terminated-by | --lines-terminated-by \n --m 4 --batch
This is for import
When you run the sqoop command from cli, the arguments to the option should have ', on the other hand when you run from oozie it should not be enclosed within the single qoute '.
I was using sqoop fro, oozie with the following arguments:
<arg>--fields-terminated-by</arg>
<arg>'\001'</arg>
<arg>--null-string</arg>
<arg>'\\N'</arg>
<arg>--null-non-string</arg>
<arg>'\\N'</arg>
The above code didn't work as expected, but the below code did
<arg>--fields-terminated-by</arg>
<arg>\001</arg>
<arg>--null-string</arg>
<arg>\\N</arg>
<arg>--null-non-string</arg>
<arg>\\N</arg>

Sqoop Import Create File Name with Date

I am working on a Sqoop script which I want to create the target directory with the current date. Do we have some options in Sqoop like --target-dir /dir1/$DATE. If so, what is the exact syntax?
You can't directly add $DATE to sqoop but
You can use shell script and pass the parameters in the shell script For e.g.
# -----------myscript.sh------------------
DATE=`date`
echo
sqoop import --connect jdbc:db2://localhost:<PORT_NUMBER>/<DB> --table TABLE_NAME --username user -password pass -m 1 --target-dir /user/$DATE
#------------end script----------------------
Now
add permission to script file
chmod 777 myscript.sh
Run the script file
./myscript.sh

Resources