SQOOP export in shell script fails - shell

I am exporting a table from hive to mysql with the help of shell script.The below is the sqoop export command
sqoop export --connect jdbc:mysql://192.168.154.129:3306/ey -username root --table call_detail_records --export-dir /apps/hive/warehouse/xademo.db/call_detail_records --fields-terminated-by '|' --lines-terminated-by '\n' --m 4 --batch
The above command works fine from the CLI. but it doesnt work from the shell script and it generates the below Warning and error.
Warning :
15/05/05 13:30:06 WARN sqoop.SqoopOptions: Character argument '|' has multiple characters; only the first will be used.
15/05/05 13:30:06 WARN sqoop.SqoopOptions: Character argument '\n' has multiple characters; only the first will be used.
Error:
15/05/05 13:30:50 INFO mapreduce.Job: map 0% reduce 0%
15/05/05 13:31:56 INFO mapreduce.Job: Task Id : attempt_1430805361424_0046_m_000001_0, Status : FAILED
Error: java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.RuntimeException: Can't parse input data: 'PHONE_NUM|PLAN|DATE|STAUS|BALANCE|IMEI|REGION'
at customer_details.__loadFromFields(customer_details.java:464)
at customer_details.parse(customer_details.java:382)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
Caused by: java.util.NoSuchElementException
at java.util.ArrayList$Itr.next(ArrayList.java:834)
at customer_details.__loadFromFields(customer_details.java:434)
... 12 more
My Sqoop command in shell script will have variables which will be expanded.
nohup sqoop export --connect jdbc:mysql://192.168.154.129:3306/ey -username root --table $TBL_NAME --export-dir $HIVE_DIR --fields-terminated-by "$FIELD_SEP" --lines-terminated-by "'"'\'"$LINE_SEP""'" --m $NUM_MAPPERS --batch > $sqoop_outs/$TBL_NAME.out 2>&1 &
Any help is highly appreciated.
I am struggling with this for long time...

Atlast i found the reason, it is the disparate treatment of " and ' in the SQOOP command when i run from the CLI and Shell script.
Solution :
I had to change in my shell script as follows
nohup sqoop export --connect jdbc:mysql://192.168.154.129:3306/ey -username root --table $TBL_NAME --export-dir $HIVE_DIR --fields-terminated-by "$FIELD_SEP" --lines-terminated-by '\'"$LINE_SEP" --m $NUM_MAPPERS --batch > $sqoop_outs/$TBL_NAME.out 2>&1 &
which will issue the SQOOP command as follows, but it worked fine
sqoop export --connect jdbc:mysql://192.168.154.129:3306/ey -username root --table call_detail_records --export-dir /apps/hive/warehouse/xademo.db/call_detail_records --fields-terminated-by | --lines-terminated-by \n --m 4 --batch

This is for import
When you run the sqoop command from cli, the arguments to the option should have ', on the other hand when you run from oozie it should not be enclosed within the single qoute '.
I was using sqoop fro, oozie with the following arguments:
<arg>--fields-terminated-by</arg>
<arg>'\001'</arg>
<arg>--null-string</arg>
<arg>'\\N'</arg>
<arg>--null-non-string</arg>
<arg>'\\N'</arg>
The above code didn't work as expected, but the below code did
<arg>--fields-terminated-by</arg>
<arg>\001</arg>
<arg>--null-string</arg>
<arg>\\N</arg>
<arg>--null-non-string</arg>
<arg>\\N</arg>

Related

Pass queue name in mapreduce job via command line

How can we pass the queue name in mapreduce job while running via command line. I have tried passing it as :
set -e; export HADOOP_USER_CLASSPATH_FIRST='true';export HADOOP_OPTS='-Djava.io.tmpdir=/tmp'; export HADOOP_CLASSPATH='/path/to/jars-1.0.jar';sudo -E -u myUser hadoop jar /path/to/jar com.pacakage.ClassName -D mapred.job.queue.name=prod_queue --input {inputPath} --output {outputPath}
Also tried to set the mapred.job.queue.name as :
set -e; export HADOOP_USER_CLASSPATH_FIRST='true';export HADOOP_OPTS='-Djava.io.tmpdir=/tmp'; export HADOOP_CLASSPATH='/path/to/jars-1.0.jar';set mapred.job.queue.name=prod_queue;sudo -E -u myUser hadoop jar /path/to/jar com.pacakage.ClassName --input {inputPath} --output {outputPath}
None of the above command is working and the error I am getting is :
Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_xxxxx to YARN : Application application_xxxxx submitted by user myUser to unknown queue: default
After Hadoop 2.4.1 property name is mapreduce.job.queuename.
If it still does not work via command line you can try to set property directly in job:
job.getConfiguration().set("mapreduce.job.queuename", queue);

Oozie iterative workflow

I am building an application to ingest data from MYSQL DB to hive tables. App will be scheduled to execute every day.
The very first action is to read a Hive table to load import table info e.g name, type etc., and create a list of tables in a file to import. Next a Sqoop action to transfer data for each table in sequence.
Is it possible to create a shell script Oozie action which will iterate through the table list and launch oozie sub-workflow Sqoop action for each table in sequence? Could you provide some reference? Also any suggestion of a better approach!
I have come up with following shell script containing Sqoop action. It works fine with some environment variable tweaking.
hdfs_path='hdfs://quickstart.cloudera:8020/user/cloudera/workflow/table_metadata' table_temp_path='hdfs://quickstart.cloudera:8020/user/cloudera/workflow/hive_temp
if $(hadoop fs -test -e $hdfs_path)
then
for file in $(hadoop fs -ls $hdfs_path | grep -o -e "$hdfs_path/*.*");
do
echo ${file}
TABLENAME=$(hadoop fs -cat ${file});
echo $TABLENAME
HDFSPATH=$table_temp_path
sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/retail_db --table departments --username=retail_dba --password=cloudera --direct -m 1 --delete-target-dir --target-dir $table_temp_path/$TABLENAME
done
fi

overwrite hdfs directory Sqoop import

Is it possible to overwrite HDFS directory automatically instead of overwriting it every time manually while Sqoop import?
(Do we have any option like "--overwrite" like we have for hive import "--hive-overwrite")
Use --delete-target-dir
​It will delete <HDFS-target-dir> provided in command before writing data to this directory.
Use this: --delete-target-dir
This will work for overwriting the hdfs directory using sqoop syntax:
$ sqoop import --connect jdbc:mysql://localhost/dbname --username username -P --table tablename --delete-target-dir --target-dir '/targetdirectorypath' -m 1
E.g:
$ sqoop import --connect jdbc:mysql://localhost/abc --username root -P --table empsqooptargetdel --delete-target-dir --target-dir '/tmp/sqooptargetdirdelete' -m 1
This command will refresh the corresponding hdfs directory or hive table data with updated/fresh data, every time this command is run.

Sqoop Import Create File Name with Date

I am working on a Sqoop script which I want to create the target directory with the current date. Do we have some options in Sqoop like --target-dir /dir1/$DATE. If so, what is the exact syntax?
You can't directly add $DATE to sqoop but
You can use shell script and pass the parameters in the shell script For e.g.
# -----------myscript.sh------------------
DATE=`date`
echo
sqoop import --connect jdbc:db2://localhost:<PORT_NUMBER>/<DB> --table TABLE_NAME --username user -password pass -m 1 --target-dir /user/$DATE
#------------end script----------------------
Now
add permission to script file
chmod 777 myscript.sh
Run the script file
./myscript.sh

Execute multiple sqoop commands from a file

I have multiple sqoop commands, and I want to execute them sequentially. How can I do this.
Currently, --options-file allows us to execute one command at a time.
Use shell script. Write commands one by one and execute the script.It will definitely work.
#!/bin/bash
echo "*************SQOOP IMPORT JOB UTILITY*******************"
# First Sqoop command
echo
sqoop import --connect jdbc:db2://localhost:<PORT_NUMBER>/<DB> --table TABLE_NAME_1 --username user -password pass -m 1 2> log1.txt
# Second Sqoop command
echo
sqoop import --connect jdbc:db2://localhost:<PORT_NUMBER>/<DB> --table TABLE_NAME_2 --username user -password pass -m 1 2> log2.txt
echo "Check log file for sqoop jobs status"
Run shell script
./myscript.sh
I am not sure if that is possible only with Sqoop but for my case i have used Oozie to execute multiple Sqoop commands.

Resources