Sqoop Import Create File Name with Date - sqoop

I am working on a Sqoop script which I want to create the target directory with the current date. Do we have some options in Sqoop like --target-dir /dir1/$DATE. If so, what is the exact syntax?

You can't directly add $DATE to sqoop but
You can use shell script and pass the parameters in the shell script For e.g.
# -----------myscript.sh------------------
DATE=`date`
echo
sqoop import --connect jdbc:db2://localhost:<PORT_NUMBER>/<DB> --table TABLE_NAME --username user -password pass -m 1 --target-dir /user/$DATE
#------------end script----------------------
Now
add permission to script file
chmod 777 myscript.sh
Run the script file
./myscript.sh

Related

Oozie iterative workflow

I am building an application to ingest data from MYSQL DB to hive tables. App will be scheduled to execute every day.
The very first action is to read a Hive table to load import table info e.g name, type etc., and create a list of tables in a file to import. Next a Sqoop action to transfer data for each table in sequence.
Is it possible to create a shell script Oozie action which will iterate through the table list and launch oozie sub-workflow Sqoop action for each table in sequence? Could you provide some reference? Also any suggestion of a better approach!
I have come up with following shell script containing Sqoop action. It works fine with some environment variable tweaking.
hdfs_path='hdfs://quickstart.cloudera:8020/user/cloudera/workflow/table_metadata' table_temp_path='hdfs://quickstart.cloudera:8020/user/cloudera/workflow/hive_temp
if $(hadoop fs -test -e $hdfs_path)
then
for file in $(hadoop fs -ls $hdfs_path | grep -o -e "$hdfs_path/*.*");
do
echo ${file}
TABLENAME=$(hadoop fs -cat ${file});
echo $TABLENAME
HDFSPATH=$table_temp_path
sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/retail_db --table departments --username=retail_dba --password=cloudera --direct -m 1 --delete-target-dir --target-dir $table_temp_path/$TABLENAME
done
fi

overwrite hdfs directory Sqoop import

Is it possible to overwrite HDFS directory automatically instead of overwriting it every time manually while Sqoop import?
(Do we have any option like "--overwrite" like we have for hive import "--hive-overwrite")
Use --delete-target-dir
​It will delete <HDFS-target-dir> provided in command before writing data to this directory.
Use this: --delete-target-dir
This will work for overwriting the hdfs directory using sqoop syntax:
$ sqoop import --connect jdbc:mysql://localhost/dbname --username username -P --table tablename --delete-target-dir --target-dir '/targetdirectorypath' -m 1
E.g:
$ sqoop import --connect jdbc:mysql://localhost/abc --username root -P --table empsqooptargetdel --delete-target-dir --target-dir '/tmp/sqooptargetdirdelete' -m 1
This command will refresh the corresponding hdfs directory or hive table data with updated/fresh data, every time this command is run.

beeline query in bash script

simple working beeline query below; when i put in script it will run but I want to put a hivevar for the path, how do I accomplish this as when i put in my script .properties file the ='path' does not seem to work. I am missing something with these single quotes i think and I just can't seem to get it to work.
maxValQuery.hql
WORKING: INSERT OVERWRITE DIRECTORY '/user/tmp/maxVal' select max(${hivevar:MAX_VAL_COL}) from ${hivevar:FACT_TABLE};
WANTED: INSERT OVERWRITE DIRECTORY ${hivevar:PATH_ON_HDFS} select max(${hivevar:MAX_VAL_COL}) from ${hivevar:FACT_TABLE};
script.sh
#! /bin/bash
# I want to add --hivevar PATH_ON_HDFS=${maxValPathOnHDFS}
beeline \
-u $hiveServer2 \
--hivevar DATABASE_NAME_ON_HIVE=${dbNameOnHive} \
--hivevar FACT_TABLE=${mainFactTableOnHive} \
--hivevar MAX_VAL_COL=${factTableIncrementalColumn} \
-f ${maxValQueryFile}
script.properties
dbNameOnHive=poc
mainFactTableOnHive=factTable
factTableIncrementalColumn=aTimeColumn
maxValQueryFile=maxValQuery.hql
#maxValPathOnHDFS='/user/tmp/maxVal'
#I believe problem is above with the single quotes, yes I uncomment when i execute :P
removed single quotes from properties file and added around hivevar in query:
#maxValPathOnHDFS=/user/tmp/maxVal & '${hivevar:PATH_ON_HDFS}'

SQOOP export in shell script fails

I am exporting a table from hive to mysql with the help of shell script.The below is the sqoop export command
sqoop export --connect jdbc:mysql://192.168.154.129:3306/ey -username root --table call_detail_records --export-dir /apps/hive/warehouse/xademo.db/call_detail_records --fields-terminated-by '|' --lines-terminated-by '\n' --m 4 --batch
The above command works fine from the CLI. but it doesnt work from the shell script and it generates the below Warning and error.
Warning :
15/05/05 13:30:06 WARN sqoop.SqoopOptions: Character argument '|' has multiple characters; only the first will be used.
15/05/05 13:30:06 WARN sqoop.SqoopOptions: Character argument '\n' has multiple characters; only the first will be used.
Error:
15/05/05 13:30:50 INFO mapreduce.Job: map 0% reduce 0%
15/05/05 13:31:56 INFO mapreduce.Job: Task Id : attempt_1430805361424_0046_m_000001_0, Status : FAILED
Error: java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.RuntimeException: Can't parse input data: 'PHONE_NUM|PLAN|DATE|STAUS|BALANCE|IMEI|REGION'
at customer_details.__loadFromFields(customer_details.java:464)
at customer_details.parse(customer_details.java:382)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
Caused by: java.util.NoSuchElementException
at java.util.ArrayList$Itr.next(ArrayList.java:834)
at customer_details.__loadFromFields(customer_details.java:434)
... 12 more
My Sqoop command in shell script will have variables which will be expanded.
nohup sqoop export --connect jdbc:mysql://192.168.154.129:3306/ey -username root --table $TBL_NAME --export-dir $HIVE_DIR --fields-terminated-by "$FIELD_SEP" --lines-terminated-by "'"'\'"$LINE_SEP""'" --m $NUM_MAPPERS --batch > $sqoop_outs/$TBL_NAME.out 2>&1 &
Any help is highly appreciated.
I am struggling with this for long time...
Atlast i found the reason, it is the disparate treatment of " and ' in the SQOOP command when i run from the CLI and Shell script.
Solution :
I had to change in my shell script as follows
nohup sqoop export --connect jdbc:mysql://192.168.154.129:3306/ey -username root --table $TBL_NAME --export-dir $HIVE_DIR --fields-terminated-by "$FIELD_SEP" --lines-terminated-by '\'"$LINE_SEP" --m $NUM_MAPPERS --batch > $sqoop_outs/$TBL_NAME.out 2>&1 &
which will issue the SQOOP command as follows, but it worked fine
sqoop export --connect jdbc:mysql://192.168.154.129:3306/ey -username root --table call_detail_records --export-dir /apps/hive/warehouse/xademo.db/call_detail_records --fields-terminated-by | --lines-terminated-by \n --m 4 --batch
This is for import
When you run the sqoop command from cli, the arguments to the option should have ', on the other hand when you run from oozie it should not be enclosed within the single qoute '.
I was using sqoop fro, oozie with the following arguments:
<arg>--fields-terminated-by</arg>
<arg>'\001'</arg>
<arg>--null-string</arg>
<arg>'\\N'</arg>
<arg>--null-non-string</arg>
<arg>'\\N'</arg>
The above code didn't work as expected, but the below code did
<arg>--fields-terminated-by</arg>
<arg>\001</arg>
<arg>--null-string</arg>
<arg>\\N</arg>
<arg>--null-non-string</arg>
<arg>\\N</arg>

Execute multiple sqoop commands from a file

I have multiple sqoop commands, and I want to execute them sequentially. How can I do this.
Currently, --options-file allows us to execute one command at a time.
Use shell script. Write commands one by one and execute the script.It will definitely work.
#!/bin/bash
echo "*************SQOOP IMPORT JOB UTILITY*******************"
# First Sqoop command
echo
sqoop import --connect jdbc:db2://localhost:<PORT_NUMBER>/<DB> --table TABLE_NAME_1 --username user -password pass -m 1 2> log1.txt
# Second Sqoop command
echo
sqoop import --connect jdbc:db2://localhost:<PORT_NUMBER>/<DB> --table TABLE_NAME_2 --username user -password pass -m 1 2> log2.txt
echo "Check log file for sqoop jobs status"
Run shell script
./myscript.sh
I am not sure if that is possible only with Sqoop but for my case i have used Oozie to execute multiple Sqoop commands.

Resources