Passing libjars in sqoop job - sqoop

I need to pass libjars in Sqoop import but it failed with "ERROR tool.BaseSqoopTool: Unrecognized argument: libjars"
My Sqoop command is:
sqoop job --create myjob -- import
-libjars /var/lib/sqoop/db2jcc4.jar,/var/lib/sqoop/db2jcc.jar
- Dhadoop.security.credential.provider.path=jceks://hdfs/user/xyz/db2/db2_password.jceks
--driver com.ibm.db2.jcc.DB2Driver --connect jdbc:db2://server:3714/XYX --username user --password-alias
db2.password.alias --table db.table_name --fields-terminated-by '\001'
--null-string '\N' --delete-target-dir --target-dir /user/jainm2/test_data1 -split-by "col_name" -m 3 --delete-target-dir
--incremental append --last-value "2005-02-14 16:23:25"

As per Sqoop document generic arguments should be provided next to sqoop job,
sqoop job (generic-args) (job-args) [-- [subtool-name] (subtool-args)]
Try in this way and let me know if it works fine.

Related

hive-import and hive-overwrite with sqoop import all

sqoop import-all-tables --connect jdbc:mysql://localhost/SomeDB --username root --hive-database test --hive-import;
The above command is working fine but it's duplicating the values in the destination tables. I used the below to overwrite the data.
sqoop import-all-tables --connect jdbc:mysql://localhost/SomeDB --username root --hive-import --hive-database Test --hive-overwrite
This replaced all the values in the table and inserted only null values. If I am removing --hive-import then also it's not working. What wrong I am doing here?
This will solve the problem.
sqoop import-all-tables
--connect jdbc:mysql://localhost/SomeDB
--username root
--hive-import
--warehouse-dir /user/hive/warehouse/Test
--hive-database Test
--hive-overwrite

Passing date parameter to sqoop import into Hive table

I am importing a set of tables from an Oracle database into Hive using sqoop import statement as follows:
sqoop import "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" --connect CONNECTIONSTRING --table TABLENAME --username USERNAME --password PASSWORD --hive-import --hive-drop-import-delims --hive-overwrite --hive-table HIVE_TABLE_NAME1 --null-string '\N' --null-non-string '\N' -m 1
and i am using the following check column keyword in this sqoop statement for incremental loads:
--check-column COLUMN_NAME --incremental lastmodified --last-value HARDCODED_DATE
I tested this and it works great but I want to modify this so that it is dynamic and I dont have to hard code the date into the statement and I can just pass it as a parameter so that it checks the specified column and gets all the data after that date. I understand that the date has to be passed from a different file but I am not really sure what the structure of the file should be and how it would be referencing this sqoop statement. Any help or guidance would be greatly appreciated. Thank you in advance!
You can use sqoop job for the same.
Using sqoop job, you have to apply last-value as 0, it will import and update the data in the job so you only have to run sqoop-job --exec <> everytime, it will update the data without any hardcoded value.
sqoop job create <<job_name>> -- import "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" --connect <<db_url>> --table <<db_name>> --username <<username>> --password <<password>> --hive-import --hive-drop-import-delims --hive-overwrite --hive-table <<hive_table>> --null-string '\N' --null-non-string '\N' -m 1 --incremental lastmodified --check-column timedate --last-value 0
sqoop job --exec <<job_name>>
For more details visit https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_literal_sqoop_job_literal

Sqoop-Imported data is not shown in the target directory

I have imported the data from MYSQL to HDFS with Sqoop but not able to see the imported data into desired given path.
Sqoop query is like -
sqoop job --create EveryDayImport --import --connect jdbc:mysql://localhost:3306/books --username=root --table=authors -m 1 --target-dir /home/training/viresh/Sqoop/authors1234 --incremental append --check-column id --last-value 0;
sqoop job --create EveryDayImport -- import --connect jdbc:mysql://localhost:3306/books --username=root --table=authors -m 1 --target-dir /home/training/viresh/Sqoop/authors1234 --incremental append --check-column id --last-value 0
There is a mistake in your Sqoop statement that you missed to give space between "--" and import as mentioned in the comment by dev
Your sqoop statement use to create a sqoop job. To execute you job (sqoop import) you have to submit it by below statement.
$ sqoop job --exec EveryDayImport
I feel this is the reason no data present in your target dir

saved sqoop job not using time zone of the server

The following saved sqoop job is using a timezone not that of the server in which the job is saved.
sqoop job --create myjob9 -- import --connect jdbc:oracle:thin:#xyz:1234/abc --check-column LAST_UPDATE_DATETIME --incremental lastmodified --last-value "2015-02-15 19.19.37.000000000" --hive-import --table SIM_UNAUDITED_SALES_TMP --append
Last value when the job is executed is 1 hour ahead of the system time. How do I sync the timezone?
you can use the following generic-argument in order to set the server timezone:
-D mapreduce.map.java.opts=" -Duser.timezone=$your_timezone"
Be careful to use this generic argument before calling the job-args. So, you can make it this way:
sqoop job -D mapreduce.map.java.opts=" -Duser.timezone=$your_timezone" --create myjob9 -- import --connect jdbc:oracle:thin:#xyz:1234/abc --check-column LAST_UPDATE_DATETIME --incremental lastmodified --last-value "2015-02-15 19.19.37.000000000" --hive-import --table SIM_UNAUDITED_SALES_TMP --append

How to run a sqoop import and associate the task with specific scheduler queue

I am stuck with a situation where I need to run a sqoop import and put the MR job into a specific queue.
I tried the following command but it doesn't work.
/usr/bin/sqoop import -Dmapred.job.queue.name=scheduledjobs --username=hduser --password=XXXXXXX --connect jdbc:mysql://127.0.0.1/analytics --fields-terminated-by ',' --query "SELECT email FROM analytics.store WHERE \$CONDITIONS" -m1 --hive-import --hive-table "abce.ucsd" --hive-overwrite --target-dir /result/
Also this did not work
/usr/bin/sqoop import --Dmapred.job.queue.name=scheduledjobs --username=hduser --password=XXXXXXX --connect jdbc:mysql://127.0.0.1/analytics --fields-terminated-by ',' --query "SELECT email FROM analytics.store WHERE \$CONDITIONS" -m1 --hive-import --hive-table "abce.ucsd" --hive-overwrite --target-dir /result/
Please let me know what am i doing wrong.
This is an old question, but maybe the answer will help somebody else. The sqoop import above has 1 extra - before the Dmapred.job.queue.name call.
You have
/usr/bin/sqoop import **--**Dmapred.job.queue.name=scheduledjobs --username=hduser
and it should be
/usr/bin/sqoop import **-**Dmapred.job.queue.name=scheduledjobs --username=hduser
If you use the one with --, it will fail with an Error parsing arguments message.

Resources