hive-import and hive-overwrite with sqoop import all - hadoop

sqoop import-all-tables --connect jdbc:mysql://localhost/SomeDB --username root --hive-database test --hive-import;
The above command is working fine but it's duplicating the values in the destination tables. I used the below to overwrite the data.
sqoop import-all-tables --connect jdbc:mysql://localhost/SomeDB --username root --hive-import --hive-database Test --hive-overwrite
This replaced all the values in the table and inserted only null values. If I am removing --hive-import then also it's not working. What wrong I am doing here?

This will solve the problem.
sqoop import-all-tables
--connect jdbc:mysql://localhost/SomeDB
--username root
--hive-import
--warehouse-dir /user/hive/warehouse/Test
--hive-database Test
--hive-overwrite

Related

Passing date parameter to sqoop import into Hive table

I am importing a set of tables from an Oracle database into Hive using sqoop import statement as follows:
sqoop import "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" --connect CONNECTIONSTRING --table TABLENAME --username USERNAME --password PASSWORD --hive-import --hive-drop-import-delims --hive-overwrite --hive-table HIVE_TABLE_NAME1 --null-string '\N' --null-non-string '\N' -m 1
and i am using the following check column keyword in this sqoop statement for incremental loads:
--check-column COLUMN_NAME --incremental lastmodified --last-value HARDCODED_DATE
I tested this and it works great but I want to modify this so that it is dynamic and I dont have to hard code the date into the statement and I can just pass it as a parameter so that it checks the specified column and gets all the data after that date. I understand that the date has to be passed from a different file but I am not really sure what the structure of the file should be and how it would be referencing this sqoop statement. Any help or guidance would be greatly appreciated. Thank you in advance!
You can use sqoop job for the same.
Using sqoop job, you have to apply last-value as 0, it will import and update the data in the job so you only have to run sqoop-job --exec <> everytime, it will update the data without any hardcoded value.
sqoop job create <<job_name>> -- import "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" --connect <<db_url>> --table <<db_name>> --username <<username>> --password <<password>> --hive-import --hive-drop-import-delims --hive-overwrite --hive-table <<hive_table>> --null-string '\N' --null-non-string '\N' -m 1 --incremental lastmodified --check-column timedate --last-value 0
sqoop job --exec <<job_name>>
For more details visit https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_literal_sqoop_job_literal

Passing libjars in sqoop job

I need to pass libjars in Sqoop import but it failed with "ERROR tool.BaseSqoopTool: Unrecognized argument: libjars"
My Sqoop command is:
sqoop job --create myjob -- import
-libjars /var/lib/sqoop/db2jcc4.jar,/var/lib/sqoop/db2jcc.jar
- Dhadoop.security.credential.provider.path=jceks://hdfs/user/xyz/db2/db2_password.jceks
--driver com.ibm.db2.jcc.DB2Driver --connect jdbc:db2://server:3714/XYX --username user --password-alias
db2.password.alias --table db.table_name --fields-terminated-by '\001'
--null-string '\N' --delete-target-dir --target-dir /user/jainm2/test_data1 -split-by "col_name" -m 3 --delete-target-dir
--incremental append --last-value "2005-02-14 16:23:25"
As per Sqoop document generic arguments should be provided next to sqoop job,
sqoop job (generic-args) (job-args) [-- [subtool-name] (subtool-args)]
Try in this way and let me know if it works fine.

Importing vertica data to sqoop

I am injecting vertica data to sqoop1 on a mapr cluster. I use the following query :
sqoop import -m 1 --driver com.vertica.jdbc.Driver --connect "jdbc:vertica://*******:5433/db_name" --password "password" --username "username" --table "schemaName.tableName" --columns "id" --target-dir "/t" --verbose
This query gives me an error that
Caused by: com.vertica.util.ServerException: [Vertica][VJDBC](4856) ERROR: Syntax error at or near "."
I read https://groups.google.com/a/cloudera.org/forum/#!msg/cdh-user/xIBwvc_eOp0/TvhANQfvcv4J for getting more information regarding this, but wasnt quite helpful as they gave results on Sqoop2.
When I run this query :
sqoop import -m 1 --driver com.vertica.jdbc.Driver --connect "jdbc:vertica://*******:5433/db_name" --password "password" --username "username" --table "tableName" --columns "id" --target-dir "/t" --verbose
It gives an error: Relation "tableName" doesnt exist.
I have added the required vertica-jdk jars in sqoop library too.
Any help regarding how to mention schema name in sqoop for vertica?
You can specify the schema name to use in the connection string like this:
--connect "jdbc:vertica://*******:5433/db_name?searchpath=myschema"
I changed the statement to --query and the schema.table is working fine there. So the statement is :
sqoop import -m 1 --driver com.vertica.jdbc.Driver --connect "jdbc:vertica://*****:5433/dbName" --password "*****" --username "******" --target-dir "/tmp/cdsdj" --verbose --query 'SELECT t.col1 FROM schema.tableName t where $CONDITIONS'

Appending Data to hive Table using Sqoop

I am trying to append data to already existing Table in hive.Using the Following command first i import the table from MS-SQL Server to hive.
Sqoop Command:
sqoop import --connect "jdbc:sqlserver://XXX.XX.XX.XX;databaseName=mydatabase" --table "my_table" --where "Batch_Id > 100" --username myuser --password mypassword --hive-import
Now i want to append the data to same existing table in hive where "Batch_Id < 100"
I am using the following Command:
sqoop import --connect "jdbc:sqlserver://XXX.XX.XX.XX;databaseName=mydatabase" --table "my_table" --where "Batch_Id < 100" --username myuser --password mypassword --append --hive-table my_table
This command however runs successfully also updates the HDFS data, but when u connect to hive shell and query the table, the records which are appended are not visible.
Sqoop updated the Data on hdfs "/user/hduser/my_table" but the data on "/user/hive/warehouse/batch_dim" is not updated.
How can reslove this issue.
Regards,
Bhagwant Bhobe
Try using
sqoop import --connect "jdbc:sqlserver://XXX.XX.XX.XX;databaseName=mydatabase"
--table "my_table" --where "Batch_Id < 100"
--username myuser --password mypassword
--hive-import --hive-table my_table
when you are using --hive-import DO NOT use --append parameter.
The Sqoop command you're using (--import) is only for ingesting records into HDFS. You need to use the --hive-import flag to import records into Hive.
See http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_importing_data_into_hive for more details and for additional import configuration options (you may want to change the document reference to your version of Sqoop, of course).

How to use a specified Hive database when using Sqoop import

sqoop import --connect jdbc:mysql://remote-ip/db --username xxx --password xxx --table tb --hive-import
The above command imports table tb into the 'default' Hive database.
Can I use other database instead?
Off the top of my head i recall you can specify --hive-table foo.tb
where foo is your hive database and tb is your hive table.
so in your case it would be:
sqoop import --connect jdbc:mysql://remote-ip/db --username xxx --password xxx --table tb --hive-import --hive-table foo.tb
As a footnote, here is the original jira issue https://issues.apache.org/jira/browse/SQOOP-322
Hive database using Sqoop import:
sqoop import --connect jdbc:mysql://localhost/arun --table account --username root --password root -m 1 --hive-import **--hive-database** company **--create-hive-table --hive-table** account --target-dir /tmp/customer/ac
You can specify the database name as a part of the --hive-table parameter, e.g. "--hive-table foo.tb".
There is a new request to add a special parameter for the database that is being tracked: SQOOP-912.

Resources