Importing vertica data to sqoop - hadoop

I am injecting vertica data to sqoop1 on a mapr cluster. I use the following query :
sqoop import -m 1 --driver com.vertica.jdbc.Driver --connect "jdbc:vertica://*******:5433/db_name" --password "password" --username "username" --table "schemaName.tableName" --columns "id" --target-dir "/t" --verbose
This query gives me an error that
Caused by: com.vertica.util.ServerException: [Vertica][VJDBC](4856) ERROR: Syntax error at or near "."
I read https://groups.google.com/a/cloudera.org/forum/#!msg/cdh-user/xIBwvc_eOp0/TvhANQfvcv4J for getting more information regarding this, but wasnt quite helpful as they gave results on Sqoop2.
When I run this query :
sqoop import -m 1 --driver com.vertica.jdbc.Driver --connect "jdbc:vertica://*******:5433/db_name" --password "password" --username "username" --table "tableName" --columns "id" --target-dir "/t" --verbose
It gives an error: Relation "tableName" doesnt exist.
I have added the required vertica-jdk jars in sqoop library too.
Any help regarding how to mention schema name in sqoop for vertica?

You can specify the schema name to use in the connection string like this:
--connect "jdbc:vertica://*******:5433/db_name?searchpath=myschema"

I changed the statement to --query and the schema.table is working fine there. So the statement is :
sqoop import -m 1 --driver com.vertica.jdbc.Driver --connect "jdbc:vertica://*****:5433/dbName" --password "*****" --username "******" --target-dir "/tmp/cdsdj" --verbose --query 'SELECT t.col1 FROM schema.tableName t where $CONDITIONS'

Related

hive-import and hive-overwrite with sqoop import all

sqoop import-all-tables --connect jdbc:mysql://localhost/SomeDB --username root --hive-database test --hive-import;
The above command is working fine but it's duplicating the values in the destination tables. I used the below to overwrite the data.
sqoop import-all-tables --connect jdbc:mysql://localhost/SomeDB --username root --hive-import --hive-database Test --hive-overwrite
This replaced all the values in the table and inserted only null values. If I am removing --hive-import then also it's not working. What wrong I am doing here?
This will solve the problem.
sqoop import-all-tables
--connect jdbc:mysql://localhost/SomeDB
--username root
--hive-import
--warehouse-dir /user/hive/warehouse/Test
--hive-database Test
--hive-overwrite

error while performing sqoop - merge

I was trying to sqoop merge two data sets by importing the data from the netezza server.
below are the data sets with the numeric as id and letters as name:
Both of the below tables are imported from netezza using the commands:
sqoop import --connect neteeza_url --username uname --password pwd --table sqoop_merge_1 --hive-import --warehouse-dir hdfs_pth --create-hive-table sqoop_merge_1 -m 1
sqoop_merge_1:
1,a
2,b
3,c
4,d
5,e
sqoop_merge_2:
4,z
5,y
and the commands are:
sqoop merge --new-data hdfs_path/sqoop_merge_2 --onto hdfs_path/sqoop_merge_1 --target-dir hdfs_path/sqoop_merge_output --jar-file jar_file_path/sqoop_merge_class_name.jar --class-name sqoop_merge_class_name --merge-key id
I created the jar file by using the codegen command:
sqoop codegen --connect netezza_url --username uname --password -pwd --table sqoop_merge_1
But I am getting the following error:
java.io.IOException: Cannot join values on null key. Did you specify a key column that exists?
Tried all the ways i knew but still getting the error.
Please help.
As you are sure about id column existence, it could be an issue due to case-sensitivity.
Check if you specified ID in Netezza?
If yes, try with --merge-key ID.

sqoop to import data to hive

i am trying to import data to hive table using sqoop2. I am using --hive-import but it is not working
Code:
sqoop import --connect jdbc:sqlserver://192.168.x.xxx:11xx --username user --password user --table xxxx.NOTIFICATION --hive-import
Error:
ERROR manager.SqlManager: Error executing statement: com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name 'XXXX.NOTIFICATION'.
What am I doing wrong?
Below observations are based on Sqoop 1.4.6
you are using . (dot) in your table name.
Internally, Sqoop will fire command
SELECT t.* FROM xxxx.NOTIFICATION AS t WHERE 1=0
to fetch metadata of your SQL Server table.
This command is interpreted as
xxxx - schame name
NOTIFICATION - Table name
To avoid this you can use escape character ( [ ] in case of SQL Server):
sqoop import --connect jdbc:sqlserver://192.168.x.xxx:11xx --username user --password user --table [xxxx.NOTIFICATION] --hive-import
This will generate
SELECT t.* FROM [xxxx.NOTIFICATION] AS t WHERE 1=0
Now xxxx.NOTIFICATION will be treated as table name.
Hi after doing a bit research and discussing on the question with #dev i found the solution.
I am using sqoop2 so i changed my command and used below one and it worked for me.
$ sqoop import --connect "jdbc:sqlserver://192.168.x.xxx:11xx;database=SSSS;username=user;password=user" --query "SELECT * FROM xxxx.NOTIFICATION where \$CONDITIONS" --split-by xxxx.NOTIFICATION.ID --hive-import --hive-table NOTIFICATION --target-dir NOTIFICATION
before executing this command we should create table in hive using create command. Here i have created hive table named NOTIFICATION.
I assume the table name is NOTIFICATION and you are trying to mention database name xxxx when you write --table xxxx.NOTIFICATION
If this is the case, can you please try the below mentioned syntax instead?
sqoop import --connect jdbc:sqlserver://192.168.x.xxx:11xx;databaseName=xxxx --username user --password user --table NOTIFICATION --hive-import

Appending Data to hive Table using Sqoop

I am trying to append data to already existing Table in hive.Using the Following command first i import the table from MS-SQL Server to hive.
Sqoop Command:
sqoop import --connect "jdbc:sqlserver://XXX.XX.XX.XX;databaseName=mydatabase" --table "my_table" --where "Batch_Id > 100" --username myuser --password mypassword --hive-import
Now i want to append the data to same existing table in hive where "Batch_Id < 100"
I am using the following Command:
sqoop import --connect "jdbc:sqlserver://XXX.XX.XX.XX;databaseName=mydatabase" --table "my_table" --where "Batch_Id < 100" --username myuser --password mypassword --append --hive-table my_table
This command however runs successfully also updates the HDFS data, but when u connect to hive shell and query the table, the records which are appended are not visible.
Sqoop updated the Data on hdfs "/user/hduser/my_table" but the data on "/user/hive/warehouse/batch_dim" is not updated.
How can reslove this issue.
Regards,
Bhagwant Bhobe
Try using
sqoop import --connect "jdbc:sqlserver://XXX.XX.XX.XX;databaseName=mydatabase"
--table "my_table" --where "Batch_Id < 100"
--username myuser --password mypassword
--hive-import --hive-table my_table
when you are using --hive-import DO NOT use --append parameter.
The Sqoop command you're using (--import) is only for ingesting records into HDFS. You need to use the --hive-import flag to import records into Hive.
See http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_importing_data_into_hive for more details and for additional import configuration options (you may want to change the document reference to your version of Sqoop, of course).

How to use a specified Hive database when using Sqoop import

sqoop import --connect jdbc:mysql://remote-ip/db --username xxx --password xxx --table tb --hive-import
The above command imports table tb into the 'default' Hive database.
Can I use other database instead?
Off the top of my head i recall you can specify --hive-table foo.tb
where foo is your hive database and tb is your hive table.
so in your case it would be:
sqoop import --connect jdbc:mysql://remote-ip/db --username xxx --password xxx --table tb --hive-import --hive-table foo.tb
As a footnote, here is the original jira issue https://issues.apache.org/jira/browse/SQOOP-322
Hive database using Sqoop import:
sqoop import --connect jdbc:mysql://localhost/arun --table account --username root --password root -m 1 --hive-import **--hive-database** company **--create-hive-table --hive-table** account --target-dir /tmp/customer/ac
You can specify the database name as a part of the --hive-table parameter, e.g. "--hive-table foo.tb".
There is a new request to add a special parameter for the database that is being tracked: SQOOP-912.

Resources