sqoop to import data to hive - hadoop

i am trying to import data to hive table using sqoop2. I am using --hive-import but it is not working
Code:
sqoop import --connect jdbc:sqlserver://192.168.x.xxx:11xx --username user --password user --table xxxx.NOTIFICATION --hive-import
Error:
ERROR manager.SqlManager: Error executing statement: com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name 'XXXX.NOTIFICATION'.
What am I doing wrong?

Below observations are based on Sqoop 1.4.6
you are using . (dot) in your table name.
Internally, Sqoop will fire command
SELECT t.* FROM xxxx.NOTIFICATION AS t WHERE 1=0
to fetch metadata of your SQL Server table.
This command is interpreted as
xxxx - schame name
NOTIFICATION - Table name
To avoid this you can use escape character ( [ ] in case of SQL Server):
sqoop import --connect jdbc:sqlserver://192.168.x.xxx:11xx --username user --password user --table [xxxx.NOTIFICATION] --hive-import
This will generate
SELECT t.* FROM [xxxx.NOTIFICATION] AS t WHERE 1=0
Now xxxx.NOTIFICATION will be treated as table name.

Hi after doing a bit research and discussing on the question with #dev i found the solution.
I am using sqoop2 so i changed my command and used below one and it worked for me.
$ sqoop import --connect "jdbc:sqlserver://192.168.x.xxx:11xx;database=SSSS;username=user;password=user" --query "SELECT * FROM xxxx.NOTIFICATION where \$CONDITIONS" --split-by xxxx.NOTIFICATION.ID --hive-import --hive-table NOTIFICATION --target-dir NOTIFICATION
before executing this command we should create table in hive using create command. Here i have created hive table named NOTIFICATION.

I assume the table name is NOTIFICATION and you are trying to mention database name xxxx when you write --table xxxx.NOTIFICATION
If this is the case, can you please try the below mentioned syntax instead?
sqoop import --connect jdbc:sqlserver://192.168.x.xxx:11xx;databaseName=xxxx --username user --password user --table NOTIFICATION --hive-import

Related

Is it possible to import a table with sqoop and add an extra timestamp column?

is it possible to use the sqoop command "import table" to import a table from an oracle database to an Hadoop cluster and add an extra column with the current timestamp (for troubleshouting purposes)? so far, I have the following command:
sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true --connect jdbc:oracle:thin:#//MY_ORACLE_SERVER --username USERNAME --password PASSWORD --target-dir /MyDIR --fields-terminated-by '\b' --table SOURCE_TABLE --hive-table DESTINATION_TABLE --hive-import --hive-overwrite --hive-delims-replacement '<newline>'
I would like to add a timestamp column to the table so that I know when that data was loaded. Is it possible?
Thanks in advance
you can use the free-form query import instead of table import, and call the timestamp function :
sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true --connect jdbc:oracle:thin:#//MY_ORACLE_SERVER --username USERNAME --password PASSWORD --target-dir /MyDIR --fields-terminated-by '\b' ----query 'SELECT a.*,systimestamp FROM SOURCE_TABLE a' --hive-table DESTINATION_TABLE --hive-import --hive-overwrite --hive-delims-replacement '<newline>'
Maybe you could use sysdate instead systimestamp (smaller datatype but less precision)
You can create a temp hive table by using sqoop ,after that create a new hive table by using old one with extra required columns.

Sqoop : import all tables converting blob types

I'm trying to import all the tables from my Oracle 11g R2 (import-all-tables) and I'm facing a problem with a CLOB type. (Using CDH 5.9, Sqoop 1.4.6-cdh5.9.1)
First try :
sqoop import-all-tables --connect ... --hive-import --hive-overwrite --as-parquet-file --autoreset-to-one-mapper -m XX --direct
Tables are correctly imported until one table with a CLOB column is found and throw the following error : Cannot convert to SQL type 2005.
Second try:
sqoop import-all-tables -D oraoop.disabled=true --connect ... --hive-import --hive-overwrite --as-parquet-file --autoreset-to-one-mapper -m XX
I get the same error.
Third try
sqoop import --connect ... --hive-import --hive-overwrite --as-parquet-file --autoreset-to-one-mapper -m XX --table MyClobTable --map-column-java CLOBCOL=String
This works, so I try to get the same with all the tables:
sqoop import-all-tables --connect ... --hive-import --hive-overwrite --as-parquet-file --autoreset-to-one-mapper -m XX --map-column-java CLOBCOL=String
This fails because just one of my tables has got a CLOBCOL column.
Is there a way to use import-all-tables, fixing the 2005 SQL type error, or telling Sqoop how to resolve it "on the fly"?
Thanks!
you should not use --direct command as the sqoop document says "Sqoop’s direct mode does not support imports of BLOB, CLOB, or LONGVARBINARY columns. Use JDBC-based imports for these columns; do not supply the --direct argument to the import tool."

import data from vertica to hive

I try to upload data from Vertica to Hive by using Sqoop.
I can see that it creates a file and a table on HIVE, but when i try to select the data from the HIVE or from the file i cannot see the data. it shows me an ERROR(there is no delimiter on the column of the file) select.
this is my code:
sqoop import -m -1 --driver com.vertica.jdbc.Driver --connect "jdbc:vertica://serverName:5443/DBName" --username "user" --password "pass" --query 'select id, name from contacts limit 10' --target-dir "folder/contacts" --hive-import --create-hive-table --hive-table db.contacts
Use these arguments and choose a delimiters for your data
--fields-terminated-by
--lines-terminated-by

Appending Data to hive Table using Sqoop

I am trying to append data to already existing Table in hive.Using the Following command first i import the table from MS-SQL Server to hive.
Sqoop Command:
sqoop import --connect "jdbc:sqlserver://XXX.XX.XX.XX;databaseName=mydatabase" --table "my_table" --where "Batch_Id > 100" --username myuser --password mypassword --hive-import
Now i want to append the data to same existing table in hive where "Batch_Id < 100"
I am using the following Command:
sqoop import --connect "jdbc:sqlserver://XXX.XX.XX.XX;databaseName=mydatabase" --table "my_table" --where "Batch_Id < 100" --username myuser --password mypassword --append --hive-table my_table
This command however runs successfully also updates the HDFS data, but when u connect to hive shell and query the table, the records which are appended are not visible.
Sqoop updated the Data on hdfs "/user/hduser/my_table" but the data on "/user/hive/warehouse/batch_dim" is not updated.
How can reslove this issue.
Regards,
Bhagwant Bhobe
Try using
sqoop import --connect "jdbc:sqlserver://XXX.XX.XX.XX;databaseName=mydatabase"
--table "my_table" --where "Batch_Id < 100"
--username myuser --password mypassword
--hive-import --hive-table my_table
when you are using --hive-import DO NOT use --append parameter.
The Sqoop command you're using (--import) is only for ingesting records into HDFS. You need to use the --hive-import flag to import records into Hive.
See http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_importing_data_into_hive for more details and for additional import configuration options (you may want to change the document reference to your version of Sqoop, of course).

sqoop import complete but hive show tables can't see table

After install hadoop, hive (CDH version) I execute
./sqoop import -connect jdbc:mysql://10.164.11.204/server -username root -password password -table user -hive-import --hive-home /opt/hive/
All goes fine, but when I enter hive command line and execute show tables, there are nothing.
I use ./hadoop fs -ls, I can see /user/(username)/user existing.
Any help is appreciated.
---EDIT-----------
/sqoop import -connect jdbc:mysql://10.164.11.204/server -username root -password password -table user -hive-import --target-dir /user/hive/warehouse
import fail due to :
11/07/02 00:40:00 INFO hive.HiveImport: FAILED: Error in semantic analysis: line 2:17 Invalid Path 'hdfs://hadoop1:9000/user/ubuntu/user': No files matching path hdfs://hadoop1:9000/user/ubuntu/user
11/07/02 00:40:00 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Hive exited with status 10
at com.cloudera.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:326)
at com.cloudera.sqoop.hive.HiveImport.executeScript(HiveImport.java:276)
at com.cloudera.sqoop.hive.HiveImport.importTable(HiveImport.java:218)
at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:362)
at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:423)
at com.cloudera.sqoop.Sqoop.run(Sqoop.java:144)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:180)
at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:218)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:228)
Check your hive-site.xml for the value of the property
javax.jdo.option.ConnectionURL. If you do not define this explicitly,
the default value will use a relative path for creation of hive
metastore (jdbc:derby:;databaseName=metastore_db;create=true) which
will be different depending upon where you launch the process from.
This would explain why you cannot see the table via show tables.
define this property value in your
hive-site.xml using an absolute path
no need of creating the table in hive..refer the below query
sqoop import --connect jdbc:mysql://xxxx.com/Database name --username root --password admin --table tablename (mysql table) --direct -m 1 --hive-import --create-hive-table --hive-table table name --target-dir '/user/hive/warehouse/Tablename(which u want create in hive)' --fields-terminated-by '\t'
In my case Hive stores data in /user/hive/warehouse directory in HDFS. This is where Sqoop should put it.
So I guess you have to add:
--target-dir /user/hive/warehouse
Which is default location for Hive tables (might be different in your case).
You might also want to create this table in Hive:
sqoop create-hive-table --connect jdbc:mysql://host/database --table tableName --username user --password password
in my case it creates table in hive default database, you can give it a try.
sqoop import --connect jdbc:mysql://xxxx.com/Database name --username root --password admin --table NAME --hive-import --warehouse-dir DIR --create-hive-table --hive-table NAME -m 1
Hive tables will be created by Sqoop import process. Please make sure the /user/hive/warehouse is created in you HDFS. You can browse the HDFS (http://localhost:50070/dfshealth.jsp - Browse the File System option.
Also include the HDFS local in -target dir i.e hdfs://:9000/user/hive/warehouse in the sqoop import command.
First of all , create the table definition in Hive with exact field names and types as in mysql.
Then, perform the import operation
For Hive Import
sqoop import --verbose --fields-terminated-by ',' --connect jdbc:mysql://localhost/test --table tablename --hive-import --warehouse-dir /user/hive/warehouse --fields-terminated-by ',' --split-by id --hive-table tablename
'id' can be your primary key of the existing table
'localhost' can be your local ip
'test' is database
'warehouse' directory is in HDFS
I think all you need is to specify the hive table where data should go.
add "--hive-table database.tablename" to the sqoop command and remove the --hive-home /opt/hive/. I think that should resolve the problem.

Resources