Sqoop options to load a LOB column from rdbms to Hive - sqoop

I was trying to load LOB column (VARCHAR(28000)) from DB2 database to hive, I am using below sqoop options, but not getting success.
1.
--table
schema.tbl_name
-m
1
--fetch-size
1000
--map-column-java
HVBA02_PRT_LIST_X=BINARY
--map-column-hive
HVBA02_PRT_LIST_X=BINARY
2.
--table
schema.tbl_name
-m
1
--fetch-size
1000
--map-column-java
col_name=String
--map-column-hive
col_name=String

Related

Sqoop : import all tables converting blob types

I'm trying to import all the tables from my Oracle 11g R2 (import-all-tables) and I'm facing a problem with a CLOB type. (Using CDH 5.9, Sqoop 1.4.6-cdh5.9.1)
First try :
sqoop import-all-tables --connect ... --hive-import --hive-overwrite --as-parquet-file --autoreset-to-one-mapper -m XX --direct
Tables are correctly imported until one table with a CLOB column is found and throw the following error : Cannot convert to SQL type 2005.
Second try:
sqoop import-all-tables -D oraoop.disabled=true --connect ... --hive-import --hive-overwrite --as-parquet-file --autoreset-to-one-mapper -m XX
I get the same error.
Third try
sqoop import --connect ... --hive-import --hive-overwrite --as-parquet-file --autoreset-to-one-mapper -m XX --table MyClobTable --map-column-java CLOBCOL=String
This works, so I try to get the same with all the tables:
sqoop import-all-tables --connect ... --hive-import --hive-overwrite --as-parquet-file --autoreset-to-one-mapper -m XX --map-column-java CLOBCOL=String
This fails because just one of my tables has got a CLOBCOL column.
Is there a way to use import-all-tables, fixing the 2005 SQL type error, or telling Sqoop how to resolve it "on the fly"?
Thanks!
you should not use --direct command as the sqoop document says "Sqoop’s direct mode does not support imports of BLOB, CLOB, or LONGVARBINARY columns. Use JDBC-based imports for these columns; do not supply the --direct argument to the import tool."

sqoop import as parquet file to target dir, but can't find the file

I have been using sqoop to import data from mysql to hive, the command I used are below:
sqoop import --connect jdbc:mysql://localhost:3306/datasync \
--username root --password 654321 \
--query 'SELECT id,name FROM test WHERE $CONDITIONS' --split-by id \
--hive-import --hive-database default --hive-table a \
--target-dir /tmp/yfr --as-parquetfile
The Hive table is created and the data is inserted, however I can not find the parquet file.
Does anyone know?
Best regards,
Feiran
Sqoop import to hive works in 2 steps:
Fetching data from RDBMS to HDFS
Create hive table if not exists and Load data into hive table
In your case,
firstly, data is stored at --target-dir i.e. /tmp/yfr
Then, it is loaded into Hive table a using
LOAD DATA INPTH ... INTO TABLE..
command.
As mentioned in the comments, data is moved to hive warehouse directory that's why there is no data in --target-dir.

Incrimental update in HIVE table using sqoop

I have a table in oracle with only 4 columns...
Memberid --- bigint
uuid --- String
insertdate --- date
updatedate --- date
I want to import those data in HIVE table using sqoop. I create corresponding HIVE table with
create EXTERNAL TABLE memberimport(memberid BIGINT,uuid varchar(36),insertdate timestamp,updatedate timestamp)LOCATION '/user/import/memberimport';
and sqoop command
sqoop import --connect jdbc:oracle:thin:#dbURL:1521/dbName --username ** --password *** --hive-import --table MEMBER --columns 'MEMBERID,UUID,INSERTDATE,UPDATEDATE' --map-column-hive MEMBERID=BIGINT,UUID=STRING,INSERTDATE=TIMESTAMP,UPDATEDATE=TIMESTAMP --hive-table memberimport -m 1
Its working properly and able to import data in HIVE table.
Now I want to update this table with incremental update with updatedate (last value today's date) so that I can get day to day update for that OLTP table into my HIVE table using sqoop.
For Incremental import I am using following sqoop command
sqoop import --hive-import --connect jdbc:oracle:thin:#dbURL:1521/dbName --username *** --password *** --table MEMBER --check-column UPDATEDATE --incremental append --columns 'MEMBERID,UUID,INSERTDATE,UPDATEDATE' --map-column-hive MEMBERID=BIGINT,UUID=STRING,INSERTDATE=TIMESTAMP,UPDATEDATE=TIMESTAMP --hive-table memberimport -m 1
But I am getting exception
"Append mode for hive imports is not yet supported. Please remove the parameter --append-mode"
When I remove the --hive-import it run properly but I did not found those new update in HIVE table that I have in OLTP table.
Am I doing anything wrong ?
Please suggest me how can I run incremental update with Oracle - Hive using sqoop.
Any help will be appropriated..
Thanks in Advance ...
Although i don't have resources to replicate your scenario exactly.
You might want to try building a sqoop job and test your use case.
sqoop job --create sqoop_job \
-- import \
--connect "jdbc:oracle://server:port/dbname" \
--username=(XXXX) \
--password=(YYYY) \
--table (TableName)\
--target-dir (Hive Directory corresponding to the table) \
--append \
--fields-terminated-by '(character)' \
--lines-terminated-by '\n' \
--check-column "(Column To Monitor Change)" \
--incremental append \
--last-value (last value of column being monitored) \
--outdir (log directory)
when you create a sqoop job, it takes care of --last-value for subsequent runs. Also here i have used the Hive table's data file as target for incremental update.
Hope this provides a helpful direction to proceed.
There is no direct way to achieve this in Sqoop. However you can use 4 Step Strategy.

Table isn't visible in Hive

I have created a table in Mysql, "A"
And I have created a database in Hive - "hiveankit"
When I try to import table A to the target database with the following command:
[training#localhost ~]$ sqoop import --connect jdbc:mysql://localhost/march2015 --username root --table A -m 1 --target-dir hiveankit;
This is the result:
16/07/02 08:53:19 INFO mapreduce.ImportJobBase: Retrieved 15 records.
[training#localhost ~]$ hive;
Hive history file=/tmp/training/hive_job_log_training_201607020853_1580004608.txt
hive> show databases;
OK
default
hiveankit
Time taken: 3.029 seconds
hive> use hiveankit;
OK
Time taken: 0.044 seconds
hive> select * from A;
FAILED: Error in semantic analysis: Line 1:14 Table not found A
Why I am getting this error.
Am I missing any steps?
The import command should have "--hive-table", "--create-hive-table" and "--hive-import" options to automatically create Hive table during sqoop import. I have modified your code by adding these options (see below). Without these options, the Sqoop import will only copy the data to HDFS and a Hive table will NOT be created.
sqoop-import --connect jdbc:mysql://localhost/march2015 --username root --table A --hive-table ${hive_db_name}.A --create-hive-table --hive-import -m 1 --target-dir hiveankit;

import data from vertica to hive

I try to upload data from Vertica to Hive by using Sqoop.
I can see that it creates a file and a table on HIVE, but when i try to select the data from the HIVE or from the file i cannot see the data. it shows me an ERROR(there is no delimiter on the column of the file) select.
this is my code:
sqoop import -m -1 --driver com.vertica.jdbc.Driver --connect "jdbc:vertica://serverName:5443/DBName" --username "user" --password "pass" --query 'select id, name from contacts limit 10' --target-dir "folder/contacts" --hive-import --create-hive-table --hive-table db.contacts
Use these arguments and choose a delimiters for your data
--fields-terminated-by
--lines-terminated-by

Resources