Sqoop options to load a LOB column from rdbms to Hive

Sqoop options to load a LOB column from rdbms to Hive - sqoop

I was trying to load LOB column (VARCHAR(28000)) from DB2 database to hive, I am using below sqoop options, but not getting success.
1.
--table
schema.tbl_name
-m
1
--fetch-size
1000
--map-column-java
HVBA02_PRT_LIST_X=BINARY
--map-column-hive
HVBA02_PRT_LIST_X=BINARY
2.
--table
schema.tbl_name
-m
1
--fetch-size
1000
--map-column-java
col_name=String
--map-column-hive
col_name=String

Related

Sqoop : import all tables converting blob types

I'm trying to import all the tables from my Oracle 11g R2 (import-all-tables) and I'm facing a problem with a CLOB type. (Using CDH 5.9, Sqoop 1.4.6-cdh5.9.1)
First try :
sqoop import-all-tables --connect ... --hive-import --hive-overwrite --as-parquet-file --autoreset-to-one-mapper -m XX --direct
Tables are correctly imported until one table with a CLOB column is found and throw the following error : Cannot convert to SQL type 2005.
Second try:
sqoop import-all-tables -D oraoop.disabled=true --connect ... --hive-import --hive-overwrite --as-parquet-file --autoreset-to-one-mapper -m XX
I get the same error.
Third try
sqoop import --connect ... --hive-import --hive-overwrite --as-parquet-file --autoreset-to-one-mapper -m XX --table MyClobTable --map-column-java CLOBCOL=String
This works, so I try to get the same with all the tables:
sqoop import-all-tables --connect ... --hive-import --hive-overwrite --as-parquet-file --autoreset-to-one-mapper -m XX --map-column-java CLOBCOL=String
This fails because just one of my tables has got a CLOBCOL column.
Is there a way to use import-all-tables, fixing the 2005 SQL type error, or telling Sqoop how to resolve it "on the fly"?
Thanks!

you should not use --direct command as the sqoop document says "Sqoop’s direct mode does not support imports of BLOB, CLOB, or LONGVARBINARY columns. Use JDBC-based imports for these columns; do not supply the --direct argument to the import tool."

sqoop import as parquet file to target dir, but can't find the file

I have been using sqoop to import data from mysql to hive, the command I used are below:
sqoop import --connect jdbc:mysql://localhost:3306/datasync \
--username root --password 654321 \
--query 'SELECT id,name FROM test WHERE $CONDITIONS' --split-by id \
--hive-import --hive-database default --hive-table a \
--target-dir /tmp/yfr --as-parquetfile
The Hive table is created and the data is inserted, however I can not find the parquet file.
Does anyone know?
Best regards,
Feiran

Sqoop import to hive works in 2 steps:
Fetching data from RDBMS to HDFS
Create hive table if not exists and Load data into hive table
In your case,
firstly, data is stored at --target-dir i.e. /tmp/yfr
Then, it is loaded into Hive table a using
LOAD DATA INPTH ... INTO TABLE..
command.
As mentioned in the comments, data is moved to hive warehouse directory that's why there is no data in --target-dir.

Incrimental update in HIVE table using sqoop

I have a table in oracle with only 4 columns...
Memberid --- bigint
uuid --- String
insertdate --- date
updatedate --- date
I want to import those data in HIVE table using sqoop. I create corresponding HIVE table with
create EXTERNAL TABLE memberimport(memberid BIGINT,uuid varchar(36),insertdate timestamp,updatedate timestamp)LOCATION '/user/import/memberimport';
and sqoop command
sqoop import --connect jdbc:oracle:thin:#dbURL:1521/dbName --username ** --password *** --hive-import --table MEMBER --columns 'MEMBERID,UUID,INSERTDATE,UPDATEDATE' --map-column-hive MEMBERID=BIGINT,UUID=STRING,INSERTDATE=TIMESTAMP,UPDATEDATE=TIMESTAMP --hive-table memberimport -m 1
Its working properly and able to import data in HIVE table.
Now I want to update this table with incremental update with updatedate (last value today's date) so that I can get day to day update for that OLTP table into my HIVE table using sqoop.
For Incremental import I am using following sqoop command
sqoop import --hive-import --connect jdbc:oracle:thin:#dbURL:1521/dbName --username *** --password *** --table MEMBER --check-column UPDATEDATE --incremental append --columns 'MEMBERID,UUID,INSERTDATE,UPDATEDATE' --map-column-hive MEMBERID=BIGINT,UUID=STRING,INSERTDATE=TIMESTAMP,UPDATEDATE=TIMESTAMP --hive-table memberimport -m 1
But I am getting exception
"Append mode for hive imports is not yet supported. Please remove the parameter --append-mode"
When I remove the --hive-import it run properly but I did not found those new update in HIVE table that I have in OLTP table.
Am I doing anything wrong ?
Please suggest me how can I run incremental update with Oracle - Hive using sqoop.
Any help will be appropriated..
Thanks in Advance ...

Although i don't have resources to replicate your scenario exactly.
You might want to try building a sqoop job and test your use case.
sqoop job --create sqoop_job \
-- import \
--connect "jdbc:oracle://server:port/dbname" \
--username=(XXXX) \
--password=(YYYY) \
--table (TableName)\
--target-dir (Hive Directory corresponding to the table) \
--append \
--fields-terminated-by '(character)' \
--lines-terminated-by '\n' \
--check-column "(Column To Monitor Change)" \
--incremental append \
--last-value (last value of column being monitored) \
--outdir (log directory)
when you create a sqoop job, it takes care of --last-value for subsequent runs. Also here i have used the Hive table's data file as target for incremental update.
Hope this provides a helpful direction to proceed.

There is no direct way to achieve this in Sqoop. However you can use 4 Step Strategy.

Table isn't visible in Hive

I have created a table in Mysql, "A"
And I have created a database in Hive - "hiveankit"
When I try to import table A to the target database with the following command:
[training#localhost ~]$ sqoop import --connect jdbc:mysql://localhost/march2015 --username root --table A -m 1 --target-dir hiveankit;
This is the result:
16/07/02 08:53:19 INFO mapreduce.ImportJobBase: Retrieved 15 records.
[training#localhost ~]$ hive;
Hive history file=/tmp/training/hive_job_log_training_201607020853_1580004608.txt
hive> show databases;
OK
default
hiveankit
Time taken: 3.029 seconds
hive> use hiveankit;
OK
Time taken: 0.044 seconds
hive> select * from A;
FAILED: Error in semantic analysis: Line 1:14 Table not found A
Why I am getting this error.
Am I missing any steps?

The import command should have "--hive-table", "--create-hive-table" and "--hive-import" options to automatically create Hive table during sqoop import. I have modified your code by adding these options (see below). Without these options, the Sqoop import will only copy the data to HDFS and a Hive table will NOT be created.
sqoop-import --connect jdbc:mysql://localhost/march2015 --username root --table A --hive-table ${hive_db_name}.A --create-hive-table --hive-import -m 1 --target-dir hiveankit;

import data from vertica to hive

I try to upload data from Vertica to Hive by using Sqoop.
I can see that it creates a file and a table on HIVE, but when i try to select the data from the HIVE or from the file i cannot see the data. it shows me an ERROR(there is no delimiter on the column of the file) select.
this is my code:
sqoop import -m -1 --driver com.vertica.jdbc.Driver --connect "jdbc:vertica://serverName:5443/DBName" --username "user" --password "pass" --query 'select id, name from contacts limit 10' --target-dir "folder/contacts" --hive-import --create-hive-table --hive-table db.contacts

Use these arguments and choose a delimiters for your data
--fields-terminated-by
--lines-terminated-by

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Sqoop options to load a LOB column from rdbms to Hive - sqoop

Related

Sqoop : import all tables converting blob types

sqoop import as parquet file to target dir, but can't find the file

Incrimental update in HIVE table using sqoop

Table isn't visible in Hive

import data from vertica to hive

Categories

Resources