How to retain datatypes when importing data from Oracle to HDFS using sqoop? - hadoop

We are using sqoop to import data from Oracle to HDFS.
In HDFS we are creating an avro file.
Issue we are facing is that, date is being converted to long and all other datatypes are being converted to string.
Is there any way to preserve the datatypes, when importing data using sqoop?
Thanks

Related

What is the best way to store Blob data type in a Hive table, as a string or Binary?

What is the best way to store Blob data type in a Hive table, as a string or Binary?
We have archived RDBMS table into Hive using Sqoop. Which is having a column of type BLOB, So in Hive we kept in Binary. But We are not able to read the binary content into PDF or any document. So Do we have any possibility to read that Hive binary data as a document?
Storing BLOB data into Hive Binary is recommendable approach or do we have any other ways?
Is there any Big data Component like HBase,Cassandra will support BLOB types?
It is better to use HIVE binary to store blob data into HIVE. You can follow the below link Import blob from oracle to HIVE
You can also use Cassandra or parallel nosql to store the blob data. Again it's based on your use case whether to chose HIVE or nosql databases.

Sqoop mapping all datatype as string

I'm importing a table from oracle to a s3 directory using Amazon EMR. The files are being imported as avro and Sqoop exports the avsc file with all columns as String.
Does anyone knows what to do to Sqoop map the correct datatype ?
Use --map-column-java to map to the appropriate data type. For hive you can use --map-column-hive

Hive Table Creation based on file structure

i have one doubt, is there any way in HIVE which create table during load to hive warehouse or external table.
As i know hive is based on Schema On Read. so table structure must sync with file structure. but if file size is huge and we don't know its structure for example columns and their datatypes.
Than how to load those file to hive table.
so in short how to load file from HDFS to HIVE Table without knowing its schema structure.
New to Hive, Pardon if my understanding is wrong.
Thanks
By using sqoop you can create hive table while importing data.
Please refer to this link to create hive table while importing data
(or)
if you have imported data in AVRO format then you can generate avro schema by using
/usr/bin/Avro/avro-tools-*.jar then use the generated avro schema while creating table in hive then hive uses the schema and reads the data from HDFS.
Please refer to this link to extract schema from avro data file
(or)
While importing data using sqoop --as-avrodatefile then sqoop creates .avsc file with schema in it, so we can use this .avsc file creating the table.
CREATE EXTERNAL TABLE avro_tbl
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '<hdfs-location>'
TBLPROPERTIES ('avro.schema.url'='<schema-file>');
(or)
By using NiFi to import data NiFi pulls data in avro format by using ExtractAvroMetadata processor we can extract the avro schema and store into HDFS and create table by using this avro schema.
If you want to create table in ORC format then by using ConvertAvroToOrc processor adds hive.ddl attribute to the flowfile as we can execute the ddl statement to create orc table in hive.

Can Hive deal with binary data?

Can Hive deal with unstructured data .
If we are having image file in oracle database and we have to run sqoopout to load that image from oracle to another source database and export as well in hive table.
Could you please help me on same how to handled that image file in hive?????
Your Oracle data is probably stored as BLOB.
In Hive it should be stored as BINARY.
Here is an Hortonworks article demonsrating sqoop import of oracle blob into hive
https://community.hortonworks.com/content/supportkb/49145/how-to-sqoop-import-oracle-blobclob-data-into-hive.html
Here is an example for processing of binary type using Hive UDF
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFBase64.java

Hive timestamp import from netezza

I an ETLing a Netezza DB into a Hive target DB but I keep getting issues when it concerns timestamps. The source DB for the ETL to Netezza is Oracle and the "dates" there are stored as varchar. When Etled to Netezza they undergo a transformation into the netezza format and are accepted correctly.
When extracting this data from Netezza into hive I get an exception from java.sql.Timestamp that the timestamp is not in the appropriate format.
Note: due to the nature and specificity of the error on this system I cannot show output or logs

Resources