Sqoop mapping all datatype as string - hadoop

I'm importing a table from oracle to a s3 directory using Amazon EMR. The files are being imported as avro and Sqoop exports the avsc file with all columns as String.
Does anyone knows what to do to Sqoop map the correct datatype ?

Use --map-column-java to map to the appropriate data type. For hive you can use --map-column-hive

Related

How to retain datatypes when importing data from Oracle to HDFS using sqoop?

We are using sqoop to import data from Oracle to HDFS.
In HDFS we are creating an avro file.
Issue we are facing is that, date is being converted to long and all other datatypes are being converted to string.
Is there any way to preserve the datatypes, when importing data using sqoop?
Thanks

Schema on read in hive for tsv format file

I am new on hadoop. I have data in tsv format with 50 columns and I need to store the data into hive. How can I create and load the data into table on the fly without manually creating table using create table statementa using schema on read?
Hive requires you to run a CREATE TABLE statement because the Hive metastore must be updated with the description of what data location you're going to be querying later on.
Schema-on-read doesn't mean that you can query every possible file without knowing metadata beforehand such as storage location and storage format.
SparkSQL or Apache Drill, on the other hand, will let you infer the schema from a file, but you must again define the column types for a TSV if you don't want everything to be a string column (or coerced to unexpected types). Both of these tools can interact with a Hive metastore for "decoupled" storage of schema information
you can use Hue :
http://gethue.com/hadoop-tutorial-create-hive-tables-with-headers-and/
or with Spark you can infer the schema of csv file and you can save it as a hive table.
val df=spark.read
.option("delimiter", "\t")
.option("header",true)
.option("inferSchema", "true") // <-- HERE
.csv("/home/cloudera/Book1.csv")

Hive Table Creation based on file structure

i have one doubt, is there any way in HIVE which create table during load to hive warehouse or external table.
As i know hive is based on Schema On Read. so table structure must sync with file structure. but if file size is huge and we don't know its structure for example columns and their datatypes.
Than how to load those file to hive table.
so in short how to load file from HDFS to HIVE Table without knowing its schema structure.
New to Hive, Pardon if my understanding is wrong.
Thanks
By using sqoop you can create hive table while importing data.
Please refer to this link to create hive table while importing data
(or)
if you have imported data in AVRO format then you can generate avro schema by using
/usr/bin/Avro/avro-tools-*.jar then use the generated avro schema while creating table in hive then hive uses the schema and reads the data from HDFS.
Please refer to this link to extract schema from avro data file
(or)
While importing data using sqoop --as-avrodatefile then sqoop creates .avsc file with schema in it, so we can use this .avsc file creating the table.
CREATE EXTERNAL TABLE avro_tbl
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '<hdfs-location>'
TBLPROPERTIES ('avro.schema.url'='<schema-file>');
(or)
By using NiFi to import data NiFi pulls data in avro format by using ExtractAvroMetadata processor we can extract the avro schema and store into HDFS and create table by using this avro schema.
If you want to create table in ORC format then by using ConvertAvroToOrc processor adds hive.ddl attribute to the flowfile as we can execute the ddl statement to create orc table in hive.

How to sqoop to import oracle clob data to avro files on hdfs

I am getting a strange error when sqooping the data from oracle DB to HDFS.
Sqoop is not able to import clob data into an avro files on hadoop.
This is the sqoop import error :
ERROR tool.ImportTool: Imported Failed: Cannot convert SQL type 2005
Do we need to add any extra arguments to sqoop import statement for it correctly import clob data into avro files ?
Update: Found the solution, We need to add --map-column-java for the clob columns.
For Eg: If the column name is clob then we have pass --map-column-java clob=string for sqoop to import the clob columns.

How to get metadata of source DB using sqoop

Sqoop reads the metadata of the source DB before storing the data into HDFS/HIVE.
Is there any method by which we can get this metadata information from sqoop ?
Answering my question:
To get the metadata from sqoop , we can use the sqoop java apis and connect to the destination and get the following metadata
Table name
Db name
Column details etc.

Resources