Hive timestamp import from netezza - hadoop

I an ETLing a Netezza DB into a Hive target DB but I keep getting issues when it concerns timestamps. The source DB for the ETL to Netezza is Oracle and the "dates" there are stored as varchar. When Etled to Netezza they undergo a transformation into the netezza format and are accepted correctly.
When extracting this data from Netezza into hive I get an exception from java.sql.Timestamp that the timestamp is not in the appropriate format.
Note: due to the nature and specificity of the error on this system I cannot show output or logs

Related

How to retain datatypes when importing data from Oracle to HDFS using sqoop?

We are using sqoop to import data from Oracle to HDFS.
In HDFS we are creating an avro file.
Issue we are facing is that, date is being converted to long and all other datatypes are being converted to string.
Is there any way to preserve the datatypes, when importing data using sqoop?
Thanks

Schema on read in hive for tsv format file

I am new on hadoop. I have data in tsv format with 50 columns and I need to store the data into hive. How can I create and load the data into table on the fly without manually creating table using create table statementa using schema on read?
Hive requires you to run a CREATE TABLE statement because the Hive metastore must be updated with the description of what data location you're going to be querying later on.
Schema-on-read doesn't mean that you can query every possible file without knowing metadata beforehand such as storage location and storage format.
SparkSQL or Apache Drill, on the other hand, will let you infer the schema from a file, but you must again define the column types for a TSV if you don't want everything to be a string column (or coerced to unexpected types). Both of these tools can interact with a Hive metastore for "decoupled" storage of schema information
you can use Hue :
http://gethue.com/hadoop-tutorial-create-hive-tables-with-headers-and/
or with Spark you can infer the schema of csv file and you can save it as a hive table.
val df=spark.read
.option("delimiter", "\t")
.option("header",true)
.option("inferSchema", "true") // <-- HERE
.csv("/home/cloudera/Book1.csv")

Can Hive deal with binary data?

Can Hive deal with unstructured data .
If we are having image file in oracle database and we have to run sqoopout to load that image from oracle to another source database and export as well in hive table.
Could you please help me on same how to handled that image file in hive?????
Your Oracle data is probably stored as BLOB.
In Hive it should be stored as BINARY.
Here is an Hortonworks article demonsrating sqoop import of oracle blob into hive
https://community.hortonworks.com/content/supportkb/49145/how-to-sqoop-import-oracle-blobclob-data-into-hive.html
Here is an example for processing of binary type using Hive UDF
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFBase64.java

How to transfer data & metadata from Hive to RDBMS

There are more than 300 tables in my hive environment.
I want to export all the tables from Hive to Oracle/MySql including metadata.
My Oracle database doesn't have any tables corresponding to these Hive tables.
Sqoop import from Oracle to Hive creates tables in Hive if the table doesn't exists.But Sqoop export from Hive to Oracle doesn't create table if not exists and fails with an exception.
Is there any option in Sqoop to export metadata also? or
Is there any other Hadoop tool through which I can achieve this?
Thanks in advance
The feature you're asking for isn't in Spark. I don't know of a current hadoop tool which can do what you're asking either unfortunately. A potential workaround is using the "show create table mytable" statement in Hive. It will return the create table statements. You can parse this manually or pragmatically via awk and get the create tables in a file, then run this file against your oracle db. From there, you can use sqoop to populate the tables.
It won't be fun.
Sqoop can't copy metadata or create table in RDBMS on the basis of Hive table.
Table must be there in RDBMS to perform sqoop export.
Why is it so?
Mapping from RDBMS to Hive is easy because hive have only few datatypes(10-15). Mapping from multiple RDBMS datatypes to Hive datatype is easily achievable. But vice versa is not that easy. Typical RDBMS has 100s of datatypes (that too different in different RDBMS).
Also sqoop export is newly added feature. This feature may come in future.

Creating External Table in Hive using HIVE JDBC : Not Possible?

External Table creation via HIVE JDBC isnt reflected in the hive datawarehouse whereas the normal table creation inside the hive datawarehouse happens without any issue.
After creating the table via Hive JDBC,
stmt.executeQuery("create external table trial (TOPIC STRING) row format delimited fields terminated by '' STORED as TEXTFILE LOCATION '/user/ranjitha/trial'");`
no error returned.
But when I try retrieving from this table trial, nothing is returned.
Here in this link, https://groups.google.com/a/cloudera.org/forum/?fromgroups#!topic/cdh-user/YTekdFtbelE, it says external table creation not possible using HIVE JDBC.
It would be really helpful if someone can guide me on the above. Is this not possible with JDBC or is there another alternative for the same.
Thanks

Resources