I had sqooped the data from teradata to Hive using sqoop import command.
For one of the tables in teradata, I have a date field . After sqooping, my date field is appearing as a timestamp with bigint datatype.
But I need the date field as a date datatype in hive table. Can anyone please suggest me in achieving this?
select to_date(from_unixtime(your timestamp));
example:
select to_date(from_unixtime(1490985000));
output:2017-04-01
I hope it will work. please let me know if i am wrong.
I've had this problem. My approach was to create the Hive table first. You should make an equivalence between Teradata datatypes and your Hive version datatypes. After that you can use the Sqoop argument --hive-table <table-name> to insert into that table.
Related
I am importing data from oracle to hive . My table doesn't have any integer columns which can be used in my primary keys .So I am not able to use it in my split-by column.
Alternatively I created a row_num column for all rows present in the table . Then this row_num column will be used in split-by column. Finally I want to drop this column from my hive table.
Column list is huge ,I dont want to select all columns using --columns neither I want to create any temporary table for this purpose.
Please let me know whether we can handle this in sqoop arguments.
Can Any little tweek on the --query parameter help you?
Something below.
sqoop import --query 'query string'
Is there a way in sqoop to query data from hive table and write result to RDBMS table.
For example - I want to execute this query
SELECT MAX(DATE) FROM hivedbname.hivetablename
and write(insert or update) the result(in this case,the maximum date)to a table in MYSQL DB.
I know that we can use python or any other programming lang. to achieve this. But, I just want to know is this possible with sqoop.
Thanks
I've been playing around with Spark, Hive and Parquet, I have some data in my Hive table and here is how it looks like ( warning french language ahead ) :
Bloqu� � l'arriv�e NULL
Probl�me de connexion Bloqu� en hub
Obviously there's something wrong here.
What I do is : I read a teradata table as a dataframe with spark, I store it as a parquet file and then I use this file to store it to hive, here's my create table script :
CREATE TABLE `table`(
`lib` VARCHAR(255),
`libelle_sous_cause` VARCHAR(255),
)
STORED AS PARQUET
LOCATION
'hdfs://location';
I don't really know what cause this, it might be caused by some special encoding between Teradata > parquet or Parquet > Hive, I'm not sure.
Any help will be appreciated, thanks.
I figured that out, the solution was to simply use STRING instead of VARCHAR
CREATE TABLE `table`(
`lib` STRING,
`libelle_sous_cause` STRING,
)
STORED AS PARQUET
LOCATION
'hdfs://location';
I've run into the same problem when doing sqoop from Teradata to Hadoop. When extracting from Teradata, in the SELECT, please try wrapping your varchar columns, that may have this issue, with this line:
SELECT
NAME,
AGE,
TRIM(CAST(TRANSLATE(COLUMNOFINTEREST USING latin_to_unicode WITH ERROR) AS VARCHAR(50)))
FROM
TABLENAME;
COLUMNOFINTEREST is your column that would have special characters.
Let me know if that works.
Following is the gist of my problem.
Env:
Hadoop 2 (CDH5.1)
database: oracle 11g
Scenarios:
I'm sqooping fact and dimension tables from the database into hdfs. Initially, I had challenges in handling nulls (which was handled using --null-string and --non-null-string) which was set to \N as per the recommendation. Everything was fine when the hive table that was built had string fields even for date and numerics.
Solution so far
Based on a recommendation, I move to importing using the Avro format. I've built the hive table on the avro data and I'm able to query the tables. Now I need to create Hive joins and convert all the fields to their required type like dates to be dates/timestamps, numerics to be int/bigint etc. After the sqooping the avro schema created had converted all date fields to long and the hive table show bigint for those columns.
I'm confused around how sqoop is handling nulls and how those are to be handled in hive/hdfs MR etc.
Could you anybody suggest any practice that has been adopted that could be leveraged?
Thanks
Venkatesh
It was a problem for me too. When I improted schema from parquet tables.. as Parquet stores timestamp as bigint. So I guess the underlying problem is parquet that does not have a separate datatype to store timestamp. Don't use AVRO very often, but I think it is true for AVRO too. So if you sqoop from Oracle date/timestamp into a set of parquet/avro files, then storage type (bigint) is how it is stored, not how you want to access it as (timestamp/date).
That time is stored as number of milliseconds from UNIX epoch time (Jan 1st 1970). There are Hive/Spark/Impala functions from_unixtime() that take number of seconds so the solution is to convert those ms values to s resolution:
SELECT ..
, from_unixtime(cast(bigint_column/1000 as bigint))
So you will see timestamps like:
1999-04-14 06:00:00
1999-04-15 06:00:00
Notice 6 hours shift. In my case original Oracle's data type was DATE without any time part (00:00:00), but I got time shifted by 06 hours because of my timezone (MST). So to get exact dates:
SELECT ..
, from_unixtime(cast(bigint_column/1000 - 6*3600 as bigint))
which resulted in:
1999-04-14 00:00:00
1999-04-15 00:00:00
ps. "Data Type Considerations for Parquet Tables"
http://www.cloudera.com/documentation/archive/impala/2-x/2-1-x/topics/impala_parquet.html#parquet_data_types_unique_1 :
INT96 -> TIMESTAMP
Thanks Gergely. The approaches that we followed to overcome this issue was to sqoop import the date fields as Strings type when sqooped into hdfs. This was achieve using
sqoop --option-file $OPTION_FILE_NAME \
--table $TABLE_NAME \
--map-column-java DAY_END_DTE=String \
--target-dir $TARGET_DIR \
--as-avrodatafile
This would cause the timestamp information to be sqooped as string of 'yyyy-mm-dd hh:mm:ss.f' format which could be casted into a date field.
it is not a solution, it is a workaround:
You can convert the imported data to timestamp with this command:
select cast(long_column as TIMESTAMP) from imported_table;
BR,
Gergely
this is my query in DB2 Database:
CREATE TABLE MY_TABLE
(COD_SOC CHAR(5) NOT NULL);
Is possible reproduce the 'NOT NULL' in HIVE?
What about PIG?
No it is not possible at this time. It would be very difficult for Hive to enforce column constraints.
With Hive 3.0, you can have DEFAULT Constraints on the hive table
Refer
1. https://www.adaltas.com/en/2019/07/25/hive-3-features-tips-tricks/
2. https://www.slideshare.net/Hadoop_Summit/what-is-new-in-apache-hive-30