hive date & time stamp from unix_timestamp() - hadoop

I need two columns to be inserted with current date(sysdate) and time stamp.
I have created the table and inserting data using unix_timestamp. I am not able to convert into hive date and time stamp format.
############ Hive create table #############
create table informatica_p2020.M23_MD_LOC_BKEY(
group_nm string,
loc string,
natural_key string,
loc_sk_id int,
**load_date date,
load_time timestamp)**
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/spanda20/informatica_p2020/infor_external/m23_md_loc/m23_md_loc_bkey/';
############### Insert into Table ##########
insert overwrite table M23_MD_LOC_BKEY select 'M23' as group_nm,loc,concat('M23','|','LOC') as NATURAL_KEY,
ROW_NUMBER() OVER () as loc_sk_id,
from_unixtime(unix_timestamp(), 'YYYY-MM-DD'),
from_unixtime(unix_timestamp(), 'YYYY-MM-DD HH:MM:SS.SSS') from M23_MD_LOC LIMIT 2 ;
################output of the insert query ############
M23 SY_BP M23|LOC 1 **2015-07-183** 2015-07-**183** 16:07:00.000
M23 SY_MX M23|LOC 2 2015-07-183 2015-07-183 16:07:00.000
Regards
Sanjeeb

instead of from_unixtime(unix_timestamp(), 'YYYY-MM-DD')
try -
from_unixtime(unix_timestamp(), 'yyyy-MM-dd')

Related

Oracle SQL loader control file creation for date and time

I am using below CTL file to load data into table
Load data
Append
Into table abc
Fields terminated by ',' optionally enclosed by '"'
Trailing nullcols
(
R_date date 'mm/dd/yyyy hh:mm:ss'
)
Csv file value is as
R_date
09/12/2023 12:30:34
08/11/2023 22;30:45
In table abc r_date column datatype is date.
Ora-01840 input value not long enough for date format.
Noting we have written in above file
I think you want:
R_date date "mm/dd/yyyy hh24:mi:ss"

issue with hive partitioning and bucketing in CDH 5.10 quick VM

i am new to this area and got stuck in a simple issue.
I am loading data into a hive table (using insert command from another table tset1) which is partitioned by udate and day as bucket.
insert overwrite test1 partition(udate) select id,value,udate,day from tset1;
so now the issue is when I am loading data it is taking wrong value in partition column. Day is taken as partition because in my table this is last column so during data load it's taking day as udate.
how I can force my query to take the right value during data load?
hive (testdb)> create table test1_buk(id int, value string,day int) partitioned by(udate string) clustered by(day) into 5 buckets row format delimited fields terminated by ',' stored as textfile;
hive (testdb)> desc tset1;
OK
col_name data_type comment
id int
value string
udate string
day int
hive (testdb)> desc test1_buk;
OK
col_name data_type comment
id int
value string
day int
udate string
# Partition Information
# col_name data_type comment
udate string
hive (testdb)> select * from test1_buk limit 1;
OK
test1_buk.id test1_buk.value test1_buk.day test1_buk.udate
5 w 2000 10
please help.

How to store date value in hive timestamp?

I am trying to store the date and timestamp values in timestamp column using hive. The source file contain the values of date or sometimes timestamps.
Is there a way to read both date and timestamp by using the timestamp data type in hive.
Input:
2015-01-01
2015-10-10 12:00:00.232
2016-02-01
Output which I am getting:
null
2015-10-10 12:00:00.232
null
Is it possible to read both values by using timestamp data type.
DDL:
create external table mytime(id string ,t timestamp) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 'hdfs://xxx/data/dev/ind/'
I was able think of a workaround. tried this with a small set of data:
Load the data with inconsistent date data into a hive table say table1 by making the column as string datatype .
Now create another table table2 with the datatype as timestamp for the required column and load the data from table1 to table2 using the transformation INSERT OVERWRITE TABLE table2 select id,if(length(tsstr) > 10, tsstr, concat(tsstr,' 00:00:00')) from table1;
This should load the data in required format.
Code as below:
`
create table table1
(
id int,
tsstr string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION '/user/cloudera/hive/table1.tb';
Data:
1,2015-04-15 00:00:00
2,2015-04-16 00:00:00
3,2015-04-17
LOAD DATA LOCAL INPATH '/home/cloudera/data/tsstr' INTO TABLE table1;
create table table2
(
id int,
mytimestamp timestamp
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION '/user/cloudera/hive/table2.tb';
INSERT INTO TABLE table2 select id,if(length(tsstr) > 10, tsstr, concat(tsstr,' 00:00:00')) from table1;
Result shows up as expected:
Hive is similar to any other database in terms of datatype mapping and hence requires a uniform values for a specific column to be stored under a conformed datatype. The data in your file for second column has non-uniform data i.e, some are in date format while others in timestamp format.
In order to not to lose the date, as suggested by #Kishore , make sure you have a uniform datatype in the file and get the file with timestamp values as 2016-01-01 00:00:000 where there are only dates.

Hive and Sqoop partition

I have sqoopd data from Netezza table and output file is in HDFS, but one column is a timestamp and I want to load it as a date column in my hive table. Using that column I want to create partition on date. How can i do that?
Example: in HDFS data is like = 2013-07-30 11:08:36
In hive I want to load only date (2013-07-30) not timestamps. I want to partition on that column DAILY.
How can I pass partition by column as dynamically?
I have tried with loading data into one table as source. In final table I will do insert overwrite table partition by (date_column=dynamic date) select * from table1
Set these 2 properties -
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
And the Query can be like -
INSERT OVERWRITE TABLE TABLE PARTITION (DATE_STR)
SELECT
:
:
-- Partition Col is the last column
to_date(date_column) DATE_STR
FROM table1;
You can explore the two options of hive-import - if it is an incremental import you will be able to get the current day's partition.
--hive-partition-key
--hive-partition-value
You can just load the EMP_HISTORY table from EMP by enabling dynamic partition and converting the timestamp to date using to_date date function
The code might look something like this....
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
INSERT OVERWRITE TABLE EMP_HISTORY PARTITION (join_date)
SELECT e.name as name, e.age as age, e.salay as salary, e.loc as loc, to_date(e.join_date) as join_date from EMP e ;

Add DB timestamp to SQL Loader CSV import

I have a client who's CSV file does not contain any dates. They would like a timestamp to indicate as to when each row is loaded into their Oracle 11g database. The CSV file is being supplied by a vendor so I cannot modify the file. I have tried adding a default column value and an "after insert" trigger but with no luck. (Performance is not an issue as this is an off- hours process).
The control files looks like this:
options (skip=1, direct=true, rows=10000)
load data
infile data.dat
badfile sqldatatxtdata.bad
replace
into table LAM.CSV_DATA_TXT
fields terminated by ','
trailing nullcols
(ASSET, ASSET_CLASS, MATURITY, PRICE)
The table looks like such:
create table LAM.CSV_DATA_TXT (
ASSET VARCHAR2(20),
ASSET_CLASS VARCHAR2(50),
MATURITY varchar2(50),
PRICE NUMBER(12,8),
DATE_TIMESTAMP DATE default(SYSTIMESTAMP)
Any other ideas? Thanks.
Adding a TIMESTAMP column with a default value of SYSTIMESTAMP ought to worK:
SQL> create table t23
2 ( id number not null
3 , ts timestamp default systimestamp not null)
4 /
Table created.
SQL> insert into t23 (id) values (9999)
2 /
1 row created.
SQL> select * from t23
2 /
ID TS
---------- -------------------------
9999 25-APR-11 15.21.01.495000
SQL>
So you'll need to explain in greater detail why it doesn't work in your case.
I note that in your example you have created the column as a DATE datatype. This will mean it will truncate the defaulted SYSTIMESTAMP to the nearest second. If you want timestamp values you need to use the TIMESTAMP datatype.

Resources