HIVE Time conversion issue - hadoop

We are getting a time from src "2019-11-03 01:01:00". 2019-11-03 is the day light saving day.
lets say this is the end_time. We have another column in hive table start_time.
Logic to derive start time is :
start_time = (end_time- 3600)
Issue ##:
When we apply the same logic during the job execution with unix_timestamp(), following are the results.
Start_time =
select from_unixtime(unix_timestamp('2019-11-03 01:01:00') - 3600 ,'yyyy-MM-dd HH:mm:ss');
+----------------------+--+
| _c0 |
+----------------------+--+
| 2019-11-03 01:01:00 |
+----------------------+--+
also
End_time = select from_unixtime(unix_timestamp('2019-11-03 01:01:00') ,'yyyy-MM-dd HH:mm:ss');
+----------------------+--+
| _c0 |
+----------------------+--+
| 2019-11-03 01:01:00 |
+----------------------+--+
Both are returning the same result. This way our start_date=end_date which is not expected.
We want the End_time = "2019-11-03 00:01:00"
Can someone help!

You are hitting this issue HIVE-14305
The solution can be to calculate date in bash and pass it to your script as a variable:
initial_date="2019-11-03 01:01:00"
datesec="$(date '+%s' --date="$initial_date")"
result_date=$( date --date="#$((datesec - 3600))" "+%Y-%m-%d %H:%M:%S")
echo $result_date
#result 2019-11-03 00:01:00
#call your script like this
hive -hiveconf result_date="$result_date" -f script_name
#In the script use '${hiveconf:result_date}'

Related

Round the timestamp to hour in hive

If we have timestamp in column like '2018-01-01 01:35:00.000'. I want to round off the timestamp to hour and get the value as '2018-01-01 01:00:00.000'.
So your question is not round but truncate the time format to hours. The truncate function works only for date (Year, Month and Day) but not for time. For a workaround, you can use the snippet below:
date_format('2018-01-01 01:35:00.000', 'YYYY-MM-dd hh:00:00.000')
Result:
2018-01-01 01:00:00.000
You can use from_unixtime,unix_timestamp functions to match your input data and create your required output format. in your case output format would be
yyyy-MM-dd hh:00:00.000
Sample query:-
hive> select from_unixtime(unix_timestamp('2018-01-01 01:35:00.000',"yyyy-MM-dd hh:mm:ss.sss"),'yyyy-MM-dd hh:00:00.000');
+--------------------------+--+
| _c0 |
+--------------------------+--+
| 2018-01-01 01:00:00.000 |
+--------------------------+--+
(or)
2.if you just want date then change the output format to yyyy-MM-dd
hive>select from_unixtime(unix_timestamp('2018-01-01 01:35:00.000',"yyyy-MM-dd hh:mm:ss.sss"),'yyyy-MM-dd');
+-------------+--+
| _c0 |
+-------------+--+
| 2018-01-01 |
+-------------+--+
3.Extract years and hours --> output format is yyyy hh
hive> select from_unixtime(unix_timestamp('2018-01-01 01:35:00.000',"yyyy-MM-dd hh:mm:ss.sss"),'yyyy ss');
+----------+--+
| _c0 |
+----------+--+
| 2018 00 |
+----------+--+
for the question asked :
select from_unixtime(unix_timestamp('2018-01-01 01:35:00.000',"yyyy-MM-dd HH:mm:ss.sss"),'yyyy-MM-dd HH:00:00.000');
To round off the time column to any granularity
General Approach :
Hive-time-column-name : end_time
Hive-time-column-name-date-format : "yyyy-MM-dd HH:mm:ss"
Required-output-format : "yyyy-MM-dd HH:mm:ss"
round-off-granularity : 15min
Hive-query-command :
from_unixtime(unix_timestamp(time-column-name-in-hive,"time-column-name-in-hive-date-format")-unix_timestamp(time-column-name-in-hive,"time-column-name-in-hive-date-format")%900 (because 15*60), 'output-date-format')
lets say end_time column value : 2019-11-29 08:23:27
so following command will convert end_time (ex. 2019-11-29 08:23:27 to 2019-11-29 08:15:00), given granularity is 15 min
select from_unixtime(unix_timestamp(end_time,"yyyy-MM-dd HH:mm:ss")-unix_timestamp(end_time,"yyyy-MM-dd HH:mm:ss")%900, 'yyyy-MM-dd HH:mm:ss') from <table-name>;

get date records between today and beginning of the current week

suppose we have the table error_log, sth easily like:
+-------------+---------------+
| error_token | date_recorded |
+-------------+---------------+
| error_1 | 05.03.2017 |
+-------------+---------------+
| error_2 | 05.03.2017 |
+-------------+---------------+
| error_3 | 10.03.2017 |
+-------------+---------------+
| error_4 | 30.03.2017 |
+-------------+---------------+
what is the best way to get all errors that happened from the beginning of the current week till today.
and also the same if we want to get all errors between from the beginning of the current month till today.
When you say "until today" I have assume that means up to, but not including, any part of today which is less than trunc(sysdate)
select *
from error_log
where date_recorded >= trunc(sysdate,'W') -- beginning of week
and date_recorder < trunc(sysdate) -- optional
select *
from error_log
where date_recorded >= trunc(sysdate,'MONTH') -- beginning of month
and date_recorder < trunc(sysdate) -- optional
see TRUNC

Hive query to Extract Date and Hour separately from String

I need to extract Date and hour from the string column in hive.
Table:
select TO_DATE(from_unixtime(UNIX_TIMESTAMP(dates,'dd/MM/yyyy'))) from dates;
output:
0016-01-01
0016-01-01
select TO_DATE(from_unixtime(UNIX_TIMESTAMP(dates,'hh'))) from dates;
output:
1970-01-01
1970-01-01
Please advise how to take date seperately and hour seperately from the table column.
I've change the data sample to something more reasonable
with dates as (select explode(array('1/11/16 3:29','12/7/16 17:19')) as dates)
select from_unixtime(unix_timestamp(dates,'dd/MM/yy HH:mm'),'yyyy-MM-dd') as the_date
,from_unixtime(unix_timestamp(dates,'dd/MM/yy HH:mm'),'H') as H
,from_unixtime(unix_timestamp(dates,'dd/MM/yy HH:mm'),'HH') as HH
from dates
+------------+----+----+
| the_date | h | hh |
+------------+----+----+
| 2016-11-01 | 3 | 03 |
| 2016-07-12 | 17 | 17 |
+------------+----+----+

How create table with all days of year using a start_date variable?

I'd like to create a table with all the days of a year using HQL.
I already tried this approach :
generate days from date range
but the hql is a bit different from sql
What's the best approach?
Using the PL/HQL or using bash script and importing ?
Expected result:
start_date = 2017-02-14;
| date |
|2017-02-14|
|2017-02-13|
|2017-02-12|
|2017-02-11|
|2017-02-10|
|2017-02-09|
|2017-02-08|
|2017-02-07|
....
Thanks
set start_date=2017-02-14;
select date_sub('${hiveconf:start_date}',i)
from (select 1 as n) dummy lateral view posexplode(split(space(364),' ')) p as i,x
;
2017-02-14
2017-02-13
2017-02-12
2017-02-11
2017-02-10
.
.
.
2016-02-20
2016-02-19
2016-02-18
2016-02-17
2016-02-16
Using bash and an older start date for testing purposes:
start_date="2014-02-14"
days=$((($(date -u +%s) - $(date -ud $start_date +%s))/60/60/24))
(( day_end = 366 + days ))
while (( days < day_end ));do
date "+%Y-%m-%d" -d "$days days ago"
(( days++ ))
done
Result
2014-02-14
2014-02-13
2014-02-12
2014-02-11
2014-02-10
2014-02-09
2014-02-08
2014-02-07
2014-02-06
2014-02-05
2014-02-04
...
...
...
2013-02-21
2013-02-20
2013-02-19
2013-02-18
2013-02-17
2013-02-16
2013-02-15
2013-02-14

Adding <integer> seconds to a <timestamp> in PostgreSQL

I have a table of items with the following columns:
start_time column (timestamp without time zone)
expiration_time_seconds column (integer)
For example, some values are:
SELECT start_time, expiration_time_seconds
FROM whatever
ORDER BY start_time;
start_time | expiration_time_seconds
----------------------------+-------------------------
2014-08-05 08:23:32.428452 | 172800
2014-08-10 09:49:51.082456 | 3600
2014-08-13 13:03:56.980073 | 3600
2014-08-21 06:31:38.596451 | 3600
...
How do I add the expiration time, given in seconds, to the start_time?
I have tried to format a time interval string for the interval command, but failed:
blah=> SELECT interval concat(to_char(3600, '9999'), ' seconds');
ERROR: syntax error at or near "("
LINE 1: SELECT interval concat(to_char(3600, '9999'), ' seconds');
The trick is to create a fixed interval and multiply it with the number of seconds in the column:
SELECT start_time,
expiration_time_seconds,
start_time + expiration_time_seconds * interval '1 second'
FROM whatever
ORDER BY start_time;
start_time | expiration_time_seconds | end_time
----------------------------|-------------------------|----------------------------
2014-08-05 08:23:32.428452 | 172800 | 2014-08-07 08:23:32.428452
2014-08-10 09:49:51.082456 | 3600 | 2014-08-10 10:49:51.082456
2014-08-13 13:03:56.980073 | 3600 | 2014-08-13 14:03:56.980073
2014-08-21 06:31:38.596451 | 3600 | 2014-08-21 07:31:38.596451

Resources