Hive query to Extract Date and Hour separately from String

Hive query to Extract Date and Hour separately from String - hadoop

I need to extract Date and hour from the string column in hive.
Table:
select TO_DATE(from_unixtime(UNIX_TIMESTAMP(dates,'dd/MM/yyyy'))) from dates;
output:
0016-01-01
0016-01-01
select TO_DATE(from_unixtime(UNIX_TIMESTAMP(dates,'hh'))) from dates;
output:
1970-01-01
1970-01-01
Please advise how to take date seperately and hour seperately from the table column.

I've change the data sample to something more reasonable
with dates as (select explode(array('1/11/16 3:29','12/7/16 17:19')) as dates)
select from_unixtime(unix_timestamp(dates,'dd/MM/yy HH:mm'),'yyyy-MM-dd') as the_date
,from_unixtime(unix_timestamp(dates,'dd/MM/yy HH:mm'),'H') as H
,from_unixtime(unix_timestamp(dates,'dd/MM/yy HH:mm'),'HH') as HH
from dates
+------------+----+----+
| the_date | h | hh |
+------------+----+----+
| 2016-11-01 | 3 | 03 |
| 2016-07-12 | 17 | 17 |
+------------+----+----+

Related

Round the timestamp to hour in hive

If we have timestamp in column like '2018-01-01 01:35:00.000'. I want to round off the timestamp to hour and get the value as '2018-01-01 01:00:00.000'.

So your question is not round but truncate the time format to hours. The truncate function works only for date (Year, Month and Day) but not for time. For a workaround, you can use the snippet below:
date_format('2018-01-01 01:35:00.000', 'YYYY-MM-dd hh:00:00.000')
Result:
2018-01-01 01:00:00.000

You can use from_unixtime,unix_timestamp functions to match your input data and create your required output format. in your case output format would be
yyyy-MM-dd hh:00:00.000
Sample query:-
hive> select from_unixtime(unix_timestamp('2018-01-01 01:35:00.000',"yyyy-MM-dd hh:mm:ss.sss"),'yyyy-MM-dd hh:00:00.000');
+--------------------------+--+
| _c0 |
+--------------------------+--+
| 2018-01-01 01:00:00.000 |
+--------------------------+--+
(or)
2.if you just want date then change the output format to yyyy-MM-dd
hive>select from_unixtime(unix_timestamp('2018-01-01 01:35:00.000',"yyyy-MM-dd hh:mm:ss.sss"),'yyyy-MM-dd');
+-------------+--+
| _c0 |
+-------------+--+
| 2018-01-01 |
+-------------+--+
3.Extract years and hours --> output format is yyyy hh
hive> select from_unixtime(unix_timestamp('2018-01-01 01:35:00.000',"yyyy-MM-dd hh:mm:ss.sss"),'yyyy ss');
+----------+--+
| _c0 |
+----------+--+
| 2018 00 |
+----------+--+

for the question asked :
select from_unixtime(unix_timestamp('2018-01-01 01:35:00.000',"yyyy-MM-dd HH:mm:ss.sss"),'yyyy-MM-dd HH:00:00.000');
To round off the time column to any granularity
General Approach :
Hive-time-column-name : end_time
Hive-time-column-name-date-format : "yyyy-MM-dd HH:mm:ss"
Required-output-format : "yyyy-MM-dd HH:mm:ss"
round-off-granularity : 15min
Hive-query-command :
from_unixtime(unix_timestamp(time-column-name-in-hive,"time-column-name-in-hive-date-format")-unix_timestamp(time-column-name-in-hive,"time-column-name-in-hive-date-format")%900 (because 15*60), 'output-date-format')
lets say end_time column value : 2019-11-29 08:23:27
so following command will convert end_time (ex. 2019-11-29 08:23:27 to 2019-11-29 08:15:00), given granularity is 15 min
select from_unixtime(unix_timestamp(end_time,"yyyy-MM-dd HH:mm:ss")-unix_timestamp(end_time,"yyyy-MM-dd HH:mm:ss")%900, 'yyyy-MM-dd HH:mm:ss') from <table-name>;

Hive : get rows where difference between a date and date field is some value

This is my table 'ekko' and I need to get all rows where the difference between today's date and column aedat is greater than 65 days. How can I construct a hive query for the same? I use unix OS.
id rfid aedat
---|-------|-------------|
1 | 3122 | 2017-12-08 |
2 | 3423 | 2017-12-27 |
3 | 4564 | 2017-11-09 |
4 | 23442 | 2017-10-03 |

In hive you can use current_date function which can have today's date i.e 2018-02-26 and then use datediff function in where clause to caluculate the difference between aedat and current_date is greater than 65 days.
With casting aedat as date type
hive>select * from ekko where datediff(current_date,cast(aedat as date))>65;
(or)
Without casting aedat to date type
hive> select * from ekko where datediff(current_date,aedat)>65;

You can use from_unixtime(unix_timestamp()) to get the current date.
select * from ekko where datediff(from_unixtime(unix_timestamp()),aedat) > 65
or if your aedat is string type use the bellow one.
select * from ekko where datediff(from_unixtime(unix_timestamp()),cast(aedat as date))>65;

Hive : group column based on max value

I have a table with fields as
date value
10-02-1900 23
09-05-1901 22
10-03-1900 10
10-02-1901 24
....
I have to return maximum value for each year
i.e.,
1900 23
1901 24
I tried the below query but getting wrong ans.
SELECT YEAR(FROM_UNIXTIME(UNIX_TIMESTAMP(date,'dd-mm-yyyy'))) as date,MAX(value) FROM teb GROUP BY date;
Can anyone suggest me a query to do this?

Option 1
select year(from_unixtime(unix_timestamp(date,'dd-MM-yyyy'))) as year
,max(value) as max_value
from t
group by year(from_unixtime(unix_timestamp(date,'dd-MM-yyyy')))
;
Option 2
pre Hive 2.2.0
set hive.groupby.orderby.position.alias=true;
as of Hive 2.2.0
set hive.groupby.position.alias=true;
select year(from_unixtime(unix_timestamp(date,'dd-MM-yyyy'))) as date
,max(value)
from t
group by 1
;
+------+-----------+
| year | max_value |
+------+-----------+
| 1900 | 23 |
| 1901 | 24 |
+------+-----------+
P.s.
Another way to extract the year:
from_unixtime(unix_timestamp(date,'dd-MM-yyyy'),'yyyy')

Hive: How do I join with a between dates condition?

I have table of items:
| id | dateTimeUTC | color |
+----+------------------+-------+
| 1 | 1/1/2001 1:11:11 | Red |
+----+------------------+-------+
| 2 | 2/2/2002 2:22:22 | Blue |
+----+------------------+-------+
It contains some events with a dateTime in it. I also have an events table:
| eventID | startDate | endDate |
+---------+-------------------+------------------+
| 1 | 1/1/2001 1:11:11 | 2/2/2002 2:22:22 |
+---------+-------------------+------------------+
| 2 | 3/3/2003 00:00:00 | 3/3/2003 1:11:11 |
+---------+-------------------+------------------+
I want to join the two, getting where the dateTimeUTC of the item table is in between the start and end date of the events table. Now, to do this in sql is pretty standard, but HQL not so much. Hive doesn't let you have anything but an "=" in the join clause. (Link to HIVE info here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins). Now, there was a question about a similar situation before here, but I found that it's been 4 years since then and have hoped there was a solution.
Any tips on how to make this happen?

I think you have string format for dates in tables , If yes use following ... Making date into standard format.
select * from items_x, items_date where UNIX_TIMESTAMP(dateTimeUTC,'dd/MM/yyyy HH:MM:SS') between UNIX_TIMESTAMP(startDate,'DD/MM/YYYY HH:MM:SS') and UNIX_TIMESTAMP(endDate,'DD/MM/YYYY HH:MM:SS') ;

oracle sql totalize days for contiguos ranges

I have a table with date ranges and i need to count the days only for the contiguos date ranges...
-----------------------------------
| table RANGES |
----------------------------------
| d_start | d_end | days |
| (date) | (date) | (num)|
-----------------------------------
| 2014-02-01 | 2014-02-05 | 4 |
| 2014-02-06 | 2014-02-11 | 5 |
| 2014-03-22 | 2014-03-25 | 3 |
| 2014-04-02 | 2014-04-10 | 8 |
| 2014-04-11 | 2014-04-20 | 9 |
-----------------------------------
I need to totalize days with break when the date ranges are not contiguos, a result like this:
| 2014-02-01 | 2014-02-11 | 9 |
| 2014-03-22 | 2014-03-25 | 3 |
| 2014-04-02 | 2014-04-20 | 17 |
i Tryed with LEAD to check if next record's d_start is equal d_end but i can't achieve the goal.
many thanks for any idea!
Marco

The answer is quite tricky:
SQL> create table tmp$dates (d_start date, d_end date);
Table created
SQL> insert into tmp$dates values (DATE '2014-02-01', DATE '2014-02-05');
1 row inserted
SQL> insert into tmp$dates values (DATE '2014-02-06', DATE '2014-02-11');
1 row inserted
SQL> insert into tmp$dates values (DATE '2014-03-22', DATE '2014-03-25');
1 row inserted
SQL> insert into tmp$dates values (DATE '2014-04-02', DATE '2014-04-10');
1 row inserted
SQL> insert into tmp$dates values (DATE '2014-04-11', DATE '2014-04-20');
1 row inserted
SQL> select min(d_start), max(d_end), max(d_end) - min(d_start) + 1 n#
2 from tmp$dates d
3 start with d_start not in (select d_end + 1 from tmp$dates)
4 connect by prior d_end = d_start - 1
5 group by level - rownum
6 order by 1;
MIN(D_START) MAX(D_END) N#
------------ ----------- ----------
01.02.2014 11.02.2014 11
22.03.2014 25.03.2014 4
02.04.2014 20.04.2014 19

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Hive query to Extract Date and Hour separately from String - hadoop

Related

Round the timestamp to hour in hive

Hive : get rows where difference between a date and date field is some value

Hive : group column based on max value

Hive: How do I join with a between dates condition?

oracle sql totalize days for contiguos ranges

Categories

Resources