How to calculate Date difference in Hive - hadoop

I'm a novice. I have a employee table with a column specifying the joining date and I want to retrieve the list of employees who have joined in the last 3 months. I understand we can get the current date using from_unixtime(unix_timestamp()). How do I calculate the datediff? Is there a built in DATEDIFF() function like in MS SQL? please advice!

datediff(to_date(String timestamp), to_date(String timestamp))
For example:
SELECT datediff(to_date('2019-08-03'), to_date('2019-08-01')) <= 2;

If you need the difference in seconds (i.e.: you're comparing dates with timestamps, and not whole days), you can simply convert two date or timestamp strings in the format 'YYYY-MM-DD HH:MM:SS' (or specify your string date format explicitly) using unix_timestamp(), and then subtract them from each other to get the difference in seconds. (And can then divide by 60.0 to get minutes, or by 3600.0 to get hours, etc.)
Example:
UNIX_TIMESTAMP('2017-12-05 10:01:30') - UNIX_TIMESTAMP('2017-12-05 10:00:00') AS time_diff -- This will return 90 (seconds). Unix_timestamp converts string dates into BIGINTs.
More on what you can do with unix_timestamp() here, including how to convert strings with different date formatting: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions

yes datediff is implemented; see:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
By the way I found this by Google-searching "hive datediff", it was the first result ;)

I would try this first
select * from employee where month(current_date)-3 = month(joining_date)

Related

Is there a functon similar to DateDiff in MonetDB which can calculate number of weeks between two dates

Consider two dates, "01-Jan-2011' & '01-Oct-2011'.
I wish to calculate number of weeks in between these dates.
I have tried the following:
select extract ( week from ( (current_date+ interval '5' day) - current_date ));
It returns error " no such unary operator 'week(day_interval)'"
I am able to find number of days by using following :
select extract ( day from ( (current_date+ interval '5' day) - current_date ));
the line above returns the output
Is there any way I can achieve the same?
Further, MonetDB considers week from Monday to Sunday(1-7). Is there any way this can be updated/ customised to Sunday to Saturday.
Thanks.
There are a couple of possibilities that I can think of:
select date '2011-10-01' - date '2011-01-01';
results in a INTERVAL DAY value, actually expressed in seconds of the difference, i.e. 23587200.000. This you could divide by (72460*60), i.e. the number of seconds in a week. But it's still an INTERVAL type, not an INTEGER.
Another way is to first convert the date to integers: the number of seconds since "the epoch" (Jan 1, 1970):
select epoch_ms(date '2011-10-01');
This actually give milliseconds since the epoch, so an extra factor of 1000.
This result you can then manipulate to get what you want:
select (epoch_ms(date '2021-02-02') - epoch_ms(date '2020-12-31')) / (7*24*60*60*1000);
This results in a HUGEINT value (if you have 128 bit integers in your system, i.e. anything compiled with GCC or CLANG), so you can convert this to INTEGER:
select cast((epoch_ms(date '2011-10-01') - epoch_ms(date '2011-01-01')) / (7*24*60*60*1000) as integer);

Difference in two dates not coming as expected in Oracle 11g

I get some data through a OSB Proxy Service and have to transform that using Xquery. Earlier the transformation was done on the database but now it is to be done on the proxy itself. So I have been given the SQL queries which were used and have to generate Xquery expressions corresponding to those.
Here is the SQL query which is supposed to find the difference between 2 dates.
SELECT ROUND((CAST(DATEATTRIBUTE2 AS DATE) -
CAST(DATEATTRIBUTE1 AS DATE) ) * 86400 ) AS result
FROM SONY_TEST_TABLE;
DATEATTRIBUTE1 and DATEATTRIBUTE2 are both of TIMESTAMP type.
As per my understanding this query first casts the TIMESTAMP to DATE so that the time part is stripped then subtracts the dates. That difference in days in multiplied with 86400 to get the duration in seconds.
However, when I take DATEATTRIBUTE2 as 23-02-17 01:17:19.399000000 AM and DATEATTRIBUTE1 as 23-02-17 01:17:18.755000000 AM the result should ideally be 0 as the dates are same and i'm ignoring the time difference but surprisingly the result comes as 1. After checking I found that the ( CAST(DATEATTRIBUTE2 AS DATE) - CAST(DATEATTRIBUTE1 AS DATE) ) part aparently does not give an integer value but a fractional one. How does this work?? o_O
Any help is appreciated. Cheers!
EDIT : So got the problem thanks to all the answers! Even after casting to DATE it still has time so the time difference is also calculated. Now how do I implement this in XQuery? See this other question.
Oracle DATE datatype is actually a datetime. So casting something as a date doesn't remove the time element. To do that we need to truncate the value:
( trunc(DATEATTRIBUTE2) - trunc(DATEATTRIBUTE1) )
you should try this to find difference by day
SELECT (trunc(DATEATTRIBUTE2) -
trunc(DATEATTRIBUTE1) ) AS result
FROM SONY_TEST_TABLE;
alternative 2
you can use extract like below:
SELECT ROUND (
EXTRACT (MINUTE FROM INTERVAL_DIFFERENCE) / (24 * 60)
+ EXTRACT (HOUR FROM INTERVAL_DIFFERENCE) / 24
+ EXTRACT (DAY FROM INTERVAL_DIFFERENCE))
FROM (SELECT ( TO_TIMESTAMP ('23-02-17 01:17:19', 'dd-mm-yy hh24:mi:ss')
- TO_TIMESTAMP ('23-02-17 01:17:17', 'dd-mm-yy hh24:mi:ss'))
INTERVAL_DIFFERENCE
FROM DUAL)

Date format in Oracle- fetching Date of certain range

I have a date table in my db in Oracle. When I run a query I get the date format as '01-05-2015' but when I run a similar query in BIRT, I get the date format as '01-MAY-2015 12:00 AM'. How can I get the date format in dd/mm/yyy by keeping the data type of date field as date.
here is sample of my database.
EQ_DT
05-07-2015
06-06-2015
15-02-2015
19-09-2015
28-12-2015
also my query is :
select to_date(to_char(to_date(enquiry_dt,'DD/MM/YYYY'),'DD/MM/YY'),'DD/MM/YY') as q from xxcus.XXACL_SALES_ENQ_DATAMART where to_date(to_char(to_date(enquiry_dt,'DD/MM/YY'),'DD/MM/YY'),'DD/MM/YY')>'21-06-2012' order by q
I am getting error of NOT A VALID Month also
If enquiry_dt is already a date column, why are you trying to convert it to date (and then to char and to date again)?
SELECT to_char(enquiry_dt, 'DD/MM/YYYY') AS q
FROM xxcus.xxacl_sales_enq_datamart
WHERE enquiry_dt > to_date('21-06-2012', 'dd-mm-yyyy')
ORDER BY enquiry_dt
In birt, where you place the field on the report, set the field type to date. Then in properties for that field , go to format date time, and finally specify the date formatting you want for that field .
I prefer to always use pass date parameters as strings to BIRT, using a known date format. This is for report parameters as well as for DataSet parameters.
Then, inside the query, I convert to date like this:
with params as
( select to_date(pi_start_date_str, 'DD.MM.YYYY') as start_date_incl,
to_date(pi_end_date_str, 'DD.MM.YYYY') + 1 as end_date_excl
from dual
)
select whatever
from my_table, params
where ( my_table.event_date >= params.start_date_incl
and
my_table.end_date < params.start_date_excl
)
This works independent of the time of day.
This way, e.g. to select all events for january 2016, I could pass the query parameters '01.01.2016' and '31.01.2016' (I'm using german date format here).

How do I get millisecond precision in hive?

The documentation says that timestamps support the following conversion:
•Floating point numeric types: Interpreted as UNIX timestamp in seconds with decimal precision
First of all, I'm not sure how to interpret this. If I have a timestamp 2013-01-01 12:00:00.423, can I convert this to a numeric type that retains the milliseconds? Because that is what I want.
More generally, I need to do comparisons between timestamps such as
select maxts - mints as latency from mytable
where maxts and mints are timestamp columns. Currently, this gives me NullPointerException using Hive 0.11.0. I am able to perform queries if I do something like
select unix_timestamp(maxts) - unix_timestamp(mints) as latency from mytable
but this only works for seconds, not millisecond precision.
Any help appreciated. Tell me if you need additional information.
If you want to work with milliseconds, don't use the unix timestamp functions because these consider date as seconds since epoch.
hive> describe function extended unix_timestamp;
unix_timestamp([date[, pattern]]) - Returns the UNIX timestamp
Converts the current or specified time to number of seconds since 1970-01-01.
Instead, convert the JDBC compliant timestamp to double.
E.g:
Given a tab delimited data:
cat /user/hive/ts/data.txt :
a 2013-01-01 12:00:00.423 2013-01-01 12:00:00.433
b 2013-01-01 12:00:00.423 2013-01-01 12:00:00.733
CREATE EXTERNAL TABLE ts (txt string, st Timestamp, et Timestamp)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION '/user/hive/ts';
Then you may query the difference between startTime(st) and endTime(et) in milliseconds as follows:
select
txt,
cast(
round(
cast((e-s) as double) * 1000
) as int
) latency
from (select txt, cast(st as double) s, cast(et as double) e from ts) q;

Selecting the minimum difference between two dates in Oracle when the dates are represented as UNIX timestamps

There are many question posted about getting the difference between two dates in Oracle. My question is requires the query to do a couple more things.
Here's how far I have got at the moment
select m_bug_t.date_submitted, m_bug_history_t.date_modified
from m_bug_t, m_bug_history_t
where m_bug_t.id = m_bug_history_t.bug_id
and field_name = 'status'
and new_value = '100'
So far I get a set of date pairs returned like this
date_submitted | date_modified
1314894774 | 1315906468
...
...
I want to convert these numbers to dates, find the difference between them and then get the minimum of all the results. I want the difference to be represented as days.
Any ideas how you do this?
Thanks very much :).
Well, Unix timestamps are expressed as a number of seconds since 01 Jan 1970, so if you subtract one from the other you get the difference in seconds. The difference in days is then simply a matter of dividing by the number of seconds in a day:
(date_modified - date_submitted) / (24*60*60)
or
(date_modified - date_submitted) / 86400
To convert UNIX time to a date you can use:
DATE '1970-01-01' + numtodsinterval(:unix_time_stamp, 'second')
In SQL when you substract two dates you will get the difference in days so you could write:
SELECT MIN(dt_mod - dt_sub)
FROM (SELECT DATE '1970-01-01'
+ numtodsinterval(m_bug_t.date_submitted, 'second') dt_sub,
DATE '1970-01-01'
+ numtodsinterval(m_bug_history_t.date_modified, 'second') dt_mod
FROM m_bug_t, m_bug_history_t
WHERE m_bug_t.id = m_bug_history_t.bug_id
AND field_name = 'status'
AND new_value = '100')
Of course as others have suggested you don't really need to do this DATE conversion, you could just substract your 2 timestamps (difference in seconds) and convert the result in days.

Resources