How to convert string date to big int in hive with milliseconds - hadoop

I have a string 2013-01-01 12:00:01.546 which represents a timestamp with milliseconds that I need to convert to a bigint without losing the milliseconds.
I tried unix_timestamp but I lose the milliseconds:
unix_timestamp(2013-01-01 12:00:01.546,'yyyy-MM-dd HH:mm:ss') ==> 1357059601
unix_timestamp(2013-01-01 12:00:01.786,'yyyy-MM-dd HH:mm:ss') ==> 1357059601
I tried with milliseconds format as well but no difference
unix_timestamp(2013-01-01 12:00:01.786,'yyyy-MM-dd HH:mm:ss:SSS') ==> 1357059601
Is there any way to get milliseconds difference in hive?

This is what I came with so far.
If all your timestamps have a fraction of 3 digits it can be simplified.
with t as (select timestamp '2013-01-01 12:00:01.546' as ts)
select cast ((to_unix_timestamp(ts) + coalesce(cast(regexp_extract(ts,'\\.\\d*',0) as decimal(3,3)),0)) * 1000 as bigint)
from t
1357070401546
Verification of the result:
select from_utc_timestamp (1357070401546,'UTC')
2013-01-01 12:00:01.546000

So apparently unix_timestamp doesn't convert milliseconds. You can use the following approach.
hive> select unix_timestamp(cast(regexp_replace('2013-01-01 12:00:01.546', '(\\d{4})-(\\d{2})-(\\d{2}) (\\d{2}):(\\d{2}):(\\d{2}).(\\d{3})', '$1-$2-$3 $4:$5:$6.$7' ) as timestamp));
OK
1357063201

Hive function unix_timestamp() doesn't convert the milli second part, so you may want to use the below:
unix_timestamp('2013-01-01 12:00:01.546') + cast(split('2013-01-01 12:00:01.546','\\\.')[1] as int) => 1357067347
unix_timestamp('2013-01-01 12:00:01.786') + cast(split('2013-01-01 12:00:01.786','\\\.')[1] as int) => 1357067587

Related

Difference in two dates not coming as expected in Oracle 11g

I get some data through a OSB Proxy Service and have to transform that using Xquery. Earlier the transformation was done on the database but now it is to be done on the proxy itself. So I have been given the SQL queries which were used and have to generate Xquery expressions corresponding to those.
Here is the SQL query which is supposed to find the difference between 2 dates.
SELECT ROUND((CAST(DATEATTRIBUTE2 AS DATE) -
CAST(DATEATTRIBUTE1 AS DATE) ) * 86400 ) AS result
FROM SONY_TEST_TABLE;
DATEATTRIBUTE1 and DATEATTRIBUTE2 are both of TIMESTAMP type.
As per my understanding this query first casts the TIMESTAMP to DATE so that the time part is stripped then subtracts the dates. That difference in days in multiplied with 86400 to get the duration in seconds.
However, when I take DATEATTRIBUTE2 as 23-02-17 01:17:19.399000000 AM and DATEATTRIBUTE1 as 23-02-17 01:17:18.755000000 AM the result should ideally be 0 as the dates are same and i'm ignoring the time difference but surprisingly the result comes as 1. After checking I found that the ( CAST(DATEATTRIBUTE2 AS DATE) - CAST(DATEATTRIBUTE1 AS DATE) ) part aparently does not give an integer value but a fractional one. How does this work?? o_O
Any help is appreciated. Cheers!
EDIT : So got the problem thanks to all the answers! Even after casting to DATE it still has time so the time difference is also calculated. Now how do I implement this in XQuery? See this other question.
Oracle DATE datatype is actually a datetime. So casting something as a date doesn't remove the time element. To do that we need to truncate the value:
( trunc(DATEATTRIBUTE2) - trunc(DATEATTRIBUTE1) )
you should try this to find difference by day
SELECT (trunc(DATEATTRIBUTE2) -
trunc(DATEATTRIBUTE1) ) AS result
FROM SONY_TEST_TABLE;
alternative 2
you can use extract like below:
SELECT ROUND (
EXTRACT (MINUTE FROM INTERVAL_DIFFERENCE) / (24 * 60)
+ EXTRACT (HOUR FROM INTERVAL_DIFFERENCE) / 24
+ EXTRACT (DAY FROM INTERVAL_DIFFERENCE))
FROM (SELECT ( TO_TIMESTAMP ('23-02-17 01:17:19', 'dd-mm-yy hh24:mi:ss')
- TO_TIMESTAMP ('23-02-17 01:17:17', 'dd-mm-yy hh24:mi:ss'))
INTERVAL_DIFFERENCE
FROM DUAL)

How to generate diff between TIMESTAMP and DATE in SELECT in oracle 10

I need to query 2 tables, one contains a TIMESTAMP(6) column, other contains a DATE column. I want to write a select statement that prints both values and diff between these two in third column.
SB_BATCH.B_CREATE_DT - timestamp
SB_MESSAGE.M_START_TIME - date
SELECT SB_BATCH.B_UID, SB_BATCH.B_CREATE_DT, SB_MESSAGE.M_START_TIME,
to_date(to_char(SB_BATCH.B_CREATE_DT), 'DD-MON-RR HH24:MI:SS') as time_in_minutes
FROM SB_BATCH, SB_MESSAGE
WHERE
SB_BATCH.B_UID = SB_MESSAGE.M_B_UID;
Result:
Error report -
SQL Error: ORA-01830: date format picture ends before converting entire input string
01830. 00000 - "date format picture ends before converting entire input string"
You can subtract two timestamps to get an INTERVAL DAY TO SECOND, from which you calculate how many minutes elapsed between the two timestamps. In order to convert SB_MESSAGE.M_START_TIME to a timestamp you can use CAST.
Note that I have also removed your implicit table join with an explicit INNER JOIN, moving the join condition to the ON clause.
SELECT t.B_UID,
t.B_CREATE_DT,
t.M_START_TIME,
EXTRACT(DAY FROM t.diff)*24*60 +
EXTRACT(HOUR FROM t.diff)*60 +
EXTRACT(MINUTE FROM t.diff) +
ROUND(EXTRACT(SECOND FROM t.diff) / 60.0) AS diff_in_minutes
FROM
(
SELECT SB_BATCH.B_UID,
SB_BATCH.B_CREATE_DT,
SB_MESSAGE.M_START_TIME,
SB_BATCH.B_CREATE_DT - CAST(SB_MESSAGE.M_START_TIME AS TIMESTAMP) AS diff
FROM SB_BATCH
INNER JOIN SB_MESSAGE
ON SB_BATCH.B_UID = SB_MESSAGE.M_B_UID
) t
Convert the timestamp to a date using cast(... as date). Then take the difference between the dates, which is a number - expressed in days, so if you want it in minutes, multiply by 24*60. Then round the result as needed. I made up a small example below to isolate just the steps needed to answer your question. (Note that your query has many other problems, for example you didn't actually take a difference of anything anywhere. If you need help with your query in general, please post it as a separate question.)
select ts, dt, round( (sysdate - cast(ts as date))*24*60, 2) as time_diff_in_minutes
from (select to_timestamp('2016-08-23 03:22:44.734000', 'yyyy-mm-dd hh24:mi:ss.ff') as ts,
sysdate as dt from dual )
;
TS DT TIME_DIFF_IN_MINUTES
-------------------------------- ------------------- --------------------
2016-08-23 03:22:44.734000000 2016-08-23 08:09:15 286.52

Hive Current date function

I want to get the current date in beeline.
I tried to use this:
FROM_UNIXTIME(UNIX_TIMESTAMP())
it outputs this:
16-03-21
What I was looking to get it:
2016-03-21 09:34
How do I do it? I see the beeline documentation here:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions
But it didnt work for me.
you can get it by passing expected format as a parameter of from_unixtime function.
Example :
select from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:MM');
Result:
2016-03-21 16:03
Try this:
Select to_date(from_unixtime(unix_timestamp())) from my table ...
Results in '2016-03-21'
there are many functions you can use in hive : taken from http://atiblog.com/date-function-hive/
1)from_unixtime:
This function converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a STRING that represents the TIMESTAMP of that moment in the current system time zone in the format of “1970-01-01 00:00:00”. The following example returns the current date including the time.
hive> SELECT FROM_UNIXTIME(UNIX_TIMESTAMP());
OK
2015–05–18 05:43:37
Time taken: 0.153 seconds, Fetched: 1 row(s)
2)from_utc_timestamp:-
This function assumes that the string in the first expression is UTC and then, converts that string to the time zone of the second expression. This function and the to_utc_timestamp function do timezone conversions. In the following example, t1 is a string.
hive> SELECT from_utc_timestamp(‘1970-01-01 07:00:00’, ‘JST’);
OK
1970–01–01 16:00:00
Time taken: 0.148 seconds, Fetched: 1 row(s)
3)to_utc_timestamp:
This function assumes that the string in the first expression is in the timezone that is specified in the second expression, and then converts the value to UTC format. This function and the from_utc_timestamp function do timezone conversions.
hive> SELECT to_utc_timestamp (‘1970-01-01 00:00:00’,‘America/Denver’);
OK
1970–01–01 07:00:00
Time taken: 0.153 seconds, Fetched: 1 row(s)
4)unix_timestamp :
This function converts the date to the specified date format and returns the number of seconds between the specified date and Unix epoch. If it fails, then it returns 0. The following example returns the value 1237487400
hive> SELECT unix_timestamp (‘2009-03-20’, ‘yyyy-MM-dd’);
OK
1237487400
Time taken: 0.156 seconds, Fetched: 1 row(s)
5)unix_timestamp() :This function returns the number of seconds from the Unix epoch (1970-01-01 00:00:00 UTC) using the default time zone.
hive> select UNIX_TIMESTAMP(‘2000-01-01 00:00:00’);
OK
946665000
Time taken: 0.147 seconds, Fetched: 1 row(s)
6)unix_timestamp( string date ) :
This function converts the date in format ‘yyyy-MM-dd HH:mm:ss’ into Unix timestamp. This will return the number of seconds between the specified date and the Unix epoch. If it fails, then it returns 0.
hive> select UNIX_TIMESTAMP(‘2000-01-01 10:20:30’,‘yyyy-MM-dd’);
OK
946665000
Time taken: 0.148 seconds, Fetched: 1 row(s)
7)unix_timestamp( string date, string pattern ) :
This function converts the date to the specified date format and returns the number of seconds between the specified date and Unix epoch. If it fails, then it returns 0.
hive> select FROM_UNIXTIME( UNIX_TIMESTAMP() );
8)from_unixtime( bigint number_of_seconds [, string format] ) :The FROM_UNIX function converts the specified number of seconds from Unix epoch and returns the date in the format ‘yyyy-MM-dd HH:mm:ss’.
hive> SELECT FROM_UNIXTIME(UNIX_TIMESTAMP());
9)To_Date( string timestamp ) :
hive> select TO_DATE(‘2000-01-01 10:20:30’);
OK
2000–01–01
10)WEEKOFYEAR( string date )
The WEEKOFYEAR function returns the week number of the date.
hive> SELECT WEEKOFYEAR(‘2000-03-01 10:20:30’);
OK
9
11)DATEDIFF( string date1, string date2 )
The DATEDIFF function returns the number of days between the two given dates.
hive> SELECT DATEDIFF(‘2000-03-01’, ‘2000-01-10’);
OK
51
Time taken: 0.156 seconds, Fetched: 1 row(s)
12)DATE_ADD( string date, int days )
The DATE_ADD function adds the number of days to the specified date
hive> SELECT DATE_ADD(‘2000-03-01’, 5);
OK
2000–03–06
13)DATE_SUB( string date, int days )
The DATE_SUB function subtracts the number of days to the specified date
hive> SELECT DATE_SUB(‘2000-03-01’, 5);
OK
2000–02–25
14)DATE CONVERSIONS :Convert MMddyyyy Format to Unixtime
Note: M Should be Capital Every time in MMddyyyy Format
select cast(substring(from_unixtime(unix_timestamp(dt, ‘MMddyyyy’)),1,10) as date) from sample;

How to calculate Date difference in Hive

I'm a novice. I have a employee table with a column specifying the joining date and I want to retrieve the list of employees who have joined in the last 3 months. I understand we can get the current date using from_unixtime(unix_timestamp()). How do I calculate the datediff? Is there a built in DATEDIFF() function like in MS SQL? please advice!
datediff(to_date(String timestamp), to_date(String timestamp))
For example:
SELECT datediff(to_date('2019-08-03'), to_date('2019-08-01')) <= 2;
If you need the difference in seconds (i.e.: you're comparing dates with timestamps, and not whole days), you can simply convert two date or timestamp strings in the format 'YYYY-MM-DD HH:MM:SS' (or specify your string date format explicitly) using unix_timestamp(), and then subtract them from each other to get the difference in seconds. (And can then divide by 60.0 to get minutes, or by 3600.0 to get hours, etc.)
Example:
UNIX_TIMESTAMP('2017-12-05 10:01:30') - UNIX_TIMESTAMP('2017-12-05 10:00:00') AS time_diff -- This will return 90 (seconds). Unix_timestamp converts string dates into BIGINTs.
More on what you can do with unix_timestamp() here, including how to convert strings with different date formatting: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions
yes datediff is implemented; see:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
By the way I found this by Google-searching "hive datediff", it was the first result ;)
I would try this first
select * from employee where month(current_date)-3 = month(joining_date)

How do I get millisecond precision in hive?

The documentation says that timestamps support the following conversion:
•Floating point numeric types: Interpreted as UNIX timestamp in seconds with decimal precision
First of all, I'm not sure how to interpret this. If I have a timestamp 2013-01-01 12:00:00.423, can I convert this to a numeric type that retains the milliseconds? Because that is what I want.
More generally, I need to do comparisons between timestamps such as
select maxts - mints as latency from mytable
where maxts and mints are timestamp columns. Currently, this gives me NullPointerException using Hive 0.11.0. I am able to perform queries if I do something like
select unix_timestamp(maxts) - unix_timestamp(mints) as latency from mytable
but this only works for seconds, not millisecond precision.
Any help appreciated. Tell me if you need additional information.
If you want to work with milliseconds, don't use the unix timestamp functions because these consider date as seconds since epoch.
hive> describe function extended unix_timestamp;
unix_timestamp([date[, pattern]]) - Returns the UNIX timestamp
Converts the current or specified time to number of seconds since 1970-01-01.
Instead, convert the JDBC compliant timestamp to double.
E.g:
Given a tab delimited data:
cat /user/hive/ts/data.txt :
a 2013-01-01 12:00:00.423 2013-01-01 12:00:00.433
b 2013-01-01 12:00:00.423 2013-01-01 12:00:00.733
CREATE EXTERNAL TABLE ts (txt string, st Timestamp, et Timestamp)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION '/user/hive/ts';
Then you may query the difference between startTime(st) and endTime(et) in milliseconds as follows:
select
txt,
cast(
round(
cast((e-s) as double) * 1000
) as int
) latency
from (select txt, cast(st as double) s, cast(et as double) e from ts) q;

Resources