How do I get millisecond precision in hive? - hadoop

The documentation says that timestamps support the following conversion:
•Floating point numeric types: Interpreted as UNIX timestamp in seconds with decimal precision
First of all, I'm not sure how to interpret this. If I have a timestamp 2013-01-01 12:00:00.423, can I convert this to a numeric type that retains the milliseconds? Because that is what I want.
More generally, I need to do comparisons between timestamps such as
select maxts - mints as latency from mytable
where maxts and mints are timestamp columns. Currently, this gives me NullPointerException using Hive 0.11.0. I am able to perform queries if I do something like
select unix_timestamp(maxts) - unix_timestamp(mints) as latency from mytable
but this only works for seconds, not millisecond precision.
Any help appreciated. Tell me if you need additional information.

If you want to work with milliseconds, don't use the unix timestamp functions because these consider date as seconds since epoch.
hive> describe function extended unix_timestamp;
unix_timestamp([date[, pattern]]) - Returns the UNIX timestamp
Converts the current or specified time to number of seconds since 1970-01-01.
Instead, convert the JDBC compliant timestamp to double.
E.g:
Given a tab delimited data:
cat /user/hive/ts/data.txt :
a 2013-01-01 12:00:00.423 2013-01-01 12:00:00.433
b 2013-01-01 12:00:00.423 2013-01-01 12:00:00.733
CREATE EXTERNAL TABLE ts (txt string, st Timestamp, et Timestamp)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION '/user/hive/ts';
Then you may query the difference between startTime(st) and endTime(et) in milliseconds as follows:
select
txt,
cast(
round(
cast((e-s) as double) * 1000
) as int
) latency
from (select txt, cast(st as double) s, cast(et as double) e from ts) q;

Related

range between interval in Hive

Good day,
Give me advice,please,
How can i replace this Oracle syntax:
sum(fact) over(partition by name order by rep_date range between interval '20' month preceding and current row) as w_sum
to use it in Hive? I have a mistake related with interval '20'
Convert the rep_date into seconds since Unix epoch using unix_timestamp and then calculate the seconds for 20 months and use it in the range,between. Hive does not support specifying interval type in range.
sum(fact) over(
partition by name
order by unix_timestamp(rep_date,'MM-dd-yyyy') -- Specify the rep_date format here
range between 51840000 preceding and current row) as w_sum

How to convert string date to big int in hive with milliseconds

I have a string 2013-01-01 12:00:01.546 which represents a timestamp with milliseconds that I need to convert to a bigint without losing the milliseconds.
I tried unix_timestamp but I lose the milliseconds:
unix_timestamp(2013-01-01 12:00:01.546,'yyyy-MM-dd HH:mm:ss') ==> 1357059601
unix_timestamp(2013-01-01 12:00:01.786,'yyyy-MM-dd HH:mm:ss') ==> 1357059601
I tried with milliseconds format as well but no difference
unix_timestamp(2013-01-01 12:00:01.786,'yyyy-MM-dd HH:mm:ss:SSS') ==> 1357059601
Is there any way to get milliseconds difference in hive?
This is what I came with so far.
If all your timestamps have a fraction of 3 digits it can be simplified.
with t as (select timestamp '2013-01-01 12:00:01.546' as ts)
select cast ((to_unix_timestamp(ts) + coalesce(cast(regexp_extract(ts,'\\.\\d*',0) as decimal(3,3)),0)) * 1000 as bigint)
from t
1357070401546
Verification of the result:
select from_utc_timestamp (1357070401546,'UTC')
2013-01-01 12:00:01.546000
So apparently unix_timestamp doesn't convert milliseconds. You can use the following approach.
hive> select unix_timestamp(cast(regexp_replace('2013-01-01 12:00:01.546', '(\\d{4})-(\\d{2})-(\\d{2}) (\\d{2}):(\\d{2}):(\\d{2}).(\\d{3})', '$1-$2-$3 $4:$5:$6.$7' ) as timestamp));
OK
1357063201
Hive function unix_timestamp() doesn't convert the milli second part, so you may want to use the below:
unix_timestamp('2013-01-01 12:00:01.546') + cast(split('2013-01-01 12:00:01.546','\\\.')[1] as int) => 1357067347
unix_timestamp('2013-01-01 12:00:01.786') + cast(split('2013-01-01 12:00:01.786','\\\.')[1] as int) => 1357067587

Difference in two dates not coming as expected in Oracle 11g

I get some data through a OSB Proxy Service and have to transform that using Xquery. Earlier the transformation was done on the database but now it is to be done on the proxy itself. So I have been given the SQL queries which were used and have to generate Xquery expressions corresponding to those.
Here is the SQL query which is supposed to find the difference between 2 dates.
SELECT ROUND((CAST(DATEATTRIBUTE2 AS DATE) -
CAST(DATEATTRIBUTE1 AS DATE) ) * 86400 ) AS result
FROM SONY_TEST_TABLE;
DATEATTRIBUTE1 and DATEATTRIBUTE2 are both of TIMESTAMP type.
As per my understanding this query first casts the TIMESTAMP to DATE so that the time part is stripped then subtracts the dates. That difference in days in multiplied with 86400 to get the duration in seconds.
However, when I take DATEATTRIBUTE2 as 23-02-17 01:17:19.399000000 AM and DATEATTRIBUTE1 as 23-02-17 01:17:18.755000000 AM the result should ideally be 0 as the dates are same and i'm ignoring the time difference but surprisingly the result comes as 1. After checking I found that the ( CAST(DATEATTRIBUTE2 AS DATE) - CAST(DATEATTRIBUTE1 AS DATE) ) part aparently does not give an integer value but a fractional one. How does this work?? o_O
Any help is appreciated. Cheers!
EDIT : So got the problem thanks to all the answers! Even after casting to DATE it still has time so the time difference is also calculated. Now how do I implement this in XQuery? See this other question.
Oracle DATE datatype is actually a datetime. So casting something as a date doesn't remove the time element. To do that we need to truncate the value:
( trunc(DATEATTRIBUTE2) - trunc(DATEATTRIBUTE1) )
you should try this to find difference by day
SELECT (trunc(DATEATTRIBUTE2) -
trunc(DATEATTRIBUTE1) ) AS result
FROM SONY_TEST_TABLE;
alternative 2
you can use extract like below:
SELECT ROUND (
EXTRACT (MINUTE FROM INTERVAL_DIFFERENCE) / (24 * 60)
+ EXTRACT (HOUR FROM INTERVAL_DIFFERENCE) / 24
+ EXTRACT (DAY FROM INTERVAL_DIFFERENCE))
FROM (SELECT ( TO_TIMESTAMP ('23-02-17 01:17:19', 'dd-mm-yy hh24:mi:ss')
- TO_TIMESTAMP ('23-02-17 01:17:17', 'dd-mm-yy hh24:mi:ss'))
INTERVAL_DIFFERENCE
FROM DUAL)

How to generate diff between TIMESTAMP and DATE in SELECT in oracle 10

I need to query 2 tables, one contains a TIMESTAMP(6) column, other contains a DATE column. I want to write a select statement that prints both values and diff between these two in third column.
SB_BATCH.B_CREATE_DT - timestamp
SB_MESSAGE.M_START_TIME - date
SELECT SB_BATCH.B_UID, SB_BATCH.B_CREATE_DT, SB_MESSAGE.M_START_TIME,
to_date(to_char(SB_BATCH.B_CREATE_DT), 'DD-MON-RR HH24:MI:SS') as time_in_minutes
FROM SB_BATCH, SB_MESSAGE
WHERE
SB_BATCH.B_UID = SB_MESSAGE.M_B_UID;
Result:
Error report -
SQL Error: ORA-01830: date format picture ends before converting entire input string
01830. 00000 - "date format picture ends before converting entire input string"
You can subtract two timestamps to get an INTERVAL DAY TO SECOND, from which you calculate how many minutes elapsed between the two timestamps. In order to convert SB_MESSAGE.M_START_TIME to a timestamp you can use CAST.
Note that I have also removed your implicit table join with an explicit INNER JOIN, moving the join condition to the ON clause.
SELECT t.B_UID,
t.B_CREATE_DT,
t.M_START_TIME,
EXTRACT(DAY FROM t.diff)*24*60 +
EXTRACT(HOUR FROM t.diff)*60 +
EXTRACT(MINUTE FROM t.diff) +
ROUND(EXTRACT(SECOND FROM t.diff) / 60.0) AS diff_in_minutes
FROM
(
SELECT SB_BATCH.B_UID,
SB_BATCH.B_CREATE_DT,
SB_MESSAGE.M_START_TIME,
SB_BATCH.B_CREATE_DT - CAST(SB_MESSAGE.M_START_TIME AS TIMESTAMP) AS diff
FROM SB_BATCH
INNER JOIN SB_MESSAGE
ON SB_BATCH.B_UID = SB_MESSAGE.M_B_UID
) t
Convert the timestamp to a date using cast(... as date). Then take the difference between the dates, which is a number - expressed in days, so if you want it in minutes, multiply by 24*60. Then round the result as needed. I made up a small example below to isolate just the steps needed to answer your question. (Note that your query has many other problems, for example you didn't actually take a difference of anything anywhere. If you need help with your query in general, please post it as a separate question.)
select ts, dt, round( (sysdate - cast(ts as date))*24*60, 2) as time_diff_in_minutes
from (select to_timestamp('2016-08-23 03:22:44.734000', 'yyyy-mm-dd hh24:mi:ss.ff') as ts,
sysdate as dt from dual )
;
TS DT TIME_DIFF_IN_MINUTES
-------------------------------- ------------------- --------------------
2016-08-23 03:22:44.734000000 2016-08-23 08:09:15 286.52

How to calculate Date difference in Hive

I'm a novice. I have a employee table with a column specifying the joining date and I want to retrieve the list of employees who have joined in the last 3 months. I understand we can get the current date using from_unixtime(unix_timestamp()). How do I calculate the datediff? Is there a built in DATEDIFF() function like in MS SQL? please advice!
datediff(to_date(String timestamp), to_date(String timestamp))
For example:
SELECT datediff(to_date('2019-08-03'), to_date('2019-08-01')) <= 2;
If you need the difference in seconds (i.e.: you're comparing dates with timestamps, and not whole days), you can simply convert two date or timestamp strings in the format 'YYYY-MM-DD HH:MM:SS' (or specify your string date format explicitly) using unix_timestamp(), and then subtract them from each other to get the difference in seconds. (And can then divide by 60.0 to get minutes, or by 3600.0 to get hours, etc.)
Example:
UNIX_TIMESTAMP('2017-12-05 10:01:30') - UNIX_TIMESTAMP('2017-12-05 10:00:00') AS time_diff -- This will return 90 (seconds). Unix_timestamp converts string dates into BIGINTs.
More on what you can do with unix_timestamp() here, including how to convert strings with different date formatting: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions
yes datediff is implemented; see:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
By the way I found this by Google-searching "hive datediff", it was the first result ;)
I would try this first
select * from employee where month(current_date)-3 = month(joining_date)

Resources