How to convert a string to timestamp with milliseconds in Hive - hadoop

I have a string '20141014123456789' which represents a timestamp with milliseconds that I need to convert to a timestamp in Hive (0.13.0) without losing the milliseconds.
I tried this but unix_timestamp returns an integer, so I lose the milliseconds:
from_unixtime(unix_timestamp('20141014123456789', 'yyyyMMddHHmmssSSS')) >> 2014-10-14 12:34:56
Casting a string works:
cast('2014-10-14 12:34:56.789' as timestamp) >> 2014-10-14 12:34:56.789
but my string isn't in that form.
I think I need to reformat my string from '20141014123456789' to '2014-10-14 12:34:56.789'. My challenge is how to do that without a messy concatenation of substrings.

I found a way to avoid the messy concatenation of substrings using the following code:
select cast(regexp_replace('20141014123456789',
'(\\d{4})(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{3})',
'$1-$2-$3 $4:$5:$6.$7') as timestamp)

A simple strategy would be to use date_format(arg1, arg2), where arg1 is the timestamp either as formatted string, date, or timestamp and the arg2 is the format of the string (in arg1). Refer to the SimpleDateFormat java documentation for what is acceptable in the format argument.
So, in this case:
date_format('20141014123456789', 'yyyyMMddHHmmssSSS')
would yield the following string: '2014-10-14 12:34:56.789' which can then be cast as timestamp:
cast(date_format('20141014123456789', 'yyyyMMddHHmmssSSS') as timestamp)
The above statement would return timestamp (as desired).

i had the date field in this form 2015-07-22T09:00:32.956443Z(stored as string). i needed to do some date manipulations.
the following command even though little messy worked fine for me:)
select cast(concat(concat(substr(date_created,1,10),' '),substr(date_created,12,15)) as timestamp) from tablename;
this looks confusing but it is quite easy if you break it down.
extracting the date and time with milliseconds and concat a space in between and then concat the whole thing and casting it into timestamp. now this can be used for date or timestamp manipulations.

Let say you have a column 'birth_date' in your table which is in string format,
you should use the following query to filter using birth_date
date_Format(birth_date, 'yyyy-MM-dd HH:mm:ssSSS')
You can use it in a query in the following way
select * from yourtable
where
date_Format(birth_date, 'yyyy-MM-dd HH:mm:ssSSS') = '2019-04-16 07:12:59999';

I don't think this can be done without being messy. Because according to the unix_timestamp() function documentation it returns the time is seconds and hence will omit the milliseconds part.
"Convert time string with given pattern to Unix time stamp (in seconds), return 0 if fail: unix_timestamp('2009-03-20', 'yyyy-MM-dd') = 1237532400."
Best option here would be to write a UDF to handle this is you want to avoid messy concatenations. However the concatenation (though messy) would be better to the job.

Related

I've attempted a case when statement

I have attempted to use
case
when cc.create_date_time < 04/17/2019 '
then 'counted'
when cc.create_date_time > 04/17/2019
then 'need counted'
else null
end as TIME_GAP
with no luck - whatever is on the first line as in the 'counted' returns for all data even though there are results done before that date that should say need counted... how do I fix this...?
If something was counted 04/17/2019 and later its good if it has been counted before that date then I need it to tell me that... thanks
If CREATE_DATE_TIME column's datatype is DATE (it should be), why are you comparing it to strings? '04/17/2019' is a string. Either use DATE literal, or convert that string to date with the TO_DATE function and appropriate format mask, e.g.
case when cc.create_date_time < date '2019-04-17' then ...
or
case when cc.create_date_time < to_date('04/17/2019', 'mm/dd/yyyy') then ...

Hanami validate year less than X

I want to check that a date object I have in a validator.rb file has a year field that is less than the year 10000.
required(:my_date_object).maybe(
:date?,
lt?: '10000-01-01'
)
When running system tests, the following error shows up:
ArgumentError:
comparison of Date with String failed
Should I look into converting the date field into a string using to_s or something similar and then doing a regexp format check? Or is there a more straightforward way of checking that the date is less than the year 10000?
You need to create a Date for the lt?.
You can write it like follows:
required(:my_date_object) { lt?(Date.new(10000, 1, 1)) }

Select in ADO (vb6) with a numeric variable

Excuse me, occasionally I refer with some problem that maybe it's already been fixed. In any case, I would appreciate a clarification on vs.
I have a TariffeEstere table with the fields country, Min, Max, tariff
from which to extract the rate for the country concerned, depending on whether the value is between a minimum and a maximum and I should return a single record from which to extract its tariff:
The query is:
stsql = "Select * from QPagEstContanti Where country = ' Spain '
and min <= ImpAss and max >= ImpAss"
Where ImpAss is a variable of type double.
When I do
rstariffa.open ststql,.....
the recodset contains a record if e.g. ImpAss = 160 (i.e. an integer without decimals), and then the query works, but if it contains 21,77 ImpAss (Italian format) does not work anymore and gives me a syntax error.
To verify the contents of the query string (stsql) in fact I find:
Select * from QPagEstContanti Where country = 'Spain' and min < = 21,77 and max > = 21,77
in practice the bothering and would like a comma decimal, but do not know how do.
I tried to pass even a
format (ImpAss, "####0.00"),
but the value you found in a stsql is 21,77 always.
How can I fix the problem??
It sounds like the underlying language setting in SQL is expecting '.' decimals instead of ',' decimal notation.
To check this out - run the DBCC useroptions command and see what the 'language' value is set to. If the language is set to English or another '.' decimal notation - it explains why your SQL string is failing with values of double.
If that's the problem, the simplest way to fix it is to insert the following line after your stsql = statement:
  stsql = REPLACE(stsql, ",", ".")
Another way to fix it would be to change the DEFAULT_LANGUAGE for the login using the ALTER LOGIN command (but this changes the setting permanently)
Another way to fix it would be to add this command to the beginning of your stsql, which should change the language for the duration of the rs.Open:
  "SET LANGUAGE Italian;"

Ruby local_to_utc returns invalid year

I have the following date string ('US/Eastern'), which I need to convert to UTC:
date_src = '2014-07-07T23:10:00+0'
First I convert it to a "valid" format so I can operate it on later processes. I use the following to have an iso version of the date:
date = DateTime.parse(date_src).iso8601
At this point date is a nice '2014-07-07T23:10:00+00:00'. The last step on my process is to translate this date to UTC. I'm using the following:
TZInfo::Timezone.get('US/Eastern').local_to_utc(date)
The problem is this is giving me 20014 as output, instead of the UTC version of the original date. If I try:
TZInfo::Timezone.get('UTC').local_to_utc(date)
I get 2014, which is the correct year but still unexpected output.
Any ideas about what I'm doing wrong, and what I could use to solve the problem?
local_to_utc actually expects a Time or a DateTime instance:
TZInfo::Timezone.get('US/Eastern').local_to_utc(DateTime.parse(date_src))
# => #<DateTime: 2014-07-08T03:10:00+00:00 ((2456847j,11400s,0n),+0s,2299161j)>
From the documentation, you can have a hint on what actually happened:
All methods in TZInfo that operate on a time can be used with either Time or DateTime instances or with nteger timestamps (i.e. as returned by Time#to_i). The type of the values returned will match the the type passed in.
What actually happens is the local_to_utc calls to_i on the input parameter, which on a string returns the parsed integer from the beginning of the string (2014 in your case since date is the string 2014-07-07T23:10:00+00:00), and adds the time difference to it - 18000 for "US/Eastern" (5 hour difference), and 0 for UTC:
date.to_i
# => 2014
TZInfo::Timezone.get('US/Eastern').local_to_utc(date) - date.to_i
# => 18000
TZInfo::Timezone.get('UTC').local_to_utc(date) - date.to_i
# => 0
So the bottom line is - kind of serendipitously you saw this weird behavior, which stems from the compilation of some surprising quirks of the APIs you used...

How do you store a string in MongoDB as a Date type using Ruby?

I have a string that I'm parsing out from log files that looks like the following:
"[22/May/2011:23:02:21 +0000]"
What's the best way (examples in Ruby would be most appreciated, as I'm using the Mongo Ruby driver) to get that stashed into MongoDB as a native Date type?
require 'date' # this is just to get the ABBR_MONTHNAMES list
input = "[22/May/2011:23:02:21 +0000]"
# this regex captures the numbers and month name
pattern = %r{^\[(\d{2})/(\w+)/(\d{4}):(\d{2}):(\d{2}):(\d{2}) ([+-]\d{4})\]$}
match = input.match(pattern)
# MatchData can be splatted, which is very convenient
_, date, month_name, year, hour, minute, second, tz_offset = *match
# ABBR_MONTHNAMES contains "Jan", "Feb", etc.
month = Date::ABBR_MONTHNAMES.index(month_name)
# we need to insert a colon in the tz offset, because Time.new expects it
tz = tz_offset[0,3] + ':' + tz_offset[3,5]
# this is your time object, put it into Mongo and it will be saved as a Date
Time.new(year.to_i, month, date.to_i, hour.to_i, minute.to_i, second.to_i, tz)
A few things to note:
I assumed that the month names are the same as in the ABBR_MONTHNAMES list, otherwise, just make your own list.
Never ever use Date.parse to parse dates it is incredibly slow, the same goes for DateTime.parse, Time.parse, which use the same implementation.
If you parse a lot of different date formats check out the home_run gem.
If you do a lot of these (like you often do when parsing log files), consider not using a regex. Use String#index, #[] and #split to extract the parts you need.
If you want to do this as fast as possible, something like the following is probably more appropriate. It doesn't use regexes (which are useful, but not fast):
date = input[1, 2].to_i
month_name = input[4, 3]
month = Date::ABBR_MONTHNAMES.index(month_name)
year = input[8, 4].to_i
hour = input[13, 2].to_i
minute = input[16, 2].to_i
second = input[19, 2].to_i
tz_offset = input[22, 3].to_i * 60 * 60 + input[25, 2].to_i * 60
Time.new(year, month, date, hour, minute, second, tz_offset)
It takes advantage of the fact that all fields have fixed width (at least I assume they do). So all you need to do is extract the substrings. It also calculates the timezone offset as a number instead of a string.

Resources