Elasticsearch fails to index long custom date format - elasticsearch

For timestamps I am using the ISO format with zone ID included, defined as:
yyyy-MM-dd'T'HH:mm:ss.SSS'Z['z']'
This format for example matches these two timestamps:
2015-02-20T09:46:56.336Z[UTC]
2015-02-20T10:46:55.221+01:00[Europe/Berlin]
For indexing data into elasticsearch I also defined this date format in a mapping like this (using elastic4s DSL):
create index indexName mappings {
"/exampleType" as (
"exampleField" typed DateType
) dynamicDateFormats "yyyy-MM-dd'T'HH:mm:ss.SSS'Z['z']'"
}
Basically this mapping works as expected but I experience problems when the formatted date string gets too long due to the zone ID. E. g. the above example
2015-02-20T09:46:56.336Z[UTC]
having 30 chars works fine, whereas
2015-02-20T10:46:55.221+01:00[Europe/Berlin]
having 44 chars fails to index with following error:
...
Caused by: java.io.IOException: Cannot read numeric data larger than 32 chars
at org.elasticsearch.index.analysis.NumericTokenizer.incrementToken(NumericTokenizer.java:78) ~[elasticsearch-1.4.2.jar:na]
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:618) ~[lucene-core-4.10.2.jar:4.10.2 1634293 - mike - 2014-10-26 05:51:56]
at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:359) ~[lucene-core-4.10.2.jar:4.10.2 1634293 - mike - 2014-10-26 05:51:56]
...
My question is if there's a way to get around this problem e. g. by means of configuration or if I am forced to change my date format to ensure that formatted dates do not exceed 32 chars.

Related

Hanami validate year less than X

I want to check that a date object I have in a validator.rb file has a year field that is less than the year 10000.
required(:my_date_object).maybe(
:date?,
lt?: '10000-01-01'
)
When running system tests, the following error shows up:
ArgumentError:
comparison of Date with String failed
Should I look into converting the date field into a string using to_s or something similar and then doing a regexp format check? Or is there a more straightforward way of checking that the date is less than the year 10000?
You need to create a Date for the lt?.
You can write it like follows:
required(:my_date_object) { lt?(Date.new(10000, 1, 1)) }

How to convert a string to timestamp with milliseconds in Hive

I have a string '20141014123456789' which represents a timestamp with milliseconds that I need to convert to a timestamp in Hive (0.13.0) without losing the milliseconds.
I tried this but unix_timestamp returns an integer, so I lose the milliseconds:
from_unixtime(unix_timestamp('20141014123456789', 'yyyyMMddHHmmssSSS')) >> 2014-10-14 12:34:56
Casting a string works:
cast('2014-10-14 12:34:56.789' as timestamp) >> 2014-10-14 12:34:56.789
but my string isn't in that form.
I think I need to reformat my string from '20141014123456789' to '2014-10-14 12:34:56.789'. My challenge is how to do that without a messy concatenation of substrings.
I found a way to avoid the messy concatenation of substrings using the following code:
select cast(regexp_replace('20141014123456789',
'(\\d{4})(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{3})',
'$1-$2-$3 $4:$5:$6.$7') as timestamp)
A simple strategy would be to use date_format(arg1, arg2), where arg1 is the timestamp either as formatted string, date, or timestamp and the arg2 is the format of the string (in arg1). Refer to the SimpleDateFormat java documentation for what is acceptable in the format argument.
So, in this case:
date_format('20141014123456789', 'yyyyMMddHHmmssSSS')
would yield the following string: '2014-10-14 12:34:56.789' which can then be cast as timestamp:
cast(date_format('20141014123456789', 'yyyyMMddHHmmssSSS') as timestamp)
The above statement would return timestamp (as desired).
i had the date field in this form 2015-07-22T09:00:32.956443Z(stored as string). i needed to do some date manipulations.
the following command even though little messy worked fine for me:)
select cast(concat(concat(substr(date_created,1,10),' '),substr(date_created,12,15)) as timestamp) from tablename;
this looks confusing but it is quite easy if you break it down.
extracting the date and time with milliseconds and concat a space in between and then concat the whole thing and casting it into timestamp. now this can be used for date or timestamp manipulations.
Let say you have a column 'birth_date' in your table which is in string format,
you should use the following query to filter using birth_date
date_Format(birth_date, 'yyyy-MM-dd HH:mm:ssSSS')
You can use it in a query in the following way
select * from yourtable
where
date_Format(birth_date, 'yyyy-MM-dd HH:mm:ssSSS') = '2019-04-16 07:12:59999';
I don't think this can be done without being messy. Because according to the unix_timestamp() function documentation it returns the time is seconds and hence will omit the milliseconds part.
"Convert time string with given pattern to Unix time stamp (in seconds), return 0 if fail: unix_timestamp('2009-03-20', 'yyyy-MM-dd') = 1237532400."
Best option here would be to write a UDF to handle this is you want to avoid messy concatenations. However the concatenation (though messy) would be better to the job.

Select in ADO (vb6) with a numeric variable

Excuse me, occasionally I refer with some problem that maybe it's already been fixed. In any case, I would appreciate a clarification on vs.
I have a TariffeEstere table with the fields country, Min, Max, tariff
from which to extract the rate for the country concerned, depending on whether the value is between a minimum and a maximum and I should return a single record from which to extract its tariff:
The query is:
stsql = "Select * from QPagEstContanti Where country = ' Spain '
and min <= ImpAss and max >= ImpAss"
Where ImpAss is a variable of type double.
When I do
rstariffa.open ststql,.....
the recodset contains a record if e.g. ImpAss = 160 (i.e. an integer without decimals), and then the query works, but if it contains 21,77 ImpAss (Italian format) does not work anymore and gives me a syntax error.
To verify the contents of the query string (stsql) in fact I find:
Select * from QPagEstContanti Where country = 'Spain' and min < = 21,77 and max > = 21,77
in practice the bothering and would like a comma decimal, but do not know how do.
I tried to pass even a
format (ImpAss, "####0.00"),
but the value you found in a stsql is 21,77 always.
How can I fix the problem??
It sounds like the underlying language setting in SQL is expecting '.' decimals instead of ',' decimal notation.
To check this out - run the DBCC useroptions command and see what the 'language' value is set to. If the language is set to English or another '.' decimal notation - it explains why your SQL string is failing with values of double.
If that's the problem, the simplest way to fix it is to insert the following line after your stsql = statement:
  stsql = REPLACE(stsql, ",", ".")
Another way to fix it would be to change the DEFAULT_LANGUAGE for the login using the ALTER LOGIN command (but this changes the setting permanently)
Another way to fix it would be to add this command to the beginning of your stsql, which should change the language for the duration of the rs.Open:
  "SET LANGUAGE Italian;"

Ruby local_to_utc returns invalid year

I have the following date string ('US/Eastern'), which I need to convert to UTC:
date_src = '2014-07-07T23:10:00+0'
First I convert it to a "valid" format so I can operate it on later processes. I use the following to have an iso version of the date:
date = DateTime.parse(date_src).iso8601
At this point date is a nice '2014-07-07T23:10:00+00:00'. The last step on my process is to translate this date to UTC. I'm using the following:
TZInfo::Timezone.get('US/Eastern').local_to_utc(date)
The problem is this is giving me 20014 as output, instead of the UTC version of the original date. If I try:
TZInfo::Timezone.get('UTC').local_to_utc(date)
I get 2014, which is the correct year but still unexpected output.
Any ideas about what I'm doing wrong, and what I could use to solve the problem?
local_to_utc actually expects a Time or a DateTime instance:
TZInfo::Timezone.get('US/Eastern').local_to_utc(DateTime.parse(date_src))
# => #<DateTime: 2014-07-08T03:10:00+00:00 ((2456847j,11400s,0n),+0s,2299161j)>
From the documentation, you can have a hint on what actually happened:
All methods in TZInfo that operate on a time can be used with either Time or DateTime instances or with nteger timestamps (i.e. as returned by Time#to_i). The type of the values returned will match the the type passed in.
What actually happens is the local_to_utc calls to_i on the input parameter, which on a string returns the parsed integer from the beginning of the string (2014 in your case since date is the string 2014-07-07T23:10:00+00:00), and adds the time difference to it - 18000 for "US/Eastern" (5 hour difference), and 0 for UTC:
date.to_i
# => 2014
TZInfo::Timezone.get('US/Eastern').local_to_utc(date) - date.to_i
# => 18000
TZInfo::Timezone.get('UTC').local_to_utc(date) - date.to_i
# => 0
So the bottom line is - kind of serendipitously you saw this weird behavior, which stems from the compilation of some surprising quirks of the APIs you used...

Oracle date calculation issue

there is a requirement like below:
string format is : dd hh:mm:ss, this means (days hours:minutes:seconds, day is optional)
now the string will add to value "1/1/4000", so if the incoming value is "00:15:00" the resulting value would be 1/1/4000 00:15:00 (add 15 minutes to 1/1/4000). If the incoming value is 2 00:15:00 then the resulting value would be 1/3/4000 00:15:00 (add 2 days and 15 minutes to 1/1/4000) . If the incoming value is 32 00:15:00 then the resulting value would be 2/1/4000 00:15:00.
so is there any simple method to implement this requirement above?
You can convert your input string to INTERVAL DAY TO SECOND datatype using TO_DSINTERVAL and then add it to your default date. The result will be a date.
date'4000-01-01' + TO_DSINTERVAL('2 23:23:12');
But this requires your input string to be in DD HH:MI:SS format. Since in your input, day is optional, you should append 0 days to the string, in case it isn't present.

Resources