No exact match for RANGE query for a specific time - elasticsearch

Question
Why does the Elasticsearch Range query not exact match with the time "2017-11-30T13:23:23.063657+11:00"? Kindly suggest if there is a mistake in the query or it is expected.
Query
curl -XGET 'https://hostname/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"range" : {
"time" : {
"gte": "2017-11-30T13:23:23.063657+11:00",
"lte": "2017-11-30T13:23:23.063657+11:00"
}
}
}
}
'
The expected only one data to match is below.
{
"_index": "***",
"_source": {
"time": "2017-11-30T13:23:23.063657+11:00",
"log_level": "INFO",
"log_time": "2017-11-30 13:23:23,042"
},
"fields": {
"time": [
1512008603063
]
}
}
Result
However, it matched multiple records which is closer to the time.
"hits" : {
"total" : 11,
"max_score" : 1.0,
"hits" : [ {
"_index" : "***",
"_score" : 1.0,
"_source" : {
"time" : "2017-11-30T13:23:23.063612+11:00",
"log_level" : "INFO",
"log_time" : "2017-11-30 13:23:23,016"
}
}, {
"_index" : "core-apis-non-prod.97d5f1ee-a570-11e6-b038-02dc30517283.2017.11.30",
"_score" : 1.0,
"_source" : {
"time" : "2017-11-30T13:23:23.063722+11:00",
"log_level" : "INFO",
"log_time" : "2017-11-30 13:23:23,046"
}
}
...

Elasticsearch uses Joda-Time for parsing dates. And your problem is that Joda-Time only stores date/time values down to the millisecond.
From the docs:
The library internally uses a millisecond instant which is identical
to the JDK and similar to other common time representations. This
makes interoperability easy, and Joda-Time comes with out-of-the-box
JDK interoperability.
This means that the last 3 digits of the seconds are not taken into account when parsing the date.
2017-11-30T13:23:23.063612+11:00
2017-11-30T13:23:23.063657+11:00
2017-11-30T13:23:23.063722+11:00
Are all interpreted as:
2017-11-30T13:23:23.063+11:00
And the corresponding epoch time is 1512008603063 for all these values.
You can see this too by adding explain to the query like this:
{
"query": {
"range" : {
"time" : {
"gte": "2017-11-30T13:23:23.063657+11:00",
"lte": "2017-11-30T13:23:23.063657+11:00"
}
}
},
"explain": true
}
That is basically the reason all those documents match your query.

Related

Convert two repeated values in array into a string

I have some old documents where a field has an array of two vales repeated, something like this:
"task" : [
"first_task",
"first_task"
],
I'm trying to convert this array into a string because it's the same value. I've seen the following script: Convert array with 2 equal values to single value but in my case, this problem can't be fixed through logstash because it happens just with old documents stored.
I was thinking to do something like this:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"script": {
"description": "Change task field from array to first element of this one",
"lang": "painless",
"source": """
if (ctx['task'][0] == ctx['task'][1]) {
ctx['task'] = ctx['task'][0];
}
"""
}
}
]
},
"docs": [
{
"_index" : "tasks",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"#timestamp" : "2022-05-03T07:33:44.652Z",
"task" : ["first_task", "first_task"]
}
}
]
}
The result document is the following:
{
"docs" : [
{
"doc" : {
"_index" : "tasks",
"_type" : "_doc",
"_id" : "1",
"_source" : {
"#timestamp" : "2022-05-03T07:33:44.652Z",
"task" : "first_task"
},
"_ingest" : {
"timestamp" : "2022-05-11T09:08:48.150815183Z"
}
}
}
]
}
We can see the task field is reassigned and we have the first element of the array as a value.
Is there a way to manipulate actual data from Elasticsearch and convert all the documents with this characteristic using DSL queries?
Thanks.
You can achieve this with _update_by_query endpoint. Here is an example:
POST tasks/_update_by_query
{
"script": {
"source": """
if (ctx._source['task'][0] == ctx._source['task'][1]) {
ctx._source['task'] = ctx._source['task'][0];
}
""",
"lang": "painless"
},
"query": {
"match_all": {}
}
}
You can remove the match_all query if you want to update all documents or you can filter documents by chaning the conditions in the query.
Keep in mind that running a script to update all documents in the index may cause some performance issues while the update process is running.

ElasticSearch - Search between a range of dates to compare them

I am new to ElasticSearch (using version 7.6) and trying to find out how to search between two periods in time. One query I'm trying out is to query week-12 of 2019 and week-12 of 2020. The idea is to compare the results. While reading the documentation and searching for samples I have came close to what I'm looking for.
The easy way was to fire two queries with both different dates. But I would like to limit the amount of queries. The latest query I have written based on reading the docs is with the use of aggregations, but I'm not sure this is the right way:
GET sample-data_*/_search/
{
"query": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": "2020-03-20 08:00:00",
"lte": "2020-03-27 08:00:00"
}
}
}
]
}
},
"aggs": {
"range": {
"date_range": {
"field": "date",
"format": "8yyyy-MM-dd",
"ranges": [
{
"from": "2019-03-20",
"to": "2019-03-27",
"key": "last_years_week"
},
{
"from": "2020-03-20",
"to": "2020-03-27",
"key": "this_years_week"
}
],
"keyed": true
}
}
}
}
The results are coming in followed by the aggregations, but they do not contain the data that I am looking for. One of the results are returned:
{
"_index" : "sample-data_2020_03_26",
"_type" : "_doc",
"_id" : "JyhcfWFFz0s1vwizjgxh",
"_score" : 1.0,
"_source" : {
"#timestamp" : "2020-03-26 00:00:00",
"name" : "TEST0001",
"count" : "150",
"total" : 3000
}
}
...
"aggregations" : {
"range" : {
"buckets" : {
"last_years_week" : {
"from" : 1.55304E12,
"from_as_string" : "2019-03-20",
"to" : 1.5536448E12,
"to_as_string" : "2019-03-27",
"doc_count" : 0
},
"this_years_week" : {
"from" : 1.5846624E12,
"from_as_string" : "2020-03-20",
"to" : 1.5852672E12,
"to_as_string" : "2020-03-27",
"doc_count" : 0
}
}
}
}
My question is: what could be an efficient way to query data between two dates of different years using ElasticSearch, so they could be used to compare the numbers?
I would be happy to read more about the, for me complex, ElasticSearch query if you could point me into the right direction.
Thank you!
Not posting the working solution with the Elasticsearch query but as discussed in the question comments, summarize it in the form of the answer which some useful links.
Range queries on date fields are very useful to quickly search between date ranges, also supports various math operations on date fields.
aggregation on date range will also be useful and
The main difference between this aggregation and the normal range aggregation is that the from and to values can be expressed in Date Math expression useful if you want to have aggregations on your date range and it supports data math format as mention below:

How to make Elastic Engine understand a field is not to be analyzed for an exact match?

The question is based on the previous post where the Exact Search did not work either based on Match or MatchPhrasePrefix.
Then I found a similar kind of post here where the search field is set to be not_analyzed in the mapping definition (by #Russ Cam).
But I am using
package id="Elasticsearch.Net" version="7.6.0" targetFramework="net461"
package id="NEST" version="7.6.0" targetFramework="net461"
and might be for that reason the solution did not work.
Because If I pass "SOME", it matches with "SOME" and "SOME OTHER LOAN" which should not be the case (in my earlier post for "product value").
How can I do the same using NEST 7.6.0?
Well I'm not aware of how your current mapping looks. Also I don't know about NEST as well but I will explain
How to make Elastic Engine understand a field is not to be analyzed for an exact match?
by an example using elastic dsl.
For exact match (case sensitive) all you need to do is to define the field type as keyword. For a field of type keyword the data is indexed as it is without applying any analyzer and hence it is perfect for exact matching.
PUT test
{
"mappings": {
"properties": {
"field1": {
"type": "keyword"
}
}
}
}
Now lets index some docs
POST test/_doc/1
{
"field1":"SOME"
}
POST test/_doc/2
{
"field1": "SOME OTHER LOAN"
}
For exact matching we can use term query. Lets search for "SOME" and we should get document 1.
GET test/_search
{
"query": {
"term": {
"field1": "SOME"
}
}
}
O/P that we get:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6931472,
"_source" : {
"field1" : "SOME"
}
}
]
}
}
So the crux is make the field type as keyword and use term query.

Syntax for function_score in Elastic Search

I am new to Elastic Search. I have indexed movie artists (actors and directors) in ElasticSearch and a simple text search works fine e.g if I search for 'steven' with the following syntax
{"query":
{"query_string":
{"query":"steven"}
}
}
... I get the following results, which are fine :
1. Steven Conrad - Popularity (from document) = 487 - elasticsearch _score = 3,2589545
2. Steven Knight - Popularity (from document) = 487 - elasticsearch _score = 3,076738
3. Steven Waddington - Popularity (from document) = 431 - elasticsearch _score = 2,4931839
4. Steven E. De Souza - Popularity (from document) = 534 - elasticsearch _score = 2,4613905
5. Steven R. Monroe - Popularity (from document) = 293 - elasticsearch _score = 2,4613905
6. Steven Mackintosh - Popularity (from document) = 363 - elasticsearch _score = 2,2812681
7. Steven Wright - Popularity (from document) = 356 - elasticsearch _score = 2,2812681
8. Steven Soderbergh - Popularity (from document) = 5947 - elasticsearch _score = 2,270944
9. Steven Seagal - Popularity (from document) = 1388 - elasticsearch _score = 2,270944
10. Steven Bauer - Popularity (from document) = 714 - elasticsearch _score = 2,270944
However, as you can see above, I have a popularity numeric field in my document, and, when searching for 'steven', I would like the most popular artists (Steven Soderbergh, Steven Seagal ...) to come first.
Ideally, I'd like to sort the results above by popularity * _score
I am pretty sure I have to have use the function_score feature of Elastic Search but I can't figure out the exact syntax.
I've tried to do my "improved" search with the following syntax
{
"query": {
"custom_score": {
"query": {
"query_string": {
"query": "steven"
}
},
"script": "_score * doc['popularity']"
}
}
}
But I get an exception (extract from error message below :)
org.elasticsearch.search.query.QueryPhaseExecutionException: [my_index][4]: query[filtered(function score (_all:steven,function=script[_score * doc['popularity']], params [null]))->cache(_type:artist)],from[0],size[10]: Query Failed [Failed to execute main query]
// ....
Caused by: java.lang.RuntimeException: uncomparable values <<1.9709579>> and <<org.elasticsearch.index.fielddata.ScriptDocValues$Longs#7c5b73bc>>
// ...
... 9 more
Caused by: java.lang.ClassCastException: org.elasticsearch.index.fielddata.ScriptDocValues$Longs cannot be cast to java.lang.Float
at java.lang.Float.compareTo(Float.java:33)
at org.elasticsearch.common.mvel2.math.MathProcessor.doOperationNonNumeric(MathProcessor.java:266)
I have the impression the syntax I use is incorrect
What should be the right syntax ? Or is there something else that I am missing ? Thanks a lot in advance
Edit
My table mapping is defined as follows :
"mappings" : {
"artist" : {
"_all" : {
"auto_boost" : true
},
"properties" : {
"first_name" : {
"type" : "string",
"index" : "not_analyzed",
"analyzer" : "standard"
},
"last_name" : {
"type" : "string",
"boost" : 2.0,
"index" : "not_analyzed",
"norms" : {
"enabled" : true
},
"analyzer" : "standard"
},
"popularity" : {
"type" : "integer"
}
}
}
}
have you missed the .value near doc['...']?
this works for me (i stored integers without mapping):
$ curl -XPUT localhost:9200/test/test/a -d '{"name":"steven", "popularity": 666}'
{"_index":"test","_type":"test","_id":"a","_version":1,"created":true}
$ curl -XPUT localhost:9200/test/test/b -d '{"name":"steven", "popularity": 42}'
{"_index":"test","_type":"test","_id":"b","_version":1,"created":true}
$ curl -XPOST localhost:9200/test/test/_search\?pretty -d '{ "query": { "custom_score": { "query": { "match_all": {}}, "script": "_score * doc[\"popularity\"].value" } } }'
{
"took" : 83,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 666.0,
"hits" : [ {
"_index" : "test",
"_type" : "test",
"_id" : "a",
"_score" : 666.0, "_source" : {"name":"steven", "popularity": 666}
}, {
"_index" : "test",
"_type" : "test",
"_id" : "b",
"_score" : 42.0, "_source" : {"name":"steven", "popularity": 42}
} ]
}
}

ElasticSearch doesn't seem to support array lookups

I currently have a fairly simple document stored in ElasticSearch that I generated with an integration test:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "unit-test_project600",
"_type" : "recordDefinition505",
"_id" : "400",
"_score" : 1.0, "_source" : {
"field900": "test string",
"field901": "500",
"field902": "2050-01-01T00:00:00",
"field903": [
"Open"
]
}
} ]
}
}
I would like to filter for specifically field903 and a value of "Open", so I perform the following query:
{
query: {
filtered: {
filter: {
term: {
field903: "Open",
}
}
}
}
}
This returns no results. However, I can use this with other fields and it will return the record:
{
query: {
filtered: {
filter: {
term: {
field901: "500",
}
}
}
}
}
It would appear that I'm unable to search in arrays with ElasticSearch. I have read a few instances of people with a similar problem, but none of them appear to have solved it. Surely this isn't a limitation of ElasticSearch?
I thought that it might be a mapping problem. Here's my mapping:
{
"unit-test_project600" : {
"recordDefinition505" : {
"properties" : {
"field900" : {
"type" : "string"
},
"field901" : {
"type" : "string"
},
"field902" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"field903" : {
"type" : "string"
}
}
}
}
}
However, the ElasticSearch docs indicate that there is no difference between a string or an array mapping, so I don't think I need to make any changes here.
Try searching for "open" rather than "Open." By default, Elasticsearch uses a standard analyzer when indexing fields. The standard analyzer uses a lowercase filter, as described in the example here. From my experience, Elasticsearch does search arrays.

Resources