Strange behavior of range query in Elasticsearch - elasticsearch

My question is pretty simple. I have an ES index which contains field updated that is a UNIX timestamp. I only have testing records (documents) in my index, which were all created today.
I have a following query, which works well and (righfully) doesn't return any results when executed:
GET /test_index/_search
{
"size": 1,
"query": {
"bool": {
"must": [
{
"range": {
"updated": {
"lt": "159525360"
}
}
}
]
}
},
"sort": [
{
"updated": {
"order": "desc",
"mode": "avg"
}
}
]
}
So this is all ok. However, when I change timestamp in my query to lower number, I am getting multiple results! And these results all contain much larger values in updated field than 5000! Even more bafflingly, I am getting results with updated only being set in range of 1971 to 9999. So numbers like 1500 or 10000 behave corectly and I see no results. Query behaving strangely is below.
GET /test_index/_search
{
"size": 100,
"query": {
"bool": {
"must": [
{
"range": {
"updated": {
"lt": "5000"
}
}
}
]
}
},
"sort": [
{
"updated": {
"order": "desc",
"mode": "avg"
}
}
]
}
Btw, this is how my typical document stored in this index looks like:
{
"_index" : "test_index",
"_type" : "_doc",
"_id" : "V6LDyHMBAUKhWZ7lxRtb",
"_score" : null,
"_source" : {
"councilId" : 111,
"chargerId" : "15",
"unitId" : "a",
"connectorId" : "2",
"status" : 10,
"latitude" : 77.7,
"longitude" : 77.7,
"lastStatusChange" : 1596718920,
"updated" : 1596720720,
"dataType" : "recorded"
},
"sort" : [
1596720720
]
}
Here is a mapping of this index:
PUT /test_index/_mapping
{
"properties": {
"chargerId": { "type": "text"},
"unitId": { "type": "text"},
"connectorId": { "type": "text"},
"councilId": { "type": "integer"},
"status": {"type": "integer"},
"longitude" : {"type": "double"},
"latitude" : {"type": "double"},
"lastStatusChange" : {"type": "date"},
"updated": {"type": "date"}
}
}
Is there any explanation for this?

The default format for a date field in ES is
strict_date_optional_time||epoch_millis. Since you haven't specified epoch_second, your dates were incorrectly parsed (treated as millis since epoch). It's verifiable by running this script:
GET test_index/_search
{
"script_fields": {
"updated_pretty": {
"script": {
"lang": "painless",
"source": """
LocalDateTime.ofInstant(
Instant.ofEpochMilli(doc['updated'].value.millis),
ZoneId.of('Europe/Vienna')
).format(DateTimeFormatter.ofPattern("dd/MM/yyyy HH:mm"))
"""
}
}
}
}
Quick fix: update your mapping as follows:
{
...
"updated":{
"type":"date",
"format":"epoch_second"
}
}
and reindex.

Related

Elastic search dynamic field mapping with range query on price field

I have two fields in my elastic search which is lowest_local_price and lowest_global_price.
I want to map dynamic value to third field price on run time based on local or global country.
If local country matched then i want to map lowest_local_price value to price field.
If global country matched then i want to map lowest_global_price value to price field.
If local or global country matched then i want to apply range query on the price field and boost that doc by 2.0.
Note : This is not compulsary filter or query, if matched then just want to boost the doc.
I have tried below solution but does not work for me.
Query 1:
$params["body"] = [
"runtime_mappings" => [
"price" => [
"type" => "double",
"script" => [
"source" => "if (params['_source']['country_en_name'] == '$country_name' ) { emit(params['_source']['lowest_local_price']); } else { emit( params['_source']['global_rates']['$country->id']['lowest_global_price']); }"
]
]
],
"query" => [
"bool" => [
"filter" => [
"range" => [ "price" => [ "gte" => $min_price]]
],
"boost" => 2.0
]
]
];
Query 2:
$params["body"] = [
"runtime_mappings" => [
"price" => [
"type" => "double",
"script" => [
"source" => "if (params['_source']['country_en_name'] == '$country_name' ) { emit(params['_source']['lowest_local_price']); } else { emit( params['_source']['global_rates']['$country->id']['lowest_global_price']); }"
]
]
],
"query" => [
"bool" => [
"filter" => [
"range" => [ "price" => [ "gte" => $min_price, "boost" => 2.0]]
],
]
]
];
None of them working for me, because it can boost the doc. I know filter does not work with boost, then what is the solution for dynamic field mapping with range query and boost?
Please help me to solve this query.
Thank you in advance!
You can (most likely) achieve what you want without runtime_mappings by using a combination of bool queries, here's how.
Let's define test mapping
We need to clarify what mapping we are working with, because different field types require different query types.
Let's assume that your mapping looks like this:
PUT my-index-000001
{
"mappings": {
"dynamic": "runtime",
"properties": {
"country_en_name": {
"type": "text"
},
"lowest_local_price": {
"type": "float"
},
"global_rates": {
"properties": {
"UK": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
},
"FR": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
},
"US": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
}
}
}
}
}
}
Note that country_en_name is of type text, in general such fields should be indexed as keyword but for the sake of demonstration of the use of runtime_mappings I kept it text and will show later how to overcome this limitation.
bool is the same as if for Elasticsearch
The query without runtime mappings might look like this:
POST my-index-000001/_search
{
"query": {
"bool": {
"should": [
{
"match_all": {}
},
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"match": {
"country_en_name": "UK"
}
},
{
"range": {
"lowest_local_price": {
"gte": 1000
}
}
}
]
}
},
{
"range": {
"global_rates.UK.lowest_global_price": {
"gte": 1000
}
}
}
],
"boost": 2
}
}
]
}
}
}
This can be interpreted as the following:
Any document
OR (
(document with country_en_name=UK AND lowest_local_price > X)
OR
(document with global_rates.UK.lowest_global_price > X)
)[boost this part of OR]
The match_all is needed to return also documents that do not match the other queries.
How will the response of the query look like?
Let's put some documents in the ES:
POST my-index-000001/_doc/1
{
"country_en_name": "UK",
"lowest_local_price": 1500,
"global_rates": {
"FR": {
"lowest_global_price": 1000
},
"US": {
"lowest_global_price": 1200
}
}
}
POST my-index-000001/_doc/2
{
"country_en_name": "FR",
"lowest_local_price": 900,
"global_rates": {
"UK": {
"lowest_global_price": 950
},
"US": {
"lowest_global_price": 1500
}
}
}
POST my-index-000001/_doc/3
{
"country_en_name": "US",
"lowest_local_price": 950,
"global_rates": {
"UK": {
"lowest_global_price": 1100
},
"FR": {
"lowest_global_price": 1000
}
}
}
Now the result of the search query above will be something like:
{
...
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 4.9616585,
"hits" : [
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "1",
"_score" : 4.9616585,
"_source" : {
"country_en_name" : "UK",
"lowest_local_price" : 1500,
...
}
},
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "3",
"_score" : 3.0,
"_source" : {
"country_en_name" : "US",
"lowest_local_price" : 950,
"global_rates" : {
"UK" : {
"lowest_global_price" : 1100
},
...
}
}
},
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"country_en_name" : "FR",
"lowest_local_price" : 900,
"global_rates" : {
"UK" : {
"lowest_global_price" : 950
},
...
}
}
}
]
}
}
Note that document with _id:2 is on the bottom because it didn't match any of the boosted queries.
Will runtime_mappings be of any use?
Runtime mappings are useful in case there's an existing mapping with data types that do not permit to execute a certain type of query. In previous versions (before 7.11) one would have to do a reindex in such cases, but now it is possible to use runtime mappings (but the query is more expensive).
In our case, we have got country_en_name indexed as text which is suited for full-text search and not for exact lookups. We should rather use keyword instead. This is how the query may look like with the help of runtime_mappings:
POST my-index-000001/_search
{
"runtime_mappings": {
"country_en_name_keyword": {
"type": "keyword",
"script": {
"source": "emit(params['_source']['country_en_name'])"
}
}
},
"query": {
"bool": {
"should": [
{
"match_all": {}
},
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"country_en_name_keyword": "UK"
}
},
{
"range": {
"lowest_local_price": {
"gte": 1000
}
}
}
]
}
},
{
"range": {
"global_rates.UK.lowest_global_price": {
"gte": 1000
}
}
}
],
"boost": 2
}
}
]
}
}
}
Notice how we created a new runtime field country_en_name_keyword with type keyword and used a term lookup instead of match query.

Elasticsearch filter by epoch_millis not work

I have index with mapping for property "key.lastEvent"
{
"mappings": {
"_doc": {
"properties": {
"key": {
"properties": {
"lastEvent": {
"type": "date"
My data looks like this:
"hits" : [
{
"_index" : "stat-index",
"_type" : "_doc",
"_id" : "07f8d7bc3c4846e359e3122c411619f4",
"_score" : 0.0,
"_source" : {
"id" : "07f8d7bc3c4846e359e3122c411619f4",
"timestamp" : "2021-12-08T00:00:00+03:00",
"key" : {
"lastEvent" : "2021-12-08T00:00:00+03:00",
"id" : "07f8d7bc3c4846e359e3122c411619f4"
},
"count" : 20
}
}
]
And I want to filter it like this (actually it's filter from grafana, so I can't adjust it):
GET stat-index/_search
{
"query": {
"bool": {
"filter": [
{
"range": {
"key.lastEvent": {
"gte": 1607288400000,
"lte": 1607461199000,
"format": "epoch_millis"
}
}
}
]
}
}
}
And it returns 0 hits. But if I use filter with another date format
GET stat-index/_search
{
"query": {
"bool": {
"filter": [
{
"range": {
"key.lastEvent": {
"gte": "2021-12-06T00:00:00.000Z",
"lte": "2021-12-08T00:00:00.000Z",
"format": "date_time"
}
}
}
]
}
}
}
It works as expected. So... It's a problem with my mapping? How can I force first variant to work?
There is no problem with your query. I think you wrote the wrong epoch value.
The epoch value of 1607288400000 in your query
is not 2021-12-06T00:00:00.000Z
but 2020-12-06T00:00:00.000Z.
The epoch value of 1607461199000 in your query
is not 2021-12-08T00:00:00.000Z
but 2020-12-08T00:00:00.000Z.

How to Query elasticsearch index with nested and non nested fields

I have an elastic search index with the following mapping:
PUT /student_detail
{
"mappings" : {
"properties" : {
"id" : { "type" : "long" },
"name" : { "type" : "text" },
"email" : { "type" : "text" },
"age" : { "type" : "text" },
"status" : { "type" : "text" },
"tests":{ "type" : "nested" }
}
}
}
Data stored is in form below:
{
"id": 123,
"name": "Schwarb",
"email": "abc#gmail.com",
"status": "current",
"age": 14,
"tests": [
{
"test_id": 587,
"test_score": 10
},
{
"test_id": 588,
"test_score": 6
}
]
}
I want to be able to query the students where name like '%warb%' AND email like '%gmail.com%' AND test with id 587 have score > 5 etc. The high level of what is needed can be put something like below, dont know what would be the actual query, apologize for this messy query below
GET developer_search/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "abc"
}
},
{
"nested": {
"path": "tests",
"query": {
"bool": {
"must": [
{
"term": {
"tests.test_id": IN [587]
}
},
{
"term": {
"tests.test_score": >= some value
}
}
]
}
}
}
}
]
}
}
}
The query must be flexible so that we can enter dynamic test Ids and their respective score filters along with the fields out of nested fields like age, name, status
Something like that?
GET student_detail/_search
{
"query": {
"bool": {
"must": [
{
"wildcard": {
"name": {
"value": "*warb*"
}
}
},
{
"wildcard": {
"email": {
"value": "*gmail.com*"
}
}
},
{
"nested": {
"path": "tests",
"query": {
"bool": {
"must": [
{
"term": {
"tests.test_id": 587
}
},
{
"range": {
"tests.test_score": {
"gte": 5
}
}
}
]
}
},
"inner_hits": {}
}
}
]
}
}
}
Inner hits is what you are looking for.
You must make use of Ngram Tokenizer as wildcard search must not be used for performance reasons and I wouldn't recommend using it.
Change your mapping to the below where you can create your own Analyzer which I've done in the below mapping.
How elasticsearch (albiet lucene) indexes a statement is, first it breaks the statement or paragraph into words or tokens, then indexes these words in the inverted index for that particular field. This process is called Analysis and that this would only be applicable on text datatype.
So now you only get the documents if these tokens are available in inverted index.
By default, standard analyzer would be applied. What I've done is I've created my own analyzer and used Ngram Tokenizer which would be creating many more tokens than just simply words.
Default Analyzer on Life is beautiful would be life, is, beautiful.
However using Ngrams, the tokens for Life would be lif, ife & life
Mapping:
PUT student_detail
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "ngram",
"min_gram": 3,
"max_gram": 4,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings" : {
"properties" : {
"id" : {
"type" : "long"
},
"name" : {
"type" : "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"email" : {
"type" : "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"age" : {
"type" : "text" <--- I am not sure why this is text. Change it to long or int. Would leave this to you
},
"status" : {
"type" : "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"tests":{
"type" : "nested"
}
}
}
}
Note that in the above mapping I've created a sibling field in the form of keyword for name, email and status as below:
"name":{
"type":"text",
"analyzer":"my_analyzer",
"fields":{
"keyword":{
"type":"keyword"
}
}
}
Now your query could be as simple as below.
Query:
POST student_detail/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "war" <---- Note this. This would even return documents having "Schwarb"
}
},
{
"match": {
"email": "gmail" <---- Note this
}
},
{
"nested": {
"path": "tests",
"query": {
"bool": {
"must": [
{
"term": {
"tests.test_id": 587
}
},
{
"range": {
"tests.test_score": {
"gte": 5
}
}
}
]
}
}
}
}
]
}
}
}
Note that for exact matches I would make use of Term Queries on keyword fields while for normal searches or LIKE in SQL I would make use of simple Match Queries on text Fields provided they make use of Ngram Tokenizer.
Also note that for >= and <= you would need to make use of Range Query.
Response:
{
"took" : 233,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 3.7260926,
"hits" : [
{
"_index" : "student_detail",
"_type" : "_doc",
"_id" : "1",
"_score" : 3.7260926,
"_source" : {
"id" : 123,
"name" : "Schwarb",
"email" : "abc#gmail.com",
"status" : "current",
"age" : 14,
"tests" : [
{
"test_id" : 587,
"test_score" : 10
},
{
"test_id" : 588,
"test_score" : 6
}
]
}
}
]
}
}
Note that I observe the document you've mentioned in your question, in my response when I run the query.
Please do read the links I've shared. It is vital that you understand the concepts. Hope this helps!

Elasticsearch query_string filter with Fields when not empty string

Im trying to build a query_string with elasticsearch DSL, my query is sql style is like this :
SELECT NAME,DESCRIPTION, URL, FACEBOOK_URL, YEAR_CREATION FROM MY_INDEX WHERE FACEBOOK_URL<>'' and ( Match('NAME: sometext OR DESCRIPTION: sometext )) AND YEAR_CREATION > 2000
I dont know how to include filter for no empty value for FACEBOOK_URL
Thanks for help...
It's very clear about #Kamal's point. You should examine the type of your "FACEBOOK" field, which must be keyword type but not text.
Please see the below mapping, sample documents, the request query and response.
Note that I may not have added all the fields but only the concerned fields so as to mirror the query you've added.
Mapping:
PUT facebook
{
"mappings": {
"properties": {
"name":{
"type": "text",
"fields": {
"keyword":{
"type":"keyword"
}
}
},
"description":{
"type": "text",
"fields": {
"keyword":{
"type":"keyword"
}
}
},
"facebook_url":{
"type": "keyword"
},
"year_creation":{
"type": "date"
}
}
}
}
Sample Docs:
In the below 4 documents, only the 3rd document mentioned would be something that you would want to be returned.
Docs 1 and 2 have empty values of facebook_url while doc 4 does not have the field in the first place at all.
POST facebook/_doc/1
{
"name": "sometext",
"description": "sometext",
"facebook_url": "",
"year_creation": "2019-01-01"
}
POST facebook/_doc/2
{
"name": "sometext",
"description": "sometext",
"facebook_url": "",
"year_creation": "2019-01-01"
}
POST facebook/_doc/3
{
"name" : "sometext",
"description" : "sometext",
"facebook_url" : "http://mytest.fb.link",
"year_creation" : "2019-01-01"
}
POST facebook/_doc/4
{
"name": "sometext",
"description": "sometext",
"year_creation": "2019-01-01"
}
Request Query:
POST facebook/_search
{
"_source": ["name", "description","facebook_url","year_creation"],
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"match": {
"name": "sometext"
}
},
{
"match": {
"description": "sometext"
}
}
]
}
},
{
"exists": {
"field": "facebook_url"
}
},
{
"range": {
"year_creation": {
"gte": "2000-01-01"
}
}
}
],
"must_not": [
{
"term": {
"facebook_url": {
"value": ""
}
}
}
]
}
}
}
I think the query would be self-explainable.
I have added Exists query so that if the document does not have that field, it would not be appearing the result, however for empty values I've added a clause in must_not.
Notice that in my design, I've used facebook_url as keyword type as it makes no sense to have it in text type. For that reason, I've used Term Query.
Also note that for date filtering, I've made use of Range Query. Do go through the links for more clarification as it is important to understand more on how each of these query works.
Response:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 2.148216,
"hits" : [
{
"_index" : "facebook",
"_type" : "_doc",
"_id" : "3",
"_score" : 2.148216,
"_source" : {
"facebook_url" : "http://mytest.fb.link",
"year_creation" : "2019-01-01",
"name" : "sometext",
"description" : "sometext"
}
}
]
}
}
Updated Answer:
Change the field of ANNEE_CREATION from integer to Date field as that is the correct type for the Date fields.
You have not applied range query on the date field based on your query in question.
Note that for must_not apply the logic on keyword field of facebook that you have and not on text field.
{
"query":{
"bool":{
"must":[
{
"query_string":{
"query":" Bordeaux",
"fields":[
"VILLE",
"ADRESSE",
"FACEBOOK"
]
}
},
{
"exists":{
"field":"FACEBOOK"
}
}
],
"must_not":[
{
"term":{
"FACEBOOK.keyword":{ <------ Make sure this is a keyword field
"value":""
}
}
}
],
"filter":[
{
"range":{
"FONDS_LEVEES_TOTAL":{
"gt":0
}
}
},
{
"range":{ <----- Apply the range query here based on what you've mentioned in question
"ANNEE_CREATION":{ <----- Make sure this is the date field
"gte": "2015" <----- Make sure you apply correct query parameter in range query
}
}
}
]
}
},
"track_total_hits":true,
"from":0,
"size":8,
"_source":[
"FACEBOOK",
"NOM",
"ANNEE_CREATION",
"FONDS_LEVEES_TOTAL"
]
}
As expected only the document having Id 3 is returned as result.

Analyzers in ElasticSearch not working

I am using ElasticSearch to store the Tweets I receive from the Twitter Streaming API. Before storing them I'd like to apply an english stemmer to the Tweet content, and to do that I'm trying to use ElasticSearch analyzers with no luck.
This is the current template I am using:
PUT _template/twitter
{
"template": "139*",
"settings" : {
"index":{
"analysis":{
"analyzer":{
"english":{
"type":"custom",
"tokenizer":"standard",
"filter":["lowercase", "en_stemmer", "stop_english", "asciifolding"]
}
},
"filter":{
"stop_english":{
"type":"stop",
"stopwords":["_english_"]
},
"en_stemmer" : {
"type" : "stemmer",
"name" : "english"
}
}
}
}
},
"mappings": {
"tweet": {
"_timestamp": {
"enabled": true,
"store": true,
"index": "analyzed"
},
"_index": {
"enabled": true,
"store": true,
"index": "analyzed"
},
"properties": {
"geo": {
"properties": {
"coordinates": {
"type": "geo_point"
}
}
},
"text": {
"type": "string",
"analyzer": "english"
}
}
}
}
}
When I start the Streaming and the index is created, all the mappings I've defined seem to apply correctly, but the text is stored as it comes from Twitter, completely raw. The index metadata shows:
"settings" : {
"index" : {
"uuid" : "xIOkEcoySAeZORr7pJeTNg",
"analysis" : {
"filter" : {
"en_stemmer" : {
"type" : "stemmer",
"name" : "english"
},
"stop_english" : {
"type" : "stop",
"stopwords" : [
"_english_"
]
}
},
"analyzer" : {
"english" : {
"type" : "custom",
"filter" : [
"lowercase",
"en_stemmer",
"stop_english",
"asciifolding"
],
"tokenizer" : "standard"
}
}
},
"number_of_replicas" : "1",
"number_of_shards" : "5",
"version" : {
"created" : "1010099"
}
}
},
"mappings" : {
"tweet" : {
[...]
"text" : {
"analyzer" : "english",
"type" : "string"
},
[...]
}
}
What am I doing wrong? The analyzers seems to be applied correctly, but nothing is happening :/
Thank you!
PS: The search query I use to realize the analyzer is not being applied:
curl -XGET 'http://localhost:9200/_all/_search?pretty' -d '{
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "_index:1397574496990"
}
}
]
}
},
"filter": {
"bool": {
"must": [
{
"match_all": {}
},
{
"exists": {
"field": "geo.coordinates"
}
}
]
}
}
}
},
"fields": [
"geo.coordinates",
"text"
],
"size": 50000
}'
This should return the stemmed text as one of the fields, but the response is:
{
"took": 29,
"timed_out": false,
"_shards": {
"total": 47,
"successful": 47,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.97402453,
"hits": [
{
"_index": "1397574496990",
"_type": "tweet",
"_id": "456086643423068161",
"_score": 0.97402453,
"fields": {
"geo.coordinates": [
-118.21122533,
33.79349318
],
"text": [
"Happy turtle Tuesday ! The week is slowly crawling to Wednesday good morning everyone 🌊🐢🐢🐢☀️#turtles… http://t.co/wAVmcxnf76"
]
}
},
{
"_index": "1397574496990",
"_type": "tweet",
"_id": "456086701451259904",
"_score": 0.97333175,
"fields": {
"geo.coordinates": [
-81.017636,
33.998741
],
"text": [
"Tuesday is Twins Day over here, apparently (it's a far too often occurrence) #tuesdaytwinsday… http://t.co/Umhtp6SoX6"
]
}
}
]
}
}
The text field is exactly the same that came from Twitter (I'm using the streaming API). What I expect is the text fields stemmed, as the analyzer is applied.
Analyzers don't affect the way data is stored. So, no matter which analyzer you are using you will get the same text back from source and stored fields. Analyzer are applied when you search. So by searching for something like text:twin and finding records with the word Twins, you will know that stemmer was applied.

Resources