range query not working as intended [elasticsearch] - elasticsearch

I am executing a simple range query. But I see that an empty result being returned. But I know that they are many records/documents that satisfy the query.
Below are the 3 types of queries I have tried.
(the third one is intended query)
1)
"query": {
"range" : {
"endTime" : {
"gte" : 1559076400.0
}
}
}
2)
"query": {
"bool": {
"must": [
{"range" : {
"endTime" : {
"gte" : 1559076401.0
}
}
}
]
}
}
3)
"query": {
"bool": {
"filter": [
{"range" : {
"startTime" : {
"gt" : 1356873300.0
}
}
},
{"range" : {
"endTime" : {
"gte" : 1559076401.0
}
}
}
]
}
All 3 queries return an empty response.
Hope you people can help. Thank you.

In elastic index, before inserting data, you you need define the fields mappings as date or numbers so that range searches can be applied.
Or keep dynamic mappings ON so that elastic can identify the field types automatically based on inserted data.
In case of latter, do check the auto generated mappings on your index.
Also check the date/timestamp format.
Steps to check mappings
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-get-mapping.html

Since you are using epoch time, you need to mention that in the mapping. This is what I did. Basically the mapping and the way you stored the data mattered here. I am not sure if we can save any format as we want and query using any format we want. I will do some more research and update the answer if that can be done
1) created the mapping -- to show how the endTime mapping is done
2) inserting a few sample documents
3) queried the document using epoch time -- the way you wanted
Mapping
PUT so_test24
{
"mappings" : {
"_doc" : {
"properties" : {
"id" : {
"type" : "long"
},
"endTime" : {
"type" : "date",
"format": "epoch_millis"
}
}
}
}
}
Inserting the documents
POST /so_test24/_doc
{
"id": 1,
"endTime": "1546300800"
}
POST /so_test24/_doc
{
"id": 2,
"endTime": "1514764800"
}
POST /so_test24/_doc
{
"id": 3,
"endTime": "1527811200"
}
POST /so_test24/_doc
{
"id": 4,
"endTime": "1535760000"
}
The search Query
GET /so_test24/_search
{
"query": {
"range": {
"endTime": {"gte": "1532883892"}
}
}
}
The result
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [
{
"_index" : "so_test24",
"_type" : "_doc",
"_id" : "uFIq42sB4TH56W1h-jGu",
"_score" : 1.0,
"_source" : {
"id" : 1,
"endTime" : "1546300800"
}
},
{
"_index" : "so_test24",
"_type" : "_doc",
"_id" : "u1Iq42sB4TH56W1h-zEK",
"_score" : 1.0,
"_source" : {
"id" : 4,
"endTime" : "1535760000"
}
}
]
}
}

Related

How to perform nested queries on Elasticsearch?

I was trying to perform nested query on elastic-search that is, I have 2 queries in which the output of the first query must be used as an input in the second query, was going through the documentation of elastic-search but couldn't find any alternative.
The first query is:
GET index1/_search
{
"query": {
"query_string": {
"query": "(imageName: xyz.jpg)"
}
}
}
The output of this query would be of JSON format,
For example:
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 2.2682955,
"hits" : [
{
"_index" : "index1",
"_type" : "_doc",
"_id" : "1",
"_score" : 2.2682955,
"_source" : {
"assetId" : "0",
"descriptor" : "randomString",
"bucketId" : [randomArray],
"imageName" : "xyz.jpg"
}
}
]
}
}
The second query is:
GET index2/_search
{
"query": {
"function_score": {
"boost_mode": "replace",
"query": {
"constant_score": {
"filter": {
"terms": {
"bucketId": [randomArray that came as an output of the first query]
}
}
}
},
"pqcode_score": {
"descriptors": [
{
"descriptor": "randomString that came as an output of the first query"
}
]
}
}
}
}
How can we use the output of the first query inside the second query?
Can anyone help me in this regard?
It is not possible in Elasticsearch. You need to implement this at your application side.
You can call first query and get result then you can call the second query by passing the output of first query that is the only option.

document_missing_exception while performing ElasticSearch update

I went through several questions with the same "document_missing_exception" problem but looks like they aren't the same problem in my case. I can query the document, but failed when I tried to updated it.
My query:
# search AuthEvent by sessionID
GET events-*/_search
{
"size": "100",
"query": {
"bool": {
"must": [{
"term": {
"type": {
"value": "AuthEvent"
}
}
},
{
"term": {
"client.sessionID.raw": {
"value": "067d660a1504Y67FOuiiRIEkVNG8uYIlnK87liuZGLBcSmEW0aHoDXAHfu"
}
}
}
]
}
}
}
Query result:
{
"took" : 18,
"timed_out" : false,
"_shards" : {
"total" : 76,
"successful" : 76,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 6.705622,
"hits" : [
{
"_index" : "events-2020.10.06",
"_type" : "doc",
"_id" : "2c675295b27a225ce243d2f13701b14222074eaf",
"_score" : 6.705622,
"_routing" : "067d660a1504Y67FOuiiRIEkVNG8uYIlnK87liuZGLBcSmEW0aHoDXAHfu",
"_source" : {
# some data
}
}
]
}
}
Update request:
POST events-2020.10.06/_doc/2c675295b27a225ce243d2f13701b14222074eaf/_update
{
"doc" : {
"custom" : {
"testField" : "testData"
}
}
}
And update result:
{
"error" : {
"root_cause" : [
{
"type" : "document_missing_exception",
"reason" : "[_doc][2c675295b27a225ce243d2f13701b14222074eaf]: document missing",
"index_uuid" : "5zhQy6W6RnWscDz7Av4_bA",
"shard" : "1",
"index" : "events-2020.10.06"
}
],
"type" : "document_missing_exception",
"reason" : "[_doc][2c675295b27a225ce243d2f13701b14222074eaf]: document missing",
"index_uuid" : "5zhQy6W6RnWscDz7Av4_bA",
"shard" : "1",
"index" : "events-2020.10.06"
},
"status" : 404
}
I'm quite new to ElasticSearch and couldn't find any reason for such behaviour. I use ElasticSearch 6.7.1 oss version + Kibana for operating with data. I also tried with bulk update but ended with same error.
As you can see in the query results, your document has been indexed with a routing value and you're missing it in your update request.
Try this instead:
POST events-2020.10.06/_doc/2c675295b27a225ce243d2f13701b14222074eaf/_update?routing=067d660a1504Y67FOuiiRIEkVNG8uYIlnK87liuZGLBcSmEW0aHoDXAHfu
{
"doc" : {
"custom" : {
"testField" : "testData"
}
}
}
If a document is indexed with a routing value, all subsequent get, update and delete operations need to happen with that routing value as well.

Elasticsearch returns 0.0 for metrics sum aggregation

Elasticsearch returns 0.0 for metrics sum aggregation. Expected output will be some of metric probe_http_duration_seconds.
Elasticsearch version: 7.1.1
Query used for aggregation:
GET some_metric/_search
{
"query": {
"bool": {
"must": [
{
"range": { "time": { "gte" : "now-1m", "lt": "now" } }
},
{
"match": {"name": "probe_http_duration_seconds"}
},
{
"match": {"labels.instance": "some-instance"}
}
]
}
},
"aggs" : {
"sum_is" : { "sum": { "field" : "value" } }
}
}
The above query returns for documents followed by:
"aggregations" : {
"sum_is" : {
"value" : 0.0
}
Each document in the index looks like:
{
"_index" : "some_metric-2019.12.03-000004",
"_type" : "_doc",
"_id" : "_wCjz24Bk6FPpmW1lC31",
"_score" : 5.3475914,
"_source" : {
"name" : "probe_http_duration_seconds",
"time" : 1575441630181,
"value" : 0,
"labels" : {
"__name__" : "probe_http_duration_seconds",
"app" : "some-events",
"i" : "some_metric",
"instance" : "some-instance",
"job" : "someproject-k8s-service",
"kubernetes_name" : "some-events",
"kubernetes_namespace" : "deploytest",
"phase" : "connect",
"t" : "type",
"v" : "1"
}
}
}
In query on changing must to should, I get:
"aggregations" : {
"sum_is" : {
"value" : 1.5389155527088604E16
}
}
The index dynamic mapping looks something like this:
"mappings" : {
"dynamic_templates" : [
{
"strings" : {
"unmatch" : "*seconds*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "keyword"
}
}
},
{
"to_float" : {
"match" : "*seconds*",
"mapping" : {
"type" : "float"
}
}
}
],
However in our requirement, we need results matching all of the clauses in the query.
For metrics aggregation elasticsearch converts everything to double, still this doesn't explain result as zero.
Any pointers will be helpful. Thanks for attention.
NOTE: I see that in example document, value field is zero. Maybe while drafting/editing I made a mistake.
Below is the result of past 2 mins. This shows value field is actually float.
Query:
GET some_metric/_search?size=3
{
"_source": ["value"],
"query": {
"bool": {
"must": [
{
"range": { "time": { "gte" : "now-2m", "lt": "now" } }
},
{
"match": {"name": "probe_http_duration_seconds"}
},
{
"match": {"labels.instance": "some-instance"}
}
]
}
}
}
Result:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : 14.551308,
"hits" : [
{
"_index" : "some_metric-2019.12.04-000005",
"_type" : "_doc",
"_id" : "7oog0G4Bk6EPplW1ibD1",
"_score" : 14.551308,
"_source" : {
"value" : 0.040022423
}
},
{
"_index" : "some_metric-2019.12.04-000005",
"_type" : "_doc",
"_id" : "74og0G4Bk6EPplW1ibD1",
"_score" : 14.551308,
"_source" : {
"value" : 3.734E-5
}
},
{
"_index" : "some_metric-2019.12.04-000005",
"_type" : "_doc",
"_id" : "A4og0G4Bk6EPplW1ibH1",
"_score" : 14.551308,
"_source" : {
"value" : 0.015694122
}
}
]
}
}
What you see is just what you indexed in the source document. ES will never modify your source document. However, since the type is long as I thought then it will index that float value as a long and not as a float.
This usually happens when the very first document to be indexed has an integer value, such as 0, for instance.
You can either reindex your data with the proper mapping... Or since you have time-based indexes, just modify the dynamic template and tomorrow's index will be created correctly.

Aggregation on .keyword to return only the keys that contain a specific string

New to aggregations in elasticsearch. Using 7.2. I am trying to write an aggregation on Tree.keyword to only return the count of documents that have a key that contains the word "Branch". I have tried sub aggregations, bucket_selector (which doesnt work for key strings) and scripts. Anyone have any ideas or suggestions on how to approach this?
Mapping:
{
"testindex" : {
"mappings" : {
"properties" : {
"Tree" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
}
}
}
}
}
Example Query that returns all the keys but what I need to do is limit to only return keys with "Branch" or better yet just the count of how many "Branch" keys there are:
GET testindex/_search
{
"aggs": {
"bucket": {
"terms": {
"field": "Tree.keyword"
}
}
}
}
Returns:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "testindex",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"Tree" : [
"Car:76",
"Branch:yellow",
"Car:one",
"Branch:blue"
]
}
}
]
},
"aggregations" : {
"bucket" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Car:76",
"doc_count" : 1
},
{
"key" : "Branch:yellow",
"doc_count" : 1
},
{
"key" : "Car:one",
"doc_count" : 1
},
{
"key" : "Branch:blue",
"doc_count" : 1
}
]
}
}
}
You have to add includes for limit result. Here's the code sample and hopefully this should help you.
GET testindex/_search
{
"_source": {
"includes": [
"Branch"
]
},
"aggs": {
"bucket": {
"terms": {
"field": "Tree.keyword"
}
}
}
}
It is possible to filter the values for which buckets will be created. This can be done using the include and exclude parameters which are based on regular expression strings or arrays of exact values. Additionally, include clauses that can filter using partition expressions.
For your case, it should be like this,
GET testindex/_search
{
"aggs": {
"bucket": {
"terms": {
"field": "Tree.keyword",
"include": "Branch:*"
}
}
}
}
Thanks for all the help! Unfortunately, none of those solutions worked for me. I ended up using a script to return all the branches and then setting everything else into a new key. Then used a bucket script to subtract 1 in Total_Buckets. Probably a better solution out there but hopefully it helps someone
GET testindex/_search
{
"aggs": {
"bucket": {
"cardinality": {
"field": "Tree.keyword",
"script": {
"lang": "painless",
"source": "if(_value.contains('Branches:')) { return _value} return 1;"
}
}
},
"Total_Branches": {
"bucket_script": {
"buckets_path": {
"my_var1": "bucket.value"
},
"script": "return params.my_var1-1"
}
}
}
}

Returning the timestamp field in elasticsearch

Why can I not see the _timestamp field while being able to filter a query by it?
The following query return the correct documents, but not the timestamp itself. How can I return the timestamp?
{
"fields": [
"_timestamp",
"_source"
],
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"range": {
"_timestamp": {
"from": "2013-01-01"
}
}
}
}
}
}
The mapping is:
{
"my_doctype": {
"_timestamp": {
"enabled": "true"
},
"properties": {
"cards": {
"type": "integer"
}
}
}
}
sample output:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test1",
"_type" : "doctype1",
"_id" : "HjfryYQEQL6RkEX3VOiBHQ",
"_score" : 1.0, "_source" : {"cards": "5"}
}, {
"_index" : "test1",
"_type" : "doctype1",
"_id" : "sDyHcT1BTMatjmUS0NSoEg",
"_score" : 1.0, "_source" : {"cards": "2"}
}]
}
When timestamp field is enabled, it's indexed but not stored by default. So, while you can search and filter by the timestamp field, you cannot easily retrieve it with your records. In order to be able to retrieve the timestamp field you need to recreate your index with the following mapping:
{
"my_doctype": {
"_timestamp": {
"enabled": "true",
"store": "yes"
},
"properties": {
...
}
}
}
This way you will be able to retrieve timestamp as the number of milliseconds since the epoch.
It is not necessary to store the timestamp field, since its exact value is preserved as a term, which is also more likely to already be present in RAM, especially if you are querying on it. You can access the timestamp via its term using a script_value:
{
"query": {
...
},
"script_fields": {
"timestamp": {
"script": "_doc['_timestamp'].value"
}
}
}
The resulting value is expressed in miliseconds since UNIX epoch. It's quite obscene that ElasticSearch can't do this for you, but hey, nothing's perfect.

Resources