Elastic query by time for the current day - elasticsearch

I have a query which gets all the data for the dates that I set. This query works very well so far.
When I remove the date and change the format I thought that elastic will take the current day but don't.
**GET /BLA*/_search**
{
"size" : 1,
"sort" : "#timestamp",
"_source" : ["#timestamp", "details"],
"query" : {
"bool" : {
"must" : [
{"term" : {"FIELD" : "TRUC"}},
{"regexp" : { "details" : ".*TRUC.*" }},
{"range": {
"#timestamp": {
"format" : "yyyy-MM-dd HH:mm",
"from": "2020-05-05 07:00",
"to": "2020-05-05 07:30",
"time_zone" : "Europe/Paris"
}
}}
]
}
}
}
I tried to remove the date and change the format but it doesn't work.
an idea? The goal is to take the data for the current day.
**GET /BLA*/_search**
{
"size" : 1,
"sort" : "#timestamp",
"_source" : ["#timestamp", "details"],
"query" : {
"bool" : {
"must" : [
{"term" : {"FIELD" : "TRUC"}},
{"regexp" : { "details" : ".*TRUC.*" }},
{"range": {
"#timestamp": {
"format" : "HH:mm",
"from": "07:00",
"to": "07:30",
"time_zone" : "Europe/Paris"
}
}}
]
}
}
}

TL;DR not possible. The only way I know of to work w/ relative datetimes in range queries is the following:
{
"size": 1,
"sort": "#timestamp",
"_source": [
"#timestamp",
"details"
],
"query": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": "now-2h",
"lte": "now+3h",
"time_zone": "Europe/Paris"
}
}
}
]
}
}
}
but it's all relative to now.
Here's the explanation of why your query doesn't seem to work.
Let's set up an index whose #timestamp will be of the format HH:mm:
PUT my_index
{
"mappings": {
"properties": {
"#timestamp": {
"type": "date",
"format": "HH:mm"
}
}
}
}
then ingest a doc
POST my_index/_doc
{
"#timestamp": "07:00"
}
all nice and dandy. Now let's investigate what the actual indexed date is:
GET my_index/_search
{
"script_fields": {
"ts_full": {
"script": {
"source": """
LocalDateTime.ofInstant(
Instant.ofEpochMilli(doc['#timestamp'].value.millis),
ZoneId.of('Europe/Paris')
).format(DateTimeFormatter.ofPattern("dd/MM/yyyy HH:mm:ss"))
"""
}
}
}
}
which yields
"01/01/1970 08:00:00"
since Paris is UTC+1.
So summing up, you can surely index & search by 'dates' (strictly speaking 'times') of the format HH:mm but they're all going to pertain to Jan 1 1970.

Related

How can I convert the date field in aggregate response to a different timezone?

I am querying aggregation from elasticsearch like below. There is a date type field timestampUtc in the document which saves the utc time. In the query, I am using an input parameter with an offset like 2022-06-01T00:00:00+10:00 and use calendar_interval to support the different timezone. I believe Elasticsearch converts the time 2022-06-01T00:00:00+10:00 into utc time internally when compare with the value saved in the document.
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"range": {
"timestampUtc": {
"gte": "2022-06-01T00:00:00+10:00",
"lte": "2022-07-05T23:59:59+10:00"
}
}
}
]
}
}
}
},
"aggs": {
"value": {
"date_histogram": {
"field": "timestampUtc",
"calendar_interval": "day"
},
"aggs": {
"amount": {
"sum": {
"field": "amount.value"
}
}
}
}
}
The response from above query is:
"buckets" : [
{
"key_as_string" : "2022-05-31T00:00:00.000Z",
"key" : 1653955200000,
"doc_count" : 897,
"amount" : {
"value" : 4.3873789E7
}
},
{
"key_as_string" : "2022-06-01T00:00:00.000Z",
"key" : 1654041600000,
"doc_count" : 1395,
"amount" : {
"value" : 5.6002755E7
}
},
As you can see that in key_as_string field, it shows utc format which is from the value saved in the db. The question is how I can make the date format in the response same as the one I put in the query.

Elastic search dynamic field mapping with range query on price field

I have two fields in my elastic search which is lowest_local_price and lowest_global_price.
I want to map dynamic value to third field price on run time based on local or global country.
If local country matched then i want to map lowest_local_price value to price field.
If global country matched then i want to map lowest_global_price value to price field.
If local or global country matched then i want to apply range query on the price field and boost that doc by 2.0.
Note : This is not compulsary filter or query, if matched then just want to boost the doc.
I have tried below solution but does not work for me.
Query 1:
$params["body"] = [
"runtime_mappings" => [
"price" => [
"type" => "double",
"script" => [
"source" => "if (params['_source']['country_en_name'] == '$country_name' ) { emit(params['_source']['lowest_local_price']); } else { emit( params['_source']['global_rates']['$country->id']['lowest_global_price']); }"
]
]
],
"query" => [
"bool" => [
"filter" => [
"range" => [ "price" => [ "gte" => $min_price]]
],
"boost" => 2.0
]
]
];
Query 2:
$params["body"] = [
"runtime_mappings" => [
"price" => [
"type" => "double",
"script" => [
"source" => "if (params['_source']['country_en_name'] == '$country_name' ) { emit(params['_source']['lowest_local_price']); } else { emit( params['_source']['global_rates']['$country->id']['lowest_global_price']); }"
]
]
],
"query" => [
"bool" => [
"filter" => [
"range" => [ "price" => [ "gte" => $min_price, "boost" => 2.0]]
],
]
]
];
None of them working for me, because it can boost the doc. I know filter does not work with boost, then what is the solution for dynamic field mapping with range query and boost?
Please help me to solve this query.
Thank you in advance!
You can (most likely) achieve what you want without runtime_mappings by using a combination of bool queries, here's how.
Let's define test mapping
We need to clarify what mapping we are working with, because different field types require different query types.
Let's assume that your mapping looks like this:
PUT my-index-000001
{
"mappings": {
"dynamic": "runtime",
"properties": {
"country_en_name": {
"type": "text"
},
"lowest_local_price": {
"type": "float"
},
"global_rates": {
"properties": {
"UK": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
},
"FR": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
},
"US": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
}
}
}
}
}
}
Note that country_en_name is of type text, in general such fields should be indexed as keyword but for the sake of demonstration of the use of runtime_mappings I kept it text and will show later how to overcome this limitation.
bool is the same as if for Elasticsearch
The query without runtime mappings might look like this:
POST my-index-000001/_search
{
"query": {
"bool": {
"should": [
{
"match_all": {}
},
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"match": {
"country_en_name": "UK"
}
},
{
"range": {
"lowest_local_price": {
"gte": 1000
}
}
}
]
}
},
{
"range": {
"global_rates.UK.lowest_global_price": {
"gte": 1000
}
}
}
],
"boost": 2
}
}
]
}
}
}
This can be interpreted as the following:
Any document
OR (
(document with country_en_name=UK AND lowest_local_price > X)
OR
(document with global_rates.UK.lowest_global_price > X)
)[boost this part of OR]
The match_all is needed to return also documents that do not match the other queries.
How will the response of the query look like?
Let's put some documents in the ES:
POST my-index-000001/_doc/1
{
"country_en_name": "UK",
"lowest_local_price": 1500,
"global_rates": {
"FR": {
"lowest_global_price": 1000
},
"US": {
"lowest_global_price": 1200
}
}
}
POST my-index-000001/_doc/2
{
"country_en_name": "FR",
"lowest_local_price": 900,
"global_rates": {
"UK": {
"lowest_global_price": 950
},
"US": {
"lowest_global_price": 1500
}
}
}
POST my-index-000001/_doc/3
{
"country_en_name": "US",
"lowest_local_price": 950,
"global_rates": {
"UK": {
"lowest_global_price": 1100
},
"FR": {
"lowest_global_price": 1000
}
}
}
Now the result of the search query above will be something like:
{
...
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 4.9616585,
"hits" : [
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "1",
"_score" : 4.9616585,
"_source" : {
"country_en_name" : "UK",
"lowest_local_price" : 1500,
...
}
},
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "3",
"_score" : 3.0,
"_source" : {
"country_en_name" : "US",
"lowest_local_price" : 950,
"global_rates" : {
"UK" : {
"lowest_global_price" : 1100
},
...
}
}
},
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"country_en_name" : "FR",
"lowest_local_price" : 900,
"global_rates" : {
"UK" : {
"lowest_global_price" : 950
},
...
}
}
}
]
}
}
Note that document with _id:2 is on the bottom because it didn't match any of the boosted queries.
Will runtime_mappings be of any use?
Runtime mappings are useful in case there's an existing mapping with data types that do not permit to execute a certain type of query. In previous versions (before 7.11) one would have to do a reindex in such cases, but now it is possible to use runtime mappings (but the query is more expensive).
In our case, we have got country_en_name indexed as text which is suited for full-text search and not for exact lookups. We should rather use keyword instead. This is how the query may look like with the help of runtime_mappings:
POST my-index-000001/_search
{
"runtime_mappings": {
"country_en_name_keyword": {
"type": "keyword",
"script": {
"source": "emit(params['_source']['country_en_name'])"
}
}
},
"query": {
"bool": {
"should": [
{
"match_all": {}
},
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"country_en_name_keyword": "UK"
}
},
{
"range": {
"lowest_local_price": {
"gte": 1000
}
}
}
]
}
},
{
"range": {
"global_rates.UK.lowest_global_price": {
"gte": 1000
}
}
}
],
"boost": 2
}
}
]
}
}
}
Notice how we created a new runtime field country_en_name_keyword with type keyword and used a term lookup instead of match query.

Elasticsearch filter by epoch_millis not work

I have index with mapping for property "key.lastEvent"
{
"mappings": {
"_doc": {
"properties": {
"key": {
"properties": {
"lastEvent": {
"type": "date"
My data looks like this:
"hits" : [
{
"_index" : "stat-index",
"_type" : "_doc",
"_id" : "07f8d7bc3c4846e359e3122c411619f4",
"_score" : 0.0,
"_source" : {
"id" : "07f8d7bc3c4846e359e3122c411619f4",
"timestamp" : "2021-12-08T00:00:00+03:00",
"key" : {
"lastEvent" : "2021-12-08T00:00:00+03:00",
"id" : "07f8d7bc3c4846e359e3122c411619f4"
},
"count" : 20
}
}
]
And I want to filter it like this (actually it's filter from grafana, so I can't adjust it):
GET stat-index/_search
{
"query": {
"bool": {
"filter": [
{
"range": {
"key.lastEvent": {
"gte": 1607288400000,
"lte": 1607461199000,
"format": "epoch_millis"
}
}
}
]
}
}
}
And it returns 0 hits. But if I use filter with another date format
GET stat-index/_search
{
"query": {
"bool": {
"filter": [
{
"range": {
"key.lastEvent": {
"gte": "2021-12-06T00:00:00.000Z",
"lte": "2021-12-08T00:00:00.000Z",
"format": "date_time"
}
}
}
]
}
}
}
It works as expected. So... It's a problem with my mapping? How can I force first variant to work?
There is no problem with your query. I think you wrote the wrong epoch value.
The epoch value of 1607288400000 in your query
is not 2021-12-06T00:00:00.000Z
but 2020-12-06T00:00:00.000Z.
The epoch value of 1607461199000 in your query
is not 2021-12-08T00:00:00.000Z
but 2020-12-08T00:00:00.000Z.

I need to get average document count by date in elasticsearch

I want to get average document count by date without getting the whole bunch of buckets data and get average value by hand cause there are years of data and when I group by the date I get too_many_buckets_exception.
So my current query is
{
"query": {
"bool": {
"must": [],
"filter": []
}
},
"aggs": {
"groupByChannle": {
"terms": {
"field": "channel"
},
"aggs": {
"docs_per_day": {
"date_histogram": {
"field": "message_date",
"fixed_interval": "1d"
}
}
}
}
}
}
How can I get an average doc count grouped by message_date(day) and channel without taking buckets array of this data
"buckets" : [
{
"key_as_string" : "2018-03-17 00:00:00",
"key" : 1521244800000,
"doc_count" : 4027
},
{
"key_as_string" : "2018-03-18 00:00:00",
"key" : 1521331200000,
"doc_count" : 10133
},
...thousands of rows
]
my index structure looks like this
"mappings" : {
"properties" : {
"channel" : {
"type" : "keyword"
},
"message" : {
"type" : "text"
},
"message_date" : {
"type" : "date",
"format" : "yyyy-MM-dd HH:mm:ss"
},
}
}
By this query, I want to get JUST A AVERAGE DOC COUNT BY DATE and nothing else
"avg_count": {
"avg_bucket": {
"buckets_path": "docs_per_day>_count"
}
}
after docs_per_day ending this.
avg_count provides average count.
_count refers the bucket count
I think, that you can use stats aggregation with the script :
{
"size": 0,
"aggs": {
"term": {
"terms": {
"field": "chanel"
},
"aggs": {
"stats": {
"stats": {
"field": "message_date"
}
},
"result": {
"bucket_script": {
"buckets_path": {
"max" : "stats.max",
"min" : "stats.min",
"count" : "stats.count"
},
"script": "params.count/(params.max - params.min)/1000/86400)"
}
}
}
}
}
}

ElasticSearch Query aggregations per #timestamp hour

im making a Query on elasticSearch, of metricbeat, to rate the most used process per hourly, in these moment i'm aggregating per process start time, and process name, i need to "divide" these groups using field "#timestamp" hourly
that's my actual query
GET metricbeat*/_search?
{"query": {
"bool": {
"must": [
{ "wildcard" : { "beat.hostname" : "ibmcx*" }},
{ "range": {
"#timestamp": {
"gte": "2019-03-22T00:00:00",
"lte": "2019-03-23T00:00:00"}}},
{"terms" : { "beat.hostname" : ["ibmcxapp101", "ibmcxapp102", "ibmcxapp103",
"ibmcxapp104", "ibmcxapp105", "ibmcxapp106", "ibmcxapp107",
"ibmcxapp108", "ibmcxapp109", "ibmcxapp110", "ibmcxapp111",
"ibmcxapp112", "ibmcxapp113", "ibmcxapp114", "ibmcxapp115",
"ibmcxapp116", "ibmcxapp117", "ibmcxapp118", "ibmcxapp119",
"ibmcxapp120", "ibmcxapp121", "ibmcxapp122", "ibmcxxaa100",
"ibmcxxaa101", "ibmcxxaa102", "ibmcxxaa103", "ibmcxxaa104",
"ibmcxxaa105", "ibmcxxaa106", "ibmcxxaa107", "ibmcxxaa108",
"ibmcxxaa109", "ibmcxxaa110", "ibmcxxaa111", "ibmcxxaa112",
"ibmcxxaa201", "ibmcxxaa202", "ibmcxxaa203", "ibmcxxaa204"
] }},
{"exists": {"field": "system.process.cmdline"}}
],
"must_not": [
{"term" : { "system.process.username" : "NT AUTHORITY\\SYSTEM" }},
{"term" : { "system.process.username" : "NT AUTHORITY\\NETWORK SERVICE" }},
{"term" : { "system.process.username" : "NT AUTHORITY\\LOCAL SERVICE" }},
{"term" : { "system.process.username" : "NT AUTHORITY\\Servicio de red"}},
{"term" : { "system.process.username" : "" }}
]
}
},
"size": 0,
"aggs": {
"group_by_start_time": {
"terms": {
"field": "system.process.cpu.start_time"
},
"aggs": {
"group_by_name": {
"terms": {
"field": "system.process.name.keyword"
}
}
}
}
},
"size": 0,
"sort" : [
{ "system.process.cpu.start_time" : {"order" : "asc"}},
{ "#timestamp" : {"order" : "asc"}},
{ "system.process.pid" : {"order" : "desc"}}
]}
It's a bit hard to follow and reproduce — a minimal example (I think the entire query is not really needed) and sample docs would go a long way.
If you want to have an hourly aggregation, the first thing you'll need to do is that aggregation and then run the others inside.
The minimal example for an hourly aggregation would be:
POST /metricbeat*/_search?size=0
{
"aggs" : {
"metrics_per_hour" : {
"date_histogram" : {
"field" : "#timestamp",
"interval" : "hour"
}
}
}
}
Folding in the other aggregation would look like this:
POST /metricbeat*/_search?size=0
{
"aggs" : {
"metrics_per_hour" : {
"date_histogram" : {
"field" : "#timestamp",
"interval" : "hour"
},
"aggs" : {
...
}
}
}
}
PS: If you are using a daily index pattern, you could just use the right day instead of the wildcard one and then skip this part of the query:
"range": {
"#timestamp": {
"gte": "2019-03-22T00:00:00",
"lte": "2019-03-23T00:00:00"
}
}

Resources