ElasticSearch Query aggregations per #timestamp hour - elasticsearch

im making a Query on elasticSearch, of metricbeat, to rate the most used process per hourly, in these moment i'm aggregating per process start time, and process name, i need to "divide" these groups using field "#timestamp" hourly
that's my actual query
GET metricbeat*/_search?
{"query": {
"bool": {
"must": [
{ "wildcard" : { "beat.hostname" : "ibmcx*" }},
{ "range": {
"#timestamp": {
"gte": "2019-03-22T00:00:00",
"lte": "2019-03-23T00:00:00"}}},
{"terms" : { "beat.hostname" : ["ibmcxapp101", "ibmcxapp102", "ibmcxapp103",
"ibmcxapp104", "ibmcxapp105", "ibmcxapp106", "ibmcxapp107",
"ibmcxapp108", "ibmcxapp109", "ibmcxapp110", "ibmcxapp111",
"ibmcxapp112", "ibmcxapp113", "ibmcxapp114", "ibmcxapp115",
"ibmcxapp116", "ibmcxapp117", "ibmcxapp118", "ibmcxapp119",
"ibmcxapp120", "ibmcxapp121", "ibmcxapp122", "ibmcxxaa100",
"ibmcxxaa101", "ibmcxxaa102", "ibmcxxaa103", "ibmcxxaa104",
"ibmcxxaa105", "ibmcxxaa106", "ibmcxxaa107", "ibmcxxaa108",
"ibmcxxaa109", "ibmcxxaa110", "ibmcxxaa111", "ibmcxxaa112",
"ibmcxxaa201", "ibmcxxaa202", "ibmcxxaa203", "ibmcxxaa204"
] }},
{"exists": {"field": "system.process.cmdline"}}
],
"must_not": [
{"term" : { "system.process.username" : "NT AUTHORITY\\SYSTEM" }},
{"term" : { "system.process.username" : "NT AUTHORITY\\NETWORK SERVICE" }},
{"term" : { "system.process.username" : "NT AUTHORITY\\LOCAL SERVICE" }},
{"term" : { "system.process.username" : "NT AUTHORITY\\Servicio de red"}},
{"term" : { "system.process.username" : "" }}
]
}
},
"size": 0,
"aggs": {
"group_by_start_time": {
"terms": {
"field": "system.process.cpu.start_time"
},
"aggs": {
"group_by_name": {
"terms": {
"field": "system.process.name.keyword"
}
}
}
}
},
"size": 0,
"sort" : [
{ "system.process.cpu.start_time" : {"order" : "asc"}},
{ "#timestamp" : {"order" : "asc"}},
{ "system.process.pid" : {"order" : "desc"}}
]}

It's a bit hard to follow and reproduce — a minimal example (I think the entire query is not really needed) and sample docs would go a long way.
If you want to have an hourly aggregation, the first thing you'll need to do is that aggregation and then run the others inside.
The minimal example for an hourly aggregation would be:
POST /metricbeat*/_search?size=0
{
"aggs" : {
"metrics_per_hour" : {
"date_histogram" : {
"field" : "#timestamp",
"interval" : "hour"
}
}
}
}
Folding in the other aggregation would look like this:
POST /metricbeat*/_search?size=0
{
"aggs" : {
"metrics_per_hour" : {
"date_histogram" : {
"field" : "#timestamp",
"interval" : "hour"
},
"aggs" : {
...
}
}
}
}
PS: If you are using a daily index pattern, you could just use the right day instead of the wildcard one and then skip this part of the query:
"range": {
"#timestamp": {
"gte": "2019-03-22T00:00:00",
"lte": "2019-03-23T00:00:00"
}
}

Related

I need to get average document count by date in elasticsearch

I want to get average document count by date without getting the whole bunch of buckets data and get average value by hand cause there are years of data and when I group by the date I get too_many_buckets_exception.
So my current query is
{
"query": {
"bool": {
"must": [],
"filter": []
}
},
"aggs": {
"groupByChannle": {
"terms": {
"field": "channel"
},
"aggs": {
"docs_per_day": {
"date_histogram": {
"field": "message_date",
"fixed_interval": "1d"
}
}
}
}
}
}
How can I get an average doc count grouped by message_date(day) and channel without taking buckets array of this data
"buckets" : [
{
"key_as_string" : "2018-03-17 00:00:00",
"key" : 1521244800000,
"doc_count" : 4027
},
{
"key_as_string" : "2018-03-18 00:00:00",
"key" : 1521331200000,
"doc_count" : 10133
},
...thousands of rows
]
my index structure looks like this
"mappings" : {
"properties" : {
"channel" : {
"type" : "keyword"
},
"message" : {
"type" : "text"
},
"message_date" : {
"type" : "date",
"format" : "yyyy-MM-dd HH:mm:ss"
},
}
}
By this query, I want to get JUST A AVERAGE DOC COUNT BY DATE and nothing else
"avg_count": {
"avg_bucket": {
"buckets_path": "docs_per_day>_count"
}
}
after docs_per_day ending this.
avg_count provides average count.
_count refers the bucket count
I think, that you can use stats aggregation with the script :
{
"size": 0,
"aggs": {
"term": {
"terms": {
"field": "chanel"
},
"aggs": {
"stats": {
"stats": {
"field": "message_date"
}
},
"result": {
"bucket_script": {
"buckets_path": {
"max" : "stats.max",
"min" : "stats.min",
"count" : "stats.count"
},
"script": "params.count/(params.max - params.min)/1000/86400)"
}
}
}
}
}
}

Elastic query by time for the current day

I have a query which gets all the data for the dates that I set. This query works very well so far.
When I remove the date and change the format I thought that elastic will take the current day but don't.
**GET /BLA*/_search**
{
"size" : 1,
"sort" : "#timestamp",
"_source" : ["#timestamp", "details"],
"query" : {
"bool" : {
"must" : [
{"term" : {"FIELD" : "TRUC"}},
{"regexp" : { "details" : ".*TRUC.*" }},
{"range": {
"#timestamp": {
"format" : "yyyy-MM-dd HH:mm",
"from": "2020-05-05 07:00",
"to": "2020-05-05 07:30",
"time_zone" : "Europe/Paris"
}
}}
]
}
}
}
I tried to remove the date and change the format but it doesn't work.
an idea? The goal is to take the data for the current day.
**GET /BLA*/_search**
{
"size" : 1,
"sort" : "#timestamp",
"_source" : ["#timestamp", "details"],
"query" : {
"bool" : {
"must" : [
{"term" : {"FIELD" : "TRUC"}},
{"regexp" : { "details" : ".*TRUC.*" }},
{"range": {
"#timestamp": {
"format" : "HH:mm",
"from": "07:00",
"to": "07:30",
"time_zone" : "Europe/Paris"
}
}}
]
}
}
}
TL;DR not possible. The only way I know of to work w/ relative datetimes in range queries is the following:
{
"size": 1,
"sort": "#timestamp",
"_source": [
"#timestamp",
"details"
],
"query": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": "now-2h",
"lte": "now+3h",
"time_zone": "Europe/Paris"
}
}
}
]
}
}
}
but it's all relative to now.
Here's the explanation of why your query doesn't seem to work.
Let's set up an index whose #timestamp will be of the format HH:mm:
PUT my_index
{
"mappings": {
"properties": {
"#timestamp": {
"type": "date",
"format": "HH:mm"
}
}
}
}
then ingest a doc
POST my_index/_doc
{
"#timestamp": "07:00"
}
all nice and dandy. Now let's investigate what the actual indexed date is:
GET my_index/_search
{
"script_fields": {
"ts_full": {
"script": {
"source": """
LocalDateTime.ofInstant(
Instant.ofEpochMilli(doc['#timestamp'].value.millis),
ZoneId.of('Europe/Paris')
).format(DateTimeFormatter.ofPattern("dd/MM/yyyy HH:mm:ss"))
"""
}
}
}
}
which yields
"01/01/1970 08:00:00"
since Paris is UTC+1.
So summing up, you can surely index & search by 'dates' (strictly speaking 'times') of the format HH:mm but they're all going to pertain to Jan 1 1970.

Elasticsearch Filter Query

I am using elasticsearch 1.5.2. I stored some products with a field named "allergic" and some others without this field. And the values of this field can be fish or milk or nuts etc. I want to make a query and to get as a result only products which doesn't have at all this field called "allergic" and to integrate this to an other aggregation query. I want to make just one query: first eliminate products which have "allergic" field and then execute the aggregation query of the second block.
How to integrate this :
{
"constant_score" : {
"filter" : {
"missing" : { "field" : "allergic" }
}
}
}
to this aggregation query:
POST tes1/_search?search_type=count
{
"aggs" : {
"fruits" : {
"filter" : {
"query":{
"query_string": {
"query": "Fruits",
"fields": [
"category"
]
}
}},
"aggs" : {
"minprice": {
"top_hits": {
"sort": [
{
"prix en €/kg": {
"order": "asc"
}
}
], "size":400
}
}
}
}} }
You need to add the query part before the aggregation call. This will filter the results and then run aggregation on the resultset.
POST tes1/_search
{
"_source": false,
"size": 1000,
"query":
{ "constant_score" : {
"filter" : {
"missing" : { "field" : "allergic" }
}
}
},
"aggs" : {
"fruits" : {
"filter" : {
"query":{
"query_string": {
"query": "Fruits",
"fields": [
"category"
]
}
}},
"aggs" : {
"minprice": {
"top_hits": {
"sort": [
{
"prix en €/kg": {
"order": "asc"
}
}
], "size":400
}
}
}
}} }
On a side note please consider upgrading ElasticSearch to the latest version as 1.x is no longer supported.

Converting SQL query to ElasticSearch Query

I want to convert the following sql query to Elasticsearch one. can any one help in this.
select csgg, sum(amount) from table1
where type in ('a','b','c') and year=2016 and fc="33" group by csgg having sum(amount)=0
I tried following way:enter code here
{
"size": 500,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : [
{"term" : {"fc" : "33"}},
{"term" : {"year" : 2016}}
],
"should" : [
{"terms" : {"type" : ["a","b","c"] }}
]
}
}
}
},
"aggs": {
"group_by_csgg": {
"terms": {
"field": "csgg"
},
"aggs": {
"sum_amount": {
"sum": {
"field": "amount"
}
}
}
}
}
}
but not sure if I am doing right as its not validating the results.
seems query to be added inside aggregation.
Assuming that you use Elasticsearch 2.x, there is a possibility to have the having-semantics in Elasticsearch.
I'm not aware of a possibility prior 2.0.
You can use the new Pipeline Aggregation Bucket Selector Aggregation, which only selects the buckets, which meet a certain criteria:
POST test/test/_search
{
"size": 0,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : [
{"term" : {"fc" : "33"}},
{"term" : {"year" : 2016}},
{"terms" : {"type" : ["a","b","c"] }}
]
}
}
}
},
"aggs": {
"group_by_csgg": {
"terms": {
"field": "csgg",
"size": 100
},
"aggs": {
"sum_amount": {
"sum": {
"field": "amount"
}
},
"no_amount_filter": {
"bucket_selector": {
"buckets_path": {"sumAmount": "sum_amount"},
"script": "sumAmount == 0"
}
}
}
}
}
}
However there are two caveats. Depending on your configuration, it might be necessary to enable scripting like that:
script.aggs: true
script.groovy: true
Moreover, as it works on the parent buckets it is not guaranteed that you get all buckets with amount = 0. If the terms aggregation selects only terms with sum amount != 0, you will have no result.

Elasticsearch Facets: Search on _index returned no results

I want to search data on ES in this order by index-> by index_type-> text search data.
When I'am using the below query on "_index" I expected to get list of index_types under that particular _index and also the related data but it returned nothing. On the other hand when I searched by _type I got the data pertaining to the index_type. Where have I gone wrong?
curl -XGET 'http://localhost:9200/_all/_search?pretty' -d '{
"facets": {
"terms": {
"terms": {
"field": "_index",
"size": 10,
"order": "count",
"exclude": []
},
"facet_filter": {
"fquery": {
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "*"
}
}
]
}
},
"filter": {
"bool": {
"must": [
{
"terms": {
"_index": [
"<index_name>"
]
}
}
]
}
}
}
}
}
}
}
},
"size": 0
}'
Note: I faced this problem first on Kibana, where I used the filter "_index":"name_of_index"; it returned no results but "_type":"name_of_index_type" returned the expected result. I found Kibana uses the above query behind the scenes to get the results of the filter I tried.
this is an example of query with pre filter ( "query" : "*" ) and then a must&mustnot query. then the resutlt is used to make the aggregations :
curl -XGET 'http://localhost:9200/YOUR_INDEX_NAME/_search?size=10' -d '{
"query" : {
"filtered" : {
"query" : {
"query_string" : {
"query" : "*"
}
},
"filter" : {
"bool" : {
"must" : [
{ "term" : { "E_RECORDEDBY" : "malençon, g."} },
{ "term" : { "T_SCIENTIFICNAME" : "peniophora incarnata" } }
],
"must_not" : [
{"term" : { "L_CONTINENT" : "africa" } },
{"term" : { "L_CONTINENT" : "europe" } }
]
}
}
}
},
"aggs" : {
"L_CONTINENT" : {
"terms" : {
"field" : "L_CONTINENT",
"size" : 20
}
}
},
"sort" : "_score"
}'

Resources