Sub-aggregation or aggregation filter in elastic - elasticsearch

I have a list of records with name and timestamp. For each name, I want to get the maximal timestamp, but I only want to get names with max timestamp before an hour ago (meaning that in my result I would only like to see a list of names and their max timestamp, but only for names that has max timestamp before an hour ago. If a name has a record with timestamp after an hour ago, I don't want to see this name in my result).
I tried to solve this issue using aggregation, by creating a term aggregation over name, and then aggregating over max timestamp and then filtering records with max timestamp after one hour ago, as follows:
{
"size": 0,
"aggs": {
"names_aggs": {
"terms": {
"field": "name",
"size": 10
},
"aggs": {
"max_timestamp": {
"max": {
"field": "timestamp"
},
"aggs": {
"sub-agg": {
"filter": {
"range": {
"timestamp": {
"lt": "now-1h"
}
}
}
}
}
}
}
}
}
}
However, this query produces the following error:
{
"type": "aggregation_initialization_exception",
"reason": "Aggregator [max_timestamp] of type [max] cannot accept sub-aggregations"
}
I can basically get a similar functionality by using the timestamp filter before the max aggregation as follows:
{
"size": 0,
"aggs": {
"names_aggs": {
"terms": {
"field": "name",
"size": 10
},
"aggs": {
"maximals": {
"filter": {
"range": {
"timestamp": {
"lt": "now-1h"
}
}
},
"aggs": {
"max_timestamp": {
"max": {
"field": "timestamp"
}
}
}
}
}
}
}
}
Indeed, I get a set of results with name and max_timestamp for each name that passed the maximals filter, and a null max_timestamp for each name that didn't pass the maximals filter. This is a solution I can work with, however, this query does not return for a large amount of records, because of the maximals filter that runs for each name.
Thanks in advance for your help.

Related

Query returns result with small size that is not my intention in elasticsearch

I am using rest api to query the result from ElasticSearch.
Below is the API query string.
GET /..../_search
{
"size":0,
"query": {
"bool": {
"must": [
{ "range": {
"#timestamp": {
"time_zone": "+09:00",
"gte": "2023-01-24T00:00:00.000Z",
"lt": "2023-01-24T03:03:00.000Z" } } },
{
"term" : {
"serviceid.keyword" : {
"value" : "430011397"
}
}
}
]
}
},
"aggs": {
"by_day": {
"auto_date_histogram": {
"field": "#timestamp",
"minimum_interval":"minute"
},
"aggs": {
"agg-type": {
"terms": {
"field": "nxlogtype.keyword",
"size": 100000
},
"aggs": {
"my-sub-agg-name": {
"avg": {
"field": "size"
}
}
}
}
}
}
}
}
As you can see, I specified the time range about three hours in gte and lt field.
However, the result returns only 6 buckets which have 30 minute intervals.
I expected that many buckets will be returned with one minute interval during the timestamp I specified, but the result is always same even though I changed the time range as more extended one.
Since I am quite new to elastic search, I am not familiar with query usage.
How to resolve my issue?

Elasticsearch aggregate on term multiple times per different time range

I'm trying to aggregate a field by each half of the time-range given in the query. For example, here's the query:
{
"query": {
"simple_query_string": {
"query": "+sitetype:(redacted) +sort_date:[now-2h TO now]"
}
}
}
...and I want to aggregate on term "product1.keyword" from now-2h to now-1h and aggregate on the same term "product1.keyword" from now-1h to now, so like:
"terms": {
"field": "product1",
"size": 10,
}
^ aggregate the top 10 results on product1 in now-2h TO now-1h,
and aggregate the top 10 results on product1 in now-1h TO now.
Clarification: product1 is not a date or time-related field. It would be like a type of car, phone, etc.
if you want use now in your query,you must make product1 field as date type,then you can try as below:
GET index1/_search
{
"size": 0,
"aggs": {
"dataAgg": {
"date_range": {
"field": "product1",
"ranges": [
{
"from": "now-2h",
"to": "now-1h"
},
{
"from": "now-1h",
"to": "now"
}
]
},
"aggs": {
"top10": {
"top_hits": {
"size": 10
}
}
}
}
}
}
and if you can't change product1's type ,you can try rang agg,but you must write the time explicitly instead of using now

Elasticsearch aggregate field between dates

I want to compare two buckets against each other and find new occurrences that appear in the second bucket. The below query returns all entries in the "query.keyword" field between the two UNIX timestamps provided but I want the UNIX timestamps to be apart of the aggregation section itself.
GET _search
{
"size": 0,
"query": {
"range" :{
"ts": {
"gte":1535155200,
"lte":1535414399
}
}
},
"aggs": {
"domains": {
"terms": {
"field":"query.keyword"
}
}
}
}
I've also tried this but received the error:
"Found two aggregation type definitions in [domains_prev]: [range] and [terms]",
GET _search
{
"size": 0,
"aggs": {
"domains_prev": {
"range" :{
"field":"ts",
"ranges": [
{"to" : 1535414399},
{"from" : 1535155200}
]
},
"terms": {
"field":"query.keyword"
}
}
}
}
The goal is to have something similar to this:
Agg1
"domains_prev"
"field":"query.keyword"
date:gte:timestamp, lte:timestamp
Agg2
"domains_today"
"field":"query.keyword"
date:today
show all "query.keyword" in agg2 that does not appear in agg1.
This is the SQL query that I use to achieve the intended result:
select domains FROM table WHERE date >= 20171123 and domains NOT IN (SELECT domains FROM table WHERE date < 20171123 group by domains)
You'll want to do a nested bucket aggregation starting with date range:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-daterange-aggregation.html
From their page, start with an aggregation like this at the top level:
{
"aggs": {
"range": {
"date_range": {
"field": "date",
"format": "MM-yyy",
"ranges": [
{ "to": "now-10M/M" },
{ "from": "now-10M/M" }
]
}
}
}
}
Then nest your existing terms aggregation using query.keyword under that.
The end result should be something like:
{
"aggs": {
"range": {
"date_range": {
"field": "date",
"format": "MM-yyy",
"ranges": [
{ "to": "now-10M/M" },
{ "from": "now-10M/M" }
]
},
"aggs": {
"domains": {
"terms": {
"field":"query.keyword"
}
}
}
}
}
}

How to limit a date histogram aggregation of nested documents to a specific date range?

Version
Using Elasticsearch 1.7.2
Objective
I would like to create a graph of the number of predictions made by users per day for the last n days. In this case, 10 days.
Current query
{
"size": 0,
"aggs": {
"predictions": {
"nested": {
"path": "user_answers"
},
"aggs": {
"predictions_over_time": {
"date_histogram": {
"field": "user_answers.created",
"interval": "day",
"format": "yyyy-MM-dd",
"min_doc_count": 0
}
}
}
}
}
}
Issue
This query will return a histogram but will return buckets for all available dates across all documents. It doesn't restrict to a specific date range.
What have I tried?
I've tried a number of approaches to solving this, all of which have failed.
* Range filter, then histogram that
* Date range aggregation, then histogram the buckets
* Using extended_bounds with, full dates, now-10d and also timestamps
* Trying a range filter inside the histogram aggregation
Any guidance would be appreciated! Thanks.
query didn't work for me in that situation, what I used is a third aggs:
{
"size": 0,
"aggs": {
"user_answers": {
"nested": { "path": "user_answers" },
"aggs": {
"timed_user_answers": {
"filter": {
"range": {
"user_answers.created": {
"gte": "now",
"lte": "now -10d"
}
}
},
"aggs": {
"predictions_over_time": {
"date_histogram": {
"field": "user_answers.created",
"interval": "day",
"format": "yyyy-MM-dd",
"min_doc_count": 0
}
}
}
}
}
}
}
}
One aggs specifies nested, one specifies filter, and the last specifies the actual aggregation. Don't know why this syntax makes sense, but you seem to not be able to use two on the same aggs.
You need to add a query. Query can be anything except from post_filter. It should be nested and contain date range. One of the ways is to define a constant score query. Inside constant score query, use a nested filter which should use a range filter.
{
"query": {
"constant_score": {
"filter": {
"nested": {
"path": "user_answers",
"filter": {
"range": {
"user_answers.created": {
"gte": "now",
"lte": "now -10d"
}
}
}
}
}
}
}
}
Confirm if this works for you.

ElasticSearch filtering by field1 THEN field2 THEN take max of field3

I am struggling to get the information that I need from ElasticSearch.
My log statements are like this:
field1: Example
field2: Example2
field3: Example3
I would like to search a timeframe (using last 24 hours) to find all data that has this in field1 and that in field2.
There then may be multiple this.that.[field3] entries, so I want to only return the maximum of that field.
In fact, in my data, field3 is actually the key of the entry.
What is the best way of retrieving the information I need? I have managed to get the results returned using aggs, but the data is in buckets, and I am only interested in the data with the max value of field3.
I have added an example of the query that I am looking to do: https://jsonblob.com/54535d49e4b0d117eeaf6bb4
{
"size": 0,
"aggs": {
"agg_129": {
"filters": {
"filters": {
"CarName: Toyota": {
"query": {
"query_string": {
"query": "CarName: Toyota"
}
}
}
}
},
"aggs": {
"agg_130": {
"filters": {
"filters": {
"Attribute: TimeUsed": {
"query": {
"query_string": {
"query": "Attribute: TimeUsed"
}
}
}
}
},
"aggs": {
"agg_131": {
"terms": {
"field": "#timestamp",
"size": 0,
"order": {
"_count": "desc"
}
}
}
}
}
}
}
},
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": "2014-10-27T00:00:00.000Z",
"lte": "2014-10-28T23:59:59.999Z"
}
}
}
],
"must_not": []
}
}
}
}
}
So, that example above is showing only those that have CarName = Toyota and Attribute = TimeUsed.
My data is as follows:
There are x number of cars CarName and each car has y number of Attributes and each of those Attributes have a document with a timestamp.
To begin with, I was looking for a query for CarName.Attribute.timestamp (latest), however, if I am able to use just ONE query to get the latest timestamp for EVERY attribute for EVERY CarName, then that would decrease query calls from ~50 to one.
If you are using a ElasticSearch v1.3+, you can add a top_hits aggregation with parameter size:1 and descending sort on the field3 value.
This will return the whole document with maximum value on the field, as you wish.
This example in the documentation might do the trick.
Edit:
Ok, it seems you don't need the whole document, but only the maximum timestamp value. You can use a max aggregation instead of using a top_hits one.
The following query (not tested) should give you the maximum timestamp value for each top 10 Attribute value of each CarName top 10 value, in only one request.
terms aggregation is like a GROUP BY clause, and you should not have to query 50 times to retrieve the values of each CarName/Attribute combination : this is the point of nesting a terms aggregation for Attribute in the CarName aggregation.
Note that, to work properly, the CarName and Attribute fields should be not_analyzed. If it's not the case, you will have "funny" results in your buckets. The problem (and possible solution) is very well described here.
Feel free to change the size parameter of the terms aggregation to fit to your case.
{
"size": 0,
"aggs": {
"by_carnames": {
"terms": {
"field": "CarName",
"size": 10
},
"aggs": {
"by_attribute": {
"terms": {
"field": "Attribute",
"size": 10
},
"aggs": {
"max_timestamp": {
"max": {
"field": "#timestamp"
}
}
}
}
}
}
},
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": "2014-10-27T00:00:00.000Z",
"lte": "2014-10-28T23:59:59.999Z"
}
}
}
]
}
}
}
}
}

Resources