Elasticsearch aggregate on term multiple times per different time range - elasticsearch

I'm trying to aggregate a field by each half of the time-range given in the query. For example, here's the query:
{
"query": {
"simple_query_string": {
"query": "+sitetype:(redacted) +sort_date:[now-2h TO now]"
}
}
}
...and I want to aggregate on term "product1.keyword" from now-2h to now-1h and aggregate on the same term "product1.keyword" from now-1h to now, so like:
"terms": {
"field": "product1",
"size": 10,
}
^ aggregate the top 10 results on product1 in now-2h TO now-1h,
and aggregate the top 10 results on product1 in now-1h TO now.
Clarification: product1 is not a date or time-related field. It would be like a type of car, phone, etc.

if you want use now in your query,you must make product1 field as date type,then you can try as below:
GET index1/_search
{
"size": 0,
"aggs": {
"dataAgg": {
"date_range": {
"field": "product1",
"ranges": [
{
"from": "now-2h",
"to": "now-1h"
},
{
"from": "now-1h",
"to": "now"
}
]
},
"aggs": {
"top10": {
"top_hits": {
"size": 10
}
}
}
}
}
}
and if you can't change product1's type ,you can try rang agg,but you must write the time explicitly instead of using now

Related

Elasticsearch aggregate field between dates

I want to compare two buckets against each other and find new occurrences that appear in the second bucket. The below query returns all entries in the "query.keyword" field between the two UNIX timestamps provided but I want the UNIX timestamps to be apart of the aggregation section itself.
GET _search
{
"size": 0,
"query": {
"range" :{
"ts": {
"gte":1535155200,
"lte":1535414399
}
}
},
"aggs": {
"domains": {
"terms": {
"field":"query.keyword"
}
}
}
}
I've also tried this but received the error:
"Found two aggregation type definitions in [domains_prev]: [range] and [terms]",
GET _search
{
"size": 0,
"aggs": {
"domains_prev": {
"range" :{
"field":"ts",
"ranges": [
{"to" : 1535414399},
{"from" : 1535155200}
]
},
"terms": {
"field":"query.keyword"
}
}
}
}
The goal is to have something similar to this:
Agg1
"domains_prev"
"field":"query.keyword"
date:gte:timestamp, lte:timestamp
Agg2
"domains_today"
"field":"query.keyword"
date:today
show all "query.keyword" in agg2 that does not appear in agg1.
This is the SQL query that I use to achieve the intended result:
select domains FROM table WHERE date >= 20171123 and domains NOT IN (SELECT domains FROM table WHERE date < 20171123 group by domains)
You'll want to do a nested bucket aggregation starting with date range:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-daterange-aggregation.html
From their page, start with an aggregation like this at the top level:
{
"aggs": {
"range": {
"date_range": {
"field": "date",
"format": "MM-yyy",
"ranges": [
{ "to": "now-10M/M" },
{ "from": "now-10M/M" }
]
}
}
}
}
Then nest your existing terms aggregation using query.keyword under that.
The end result should be something like:
{
"aggs": {
"range": {
"date_range": {
"field": "date",
"format": "MM-yyy",
"ranges": [
{ "to": "now-10M/M" },
{ "from": "now-10M/M" }
]
},
"aggs": {
"domains": {
"terms": {
"field":"query.keyword"
}
}
}
}
}
}

How to group by month in Elastic search

I am using elastic search version 6.0.0
for group by month, I am using date histogram aggregation.
example which I've tried :
{
"from":0,
"size":2000,
"_source":{
"includes":[
"cost",
"date"
],
"excludes":[
],
"aggregations":{
"date_hist_agg":{
"date_histogram":{
"field":"date",
"interval":"month",
"format":"M",
"order":{
"_key":"asc"
},
"min_doc_count":1
},
"aggregations":{
"cost":{
"sum":{
"field":"cost"
}
}
}
}
}
}
}
and as a result i got 1(Jan/January) multiple times.
As I have data of January-2016 ,January-2017 , January-2018 so will return 3 times January. but i Want January only once which contains the sum of All years of January.
Instead of using a date_histogram aggregation you could use a terms aggregation with a script that extracts the month from the date.
{
"from": 0,
"size": 2000,
"_source": {"includes": ["cost","date"],"excludes"[]},
"aggregations": {
"date_hist_agg": {
"terms": {
"script": "doc['date'].date.monthOfYear",
"order": {
"_key": "asc"
},
"min_doc_count": 1
},
"aggregations": {
"cost": {
"sum": {
"field": "cost"
}
}
}
}
}
}
Note that using scripting is not optimal, if you know you'll need the month information, just create another field with that information so you can use a simple terms aggregation on it without having to use scripting.
We can use the calendar_interval with month value:
Documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html#calendar_interval_examples
GET my_index/_search
{
"size": 0,
"query": {},
"aggs": {
"over_time": {
"date_histogram": {
"field": "yourDateAttribute",
"calendar_interval": "month",
"format": "yyyy-MM" // <--- control the output format
}
}
}
}

Sub-aggregation or aggregation filter in elastic

I have a list of records with name and timestamp. For each name, I want to get the maximal timestamp, but I only want to get names with max timestamp before an hour ago (meaning that in my result I would only like to see a list of names and their max timestamp, but only for names that has max timestamp before an hour ago. If a name has a record with timestamp after an hour ago, I don't want to see this name in my result).
I tried to solve this issue using aggregation, by creating a term aggregation over name, and then aggregating over max timestamp and then filtering records with max timestamp after one hour ago, as follows:
{
"size": 0,
"aggs": {
"names_aggs": {
"terms": {
"field": "name",
"size": 10
},
"aggs": {
"max_timestamp": {
"max": {
"field": "timestamp"
},
"aggs": {
"sub-agg": {
"filter": {
"range": {
"timestamp": {
"lt": "now-1h"
}
}
}
}
}
}
}
}
}
}
However, this query produces the following error:
{
"type": "aggregation_initialization_exception",
"reason": "Aggregator [max_timestamp] of type [max] cannot accept sub-aggregations"
}
I can basically get a similar functionality by using the timestamp filter before the max aggregation as follows:
{
"size": 0,
"aggs": {
"names_aggs": {
"terms": {
"field": "name",
"size": 10
},
"aggs": {
"maximals": {
"filter": {
"range": {
"timestamp": {
"lt": "now-1h"
}
}
},
"aggs": {
"max_timestamp": {
"max": {
"field": "timestamp"
}
}
}
}
}
}
}
}
Indeed, I get a set of results with name and max_timestamp for each name that passed the maximals filter, and a null max_timestamp for each name that didn't pass the maximals filter. This is a solution I can work with, however, this query does not return for a large amount of records, because of the maximals filter that runs for each name.
Thanks in advance for your help.

Elasticsearch get n ordered records and then apply grouping

Here's an example of what I'm looking for. Let's say I have records of some purchases. I want to get records where price is > $50 and order by price descending. I want to limit those ordered records to 100 and then group them by zip code.
Final result should have counts of hits for each zip where sum of those counts would total to 100 record.
ES v2.1.1
what do you mean by "group them by zip code":
just want to know the number of docs in the group?
a hash with zip code as the key associated with docs?
If 1:
{
"size": 100,
"query": {
"filtered": {
"filter": {
"range": {
"price": {
"gt": 50
}
}
}
}
},
"sort": {
"price": "desc"
},
"aggs": {
"by_zip_code": {
"terms": {
"field": "zip_code"
}
}
}
}
If 2, you may use the top hits aggregations. However, sorting by price is not possible (how could we do that?), and by default Elasticsearch uses the _count (check intrinsic sorts out). If the sort is not a big deal, the following will work:
{
"size": 0,
"query": {
"filtered": {
"filter": {
"range": {
"price": {
"gt": 50
}
}
}
}
},
"sort": {
"price": "desc"
},
"aggs": {
"by_zip_code": {
"terms": {
"field": "zip_code",
"size": 100
},
"aggs": {
"hits": {
"top_hits": {}
}
}
}
}
}
You need to use the Search API to get the 100 results and then post-process to perform the aggregation (since an aggregation of top hits cannot be done directly using the ES API).
"I want to get records where price is > $50" - You need a range filter.
"...order by price descending" - You need a sort.
"I want to limit those ordered records to 100" - You need to specify
the size parameter.
"...then group them by zip code" - You need to post-process the "hits":"hits" array to do this (e.g. inserting into a hash table / dictionary with zip code as the key values).
For steps 1-3 you need:
$ curl -XGET 'http://localhost:9200/my_index/_search?pretty' -d '{"query":
{"filtered" : {"filter" : { "range": { "price": { "gt": 50 }}}}},
"size" : 100,
"sort": { "price": { "order": "desc" }}
}'

how to build a range aggregation on parent by minimum value in children docs

I have a parent/child relationship created between Product and Pricing documents. A Product has many Pricing, each with it's own subtotal field, and I'd simply like to create a range aggregation that only considers the minimum subtotal for each product and filters out the others.
I think this is possible using nested aggregations and filters, but this is the closest I've gotten:
POST /test_index/Product/_search
{
"aggs": {
"offered-at": {
"children": {
"type": "Pricing"
},
"aggs": {
"prices": {
"aggs": {
"min_price": {
"min": {
"field": "subtotal"
},
"aggs": {
"min_price_buckets": {
"range": {
"field": "subtotal",
"ranges": [
{
"to": 100
},
{
"from": 100,
"to": 200
},
{
"from": 200
}
]
}
}
}
}
}
}
}
}
}
}
However this results in the error nested: AggregationInitializationException[Aggregator [min_price] of type [min] cannot accept sub-aggregations]; }], which sort of makes sense because once you reduce to a single value there is nothing left to aggregate.
But how can I structure this so that the range aggregation is only pulling the minimum value from each set of children?
(here is a sense with mappings and test data : http://sense.qbox.io/gist/01b072b4566ef6885113dc94a796f3bdc56f19a9)

Resources