ElasticSearch how display all documents matching date range aggregation - elasticsearch

Following elastic docs:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html
Question:
How to make date range aggregation and display all documents that match to relevant date bucket just not the doc_count.
The Aggregation :
{
"aggs" : {
"articles_over_time" : {
"date_histogram" : {
"field" : "date",
"interval" : "1M",
"format" : "yyyy-MM-dd"
}
}
}
}
Response:
{
"aggregations": {
"articles_over_time": {
"buckets": [
{
"key_as_string": "2013-02-02",
"key": 1328140800000,
"doc_count": 1
},
{
"key_as_string": "2013-03-02",
"key": 1330646400000,
"doc_count": 2 //how display whole json ??
[ .. Here i want to display
all document with array based
NOT only doc_count:2.......... ]
},
...
]
}
}
}
Maybe I need to do some sub-aggregation or something else?
Any ideas?

You have to perform top_hits sub-aggregation on date-histogram aggregation. All the options can be read from here.
Your final aggregation would look like this
{
"aggs": {
"articles_over_time": {
"date_histogram": {
"field": "date",
"interval": "1M",
"format": "yyyy-MM-dd"
},
"aggs": {
"documents": {
"top_hits": {
"size": 10
}
}
}
}
}
}

Like what Sumit says, however, I think what you really want is to create a filter with a date range:
https://www.elastic.co/guide/en/elasticsearch/reference/2.3/query-dsl-range-query.html#ranges-on-dates
That way you filter out documents not in the date range and only keep the right documents. Than you can do everything you want with the results.

Related

Elasticsearch : How to do 'group by' with painless in scripted fields?

I would like to do something like the following using painless:
select day,sum(price)/sum(quantity) as ratio
from data
group by day
Is it possible?
I want to do this in order to visualize the ratio field in kibana, since kibana itself doesn't have the ability to divide aggregated values, but I would gladly listen to alternative solutions beyond scripted fields.
Yes, it's possible, you can achieve this with the bucket_script pipeline aggregation:
{
"aggs": {
"days": {
"date_histogram": {
"field": "dateField",
"interval": "day"
},
"aggs": {
"price": {
"sum": {
"field": "price"
}
},
"quantity": {
"sum": {
"field": "quantity"
}
},
"ratio": {
"bucket_script": {
"buckets_path": {
"sumPrice": "price",
"sumQuantity": "quantity"
},
"script": "params.sumPrice / params.sumQuantity"
}
}
}
}
}
}
UPDATE:
You can use the above query through the Transform API which will create an aggregated index out of the source index.
For instance, I've indexed a few documents in a test index and then we can dry-run the above aggregation query in order to see how the target aggregated index would look like:
POST _transform/_preview
{
"source": {
"index": "test2",
"query": {
"match_all": {}
}
},
"dest": {
"index": "transtest"
},
"pivot": {
"group_by": {
"days": {
"date_histogram": {
"field": "#timestamp",
"calendar_interval": "day"
}
}
},
"aggregations": {
"price": {
"sum": {
"field": "price"
}
},
"quantity": {
"sum": {
"field": "quantity"
}
},
"ratio": {
"bucket_script": {
"buckets_path": {
"sumPrice": "price",
"sumQuantity": "quantity"
},
"script": "params.sumPrice / params.sumQuantity"
}
}
}
}
}
The response looks like this:
{
"preview" : [
{
"quantity" : 12.0,
"price" : 1000.0,
"days" : 1580515200000,
"ratio" : 83.33333333333333
}
],
"mappings" : {
"properties" : {
"quantity" : {
"type" : "double"
},
"price" : {
"type" : "double"
},
"days" : {
"type" : "date"
}
}
}
}
What you see in the preview array are documents that are going to be indexed in the transtest target index, that you can then visualize in Kibana as any other index.
So what a transform actually does is run the aggregation query I gave you above and it will then store each bucket into another index that can be used.
I found a solution to get the ratio of sums with TSVB visualization in kibana.
You may see the image here to see an example.
At first, you have to create two sum aggregations, one that sums price and another that sums quantity. Then, you choose the 'Bucket Script' aggregation to divide the aforementioned sums, with the use of painless script.
The only drawback that I found is that you can not aggregate on multiple columns.

Elastic Search - Pagination on Aggregations

I have an index and I query an aggregation, instead of returning the whole aggregation at once I want to have it returned in chunks, that is small small blocks, is it possible to do so in Elastic Search?
Try to use Bucket sort
POST /sales/_search
{
"size": 0,
"aggs" : {
"sales_per_month" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
},
"aggs": {
"total_sales": {
"sum": {
"field": "price"
}
},
"sales_bucket_sort": {
"bucket_sort": {
"sort": [
{"total_sales": {"order": "desc"}}
],
"size": 3,
"from": 10
}
}
}
}
}
}

How to use ElasticSearch to bucket historical data from midnight to now?

So I have an index with timestamps in the following format:
2015-03-20T12:00:00+0500
What I would like to do in the SQL equivalent is the following:
select date(timestamp), sum(orders)
from data
where time(timestamp) < time(now)
group by date(timestamp)
I know I need an aggregation but, for now, I've tried a basic search query below but I'm getting a malformed error:
{
"size": 0,
"query":
{
"filtered":
{
"query":
{
"match_all" : {}
},
"filter":
{
"range":
{
"#timestamp":
{
"from": "00:00:01.000",
"to": "15:00:00.000"
}
}
}
}
}
}
You do indeed want an aggregation, specifically the date histogram aggregation. Something like
{
"query": {"match_all": {}},
"aggs": {
"by_date": {
"date_histogram": {
"field": "timestamp",
"interval": "day"
},
"aggs": {
"order_sum": {
"sum": {"field": "foo"}
}
}
}
}
}
First you have a bucketing aggregation that groups your documents by date, then inside that a metric aggregation that computes a value (in this case a sum) for each bucket
which would return data of the form
{
...
"aggregations": {
"by_date": {
"buckets": [
{
"key_as_string": "2015-03-01T00:00:00.000Z",
"key": 1425168000000,
"doc_count": 8644,
"order_sum": {
"value": 1234
}
},
{
"key_as_string": "2015-03-02T00:00:00.000Z",
"key": 1425254400000,
"doc_count": 8819,
"order_sum": {
"value": 45678
}
},
...
]
}
}
}
There is a good intro to aggregations on the elasticsearch blog (part 1 and part 2) if you want to do some more reading.

Elasticsearch: aggregation min_doc_count for weeks doesn't work

I've the following aggregation with interval=week and min_doc_count=0
{
"aggs": {
"scores_by_date": {
"date_histogram": {
"field": "date",
"format": "yyyy-MM-dd",
"interval": "week",
"min_doc_count": 0
}
}
}
and date filter from Jan-01-2015 to Feb-23-2015
{
"range": {
"document.date": {
"from": "2015-01-01",
"to": "2015-02-23"
}
}
}
I expected Elasticsearch to fill seven weeks even if empty and return buckets but end up only with one item in it
{
"aggregations": {
"scores_by_date": {
"buckets": [
{
"key_as_string": "2015-01-05",
"key": 1420416000000,
"doc_count": 5
}
]
}
}
}
Elasticsearch version: 1.4.0
What is wrong with my aggregation or how can I say Elasticsearch to fill missing weeks?
You can try specifying extended bounds (there's documentation discussing this feature on the official doc page for histogram aggregations). The most relevant nugget from those docs is this:
With extended_bounds setting, you now can "force" the histogram aggregation to start building buckets on a specific min values and also keep on building buckets up to a max value (even if there are no documents anymore). Using extended_bounds only makes sense when min_doc_count is 0 (the empty buckets will never be returned if min_doc_count is greater than 0).
So your aggregation may have to look something like this to force ES to return empty buckets in that range:
{
"aggs": {
"scores_by_date": {
"date_histogram": {
"field": "date",
"format": "yyyy-MM-dd",
"interval": "week",
"min_doc_count": 0,
"extended_bounds" : {
"min" : "2015-01-01",
"max" : "2015-02-23"
}
}
}
}

ElasticSearch: min_doc_count on lower/lowest level nested aggregation

I have this query with some nested aggregations
{
"aggs": {
"by_date": {
"date_histogram": {
"field": "timestamp",
"interval": "day"
},
"aggs": {
"new_users": {
"filter": {
"query": {
"match": {
"action": "USER_ADD"
}
}
},
"aggs": {
"unique_users": {
"cardinality": {
"field": "user"
}
}
}
}
}
}
},
"size": 0
}
It yields results that look like this
"aggregations": {
"by_date": {
"buckets": [
{
"key_as_string": "1970-01-07T00:00:00.000Z",
"key": 518400000,
"doc_count": 210,
"new_users": {
"doc_count": 0,
"unique_users": {
"value": 0
}
}
},
{
"key_as_string": "1970-01-09T00:00:00.000Z",
"key": 691200000,
"doc_count": 6,
"new_users": {
"doc_count": 0,
"unique_users": {
"value": 0
}
}
},
......
What I want to happen is apply min_doc_count on the most nested sub-aggregation such that I don't see zero values for "unique_users" (in this case) returned.
The issue is that min_doc_count can't be applied to my query other than the date_histogram at the top level.
Does the ES query language support something like this? Any know workarounds?
Thanks,
George
As per ElasticSearch Documentation min_doc_count can used with any aggregation including histogram
for example
{
"aggs" : {
"tags" : {
"terms" : {
"field" : "tag"
}
}
}
}
the above query is not date_histogram still you can apply the min_doc_count
{
"aggs" : {
"tags" : {
"terms" : {
"field" : "tag",
"min_doc_count" : 1
}
}
}
}
only thing is min_doc_count can be applied to any aggregation

Resources