elasticsearch - equivalent of "facets" "statistical" for aggregations

elasticsearch - equivalent of "facets" "statistical" for aggregations - elasticsearch

What is the equivalent of "facets" "statistical" fields using aggregations?
"facets": {
"text": {
"statistical": {
"script": "doc['text'].values.size()"
}
}
}

You need to use the stats aggregation
"aggs": {
"text": {
"stats": {
"script": "doc['text'].values.size()"
}
}
}

Related

Filter and sort based on attributes in Terms lookup document in Elastic Search

I have some documents in my index:
POST "/index/thing/_bulk" -s -d'
{ "index":{ "_id": 1 } }
{ "title":"One thing"}
{ "index":{ "_id": 2 } }
{ "title":"Second thing"}
{ "index":{ "_id": 3 } }
{ "title":"Three things"}
{ "index":{ "_id": 4 } }
{ "title":"And so fourth"}
{ "index":{ "_id": 5 } }
{ "title":"Five things"}
'
I also have documents which contain a users collection which are linked to the other documents (things) through the documents id attribute like so:
PUT /index/collection/1
{
"items": [
{"id": 1, "time_added": "2017-08-07T09:07:15.000Z", "condition": "fair"},
{"id": 3, "time_added": "2019-08-07T09:07:15.000Z", "condition": "good"},
{"id": 4, "time_added": "2016-08-07T09:07:15.000Z", "condition": "poor"}
]
}
I then use a terms lookup to get all the things in a users collection like so:
GET /documents/_search
{
"query" : {
"terms" : {
"_id" : {
"index" : "index",
"type" : "collection",
"id" : 1,
"path" : "items.id"
}
}
}
}
This works fine. I get the three documents in the collection and can search, sort and use aggregations like I want.
But is there a way to aggregate, filter and sort those documents based on the attributes (time_added or condition in this case) in the collection document? Say I wanted to sort based on time_added or filter for condition=="good" from the collection?
Maybe a script that can be applied to collection to sort or filter the items in there? It feels like this is getting pretty close to sql like left-join, so maybe Elastic Search is the wrong tool?

It looks like you need the nested data type
Taking your data as an example:
Without nested type:
POST collection/_bulk?filter_path=_
{"index":{}}
{"items":[{"id":11,"time_added":"2017-08-07T09:07:15.000Z","condition":"fair"},{"id":13,"time_added":"2019-08-07T09:07:15.000Z","condition":"good"},{"id":14,"time_added":"2016-08-07T09:07:15.000Z","condition":"poor"}]}
{"index":{}}
{"items":[{"id":21,"time_added":"2017-09-07T09:07:15.000Z","condition":"fair"},{"id":23,"time_added":"2019-09-07T09:07:15.000Z","condition":"good"},{"id":24,"time_added":"2016-09-07T09:07:15.000Z","condition":"poor"}]}
{"index":{}}
{"items":[{"id":31,"time_added":"2017-10-07T09:07:15.000Z","condition":"fair"},{"id":33,"time_added":"2019-10-07T09:07:15.000Z","condition":"good"},{"id":34,"time_added":"2016-10-07T09:07:15.000Z","condition":"poor"}]}
{"index":{}}
{"items":[{"id":41,"time_added":"2017-11-07T09:07:15.000Z","condition":"fair"},{"id":43,"time_added":"2019-11-07T09:07:15.000Z","condition":"good"},{"id":44,"time_added":"2016-11-07T09:07:15.000Z","condition":"poor"}]}
{"index":{}}
{"items":[{"id":51,"time_added":"2017-12-07T09:07:15.000Z","condition":"fair"},{"id":53,"time_added":"2019-12-07T09:07:15.000Z","condition":"good"},{"id":54,"time_added":"2016-12-07T09:07:15.000Z","condition":"poor"}]}
Query (you'd get incorrect results - expected one, got five):
GET collection/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"items.condition": {
"value": "good"
}
}
},
{
"range": {
"items.time_added": {
"lte": "2019-09-01"
}
}
}
]
}
}
}
Aggregation (incorect results - look at the first bucket "2016-08-01T00:00:00.000Z" - it contains 3 CONDITION sub-buckets with every condition type)
GET collection/_search
{
"size": 0,
"aggs": {
"DATE": {
"date_histogram": {
"field": "items.time_added",
"calendar_interval": "month"
},
"aggs": {
"CONDITION": {
"terms": {
"field": "items.condition.keyword",
"size": 10
}
}
}
}
}
}
With nested type
DELETE collection
PUT collection
{
"mappings": {
"properties": {
"items": {
"type": "nested"
}
}
}
}
# and POST the same data from above
Query (returns just one result)
GET collection/_search
{
"query": {
"nested": {
"path": "items",
"query": {
"bool": {
"must": [
{
"term": {
"items.condition": {
"value": "good"
}
}
},
{
"range": {
"items.time_added": {
"lte": "2019-09-01"
}
}
}
]
}
}
}
}
}
Aggregation (the first date bucket contains just one CONDITION sub-bucket)
GET collection/_search
{
"size": 0,
"aggs": {
"ITEMS": {
"nested": {
"path": "items"
},
"aggs": {
"DATE": {
"date_histogram": {
"field": "items.time_added",
"calendar_interval": "month"
},
"aggs": {
"CONDITION": {
"terms": {
"field": "items.condition.keyword",
"size": 10
}
}
}
}
}
}
}
}
Hope that helps :)

ElasticSearch - How can I reuse script_fields field in aggregation?

It is possible to use a script_field to compute a field, 'emp_salary', and use in an aggregation query? Here's an example.
I have a script_fields script to compute the 'emp_salary', and I want to use it in the aggregation sub query but I get
{
"query": {
"term": {
"name.keyword": "John"
}
},
"script_fields": {
"emp_salary": {
"script": {
"lang": "painless",
"source": """return 1"""
}
}
},
"aggs": {
"average": {
"avg": {
"field": "_field['emp_salary']"
}
}
}
}
but I get null for the 'emp_salary'. Am I accessing the field value wrong?
"aggregations": {
"average": {
"value": null
}
}
Thanks

Can We Apply Bucket Selector Aggregation on Nested Aggregation in ElasticSearch?

I want to use PipeLine Aggregation(Bucket Selector Aggregation) to Nested Field Aggregation in ElasticSearch 2.4. I want to do something similar to below but I am not successful. Could you please suggest me if it is possible to do the PipeLine Aggregation in the nested field?
{
"size": 0,
"aggregations": {
"totalPaidAmount": {
"nested": {
"path": "count"
},
"aggregations": {
"paidAmountTotal": {
"sum": {
"field": "count.totalPaidAmount"
}
},
"paidAmount_filter": {
"bucket_selector": {
"script": {
"inline": "amount > 5000000"
},
"buckets_path": {
"amount": "paidAmountTotal"
}
}
}
}
}
}
}

I found the solution for the query. Actually, bucket selector Aggregation should be parallel to the nested aggregation and path should be referenced by '>' as shown below:
{
"size": 0,
"aggregations": {
"amount": {
"terms": {
"field": "countId",
"size": 0
},
"aggregations": {
"totalPaidAmount": {
"nested": {
"path": "count"
},
"aggregations": {
"paidAmountTotal": {
"sum": {
"field": "count.totalPaidAmount"
}
}
}
},
"paidAmount_filter": {
"bucket_selector": {
"script": {
"inline": "amount > 1000"
},
"buckets_path": {
"amount": "totalPaidAmount>paidAmountTotal"
}
}
}
}
}
}
}

You are missing params in script value. so, paidAmount_filter should look like:
"bucket_filter": {
"bucket_selector": {
"buckets_path": {
"amount ": "paidAmountTotal"
},
"script": "params.amount > 5000000"
}
}

How to return only aggregation stats in an ElasticSearch query?

Is it possible to exclude documents from an aggregation query? I just need to know "count" and "sum" and do not need hits. I did it like this:
{
"query": {
"match_all": {
}
},
"aggs": {
"my_agg": {
"stats": {
"field": "country_id"
}
}
}
}

To focus only on aggregation with a match_all query, you could simply use "size":0 (this specifies you want no query results) with no query:
curl -XPOST "http://localhost:9200/indexname/doctype/_search" -d'
{
"size": 0,
"aggs": {
"my_agg": {
"stats": {
"field": "country_id"
}
}
}
}'

Add to your query ?search_type=count.
For example:
GET /my_index/countries/_search?search_type=count
{
"query": {
"match_all": {
}
},
"aggs": {
"my_agg": {
"stats": {
"field": "country_id"
}
}
}
}

Multiple filters and an aggregate in elasticsearch

How can I use a filter in connection with an aggregate in elasticsearch?
The official documentation gives only trivial examples for filter and for aggregations and no formal description of the query dsl - compare it e.g. with postgres documentation.
Through trying out I found following query, which is accepted by elasticsearch (no parsing errors), but ignores the given filters:
{
"filter": {
"and": [
{
"term": {
"_type": "logs"
}
},
{
"term": {
"dc": "eu-west-12"
}
},
{
"term": {
"status": "204"
}
},
{
"range": {
"#timestamp": {
"from": 1398169707,
"to": 1400761707
}
}
}
]
},
"size": 0,
"aggs": {
"time_histo": {
"date_histogram": {
"field": "#timestamp",
"interval": "1h"
},
"aggs": {
"name": {
"percentiles": {
"field": "upstream_response_time",
"percents": [
98.0
]
}
}
}
}
}
}
Some people suggest using query instead of filter. But the official documentation generally recommends the opposite for filtering on exact values. Another issue with query: while filters offer an and, query does not.
Can somebody point me to documentation, a blog or a book, which describe writing non-trivial queries: at least an aggregate plus multiple filters.

I ended up using a filter aggregation - not filtered query. So now I have 3 nested aggs elements.
I also use bool filter instead of and as recommended by #alex-brasetvik because of http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
My final implementation:
{
"aggs": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"_type": "logs"
}
},
{
"term": {
"dc": "eu-west-12"
}
},
{
"term": {
"status": "204"
}
},
{
"range": {
"#timestamp": {
"from": 1398176502000,
"to": 1400768502000
}
}
}
]
}
},
"aggs": {
"time_histo": {
"date_histogram": {
"field": "#timestamp",
"interval": "1h"
},
"aggs": {
"name": {
"percentiles": {
"field": "upstream_response_time",
"percents": [
98.0
]
}
}
}
}
}
}
},
"size": 0
}

Put your filter in a filtered-query.
The top-level filter is for filtering search hits only, and not facets/aggregations. It was renamed to post_filter in 1.0 due to this quite common confusion.
Also, you might want to look into this post on why you often want to use bool and not and/or: http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/

more on #geekQ 's answer: to support filter string with space char,for multipal term search,use below:
{ "aggs": {
"aggresults": {
"filter": {
"bool": {
"must": [
{
"match_phrase": {
"term_1": "some text with space 1"
}
},
{
"match_phrase": {
"term_2": "some text with also space 2"
}
}
]
}
},
"aggs" : {
"all_term_3s" : {
"terms" : {
"field":"term_3.keyword",
"size" : 10000,
"order" : {
"_term" : "asc"
}
}
}
}
} }, "size": 0 }

Just for reference, as for the version 7.2, I tried with something as follows to achieve multiple filters for aggregation:
filter aggregation to filter for aggregation
use bool to set up the compound query
POST movies/_search?size=0
{
"size": 0,
"aggs": {
"test": {
"filter": {
"bool": {
"must": {
"term": {
"genre": "action"
}
},
"filter": {
"range": {
"year": {
"gte": 1800,
"lte": 3000
}
}
}
}
},
"aggs": {
"year_hist": {
"histogram": {
"field": "year",
"interval": 50
}
}
}
}
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

elasticsearch - equivalent of "facets" "statistical" for aggregations - elasticsearch

What is the equivalent of "facets" "statistical" fields using aggregations? "facets": { "text": { "statistical": { "script": "doc['text'].values.size()" } } }

You need to use the stats aggregation "aggs": { "text": { "stats": { "script": "doc['text'].values.size()" } } }

Related

Filter and sort based on attributes in Terms lookup document in Elastic Search

ElasticSearch - How can I reuse script_fields field in aggregation?

Can We Apply Bucket Selector Aggregation on Nested Aggregation in ElasticSearch?

How to return only aggregation stats in an ElasticSearch query?

Multiple filters and an aggregate in elasticsearch

Categories

Resources