I have a sample database of 1000 bank accounts.
{"account_number":1,"balance":39225,...,"state":"IL"}
What I want is list of highest balance accounts in each state. Using a terms aggregator I received collected count of accounts from each state.
eg.
"aggregations" : {
"states" : {
"buckets" : [ {
"key" : "tx",
"doc_count" : 30
}, ....
But this doesn't returns the required list. Any suggestions?
Use max aggregation
{
"aggs" : {
"max_price" : { "max" : { "field" : "price" } }
}
}
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-max-aggregation.html
You should look at significant terms aggregation http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html - this generates buckets with related terms. Explore it
Related
In an eshop with thousands of products we have a searchbar at the top. The expected output of the search is a list of categories in which there are products matching the query.
For example searching for 'iphone' should return a list of categories where there are products with that keyword.
e.g.
- Mobile phones
- Batteries for phones
- Case for phones
- etc.
What I did is search through the products index for the keyword, then get the results, pluck the category_id of each product, remove duplicates and do a /_mget in the categories index with the ids I should display.
This however seems to be inneffient since the first search might return 10k results (if it is too generic) which I then loop through to get its category_id.
I am looking for better ways to do the above.
Any ideas on how to make the above more effiecient?
Take a look into Elasticsearch Aggregations. https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html
A good place to start would be with a Terms Aggregation which is a bucket aggregation https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html.
An example:
GET /_search
{
"query": {...},
"aggs" : {
"categories" : {
"terms" : { "field" : "category_name" }
}
}
}
The response should look something like this where it puts the field value and a count into buckets.
{
...
"aggregations" : {
"categories" : {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets" : [
{
"key" : "Mobile phones",
"doc_count" : 6
},
{
"key" : "Batteries for phones",
"doc_count" : 3
},
{
"key" : "Cases for phones",
"doc_count" : 2
}
]
}
}
}
Is there an elasticsearch plugin out there that would allow me to classify the documents that I enter in an index?
The best solution for me would be a classifications of all the most recurrent terms (/ concepts) displayed in a sort of tags cloud that the user can navigate.
Is there a way to achieve this? Any suggestions?
Thanks
The basic idea is to use a terms aggregations, which will yield one bucket per term.
POST /_search
{
"aggs" : {
"genres" : {
"terms" : { "field" : "genre" }
}
}
}
The response you'll get will be ordered by decreasing amount of term occurrences:
{
...
"aggregations" : {
"genres" : {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets" : [
{
"key" : "jazz",
"doc_count" : 10
},
{
"key" : "rock",
"doc_count" : 5
},
{
"key" : "electronic",
"doc_count" : 2
},
]
}
}
}
If you're using Kibana, you can directly create a tag cloud visualization based on those terms.
I am new to elasticsearch. I am trying to get the total word frequency count of a set of documents, but I cannot seem to figure it out in elasticsearch. I know there is a document count functionality using aggregation. And with a term vector, I can find the frequency of a term in a document, but what about finding the total frequency of terms in a set of documents?
Term vector for a single document:
GET /test/product/3/_termvector
Aggregated document count:
GET /test/product/_search?pretty=true
{
"size" : 0,
"query" : {
"match_all" : {}
},
"aggs" : {
"phrases" : {
"terms" : {
"field" : "title",
"size" : 10000
}
}
}
}
In Elasticsearch I have an index containing documents with a timestamp and the number of observed requests to a webservice.
I would like to perform an aggregation to get, for each day, the hour where the maximum number of requests were observed (peak hour).
I succeed to get the result by performing the following request:
{
"aggregations" : {
"week_summary" : {
"filter" : {"range": {"#timestamp": {"gte": "2015-01-20||-7d","lte": "2015-01-20"}}},
"aggregations" : {
"oneday_interval" : {
"date_histogram" : {"field" : "#timestamp", "interval" : "1d","order" : { "_key" : "desc" }},
"aggregations" : {
"peak_hour_histogram" : {
"date_histogram" : {"field" : "#timestamp", "interval" : "1h","order" : { "peak_request_count.value" : "desc" }},
"aggregations" : {
"peak_request_count" : {
"sum" : { "field" : "request_count"}
}
}
}
}
}
}
}
},
size : 0
}
This is working ok in a sense: the first item in the peak_hour_histogram buckets array is indeed corresponding to the peak hour due to the ability to sort a date histogram on a sub-aggregation value.
Nevertheless, I don't need all the other buckets items (i.e. the other 23 hours of the day), and I'd like to receive only the first item. I tried to play with top_hits without any success.
Do you know a way to perform this filtering?
NB: In the real use case my aggregation is returning about 3MB of data. So filtering all those useless values becomes important.
Thanks for your answers.
I think this would be the feature that should answer your requirement: https://github.com/elasticsearch/elasticsearch/issues/6704. Started from this one: https://github.com/elasticsearch/elasticsearch/issues/7103
I'm trying to make a search that both limits and "offsets" (the keyword from in elasticsearch) the facet result set, so something like:
'{
"query" : {
"nested" : {
"_scope" : "my_scope",
"path" : "related_award_vendors",
"score_mode" : "avg",
"query" : {
"bool" : {
"must" : {
"text" : {"related_award_vendors.title" : "inc"}
}
}
}
}
},
"facets" : {
"facet1" : {
"terms_stats" : {
"key_field" : "related_award_vendors.django_id",
"value_field" : "related_award_vendors.award_amount",
"order":"term",
"size": 5,
"from":2
},
"scope" : "my_scope" }
}
}'
In the above, it returns id's 1,2,3,4,5 and if I remove "from" it still returns 1,2,3,5 in the result set.
The "size" is working correctly. In this case, it's returning five items in the result set.
My understanding is that solr can do this. Can this be done in elasticsearch?
The terms stats facet doesn't support the from parameter. The only way to achieve what you want is to set size to size + offset and ignore first offset entries on the client side. In your example it would mean to request 7 entries and ignore first 2.