How do I use doc_count in an aggregations range query in ElasticSearch 1.0 - syntax

I have a bunch of user generated events in my ES cluster. Each event contains the user's UUID.
I'm trying to write a query that buckets users into low, medium and high activity based on the number of events each user generates.
I'm using this query to get the number of events generated by each user:
{
"aggs" : {
"users" : {
"terms" : { "field" : "user_id.raw" }
}
}
}
This works fine, but I need to further bucket the results into a range query using the previous results "doc_count", so that I can sort each user into a low, med, high activity bucket.
I tried a bunch of ways to access the doc_count field using a sub-aggregation but never manage to get it work. I figured this would be a fairly common use case, but can't seem to crack it, so any help would be much appreciated.

I have updated https://github.com/elasticsearch/elasticsearch/issues/4983?_pjax=%23js-repo-pjax-container with this issue as well.
Looks like a minor enhancement to the aggregation framework (but) will be really useful.

you can probably do something like :
{
"aggs" : {
"tally" : {
"sum" : {
"script": "1"
}
},
"aggs" : {
//refer to tally here as the value would be same as doc_count
}
}
}

Related

Filter on score after rescore in Elasticsearch

I have been on an internet manhunt for days for this and getting ready to give up. I need to filter on _score in Elasticsearch after the rescore function has completed. So given an example query like this:
POST /_search
{
"query" : {
"match" : {
"message" : {
"operator" : "or",
"query" : "the quick brown"
}
}
},
"rescore" : {
"window_size" : 50,
"query" : {
"rescore_query" : {
"match_phrase" : {
"message" : {
"query" : "the quick brown",
"slop" : 2
}
}
},
"query_weight" : 0.7,
"rescore_query_weight" : 1.2
}
}
}
Say just for simplicity's sake that the above returns 5 documents with scores ranging from 0.0 to 1.0. I want the final returned results set to only be the documents with a score above 0.90. In other words, take those newly-rescored docs, and hand them off to a filter where it drops all documents scored below 0.90.
I have tried many, many different ways but nothing is working. Post_filter is apparently meant to come after the main query but before rescore, so that one doesn't work. min_score does not work at all with rescore, it only works with the original ES scores from the main query. Aggs is one functionality that I am able to get to work after rescore, but aggregating is not what I need to do here. But at least it shows me that ES has the ability to continue operating on the data after a rescore query.
Any thoughts on how to get this seemingly simple task accomplished? I have also tried using function_score and script_score but really those are just ways to further modify the scores, whereas I need to filter on the scores generated by the rescore. The requirement here is to get it done in the query. We can't do it as a post-processing step.

Creating histogram in Elasticsearch

I have an index with several documents. A field found in each document is "id". I want to know how many documents per id count. There can be several documents for each id. Just like in any store there can be many transactions for each customer, for instance.
Meaning for instance, I want to get something like: "There are 5 ids with 1 document. There are 10 ids with 2 documents" and so on.
How can I write that aggregation in Elasticsearch?
I believe this would be a classic terms aggregation. Something along these lines should work for you:
GET /_search
{
"aggs" : {
"ids" : {
"terms" : { "field" : "id" }
}
}
}

Delete by Query with Sort in Elasticsearch

I want to delete the most current item in my Elasticsearch index sorted by myDateField which is a date type. Is that possible? I want something like this query but this would delete all matching items even though I have the size at 1.
{
"query" : {
"match_all" : {
}
},
"size" : "1",
"sort" : [
{
"myDateField" : {
"order" : "desc"
}
}
]
}
Delete by query is unlikely to support any sorting features.
If you try Delete by query - however you'll get the error: request does not support [sort]. I couldn't find any documentation saying that the "sort" parameter is not supported in delete by query.
I've one idea to do it but don't know it's the best way or not?
Step 1: Do a normal query based on your conditions+sorting and get those ids.
Step 2: Build a bulk query to delete all documents retrieved above by id those you got on Step 1.

How can I get options for filtering by a field directly from elasticsearch?

I want to populate a filtering field based on the data I have indexed inside Elasticsearch. How can I retrieve this data? For example, my documents inside index "test" and type "doc" could be
{"id":1, "tag":"foo", "name":"foothing"}
{"id":2, "tag":"bar", "name":"barthing"}
{"id":3, "tag":"foo", "name":"something"}
{"id":4, "tag":"quux", "name":"quuxthing"}
I'm looking for something like GET /test/doc/_magic?q=tag that would return [foo,bar,quux] from my data. I don't know what this is called or even possible. I don't want to get all index entries into memory and do this programmatically, I have millions of documents in the index with around a hundred different tags.
Is this possible with ES?
Yes, that's possible and this is called a terms aggregation
You can do it like this:
GET /test/doc/_search
{
"size": 0,
"aggs" : {
"tags" : {
"terms" : {
"field" : "tag.keyword",
"size": 100
}
}
}
}
Note that depending on the cardinality of your tag field, you can increase/decrease the size setting (10 by default).

Unique values - Terms aggregation or Wildcard query

What's the best way to get all the unique terms for a field?
Can use either terms aggregation or a wild card query
(and then reduce it to unique terms at the application side)?
{
"query": {
"wildcard" : { "text" : "**" }
}
}
or
{
"aggs" : {
"genres" : {
"terms" : { "field" : "text" }
}
}
}
Terms aggregation lets elasticsearch reduce the terms to unique values (in a distributed manner) and thereby reduce the response payload. But is it going to put too much load on elasticsearch?
I'm aware of the shard size aspect of the terms aggregation. Other than that, is one internally optimized than the other or not? What's the execution plan for each?
"An aggregation can be seen as a unit-of-work that builds analytic
information over a set of documents"
What you're trying to achieve falls under analytical information, since it is provided out of the box, it is optimized.
use "explain": true to get description of score calculation(not useful for aggregations as they do not score documents)
Refer: Aggregations in ElasticSearch

Resources