How to mention from and size for the first level of elastic search aggregation in nested aggregation? - elasticsearch

I have written a query to get the buckets based on id and then sort it. This works fine. But how to make it return buckets from position 100 till 200 for aggregation_by_id rule?
{
"query": {
"match_all": {}
},
"size": 0,
"aggregations": {
"aggregation_by_id": {
"terms": {
"field": "id.keyword"
"size" : 200
},
"aggs": {
"sort_timestamp": {
"top_hits": {
"sort": [{
"timestamp": {
"order": "desc",
"unmapped_type": "long"
}
}],
"size": 1
}
}
}
}
}
}

Related

How to define percentage of result items with specific field in Elasticsearch query?

I have a search query that returns all items matching users that have type manager or lead.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{
"terms": {
"type": ["manager", "lead"]
}
}
]
}
}
}
Is there a way to define what percentage of the results should be of type "manager"?
In other words, I want the results to have 80% of users with type manager and 20% with type lead.
I want to make a suggestion to use bucket_path aggregation. As I know this aggregation needs to be run in sub-aggs of a histogram aggregation. As you have such field in your mapping so I think this query should work for you:
{
"size": 0,
"aggs": {
"NAME": {
"date_histogram": {
"field": "my_datetime",
"interval": "month"
},
"aggs": {
"role_type": {
"terms": {
"field": "type",
"size": 10
},
"aggs": {
"count": {
"value_count": {
"field": "_id"
}
}
}
},
"role_1_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_1 / (params.role_1+params.role_2)*100"
}
},
"role_2_ratio": {
"bucket_script": {
"buckets_path": {
"role_1": "role_type['manager']>count",
"role_2": "role_type['lead']>count"
},
"script": "params.role_2 / (params.role_1+params.role_2)*100"
}
}
}
}
}
}
Please let me know if it didn't work well for you.

How to diversify the result of top-hits aggregation?

Let's start with a concrete example. I have a document with these fields:
{
"template": {
"mappings": {
"template": {
"properties": {
"tid": {
"type": "long"
},
"folder_id": {
"type": "long"
},
"status": {
"type": "integer"
},
"major_num": {
"type": "integer"
}
}
}
}
}
}
I want to aggregate the query result by field folder_id, and for each group divided by folder_id, retrieve the top-N documents' _source detail. So i write query DSL like:
GET /template/template/_search
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"status": 1
}
}
]
}
},
"aggs": {
"folder": {
"terms": {
"field": "folder_id",
"size": 10
},
"aggs": {
"top_hit":{
"top_hits": {
"size": 5,
"_source": ["major_num"]
}
}
}
}
}
}
However, now comes a requirement that the top hits documents for each folder_id must be diversified on the field major_num. For each folder_id, the top hits documents retrieve by the sub top_hits aggregation under the terms aggregation, must be unique on field major_num, and for each major_num value, return at most 1 document in the sub top hits aggregation result.
top_hits aggregation cannot accept sub-aggregations, so how should i solve the question?
Why not simply adding another terms aggregation on the major_num field ?
GET /template/template/_search
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"status": 1
}
}
]
}
},
"aggs": {
"folder": {
"terms": {
"field": "folder_id",
"size": 10
},
"aggs": {
"majornum": {
"terms": {
"field": "major_num",
"size": 10
},
"aggs": {
"top_hit": {
"top_hits": {
"size": 1
}
}
}
}
}
}
}
}

Paging the top_hits aggregation in ElasticSearch

Right now I'm doing a top_hits aggregation in Elastic Search that groups my data by a field, sorts the groups by a date, and chooses the top 1.
I need to somehow page this aggregation results in a way that I can pass through the pageSize and the pageNumber, but I don't know how.
In addition to this, I also need the total results of this aggregation so we can show it in a table in our web interface.
The aggregation looks like this:
POST my_index/_search
{
"size": 0,
"aggs": {
"top_artifacts": {
"terms": {
"field": "artifactId.keyword"
},
"aggs": {
"top_artifacts_hits": {
"top_hits": {
"size": 1,
"sort": [{
"date": {
"order": "desc"
}
}]
}
}
}
}
}
}
If I understand what you want, you should be able to do pagination through a Composite Aggregation. You can still pass your size parameter in your pagination, but your from would be the key for the bucket.
POST my_index/_search
{
"size": 0,
"aggs": {
"top_artifacts": {
"composite": {
"sources": [
{
"artifact": {
"terms": {
"field": "artifactId.keyword"
}
}
}
]
,
"size": 1, // OPTIONAL SIZE (How many buckets)
"after": {
"artifact": "FOO_BAZ" // Buckets after this bucket key
}
},
"aggs": {
"hits": {
"top_hits": {
"size": 1,
"sort": [
{
"timestamp": {
"order": "desc"
}
}
]
}
}
}
}
}
}

Compute the "fill rate" of a field in Elasticsearch

I would like to compute the ratio of fields that have a value in my index.
I managed to count how many documents miss the field:
GET profiles/_search
{
"aggs": {
"profiles_wo_country": {
"missing": {
"field": "country"
}
}
},
"size": 0
}
I also managed to count how many documents have the filed:
GET profiles/_search
{
"query": {
"filtered": {
"query": {"match_all": {}},
"filter": {
"exists": {
"field": "country"
}
}
}
},
"size": 0
}
Naturally I can also get the total number of documents in the index. How can I compute the ratio?
An easy way to get the numbers you need out of a query is using the following query
POST profiles/_search?filter_path=hits.total,aggregations.existing.doc_count
{
"size": 0,
"aggs": {
"existing": {
"filter": {
"exists": {
"field": "tag"
}
}
}
}
}
You'll get an response like this one:
{
"hits": {
"total": 37258601
},
"aggregations": {
"existing": {
"doc_count": 9287160
}
}
}
And then in your client code, you can simply do
fill_rate = (aggregations.existing.doc_count / hits.total) * 100
And you're good to go.

how to bucket empty and non empty fields in nested aggregation in elasticsearch?

I have the following set of nested subaggregations in elasticsearch (field2 is a subaggregation of field1 and field3 is a subaggregation of field2).
It turns out however that the terms aggregation for field3 will not bucket documents that dont have field3.
My understanding is that I have to use a Missing subaggregation query to bucket those in addition to the term query for field3.
But I am not sure how can I add it to the query below to bucket both.
{
"size": 0,
"aggregations": {
"f1": {
"terms": {
"field": "field1",
"size": 0,
"order": {
"_count": "asc"
},
"include": [
"123"
]
},
"aggregations": {
"field2": {
"terms": {
"field": "f2",
"size": 0,
"order": {
"_count": "asc"
},
"include": [
"tr"
]
},
"aggregations": {
"field3": {
"terms": {
"field": "f3",
"order": {
"_count": "asc"
},
"size": 0
},
"aggregations": {
"aggTopHits": {
"top_hits": {
"size": 1
}
}
}
}
}
}
}
}
}
}
In version 2.1.2 and later, you can use the missing parameter of the terms aggregation, which allows you to specify a default value for documents that are missing that field. (FYI, the missing parameter was available starting 2.0, but there was a bug which prevented it from working on sub-aggregations, which is how you would use it here.)
...
"aggregations": {
"field3": {
"terms": {
"field": "f3",
"order": {
"_count": "asc"
},
"size": 0,
"missing": "n/a" <----- provide a default here
},
"aggregations": {
"aggTopHits": {
"top_hits": {
"size": 1
}
}
}
}
}
However, if you are working with a pre-2.x ES cluster, you can use the missing aggregation at the same depth as your field3 aggregation to bucket the documents that are missing "f3" like this:
...
"aggregations": {
"field3": {
"terms": {
"field": "f3",
"order": {
"_count": "asc"
},
"size": 0
},
"aggregations": {
"aggTopHits": {
"top_hits": {
"size": 1
}
}
}
},
"missing_field3": {
"missing" : {
"field": "f3"
},
"aggregations": {
"aggTopMissingHit": {
"top_hits": {
"size": 1
}
}
}
}
}

Resources