ELASTICSEARCH - Total doc_count aggregations - elasticsearch

I am looking for a way to sum up the total of an aggregation that I have defined in the query.
For example:
{
"name" : false,
"surname" : false
},
{
"name" : false,
"surname" : false
}
Query:
GET index/_search?size=0
{"query": {
"bool": {
"must": [
{"term": {"name": false}},
{"term": {"surname": false}}
]
}
},
"aggs": {
"name": {
"terms": {
"field": "name"
}
},
"surname": {
"terms": {
"field": "surname"
}
}
}
}
The query returns the value for each field "name" and "surname" with value "false".
"aggregations" : {
"name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 0,
"key_as_string" : "false",
"doc_count" : 2 <---------
}
]
},
"surname" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 0,
"key_as_string" : "false",
"doc_count" : 2 <---------
}
]
}
}
}
Is it possible to return the total sum of doc_count, so that in this situation it would be "doc_count" : 2 + "doc_count" : 2 == 4?
I've been trying to do it with script but since they are boolean values it doesn't work.

The functionality that most closely resembles the solution I am looking for is sum_bucket.
GET index/_search?filter_path=aggregations
{
"aggs": {
"surname_field": {
"terms": {
"field": "surname",
"size": 1
}
},
"sum": {
"sum_bucket" : {
"buckets_path":"surname_field>_count"
}
}
}
}
For this specific case where it is a simple JSON, the result of the query is the same as the hits.total.value (number of documents) with filtering to boolean field surname:false or name:false.
But for situations with Json with more fields we can specify the number of times we have a result in our database.
With this result I wanted to find the total number of hits and not the number of documents in the result.

Related

Elasticsearch How to count total docs by date

As my theme, I wanna count docs the day and before by date, it's sample to understand that the chart.
{"index":{"_index":"login-2015.12.23","_type":"logs"}}
{"uid":"1","register_time":"2015-12-23T12:00:00Z","login_time":"2015-12-23T12:00:00Z"}
{"index":{"_index":"login-2015.12.23","_type":"logs"}}
{"uid":"2","register_time":"2015-12-23T12:00:00Z","login_time":"2015-12-23T12:00:00Z"}
{"index":{"_index":"login-2015.12.24","_type":"logs"}}
{"uid":"1","register_time":"2015-12-23T12:00:00Z","login_time":"2015-12-24T12:00:00Z"}
{"index":{"_index":"login-2015.12.25","_type":"logs"}}
{"uid":"1","register_time":"2015-12-23T12:00:00Z","login_time":"2015-12-25T12:00:00Z"}
As you see, index login-2015.12.23 has two docs, index login-2015.12.24 has one doc, index login-2015.12.23 has one doc.
And now I wanna get the result
{
"hits" : {
"total" : 6282,
"max_score" : 1.0,
"hits" : []
},
"aggregations" : {
"group_by_date" : {
"buckets" : [
{
"key_as_string" : "2015-12-23T12:00:00Z",
"key" : 1662163200000,
"doc_count" : 2,
},
{
"key_as_string" : "2015-12-24T12:00:00Z",
"key" : 1662163200000,
"doc_count" : 3,
},
{
"key_as_string" : "2015-12-25T12:00:00Z",
"key" : 1662163200000,
"doc_count" : 4,
}
]
}
If I count the date 2015-12-24T12:00:00Z and it means I must count day 2015-12-23T12:00:00Z and 2015-12-24T12:00:00Z at the same time.
In my project I have many indices like that, and I searching many ways to make this goal come true but not, this is my demo:
{
"query": {"match_all": {}},
"size": 0,
"aggs": {
"group_by_date": {
"date_histogram": {
"field": "timestamp",
"interval": "day"
},
"aggs": {
"intersect": {
"scripted_metric": {
"init_script": "state.inner=[]",
"map_script": "state.inner.add(params.param1 == 3 ? params.param2 * params.param1 : params.param1 * params.param2)",
"combine_script": "return state.inner",
"reduce_script": "return states",
"params": {
"param1": 3,
"param2": 5
}
}
}
}
}
}
}
I wanna group by date, and use scripted_metric to iter the date list, not the second iteration just can in its bucket and not for all the document, so do anyone has better idea to solve this problem?
You can simply use the cumulative sum pipeline aggregation
{
"query": {"match_all": {}},
"size": 0,
"aggs": {
"group_by_date": {
"date_histogram": {
"field": "login_time",
"interval": "day"
},
"aggs": {
"cumulative_docs": {
"cumulative_sum": {
"buckets_path": "_count"
}
}
}
}
}
}
And the results will look like this:
"aggregations" : {
"group_by_date" : {
"buckets" : [
{
"key_as_string" : "2015-12-23T00:00:00.000Z",
"key" : 1450828800000,
"doc_count" : 2,
"cumulative_docs" : {
"value" : 2.0
}
},
{
"key_as_string" : "2015-12-24T00:00:00.000Z",
"key" : 1450915200000,
"doc_count" : 1,
"cumulative_docs" : {
"value" : 3.0
}
},
{
"key_as_string" : "2015-12-25T00:00:00.000Z",
"key" : 1451001600000,
"doc_count" : 1,
"cumulative_docs" : {
"value" : 4.0
}
}
]
}
}

Elasticsearch Query with subquery

I'm relatively new to elasticsearch. I can able to make simple query in dev tools. I need a help on converting the following sql into es query
select c.conversationid from conversations c
where c.conversationid not in
(select s.conversationid from conversations s
where s.type='end' and s.conversationid=c.conversationid)
Index looks like below.
conversationid
type
1
start
2
start
1
end
3
start
If I execute above query I will get the following results.
conversationid
2
3
I have used following
Terms aggregation
Bucket Selector
Query
{
"aggs": {
"conversations": {
"terms": {
"field": "conversationid",
"size": 10
},
"aggs": { --> subaggregation where type == end
"types": {
"terms": {
"field": "type.keyword",
"include": [
"end"
],
"size": 10
}
},
"select": { --> select those terms where there is no bucket for "end"
"bucket_selector": {
"buckets_path": {
"path": "types._bucket_count"
},
"script": "params.path==0"
}
}
}
}
}
}
Result
"aggregations" : {
"conversations" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 2,
"doc_count" : 1,
"types" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
},
{
"key" : 3,
"doc_count" : 1,
"types" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
}
]
}
}

Nested Aggregation for AND Query Not Working

Please can someone help with the below Question.
https://discuss.elastic.co/t/nested-aggregation-with-and-always-return-0-match/315722?u=chattes
I have used following aggregations
1. Terms aggregation
2. Bucket selector
3. Nested aggregation
First I have grouped by user id using terms aggregation. Then further grouped by skill Id. Using bucket selector I have filtered users which have documents under two skills.
Query
GET index5/_search
{
"size": 0,
"aggs": {
"users": {
"terms": {
"field": "id",
"size": 10
},
"aggs": {
"skills": {
"nested": {
"path": "skills"
},
"aggs": {
"filter_skill": {
"terms": {
"field": "skills.id",
"size": 10,
"include": [
553,
426
]
}
}
}
},
"bucket_count": {
"bucket_selector": {
"buckets_path": {
"skill_count": "skills>filter_skill._bucket_count"
},
"script": "params.skill_count ==2"
}
}
}
}
}
}
Results
"aggregations" : {
"users" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1,
"doc_count" : 1,
"skills" : {
"doc_count" : 3,
"filter_skill" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "426",
"doc_count" : 1
},
{
"key" : "553",
"doc_count" : 1
}
]
}
}
},
{
"key" : 2,
"doc_count" : 1,
"skills" : {
"doc_count" : 2,
"filter_skill" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "426",
"doc_count" : 1
},
{
"key" : "553",
"doc_count" : 1
}
]
}
}
}
]
}

Counting comma saperated elements as value in elasticsearch

Is it possible to count the values ​​separated by commas from the same field with aggregations or any other way?
i.e
JSON
{
"ID":"example.1",
"Sports":{
"1":
{,
"Football Teams":" Real Madrid, Manchester United, Juventus"
"Basket Teams":"Chicago Bulls"
},
"2":
{
"Football Teams":"F.C Barcelona, Milan"
"Basket Teams":"Lakers"
}
}
}
query
GET xxx/_search
{
"query": {
"match_all": {
}
},
"aggs": {
"NAME": {
"value_count": {
"field": "Sports.1."Football Teams.keyword"
}}
}
}
Desired Output
"aggregations" : {
"Count" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Real Madrid, Manchester United, Juventus",
"doc_count" : 3
}
The objective of the query I am looking for is to be able to determine how many values ​​separated by commas a field has.
You can do it with a value script in a terms sub-aggregation:
GET xxx/_search
{
"aggs": {
"count": {
"terms": {
"field": "team.keyword"
},
"aggs": {
"count": {
"terms": {
"field": "team.keyword",
"script": {
"source": "/,/.split(_value).length",
"lang": "painless"
}
}
}
}
}
}
}
The top-level buckets will be the values of the football team and each top-level bucket will have a sub-bucket with the number of tokens, like this:
{
"key" : "Real Madrid, Manchester United, Juventus", <-- team name
"doc_count" : 1,
"count" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "3", <-- number of teams
"doc_count" : 1
}
]
}
}
In order for this to work, you need to add the following line in elasticsearch.yml and restart your server:
script.painless.regex.enabled: true

Elasticsearch aggregations: how to get bucket with 'other' results of terms aggregation?

I use aggregation to collect data from nested field and stuck a little
Example of document:
{
...
rectangle: {
attributes: [
{_id: 'some_id', ...}
]
}
ES allows group data by rectangle.attributes._id, but is there any way to get some 'other' bucket to put there documents that were not added to any of groups? Or maybe there is a way to create query to create bucket for documents by {"rectangle.attributes._id": {$ne: "{currentDoc}.rectangle.attributes._id"}}
I think bucket would be perfect because i need to do further aggregations with 'other' docs.
Or maybe there's some cool workaround
I use query like this for aggregation
"aggs": {
"attributes": {
"nested": {
"path": "rectangle.attributes"
},
"aggs": {
"attributesCount": {
"cardinality": {
"field": "rectangle.attributes._id.keyword"
}
},
"entries": {
"terms": {
"field": "rectangle.attributes._id.keyword"
}
}
}
}
}
And get this result
"buckets" : [
{
"key" : "some_parent_id",
"doc_count" : 27616,
"attributes" : {
"doc_count" : 45,
"entries" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "some_id",
"doc_count" : 45,
"attributeOptionsCount" : {
"value" : 2
}
}
]
}
}
}
]
result like this would be perfect:
"buckets" : [
{
"key" : "some_parent_id",
"doc_count" : 1000,
"attributes" : {
"doc_count" : 145,
"entries" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "some_id",
"doc_count" : 45
},
{
"key" : "other",
"doc_count" : 100
}
]
}
}
}
]
You can make use of missing value parameter. Update aggregation as below:
"aggs": {
"attributes": {
"nested": {
"path": "rectangle.attributes"
},
"aggs": {
"attributesCount": {
"cardinality": {
"field": "rectangle.attributes._id.keyword"
}
},
"entries": {
"terms": {
"field": "rectangle.attributes._id.keyword",
"missing": "other"
}
}
}
}
}

Resources