Nested Aggregation in Nest Elastic Search - elasticsearch

In my Elastic document i have CityId,RootId,RootName,Price.Now i have to find top 7 roots in a city with following conditions.
Name and id of root which has minimum price in a City.
top 7 roots:- roots those have max number of entry in a City.
for Example :-
CityId RootId RootName Price
11 1 ABC 90
11 1 ABC 100
11 2 DEF 80
11 2 DEF 90
11 2 DEF 60
answer for CityId =11:-
RootId RootName Price
2 DEF 60
1 ABC 90

I am not aware of the syntax of the Nest. Adding a working example in JSON format.
Index Mapping:
{
"mappings":{
"properties":{
"listItems":{
"type":"nested"
}
}
}
}
Index Data:
{
"RootId": 2,
"CityId": 11,
"RootName": "DEF",
"listItems": [
{
"Price": 60
},
{
"Price": 90
},
{
"Price": 80
}
]
}
{
"RootId": 1,
"CityId": 11,
"RootName": "ABC",
"listItems": [
{
"Price": 100
},
{
"Price": 90
}
]
}
Search Query:
{
"size": 0,
"aggs": {
"id_terms": {
"terms": {
"field": "RootId"
},
"aggs": {
"nested_entries": {
"nested": {
"path": "listItems"
},
"aggs": {
"min_position": {
"min": {
"field": "listItems.Price"
}
}
}
}
}
}
}
}
Search Result:
"aggregations": {
"id_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 1,
"nested_entries": {
"doc_count": 2,
"min_position": {
"value": 90.0
}
}
},
{
"key": 2,
"doc_count": 1,
"nested_entries": {
"doc_count": 3,
"min_position": {
"value": 60.0
}
}
}
]
}
}

.Query(query => query.Bool(bQuery => bQuery.Filter(
fQuery => fQuery.Terms(ter => ter.Field(f => f.CityId).Terms(cityId))
)))
.Aggregations(agg => agg.Terms("group_by_rootId", st => st.Field(o => o.RootId)
.Order(TermsOrder.CountDescending)
.Aggregations(childAgg => childAgg.Min("min_price_in_group", m =>m.Field(p=>p.Price))
.TopHits("stocks", t11 => t11
.Source(sfd => sfd.Includes(fd => fd.Fields(Constants.IncludedFieldsFromElastic)))
.Size(1)
)
)
)
)
.Size(_popularStocksCount)
.From(0)
.Take(0);

Related

Interval search for messages in Elasticsearch

I need to split the found messages into intervals. Can this be done with Elasticsearch?
For example. There are 10 messages, you need to divide them into 3 intervals. It should look like this...
[0,1,2,3,4,5,6,7,8,9] => {[0,1,2], [3,4,5,6], [7,8,9]}.
I'm only interested in the beginning of the intervals. For example: {[count - 3, min 0], [count - 4, min 3], [count - 3, min - 7]}
Example.
PUT /test_index
{
"mappings": {
"properties": {
"id": {
"type": "long"
}
}
}
}
POST /test_index/_doc/0
{
"id": 0
}
POST /test_index/_doc/1
{
"id": 1
}
POST /test_index/_doc/2
{
"id": 2
}
POST /test_index/_doc/3
{
"id": 3
}
POST /test_index/_doc/4
{
"id": 4
}
POST /test_index/_doc/5
{
"id": 5
}
POST /test_index/_doc/6
{
"id": 6
}
POST /test_index/_doc/7
{
"id": 7
}
POST /test_index/_doc/8
{
"id": 8
}
POST /test_index/_doc/9
{
"id": 9
}
It is necessary to divide the values ​​into 3 intervals with the same number of elements in each interval:
{
...
"aggregations": {
"result": {
"buckets": [
{
"min": 0.0,
"doc_count": 3
},
{
"min": 3.0,
"doc_count": 4
},
{
"min": 7.0,
"doc_count": 3
}
]
}
}
}
There is a similar function: "variable width histogram":
GET /test_index/_search?size=0
{
"aggs": {
"result": {
"variable_width_histogram": {
"field": "id",
"buckets": 3
}
}
},
"query": {
"match_all": {}
}
}
But "variable width histogram" separates documents by id value, not by the number of elements in the bucket
Assuming your mapping is like:
{
"some_numeric_field" : {"type" : "integer"}
}
Then you can build histograms out of it with fixed interval sizes:
POST /my_index/_search?size=0
{
"aggs": {
"some_numeric_field": {
"histogram": {
"field": "some_numeric_field",
"interval": 7
}
}
}
}
Results:
{
...
"aggregations": {
"prices": {
"buckets": [
{
"key": 0.0,
"doc_count": 7
},
{
"key": 7.0,
"doc_count": 7
},
{
"key": 14.0,
"doc_count": 7
}
]
}
}
}
To get the individual values inside each bucket, just add a sub-aggregation, maybe "top_hits" or anything else like a "terms"
aggregation.
Without knowing more about your data, I really cannot help further.

Get group by and distinct count of values using other field in Elasticsearch

I have an index having document structure as below -
{
"key": ["10", "20"],
"keywords": [
{
'case': 1,
'word': 'abc'
},
{
'case': 2,
'word': 'def'
},
{
'case': 1
'word': abcd
}
]
}
I need to apply filter on key=10 & get the count of distinct words by each case accros the documents. There are 20 disinct cases, so this query will return is 20 buckets at max.
Filter Condition:
key = 10
Expected result set
[
{
'case': 1,
'value': 2
},
{
'case': 2,
'value': 1
}
]
Equivalent SQL Query for this is -
select case, count(distinct words) as value
from <table> where key = 10 and case in (1, 2, 3, 4) group by case;
First map the nested structure as nested datatype in ES index.
Mapping reference here.
{
"query": {
"match": {
"key": "123"
}
},
"aggs": {
"keywords": {
"nested": {
"path": "keywords"
},
"aggs": {
"subjects": {
"terms": {
"field": "keywords.case.keyword"
},
"aggs": {
"count": {
"cardinality": {
"field": "keywords.word.keyword",
"precision_threshold": 4000
}
}
}
}
}
}
},
"size": 0
}

Nested array of objects aggregation in Elasticsearch

Documents in the Elasticsearch are indexed as such
Document 1
{
"task_completed": 10
"tagged_object": [
{
"category": "cat",
"count": 10
},
{
"category": "cars",
"count": 20
}
]
}
Document 2
{
"task_completed": 50
"tagged_object": [
{
"category": "cars",
"count": 100
},
{
"category": "dog",
"count": 5
}
]
}
As you can see that the value of the category key is dynamic in nature. I want to perform a similar aggregation like in SQL with the group by category and return the sum of the count of each category.
In the above example, the aggregation should return
cat: 10,
cars: 120 and
dog: 5
Wanted to know how to write this aggregation query in Elasticsearch if it is possible. Thanks in advance.
You can achieve your required result, using nested, terms, and sum aggregation.
Adding a working example with index mapping, search query and search result
Index Mapping:
{
"mappings": {
"properties": {
"tagged_object": {
"type": "nested"
}
}
}
}
Search Query:
{
"size": 0,
"aggs": {
"resellers": {
"nested": {
"path": "tagged_object"
},
"aggs": {
"books": {
"terms": {
"field": "tagged_object.category.keyword"
},
"aggs":{
"sum_of_count":{
"sum":{
"field":"tagged_object.count"
}
}
}
}
}
}
}
}
Search Result:
"aggregations": {
"resellers": {
"doc_count": 4,
"books": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "cars",
"doc_count": 2,
"sum_of_count": {
"value": 120.0
}
},
{
"key": "cat",
"doc_count": 1,
"sum_of_count": {
"value": 10.0
}
},
{
"key": "dog",
"doc_count": 1,
"sum_of_count": {
"value": 5.0
}
}
]
}
}
}

Elasticsearch count doc_count occurrences on aggs

I have an elasticsearch aggregation query like this.
{
"size":0,
"aggs": {
"Domains": {
"terms": {
"field": "domains",
"size": 0
},
"aggs":{
"Identifier": {
"terms": {
"field":"alertIdentifier",
"size": 0
}
}
}
}
}
}
And it results in bucket aggregation like following:
"aggregations": {
"Domains": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "IT",
"doc_count": 147,
"Identifier": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "-2623493027134706869",
"doc_count": 7
},
{
"key": "-6590617724257725266",
"doc_count": 7
},
{
"key": "1106147277275983835",
"doc_count": 4
},
{
"key": "-3070527890944301111",
"doc_count": 4
},
{
"key": "-530975388352676402",
"doc_count": 3
},
{
"key": "-6225620509938623294",
"doc_count": 2
},
{
"key": "1652134630535374656",
"doc_count": 1
},
{
"key": "4191687133126999365",
"doc_count": 8
},
{
"key": "6882920925888555081",
"doc_count": 2
}
]
}
}
What I need is to count the number of doc_counts occurrences like this:
1 times: 0
2 times: 2
3 times: 1
equal or more than 4 times: 5
any idea how to build the ES query to count the occurrences of doc_count?
Thanks in advance.
below the ES query:
POST /xt-history*/_search
{
"query": {
"filtered": {"query": {"match_all": {} },
"filter": {
"and": [
{"term": {"type": "10"}}
]
}
}
},
"size": 0,
"aggs": {
"repetitions": {
"scripted_metric": {
"init_script" : "_agg['all'] = []; _agg['all2'] = [];",
"map_script" : "_agg['all'].add(_source['alert']['alertIdentifier'])",
"combine_script" : "for (alertId in _agg['all']) { _agg['all2'].add(alertId); }; return _agg['all2']",
"reduce_script" : "all3 = []; answer = {}; answer['one'] = []; answer['two'] = []; answer['three'] = []; answer['four'] = []; answer['five'] = []; answer['five_plus'] = []; for (alertIds in _aggs) { for (alertId1 in alertIds) { all3.add(alertId1); }; }; for (alertId in all3) { if (answer['five_plus'].contains(alertId)) { } else if(answer['five'].contains(alertId)) {answer['five'].remove(alertId); answer['five_plus'].add(alertId);} else if(answer['four'].contains(alertId)) {answer['four'].remove(alertId); answer['five'].add(alertId);} else if(answer['three'].contains(alertId)) {answer['three'].remove(alertId); answer['four'].add(alertId);} else if(answer['two'].contains(alertId)) {answer['two'].remove(alertId); answer['three'].add(alertId);} else if(answer['one'].contains(alertId)) {answer['one'].remove(alertId); answer['two'].add(alertId);} else {answer['one'].add(alertId);}; }; fans = []; fans.add(answer['one'].size()); fans.add(answer['two'].size()); fans.add(answer['three'].size()); fans.add(answer['four'].size()); fans.add(answer['five'].size()); fans.add(answer['five_plus'].size()); return fans"
}
}
}
}
query output:
{
"took": 4770,
"timed_out": false,
"_shards": {
"total": 190,
"successful": 189,
"failed": 0
},
"hits": {
"total": 334,
"max_score": 0,
"hits": []
},
"aggregations": {
"repetitions": {
"value": [
63,
39,
3,
10,
2,
13
]
}
}
}
where first value is the number of repetitions for doc_count=1, second value is the number of repetitions for doc_count=2, ... last value is the number of repetition for doc_count >=5

Elasticsearch analytics percent

I am using Elasticsearch 1.7.3 to accumulate data for analytics reports.
I have an index that holds documents where each document has a numeric field called 'duration' (how many milliseconds the request took), and a string field called 'component'. There can be many documents with the same component name.
Eg.
{"component": "A", "duration": 10}
{"component": "B", "duration": 27}
{"component": "A", "duration": 5}
{"component": "C", "duration": 2}
I would like to produce a report that states for each component:
The sum of all 'duration' fields for this component.
A: 15
B: 27
C: 2
The percentage of this sum out of the total sum of duration of all documents. In my example
A: (10+5) / (10+27+5+2) * 100
B: 27 / (10+27+5+2) * 100
C: 2 / (10+27+5+2) * 100
The percentage of the documents for each component, out of the total components.
A: 2 / 4 * 100
B: 1 / 4 * 100
C: 1 / 4 * 100
How do I do that with Elasticsearch 1.7.3?
With ES 1.7.3, there is no way to compute data based on the results of two different aggregations, this is something that can be done in ES 2.0 with pipeline aggregations, though.
However, what you're asking is not too complicated to do on the client-side with 1.7.3. If you use the query below, you'll get all you need to get the figures you expect:
POST components/_search
{
"size": 0,
"aggs": {
"total_duration": {
"sum": {
"field": "duration"
}
},
"components": {
"terms": {
"field": "component"
},
"aggs": {
"duration_sum": {
"sum": {
"field": "duration"
}
}
}
}
}
}
The results would look like this:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0,
"hits": []
},
"aggregations": {
"total_duration": {
"value": 44
},
"components": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "a",
"doc_count": 2,
"duration_sum": {
"value": 15
}
},
{
"key": "b",
"doc_count": 1,
"duration_sum": {
"value": 27
}
},
{
"key": "c",
"doc_count": 1,
"duration_sum": {
"value": 2
}
}
]
}
}
}
Now all you need to do would be the following. I'm using JavaScript, but you can do it in any other language that can read JSON.
var response = ...the JSON response above...
var total_duration = response.aggregations.total_duration.value;
var total_docs = response.hits.total;
response.aggregations.components.buckets.forEach(function(comp_stats) {
// total duration for the component
var total_duration_comp = comp_stats.duration_sum.value;
// percentage duration of the component
var perc_duration_comp = total_duration_comp / total_duration * 100;
// percentage documents for the component
var perc_doc_comp = comp_stats.doc_count / total_docs * 100;
});
In ElasticSearch[2.x], You can use the bucket script aggregation, which is perfectly meet your needs!
eg:
{
"bucket_script": {
"buckets_path": {
"my_var1": "the_sum",
"my_var2": "the_value_count"
},
"script": "my_var1 / my_var2"
}
}
detail:
POST /sales/_search
{
"size": 0,
"aggs" : {
"sales_per_month" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
},
"aggs": {
"total_sales": {
"sum": {
"field": "price"
}
},
"t-shirts": {
"filter": {
"term": {
"type": "t-shirt"
}
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
}
}
},
"t-shirt-percentage": {
"bucket_script": {
"buckets_path": {
"tShirtSales": "t-shirts>sales",
"totalSales": "total_sales"
},
"script": "params.tShirtSales / params.totalSales * 100"
}
}
}
}
}
}

Resources