I need to build a heatmap from the data I have in elasticsearch. The heatmap is the count of cases where two specific fields have the same value. For the data
{'name': 'john', 'age': '10', 'car': 'peugeot'}
{'name': 'john', 'age': '10', 'car': 'audi'}
{'name': 'john', 'age': '12', 'car': 'fiat'}
{'name': 'mary', 'age': '3', 'car': 'mercedes'}
I would like to get the number of unique pairs for the values of name and age. That would be
john, 10, 2
john, 12, 1
mary, 3, 1
I could get all the events and make the count myself but I was hoping that there would be some magical aggregation which could provide that.
It would not be a problem to have it in a nested form, such as
{
'john':
{
'10': 2,
'12': 1
},
'mary':
{
'3': 1
},
}
or whatever is practical.
You can use Inner aggregation. Use query like
POST count-test/_search
{
"size": 0,
"aggs": {
"group By Name": {
"terms": {
"field": "name"
},
"aggs": {
"group By age": {
"terms": {
"field": "age"
}
}
}
}
}
}
Output won't be like as you mentioned but like.
"aggregations": {
"group By Name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "john",
"doc_count": 3,
"group By age": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "10",
"doc_count": 2
},
{
"key": "12",
"doc_count": 1
}
]
}
},
{
"key": "mary",
"doc_count": 1,
"group By age": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "3",
"doc_count": 1
}
]
}
}
]
}
}
Hope this helps!!
You can use a term aggregation with a script:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_multi_field_terms_aggregation
Like this you can "concat" what you want such as :
{
"aggs" : {
"data" : {
"terms" : {
"script" : {
"source": "doc['name'].value + doc['name'].age",
"lang": "painless"
}
}
}
}
}
(Not sure about the string concat syntax).
Related
I've an index where a field (category) is a list field. I want to fetch all the distinct categories within in an index.
Following is the example.
Doc1 -
{
"category": [1,2,3,4]
}
Doc2 -
{
"category": [5,6]
}
Doc3 -
{
"category": [1,2,3,4]
}
Doc4 -
{
"category": [1,2,7]
}
My output should be
[1,2,3,4]
[5,6]
[1,2,7]
I using the below query:-
GET /products/_search
{
"size": 0,
"aggs" : {
"category" : {
"terms" : { "field" : "category", "size" : 1500 }
}
}}
This returns me [1], [2], [3], [4], [5], [6], [7]. I don't want the individual unique items in my list field. I'm rather looking for the complete unique list.
What am I missing in the above query? I'm using ElasticSearch v7.10
You can use terms aggregation with script:
{
"size": 0,
"aggs": {
"category":{
"terms": {
"script": {
"source": """
def cat="";
for(int i=0;i<doc['category'].length;i++){
cat+=doc['category'][i];}
return cat;
"""
}
}
}
}
}
Above query will return result like below:
"aggregations": {
"category": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1234",
"doc_count": 2
},
{
"key": "127",
"doc_count": 1
},
{
"key": "56",
"doc_count": 1
}
]
}
}
Query
GET /_search
{
"size" : 0,
"query" : {
"ids" : {
"types" : [ ],
"values" : [ "someId1", "someId2", "someId3" ... ]
}
},
"aggregations" : {
"how_to_merge" : {
"terms" : {
"field" : "country",
"size" : 50
}
}
}
}
Result
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "KR",
"doc_count": 90
},
{
"key": "JP",
"doc_count": 83
},
{
"key": "US",
"doc_count": 50
},
{
"key": "BE",
"doc_count": 9
}
]
}
}
}
I want to merge "KR" and "JP" and "US"
And change key name to "NEW_RESULT"
So result must like this:
{
...
"aggregations": {
"how_to_merge": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "NEW_RESULT",
"doc_count": 223
},
{
"key": "BE",
"doc_count": 9
}
]
}
}
}
Is it possible in elasticsearch query?
I cannot use a client-side solution since there are too many entities and retrieving all of them and merging would be probably too slow for my application.
Thanks for your help and comments!
You can try writing a script for that though I would recommend benchmarking this approach against the client-side processing since it might be quite slow.
My query is a nested aggregation
aggs: {
src: {
terms: {
field: "dst_ip",
size: 1000,
},
aggs: {
dst: {
terms: {
field: "a_field_which_changes",
size: 2000,
},
},
},
},
A typical doc the query is ran against is below (the mappings are all of type keyword)
{
"_index": "honey",
"_type": "event",
"_id": "AWHzRjHrjNgIX_EoDcfV",
"_score": 1,
"_source": {
"dst_ip": "10.101.146.166",
"src_ip": "10.10.16.1",
"src_port": "38",
}
},
There are actually two queries I make, one after the other. They differ by the value of a_field_which_changes, which is "src_ip" in one query and "src_port" in the other.
In the first query all the results are fine. The aggregation is 1 element large and the buckets specify what that element matched with
{
"key": "10.6.17.218", <--- "dst_ip" field
"doc_count": 1,
"dst": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "-1", <--- "src_port" field
"doc_count": 1
}
]
}
},
The other query yields two different kind of results:
{
"key": "10.6.17.218",
"doc_count": 1,
"dst": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
},
{
"key": "10.237.78.19",
"doc_count": 1,
"dst": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "10.12.67.89",
"doc_count": 1
}
]
}
},
The first result is problematic: it does not give the details of the buckets. It is no different from the other one but somehow the details are missing.
Why is it so, and most importantly - how to force Elasticsearch to display the details of the buckets?
The documentation goes into details on how to interfere with the aggregation but I could not find anything relevant there.
I have an elasticsearch aggregation query like this.
{
"size":0,
"aggs": {
"Domains": {
"terms": {
"field": "domains",
"size": 0
},
"aggs":{
"Identifier": {
"terms": {
"field":"alertIdentifier",
"size": 0
}
}
}
}
}
}
And it results in bucket aggregation like following:
"aggregations": {
"Domains": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "IT",
"doc_count": 147,
"Identifier": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "-2623493027134706869",
"doc_count": 7
},
{
"key": "-6590617724257725266",
"doc_count": 7
},
{
"key": "1106147277275983835",
"doc_count": 4
},
{
"key": "-3070527890944301111",
"doc_count": 4
},
{
"key": "-530975388352676402",
"doc_count": 3
},
{
"key": "-6225620509938623294",
"doc_count": 2
},
{
"key": "1652134630535374656",
"doc_count": 1
},
{
"key": "4191687133126999365",
"doc_count": 8
},
{
"key": "6882920925888555081",
"doc_count": 2
}
]
}
}
What I need is to count the number of doc_counts occurrences like this:
1 times: 0
2 times: 2
3 times: 1
equal or more than 4 times: 5
any idea how to build the ES query to count the occurrences of doc_count?
Thanks in advance.
below the ES query:
POST /xt-history*/_search
{
"query": {
"filtered": {"query": {"match_all": {} },
"filter": {
"and": [
{"term": {"type": "10"}}
]
}
}
},
"size": 0,
"aggs": {
"repetitions": {
"scripted_metric": {
"init_script" : "_agg['all'] = []; _agg['all2'] = [];",
"map_script" : "_agg['all'].add(_source['alert']['alertIdentifier'])",
"combine_script" : "for (alertId in _agg['all']) { _agg['all2'].add(alertId); }; return _agg['all2']",
"reduce_script" : "all3 = []; answer = {}; answer['one'] = []; answer['two'] = []; answer['three'] = []; answer['four'] = []; answer['five'] = []; answer['five_plus'] = []; for (alertIds in _aggs) { for (alertId1 in alertIds) { all3.add(alertId1); }; }; for (alertId in all3) { if (answer['five_plus'].contains(alertId)) { } else if(answer['five'].contains(alertId)) {answer['five'].remove(alertId); answer['five_plus'].add(alertId);} else if(answer['four'].contains(alertId)) {answer['four'].remove(alertId); answer['five'].add(alertId);} else if(answer['three'].contains(alertId)) {answer['three'].remove(alertId); answer['four'].add(alertId);} else if(answer['two'].contains(alertId)) {answer['two'].remove(alertId); answer['three'].add(alertId);} else if(answer['one'].contains(alertId)) {answer['one'].remove(alertId); answer['two'].add(alertId);} else {answer['one'].add(alertId);}; }; fans = []; fans.add(answer['one'].size()); fans.add(answer['two'].size()); fans.add(answer['three'].size()); fans.add(answer['four'].size()); fans.add(answer['five'].size()); fans.add(answer['five_plus'].size()); return fans"
}
}
}
}
query output:
{
"took": 4770,
"timed_out": false,
"_shards": {
"total": 190,
"successful": 189,
"failed": 0
},
"hits": {
"total": 334,
"max_score": 0,
"hits": []
},
"aggregations": {
"repetitions": {
"value": [
63,
39,
3,
10,
2,
13
]
}
}
}
where first value is the number of repetitions for doc_count=1, second value is the number of repetitions for doc_count=2, ... last value is the number of repetition for doc_count >=5
My index has a log-like structure: I insert a version of a document whenever an event occurs. For example, here are documents in the index:
{ "key": "a", subkey: 0 }
{ "key": "a", subkey: 0 }
{ "key": "a", subkey: 1 }
{ "key": "a", subkey: 1 }
{ "key": "b", subkey: 0 }
{ "key": "b", subkey: 0 }
{ "key": "b", subkey: 1 }
{ "key": "b", subkey: 1 }
I'm trying to construct a query in ElasticSearch which is basically equivalent to the following SQL query:
SELECT COUNT(*), key, subkey
FROM (SELECT DISTINCT key, subkey FROM t)
The answer to this query would obviously be
(1, a, 0)
(1, a, 1)
(1, b, 0)
(1, b, 1)
How would I replicate this query in Elasticsearch? I came up with the following:
GET test_index/test_type/_search?search_type=count
{
"aggregations": {
"count_aggr": {
"terms": {
"field": "concatenated_key"
},
"aggs": {
"sample_doc": {
"top_hits": {
"size": 1
}
}
}
}
}
}
concatenated_key is a concatenation of key and subkey. This query would create a bucket for each (key, subkey) combination and return a sample document from each bucket. However, I don't know how can I aggregate over the fields of _source.
Would appreciate any ideas. Thanks!
If you don't have the possibility to re-index the documents and to add your own concatenated key field, this is a way of doing it:
GET /my_index/my_type/_search?search_type=count
{
"aggs": {
"key_agg": {
"terms": {
"field": "key",
"size": 10
},
"aggs": {
"sub_key_agg": {
"terms": {
"field": "subkey",
"size": 10
}
}
}
}
}
}
It will give you something like this:
"buckets": [
{
"key": "a",
"doc_count": 4,
"sub_key_agg": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 0,
"doc_count": 2
},
{
"key": 1,
"doc_count": 2
}
]
}
},
{
"key": "b",
"doc_count": 4,
"sub_key_agg": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 0,
"doc_count": 2
},
{
"key": 1,
"doc_count": 2
}
]
}
}
]
where you have the key - "key": "a" - and then each combination with this key and the number of docs that match key=a and subkey=0 or key=a and subkey=1:
"buckets": [
{
"key": 0,
"doc_count": 2
},
{
"key": 1,
"doc_count": 2
}
]
Same goes for the other key.