I had imported millions data into elasticsearch. Mapping is following:
"_source": {
"mt": "w",
"hour": 1
}
i want to find number of hour's that have occured more than 5.
for exmple:
using terms aggregation i get following result :
"aggregations": {
"hours": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 7
},
{
"key": 4,
"doc_count": 5
},
{
"key": 5,
"doc_count": 2
}
]
}
}
How do i find count of hour's that occure more than 5.
in here it be 1 because only hour=1 is more than 5
you can use "min_doc_count": 5 in terms aggregation Elastic doc
Related
I have a field
*slices.code *
in my Elasticsearch mapping. Slices is an array element and slices.code has various values like "ATF", "ORW", "HKL". Slices is not a nested type field. I want to avoid adding nested type to this field. In each document there could be multiple occurnces for slice.code = ATF/ORW. So I want to get all possible values of slice.code along with total occurence of each field value in all the documents. Something like this where HKL appeared in 2 documents but 3 number of times total
{
"key": "HKL",
"doc_count": 2,
"total": {
"value": 3
}
},
{
"key": "ATF",
"doc_count": 3,
"total": {
"value": 7
}
},
{
"key": "ORW",
"doc_count": 2,
"total": {
"value": 5
}
}
I tried using terms query, but with that i only get doc_count, i don't get total occurence of the field value with that query. Below is the terms query that i tried
{
"size": 0,
"aggs": {
"distinct_colors": {
"terms": {
"field": "slices.code.keyword",
"size": 65535
}
}
}
}
Output that i received:
"buckets": [
{
"key": "HKG",
"doc_count": 1
},
{
"key": "MNL",
"doc_count": 1
},
{
"key": "PVG",
"doc_count": 1
},
{
"key": "TPE",
"doc_count": 1
}
]
please help me with understanding nested bucket aggregation in elastic search. I have next query aggregation results:
[...]
{
"key": "key1",
"doc_count": 1166,
"range_keys": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "happy",
"doc_count": 1166
}
]
}
},
{
"key": "key2",
"doc_count": 1123,
"range_keys": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "cookies",
"doc_count": 1122
},
{
"key": "happy",
"doc_count": 1
}
]
}
},
[...]
As you see, i have query results with only "happy", but i need to get all results only with "happy" and "cookies".
In order to achieve this goal i tried to use "size" argument, but this argument gave e results with size and less results query.
How i can determine "bucket" length in nested query?
My query is a nested aggregation
aggs: {
src: {
terms: {
field: "dst_ip",
size: 1000,
},
aggs: {
dst: {
terms: {
field: "a_field_which_changes",
size: 2000,
},
},
},
},
A typical doc the query is ran against is below (the mappings are all of type keyword)
{
"_index": "honey",
"_type": "event",
"_id": "AWHzRjHrjNgIX_EoDcfV",
"_score": 1,
"_source": {
"dst_ip": "10.101.146.166",
"src_ip": "10.10.16.1",
"src_port": "38",
}
},
There are actually two queries I make, one after the other. They differ by the value of a_field_which_changes, which is "src_ip" in one query and "src_port" in the other.
In the first query all the results are fine. The aggregation is 1 element large and the buckets specify what that element matched with
{
"key": "10.6.17.218", <--- "dst_ip" field
"doc_count": 1,
"dst": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "-1", <--- "src_port" field
"doc_count": 1
}
]
}
},
The other query yields two different kind of results:
{
"key": "10.6.17.218",
"doc_count": 1,
"dst": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
},
{
"key": "10.237.78.19",
"doc_count": 1,
"dst": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "10.12.67.89",
"doc_count": 1
}
]
}
},
The first result is problematic: it does not give the details of the buckets. It is no different from the other one but somehow the details are missing.
Why is it so, and most importantly - how to force Elasticsearch to display the details of the buckets?
The documentation goes into details on how to interfere with the aggregation but I could not find anything relevant there.
I have an aggregation which gives back a number of buckets. I'd like to further run a script which analyzes all the doc_counts and groups them into categories. Can someone give me an example of how to do this?
For example....
"aggregations": {
“updates_by_account: {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "fbf60d008f0b2f9b3d1d8f1f7fe6e4262662a04c9bcbcc20d92316daade3c25c",
"doc_count": 2
},
{
"key": "916129338fb099792f7b1f414868d45c3fd1a0feb89e1bbeafe24bdb496bec0b",
"doc_count": 1
},
{
"key": "f1b256be780d983549e968f187daef882999fd05889dcab7f1c8c4769ed0996b",
"doc_count": 1
}
]
}
my query looks like this
{
"aggs" : {
"updates_by_account": {
"terms": {
"script" : "doc[‘account_number’].value"
}
}
}
I'd like to do something like:
Number of users with 0-5 updates: 4
Number of users with 6-10 updates: 7
Number of users with 11 or more updates: 12
etc
I want to get the ten most frequent patterns in search with elasticsearch .
Example :
"cgn:4189, dfsdkfldslfs"
"cgn:4210, aezfvdsvgds"
"cgn:4189, fdsmpfjdjs"
"cgn:4195, cvsf"
"cgn:4189, mkpjd"
"cgn:4210, mfsfgkpjd"
I want to get :
4189 : 3
4210 : 2
4195 : 1
I know how to do that in mysql or via awk/sort/head ... but with elasticsearch I'm lost.
Exactly how it will work depends on your analyzer, but if you are just using the default, standard analyzer, you can probably get what you want pretty easily with a terms aggregation.
As a simple example, I set up a trivial index:
PUT /test_index
{
"settings": {
"number_of_shards": 1
}
}
Then indexed the data you posted, using the bulk api:
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"msg":"cgn:4189, dfsdkfldslfs"}
{"index":{"_id":2}}
{"msg":"cgn:4210, aezfvdsvgds"}
{"index":{"_id":3}}
{"msg":"cgn:4189, fdsmpfjdjs"}
{"index":{"_id":4}}
{"msg":"cgn:4195, cvsf"}
{"index":{"_id":5}}
{"msg":"cgn:4189, mkpjd"}
{"index":{"_id":6}}
{"msg":"cgn:4210, mfsfgkpjd"}
Then I can run a simple terms aggregation to get back all the terms and how often they occur (ordered descending by term frequency by default):
POST /test_index/_search?search_type=count
{
"aggs": {
"msg_terms": {
"terms": {
"field": "msg"
}
}
}
}
which returns:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 6,
"max_score": 0,
"hits": []
},
"aggregations": {
"msg_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "cgn",
"doc_count": 6
},
{
"key": "4189",
"doc_count": 3
},
{
"key": "4210",
"doc_count": 2
},
{
"key": "4195",
"doc_count": 1
},
{
"key": "aezfvdsvgds",
"doc_count": 1
},
{
"key": "cvsf",
"doc_count": 1
},
{
"key": "dfsdkfldslfs",
"doc_count": 1
},
{
"key": "fdsmpfjdjs",
"doc_count": 1
},
{
"key": "mfsfgkpjd",
"doc_count": 1
},
{
"key": "mkpjd",
"doc_count": 1
}
]
}
}
}
Here is the code I used:
http://sense.qbox.io/gist/a827095b675596c4e3d545ce963cde3fae932156