query on result aggregation elasticsearch

query on result aggregation elasticsearch - elasticsearch

I had imported millions data into elasticsearch. Mapping is following:
"_source": {
"mt": "w",
"hour": 1
}
i want to find number of hour's that have occured more than 5.
for exmple:
using terms aggregation i get following result :
"aggregations": {
"hours": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 7
},
{
"key": 4,
"doc_count": 5
},
{
"key": 5,
"doc_count": 2
}
]
}
}
How do i find count of hour's that occure more than 5.
in here it be 1 because only hour=1 is more than 5

you can use "min_doc_count": 5 in terms aggregation Elastic doc

Related

Get count of distinct values for a field across all documents in elastic search

I have a field
*slices.code *
in my Elasticsearch mapping. Slices is an array element and slices.code has various values like "ATF", "ORW", "HKL". Slices is not a nested type field. I want to avoid adding nested type to this field. In each document there could be multiple occurnces for slice.code = ATF/ORW. So I want to get all possible values of slice.code along with total occurence of each field value in all the documents. Something like this where HKL appeared in 2 documents but 3 number of times total
{
"key": "HKL",
"doc_count": 2,
"total": {
"value": 3
}
},
{
"key": "ATF",
"doc_count": 3,
"total": {
"value": 7
}
},
{
"key": "ORW",
"doc_count": 2,
"total": {
"value": 5
}
}
I tried using terms query, but with that i only get doc_count, i don't get total occurence of the field value with that query. Below is the terms query that i tried
{
"size": 0,
"aggs": {
"distinct_colors": {
"terms": {
"field": "slices.code.keyword",
"size": 65535
}
}
}
}
Output that i received:
"buckets": [
{
"key": "HKG",
"doc_count": 1
},
{
"key": "MNL",
"doc_count": 1
},
{
"key": "PVG",
"doc_count": 1
},
{
"key": "TPE",
"doc_count": 1
}
]

Elasticsearch determinte bucket length in aggregation

please help me with understanding nested bucket aggregation in elastic search. I have next query aggregation results:
[...]
{
"key": "key1",
"doc_count": 1166,
"range_keys": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "happy",
"doc_count": 1166
}
]
}
},
{
"key": "key2",
"doc_count": 1123,
"range_keys": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "cookies",
"doc_count": 1122
},
{
"key": "happy",
"doc_count": 1
}
]
}
},
[...]
As you see, i have query results with only "happy", but i need to get all results only with "happy" and "cookies".
In order to achieve this goal i tried to use "size" argument, but this argument gave e results with size and less results query.
How i can determine "bucket" length in nested query?

When are "buckets": [] in an aggregation?

My query is a nested aggregation
aggs: {
src: {
terms: {
field: "dst_ip",
size: 1000,
},
aggs: {
dst: {
terms: {
field: "a_field_which_changes",
size: 2000,
},
},
},
},
A typical doc the query is ran against is below (the mappings are all of type keyword)
{
"_index": "honey",
"_type": "event",
"_id": "AWHzRjHrjNgIX_EoDcfV",
"_score": 1,
"_source": {
"dst_ip": "10.101.146.166",
"src_ip": "10.10.16.1",
"src_port": "38",
}
},
There are actually two queries I make, one after the other. They differ by the value of a_field_which_changes, which is "src_ip" in one query and "src_port" in the other.
In the first query all the results are fine. The aggregation is 1 element large and the buckets specify what that element matched with
{
"key": "10.6.17.218", <--- "dst_ip" field
"doc_count": 1,
"dst": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "-1", <--- "src_port" field
"doc_count": 1
}
]
}
},
The other query yields two different kind of results:
{
"key": "10.6.17.218",
"doc_count": 1,
"dst": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
},
{
"key": "10.237.78.19",
"doc_count": 1,
"dst": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "10.12.67.89",
"doc_count": 1
}
]
}
},
The first result is problematic: it does not give the details of the buckets. It is no different from the other one but somehow the details are missing.
Why is it so, and most importantly - how to force Elasticsearch to display the details of the buckets?
The documentation goes into details on how to interfere with the aggregation but I could not find anything relevant there.

doc_count sub aggregation script

I have an aggregation which gives back a number of buckets. I'd like to further run a script which analyzes all the doc_counts and groups them into categories. Can someone give me an example of how to do this?
For example....
"aggregations": {
“updates_by_account: {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "fbf60d008f0b2f9b3d1d8f1f7fe6e4262662a04c9bcbcc20d92316daade3c25c",
"doc_count": 2
},
{
"key": "916129338fb099792f7b1f414868d45c3fd1a0feb89e1bbeafe24bdb496bec0b",
"doc_count": 1
},
{
"key": "f1b256be780d983549e968f187daef882999fd05889dcab7f1c8c4769ed0996b",
"doc_count": 1
}
]
}
my query looks like this
{
"aggs" : {
"updates_by_account": {
"terms": {
"script" : "doc[‘account_number’].value"
}
}
}
I'd like to do something like:
Number of users with 0-5 updates: 4
Number of users with 6-10 updates: 7
Number of users with 11 or more updates: 12
etc

How to get the count of most frequent pattern in elasticsearch?

I want to get the ten most frequent patterns in search with elasticsearch .
Example :
"cgn:4189, dfsdkfldslfs"
"cgn:4210, aezfvdsvgds"
"cgn:4189, fdsmpfjdjs"
"cgn:4195, cvsf"
"cgn:4189, mkpjd"
"cgn:4210, mfsfgkpjd"
I want to get :
4189 : 3
4210 : 2
4195 : 1
I know how to do that in mysql or via awk/sort/head ... but with elasticsearch I'm lost.

Exactly how it will work depends on your analyzer, but if you are just using the default, standard analyzer, you can probably get what you want pretty easily with a terms aggregation.
As a simple example, I set up a trivial index:
PUT /test_index
{
"settings": {
"number_of_shards": 1
}
}
Then indexed the data you posted, using the bulk api:
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"msg":"cgn:4189, dfsdkfldslfs"}
{"index":{"_id":2}}
{"msg":"cgn:4210, aezfvdsvgds"}
{"index":{"_id":3}}
{"msg":"cgn:4189, fdsmpfjdjs"}
{"index":{"_id":4}}
{"msg":"cgn:4195, cvsf"}
{"index":{"_id":5}}
{"msg":"cgn:4189, mkpjd"}
{"index":{"_id":6}}
{"msg":"cgn:4210, mfsfgkpjd"}
Then I can run a simple terms aggregation to get back all the terms and how often they occur (ordered descending by term frequency by default):
POST /test_index/_search?search_type=count
{
"aggs": {
"msg_terms": {
"terms": {
"field": "msg"
}
}
}
}
which returns:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 6,
"max_score": 0,
"hits": []
},
"aggregations": {
"msg_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "cgn",
"doc_count": 6
},
{
"key": "4189",
"doc_count": 3
},
{
"key": "4210",
"doc_count": 2
},
{
"key": "4195",
"doc_count": 1
},
{
"key": "aezfvdsvgds",
"doc_count": 1
},
{
"key": "cvsf",
"doc_count": 1
},
{
"key": "dfsdkfldslfs",
"doc_count": 1
},
{
"key": "fdsmpfjdjs",
"doc_count": 1
},
{
"key": "mfsfgkpjd",
"doc_count": 1
},
{
"key": "mkpjd",
"doc_count": 1
}
]
}
}
}
Here is the code I used:
http://sense.qbox.io/gist/a827095b675596c4e3d545ce963cde3fae932156

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

query on result aggregation elasticsearch - elasticsearch

you can use "min_doc_count": 5 in terms aggregation Elastic doc

Related

Get count of distinct values for a field across all documents in elastic search

Elasticsearch determinte bucket length in aggregation

When are "buckets": [] in an aggregation?

doc_count sub aggregation script

How to get the count of most frequent pattern in elasticsearch?

Categories

Resources