I have an elasticsearch (v2.3) backend storing ip addresses in multiple indices. The document type of an ip looks like this :
{
"ip" : {
"properties" : {
"ip": { "type" : "string", "index" : "not_analyzed" },
"categories": { "type" : "string", "index" : "not_analyzed" }
}
}
}
My goal is to group all ip documents by unique ip field to apply an operation with categories (and all other fields) of all records.
There is an easy way to go : aggregating all unique ip documents with the aggregation below and iterating over each result in my script making an other search query.
{
'size': 0,
'aggs': {
'uniq': {
'terms': { 'fields': 'ip', 'size': 0 }
}
}
}
But it is not very efficient. Is there a way to do this in one single search query ?
I've seen a workaround here Elasticsearch filter document group by field, with a top_hits aggregation :
{
"size": 0,
"aggs":{
"uniq":{
"terms": {
"field": "ip",
"size": 0
},
"aggs": {
"tops": {
"top_hits": {
"size": 10
}
}
}
}
}
}
However, I can't have top_hits size to 0, which is what I want because I would like it to handle cases with the same ip in N different indices.
I've taken a look at pipeline aggregations but it does seem to be able to perform raw searches.
Thanks for help !
Related
I have an ElasticSearch index where I store internet traffic flow objects, which each object containing an IP address. I want to aggregate the data in a way that all objects with the same IP Prefix are collected in the same bucket (but without specifying a specific Prefix). Something like a histogram aggregation. Is this possible?
I have tried this:
GET flows/_search
{
"size": 0,
"aggs": {
"ip_ranges": {
"histogram": {
"field": "ipAddress",
"interval": 256
}
}
}
}
But this doesn't work, probably because histogram aggregations aren't supported for ip type fields. How would you go about doing this?
Firstly, As suggested here, the best approach would be to:
categorize the IP address at index time and then use a simple keyword field to store the class c information, and then use a term aggregation on that field to do the count.
Alternatively, you could simply add a multi-field keyword mapping:
PUT myindex
{
"mappings": {
"properties": {
"ipAddress": {
"type": "ip",
"fields": {
"keyword": { <---
"type": "keyword"
}
}
}
}
}
}
and then extract the prefix at query time (⚠️ highly inefficient!):
GET myindex/_search
{
"size": 0,
"aggs": {
"my_prefixes": {
"terms": {
"script": "/\\./.split(doc['ipAddress.keyword'].value)[0]",
"size": 10
}
}
}
}
As a final option, you could define the intervals of interest in advance and use an ip_range aggregation:
{
"size": 0,
"aggs": {
"my_ip_ranges": {
"ip_range": {
"field": "ipAddress",
"ranges": [
{ "to": "192.168.1.1" },
{ "from": "192.168.1.1" }
]
}
}
}
}
Apply Match phrase prefix query on the result of terms aggregation in Elastic Search.
I have terms query and the result looks something like below
"buckets": [
{
"key": "KEY",
"count": 20
},
{
"key": "LOCK",
"count": 30
}
]
Now the requirement is to filter those buckets whose key starts with a certain prefix, so something similar to match phrase prefix. For example if input to match phrase prefix query is "LOC", then only one bucket should be returned(2nd one). So effectively it's a filter on terms aggregation. Thanks for your thoughts.
You could use the include parameter on your terms aggregation to filter out the values based on regex.
Something like this should work:
GET stackoverflow/_search
{
"_source": false,
"aggs": {
"groups": {
"terms": {
"field": "text.keyword",
"include": "LOC.*"
}
}
}
}
Example: Let's say you have three different documents with three different terms(LOCK, KEY & LOL) in an index. So if you perform the following request:
GET stackoverflow/_search
{
"_source": false,
"aggs": {
"groups": {
"terms": {
"field": "text.keyword",
"include": "L.*"
}
}
}
}
You'll get the following buckets:
"buckets" : [
{
"key" : "LOCK",
"doc_count" : 1
},
{
"key" : "LOL",
"doc_count" : 1
}
]
Hope it is helpful.
I am trying to get a list of users who has as their last activity "connect". Ideally, I want this as a metric viz or a data table in Kibana showing the number of users that connected last and the list of them, respectively. I have, however, given up being able to do this in Kibana. I can get something similar directly from Elasticsearch using a terms aggregation followed by top_hits as below. But the problem is, even though I am sorting the top_hits by #timestamp, the resulting document in NOT the most recent.
{
"size" : 0,
"sort": { "#timestamp": {"order": "desc"} },
"aggs" : {
"by_user" : {
"terms" : {
"field" : "fields.username.keyword",
"size" : 1
},
"aggs": {
"last_message": {
"top_hits": {
"sort": [
{
"#timestamp": {
"order": "desc"
}
}
],
"_source": {
"includes": ["fields.username.keyword", "#timestamp", "status"]
},
"size": 1
}
}
}
}
}
}
Is there a way to do this directly in Kibana?
How can I make sure top_hits gives me the latest results, rather than the "most relevant"?
I think what you want is field collapsing, which is faster than an aggregation.
Something like this should work for your use case:
GET my-index/_search {
"query": {
"match_all": { }
},
"collapse" : {
"field" : "fields.username.keyword"
},
"sort": [ {
"#timestamp": {
"order": "desc"
}
} ] }
I might be missing something, but I don't think Kibana supports this at the moment.
I would like to group documents based on a group field G. I use the „field aggregation“ strategy described in the Elastic documention to sort the buckets by the maximal score of the contained documents (called 'field collapse example in the Elastic doc), like this:
{
"query": {
"match": {
"body": "elections"
}
},
"aggs": {
"top_sites": {
"terms": {
"field": "domain",
"order": {
"top_hit": "desc"
}
},
"aggs": {
"top_tags_hits": {
"top_hits": {}
},
"top_hit" : {
"max": {
"script": {
"source": "_score"
}
}
}
}
}
}
}
This query also includes the top hits in each bucket.
If the maximal score is not unique for the buckets, I would like to specify a second order column. From the application context I know that inside a bucket all documents share the same value for a field F. Therefore, this field should be employed as the second order column.
How can I realize this in Elastic? Is there a way to make a field from the top hits subaggregation useable in the enclosing aggregation?
Any ideas? Many thanks!
It seems you can. In this page all the sorting strategy for terms aggregation are listed.
And they is an example of multi criteria buckets sorting :
Multiple criteria can be used to order the buckets by providing an
array of order criteria such as the following:
GET /_search
{
"aggs" : {
"countries" : {
"terms" : {
"field" : "artist.country",
"order" : [ { "rock>playback_stats.avg" : "desc" }, { "_count" : "desc" } ]
},
"aggs" : {
"rock" : {
"filter" : { "term" : { "genre" : "rock" }},
"aggs" : {
"playback_stats" : { "stats" : { "field" : "play_count" }}
}
}
}
}
}
}
I have a list of documents organized as followed:
{
"date": "2010-12-12" // Some valid datetime string
"category": "some_category" // This can be any string
}
I need to create a frequency distribution for the data within buckets of time. I have looked at the date_histogram API but that only gets me halfway there.
{
"size": 0,
"aggs" : {
"my_search" : {
"date_histogram" : {
"field" : "date",
"interval" : "1s"
}
}
}
}
Which returns me the count of my data that falls into all 1 second buckets. Within those 1 second buckets, I also need to aggregate all of the data into type category buckets, such that I'm left with buckets of time with counts of category within each bucket. Is there a built in method to do this?
You're on the right path, you simply need to add another terms sub-aggregation for the category field:
{
"size": 0,
"aggs" : {
"my_search" : {
"date_histogram" : {
"field" : "date",
"interval" : "1s"
},
"aggs": {
"categories": {
"terms": {
"field": "category"
}
}
}
}
}
}