Bodybuilder js: Bucket Filter Aggregation - elasticsearch

I would like to make buckets based of keyword-occurences in a field.
I checked elasticsearch documentation and found Filters Aggregation should be a good fit:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filters-aggregation.html#search-aggregations-bucket-filters-aggregation
Currently we're using bodybuilder.js to build queries. I found in the source code (https://github.com/danpaz/bodybuilder/blob/master/src/aggregation-builder.js#L87) an undocumented function:
bodybuilder()
.aggregation('terms', 'title', {
_meta: { color: 'blue' }
}, 'titles')
.build()
what results in:
{
"aggs": {
"titles": {
"terms": {
"field": "title"
},
"meta": {
"color": "blue"
}
}
}
}
But that's actually not the same structure like described in ES documentation:
GET logs/_search
{
"size": 0,
"aggs" : {
"messages" : {
"filters" : {
"filters" : [
{ "match" : { "body" : "error" }},
{ "match" : { "body" : "warning" }}
]
}
}
}
}
Any idea how to achieve Filters Aggregations with bodybuilder.js ?

This is not a solution for bodybuilder.js but one for javascript using a library - very similar to bodybuilder.js. Its called elastic-builder.js.
Regarding your question this code:
esb.requestBodySearch()
.agg(esb.filtersAggregation('messages')
.filter('errors', esb.matchQuery('body', 'error'))
.filter('warnings', esb.matchQuery('body', 'warning')));
will give you this response:
{
"aggs": {
"messages": {
"filters": {
"filters": {
"errors": { "match": { "body": "error" }},
"warnings": { "match": { "body": "warning" }}
}
}
}
}
}
It looks like bodybuilder.js is mostly build for queries and its easier to use for such purpose. If you want more control without the heavy json overhead elastic-builder could be a solution.

We can do it like this in bodybuilder.js:
bodybuilder().aggregation('filters', {
"filters": {
"errors": { "term": { "body": "error" } },
"warnings": { "term": { "body": "warning" } }
}
}, 'messages').build();
above code will generate the query:
{
"aggs" : {
"messages" : {
"filters" : {
"filters" : {
"errors" : { "term" : { "body" : "error" }},
"warnings" : { "term" : { "body" : "warning" }}
}
},
}
}
}

Related

aggregating properties in elastic search

I have an indexed entry that has optional properties. So, for example, I have entries like this
{
"id":1
"field1":"XYZ"
},
{
"id":2
"field2":"XYZ"
},
{
"id":3
"field1":"XYZ"
}
I would like to make an aggregation that will tell me how many entries I have with field1 and field2 populated.
The expected result should be:
{
"field1":2
"field2":1
}
Is this even possible with elasticsaerch?
Yes, you can do it like this:
POST myindex/_search
{
"size": 0,
"aggs": {
"field_exists": {
"filters": {
"filters": {
"field1": {
"exists": {
"field": "field1"
}
},
"field2": {
"exists": {
"field": "field2"
}
}
}
}
}
}
}
You'll get an answer like this one:
"aggregations" : {
"field_exists" : {
"buckets" : {
"field1" : {
"doc_count" : 2
},
"field2" : {
"doc_count" : 1
}
}
}
}

Count the percentage of character fields

I want to count the percentage of specified field data.
this is my Restful API:
Restful API:
GET _search
{
"_source": {
"includes": [ "FIRST_SWITCHED","LAST_SWITCHED","IPV4_DST_ADDR","L4_DST_PORT","IPV4_SRC_ADDR","L7_PROTO_NAME","IN_BYTES","IN_PKTS","OUT_BYTES","OUT_PKTS"]
},
"from" : 0, "size" : 10000,
"query": {
"bool": {
"must": [
{
"match" : { "_index" : "logstash-2017.12.22" }
},
{
"match_phrase":{"IPV4_SRC_ADDR":"192.168.0.159"}
},
{
"range" : {
"LAST_SWITCHED" : {
"gte" : 1513683600
}
}
}
]
}
},
"aggs": {
"IN_PKTS": {
"sum": {
"field": "IN_PKTS"
}
},
"IN_BYTES": {
"sum": {
"field": "IN_BYTES"
}
},
"OUT_BYTES": {
"sum": {
"field": "OUT_BYTES"
}
},
"OUT_PKTS": {
"sum": {
"field": "OUT_PKTS"
}
},
"percent":{
"significant_terms" : {
"field" : "L7_PROTO_NAME",
"percentage":{}
}},
"protocol" : {
"terms" : {
"field" : "PROTOCOL",
"include" : ["17", "6"]
}
},
"Using_port_count" : {
"cardinality" : {
"field" : "L4_SRC_PORT"
}
}
}
}
but there's some errors.
this is error messages:
error messages:
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [L7_PROTO_NAME] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
thank you in advance!
ok, I find the answer!
just add .keyword at here then it can run!
"field" : "L7_PROTO_NAME.keyword"

Getting "Field data loading is forbidden" when trying to aggregate

I'm trying to do a simple unique aggregation, but getting this error:
java.lang.IllegalStateException: Field data loading is forbidden on eid
this is my query:
POST /logstash-2016.06.*/Nginx/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"pid": "1"
}
},
{
"term": {
"cvprogress": "0"
}
},
{
"range" : {
"ServerTime" : {
"gte" : "2016-06-28T00:00:00"
}
}
}
]
}
},
"aggs": {
"distinct_colors" : {
"cardinality" : {
"field" : "eid"
}
}
}
}
After going through the entire thread at https://github.com/elastic/elasticsearch/issues/15267 what worked was adding .raw
like this:
"aggs": {
"distinct_colors" : {
"cardinality" : {
"field" : "eid.raw"
}
}
}

Watcher alert if no records matching filter in x minutes

I need to get ElasticSearch watcher to alert if there is no record matching a pattern inserted into the index in a time frame, it needs to be able to do this whilst grouping on another pair of field.
i.e. the records will be of the pattern:
Date Timestamp Level Message Client Site
It needs to check that Message matches "is running" for each Client's site(s) (i.e. Google Maps and Bing Maps have the same site of Maps). I tihnk the best(?) way to do this right now is to run a wacher per client site.
Sofar I have this, assume the task should write is running into the log every 20 minutes :
{
"trigger" : {
"schedule" : {
"interval" : "25m"
}
},
"input" : {
"search" : {
"request" : {
"search_type" : "count",
"indices" : "<logstash-{now/d}>",
"body" : {
"filtered" : {
"query" : {
"match_phrase" : { "Message" : "Is running" }
},
"filter" : {
"match" : { "Client" : "Example" } ,
"match" : { "Site" : "SomeSite" }
}
}
}
}
}
},
"condition" : {
"script" : "return ctx.payload.hits.total < 1"
},
"actions" : {
},
"email_administrator" : {
"email" : {
"to" : "me#host.tld",
"subject" : "Tasks are not running for {{ctx.payload.client}} on their site {{ctx.payload.site}}",
"body" : "Too many error in the system, see attached data",
"attach_data" : true,
"priority" : "high"
}
}
}
}
For anyone looking how to do this in the future, a few things need nesting in query as part of filter and match becomes term. Fun!...
{
"trigger": {
"schedule": {
"interval": "25m"
}
},
"input": {
"search": {
"request": {
"search_type": "count",
"indices": "<logstash-{now/d}>",
"body": {
"query": {
"filtered": {
"query": {
"match_phrase": {
"Message": "Its running"
}
},
"filter": {
"query": {
"term": {
"Client": "Example"
}
},
"query": {
"term": {
"Site": "SomeSite"
}
},
"query": {
"range": {
"event_timestamp": {
"gte": "now-25m",
"lte": "now"
}
}
}
}
}
}
}
}
}
},
"condition": {
"compare": {
"ctx.payload.hits.total": {
"lte": 1
}
}
},
"actions": {
"email_administrator": {
"email": {
"to": "me#host.tld",
"subject": "Tasks are not running for {{ctx.payload.client}} on their site {{ctx.payload.site}}",
"body": "Tasks are not running for {{ctx.payload.client}} on their site {{ctx.payload.site}}",
"attach_data": true,
"priority": "high"
}
}
}
}
You have to change your condition,It support json format:
"condition" : {
"script" : "return ctx.payload.hits.total : 1"
}
Please refer below link,
https://www.elastic.co/guide/en/watcher/current/condition.html

Elasticsearch match list against field

I have a list, array or whichever language you are familiar. E.g. names : ["John","Bas","Peter"] and I want to query the name field if it matches one of those names.
One way is with OR Filter. e.g.
{
"filtered" : {
"query" : {
"match_all": {}
},
"filter" : {
"or" : [
{
"term" : { "name" : "John" }
},
{
"term" : { "name" : "Bas" }
},
{
"term" : { "name" : "Peter" }
}
]
}
}
}
Any fancier way? Better if it's a query than a filter.
{
"query": {
"filtered" : {
"filter" : {
"terms": {
"name": ["John","Bas","Peter"]
}
}
}
}
}
Which Elasticsearch rewrites as if you hat used this one
{
"query": {
"filtered" : {
"filter" : {
"bool": {
"should": [
{
"term": {
"name": "John"
}
},
{
"term": {
"name": "Bas"
}
},
{
"term": {
"name": "Peter"
}
}
]
}
}
}
}
}
When using a boolean filter, most of the time, it is better to use the bool filter than and or or. The reason is explained on the Elasticsearch blog: http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
As I tried the filtered query I got no [query] registered for [filtered], based on answer here it seems the filtered query has been deprecated and removed in ES 5.0. So I provide using:
{
"query": {
"bool": {
"filter": {
"terms": {
"name": ["John","Bas","Peter"]
}
}
}
}
}
example query = filter by keyword and a list of values
{
"query": {
"bool": {
"must": [
{
"term": {
"fguid": "9bbfe844-44ad-4626-a6a5-ea4bad3a7bfb.pdf"
}
}
],
"filter": {
"terms": {
"page": [
"1",
"2",
"3"
]
}
}
}
}
}

Resources