Aggregation with range and match_phrase - elasticsearch

I want to count the total failed logins each day (between 2021-07-18 and 2021-07-26).
The query here will return only hits for failed logins between those dates. If I removed the 'match_phrase' component that looks for failed logins, it will count just the total logins each day. But including the 'match_phrase' part will only return the 'failed login' documents, and not count how many failed per day.
Aggregation Query
GET index/_search
{
"size": 8,
"query": {
"bool": {
"must": [
{"match_phrase": {
"message": "[Failed login"
}
},
{"range": {
"#timestamp":{
"gte":"2021-07-18",
"lte":"2021-07-26"
}
}
}
]
}
},
"aggs": {
"hit_count_per_day": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "day"
}
}
}
}
I think I think I need to add something to the 'aggs' section? or is using 'match_phrase' fundamentally not correctly applied for aggregation purposes? Mapping for 'message'...
"message" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
},
"analyzer" : "whitespace"
}
Sample 'failed login' Document from hits, trimmed for brevity...
{
"_index" : "log-c0001_log-004444",
"_type" : "_doc",
"_id" : "DAwerFPXIHfH_4rWX",
"_score" : 29.337955,
"_source" : {
"message" : "<14>1 2021-07-22T07:56:01.598708+00:00 - - Event [247458] [1-1] [2021-07-22T07:56:01.598364Z] [info] [] [247458] [Failed login]",
"#timestamp" : "2021-07-22T07:56:01.596Z"
}
Response:
{
"took" : 886,
"timed_out" : false,
"_shards" : {
"total" : 162,
"successful" : 162,
"skipped" : 144,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 29.337955,
"hits" : [
{..},
{..},
{..}
]
},
"aggregations" : {
"hit_count_per_day" : {
"buckets" : [ ]
}
}
}

Related

Why does elastic search wildcard query return no results?

Query #1 in Kibana returns results, however Query #2 returns no results. I search for only "bob" and get results, but when searching for "bob smith", no results, even though "Bob Smith" exists in the index. Any reason why?
Query #1: returns results
GET people/_search
{
"query": {
"wildcard" : {
"name" : "*bob*"
}
}
}
Results:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 23,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "people",
"_type" : "_doc",
"_id" : "xxxxx",
"_score" : 1.0,
"_source" : {
"name" : "Bob Smith",
...
Query #2: returns nothing.. why(?)
GET people/_search
{
"query": {
"wildcard" : {
"name" : "*bob* *smith*"
}
}
}
results...nothing
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
Look like the reason of the empty result is your index mapping. If you use "text" type field, you actually search in the inverted index, mean you search in the token "bob" and token "smith" (standard analyzer) and not in the "Bob Smith". If you want to search in "Bob Smith" as one token, you need to use "keyword" type (maybe with lowercase normalizer, if you want to use not key sensetive search)
For example:
PUT test
{
"settings": {
"analysis": {
"normalizer": {
"lowercase_normalizer": {
"type": "custom",
"char_filter": [],
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "keyword",
"ignore_above": 256,
"normalizer": "lowercase_normalizer"
}
}
}
}
PUT test/_doc/1
{
"name" : "Bob Smith"
}
GET test/_search
{
"query": {
"wildcard": {
"name": "*bob* *Smith*"
}
}
}

document_missing_exception while performing ElasticSearch update

I went through several questions with the same "document_missing_exception" problem but looks like they aren't the same problem in my case. I can query the document, but failed when I tried to updated it.
My query:
# search AuthEvent by sessionID
GET events-*/_search
{
"size": "100",
"query": {
"bool": {
"must": [{
"term": {
"type": {
"value": "AuthEvent"
}
}
},
{
"term": {
"client.sessionID.raw": {
"value": "067d660a1504Y67FOuiiRIEkVNG8uYIlnK87liuZGLBcSmEW0aHoDXAHfu"
}
}
}
]
}
}
}
Query result:
{
"took" : 18,
"timed_out" : false,
"_shards" : {
"total" : 76,
"successful" : 76,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 6.705622,
"hits" : [
{
"_index" : "events-2020.10.06",
"_type" : "doc",
"_id" : "2c675295b27a225ce243d2f13701b14222074eaf",
"_score" : 6.705622,
"_routing" : "067d660a1504Y67FOuiiRIEkVNG8uYIlnK87liuZGLBcSmEW0aHoDXAHfu",
"_source" : {
# some data
}
}
]
}
}
Update request:
POST events-2020.10.06/_doc/2c675295b27a225ce243d2f13701b14222074eaf/_update
{
"doc" : {
"custom" : {
"testField" : "testData"
}
}
}
And update result:
{
"error" : {
"root_cause" : [
{
"type" : "document_missing_exception",
"reason" : "[_doc][2c675295b27a225ce243d2f13701b14222074eaf]: document missing",
"index_uuid" : "5zhQy6W6RnWscDz7Av4_bA",
"shard" : "1",
"index" : "events-2020.10.06"
}
],
"type" : "document_missing_exception",
"reason" : "[_doc][2c675295b27a225ce243d2f13701b14222074eaf]: document missing",
"index_uuid" : "5zhQy6W6RnWscDz7Av4_bA",
"shard" : "1",
"index" : "events-2020.10.06"
},
"status" : 404
}
I'm quite new to ElasticSearch and couldn't find any reason for such behaviour. I use ElasticSearch 6.7.1 oss version + Kibana for operating with data. I also tried with bulk update but ended with same error.
As you can see in the query results, your document has been indexed with a routing value and you're missing it in your update request.
Try this instead:
POST events-2020.10.06/_doc/2c675295b27a225ce243d2f13701b14222074eaf/_update?routing=067d660a1504Y67FOuiiRIEkVNG8uYIlnK87liuZGLBcSmEW0aHoDXAHfu
{
"doc" : {
"custom" : {
"testField" : "testData"
}
}
}
If a document is indexed with a routing value, all subsequent get, update and delete operations need to happen with that routing value as well.

ElasticSearch join data within the same index

I am quite new with ElasticSearch and I am collecting some application logs within the same index which have this format
{
"_index" : "app_logs",
"_type" : "_doc",
"_id" : "JVMYi20B0a2qSId4rt12",
"_source" : {
"username" : "mapred",
"app_id" : "application_1569623930006_490200",
"event_type" : "STARTED",
"ts" : "2019-10-02T08:11:53Z"
}
I can have different event types. In this case I am interested in STARTED and FINISHED. I would like to query ES in order to get all the app that started in a certain day and enrich them with their end time. Basically I want to create couples of start/end (an end might also be missing, but that's fine).
I have realized join relations in sql cannot be used in ES and I was wondering if I can exploit some other feature in order to get this result in one query.
Edit: these are the details of the index mapping
{
“app_logs" : {
"mappings" : {
"_doc" : {
"properties" : {
"event_type" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
“app_id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"ts" : {
"type" : "date"
},
“event_type” : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}}}}
What I understood is that you would want to collate list of documents having same app_id along with the status as either STARTED or FINISHED.
I do not think Elasticsearch is not meant to perform JOIN operations. I mean you can but then you have to design your documents as mentioned in this link.
What you would need is an Aggregation query.
Below is the sample mapping, documents, the aggregation query and the response as how it appears, which would actually help you get the desired result.
Mapping:
PUT mystatusindex
{
"mappings": {
"properties": {
"username":{
"type": "keyword"
},
"app_id":{
"type": "keyword"
},
"event_type":{
"type":"keyword"
},
"ts":{
"type": "date"
}
}
}
}
Sample Documents
POST mystatusindex/_doc/1
{
"username" : "mapred",
"app_id" : "application_1569623930006_490200",
"event_type" : "STARTED",
"ts" : "2019-10-02T08:11:53Z"
}
POST mystatusindex/_doc/2
{
"username" : "mapred",
"app_id" : "application_1569623930006_490200",
"event_type" : "FINISHED",
"ts" : "2019-10-02T08:12:53Z"
}
POST mystatusindex/_doc/3
{
"username" : "mapred",
"app_id" : "application_1569623930006_490201",
"event_type" : "STARTED",
"ts" : "2019-10-02T09:30:53Z"
}
POST mystatusindex/_doc/4
{
"username" : "mapred",
"app_id" : "application_1569623930006_490202",
"event_type" : "STARTED",
"ts" : "2019-10-02T09:45:53Z"
}
POST mystatusindex/_doc/5
{
"username" : "mapred",
"app_id" : "application_1569623930006_490202",
"event_type" : "FINISHED",
"ts" : "2019-10-02T09:45:53Z"
}
POST mystatusindex/_doc/6
{
"username" : "mapred",
"app_id" : "application_1569623930006_490203",
"event_type" : "STARTED",
"ts" : "2019-10-03T09:30:53Z"
}
POST mystatusindex/_doc/7
{
"username" : "mapred",
"app_id" : "application_1569623930006_490203",
"event_type" : "FINISHED",
"ts" : "2019-10-03T09:45:53Z"
}
Query:
POST mystatusindex/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"ts": {
"gte": "2019-10-02T00:00:00Z",
"lte": "2019-10-02T23:59:59Z"
}
}
}
],
"should": [
{
"match": {
"event_type": "STARTED"
}
},
{
"match": {
"event_type": "FINISHED"
}
}
]
}
},
"aggs": {
"application_IDs": {
"terms": {
"field": "app_id"
},
"aggs": {
"ids": {
"top_hits": {
"size": 10,
"_source": ["event_type", "app_id"],
"sort": [
{ "event_type": { "order": "desc"}}
]
}
}
}
}
}
}
Notice that for filtering I've made use of Range Query as you only want to filter documents for that date and also added a bool should logic to filter based on STARTED and FINISHED.
Once I have the documents, I've made use of Terms Aggregation and Top Hits Aggregation to get the desired result.
Result
{
"took" : 12,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"application_IDs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "application_1569623930006_490200", <----- APP ID
"doc_count" : 2,
"ids" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "mystatusindex",
"_type" : "_doc",
"_id" : "1", <--- Document with STARTED status
"_score" : null,
"_source" : {
"event_type" : "STARTED",
"app_id" : "application_1569623930006_490200"
},
"sort" : [
"STARTED"
]
},
{
"_index" : "mystatusindex",
"_type" : "_doc",
"_id" : "2", <--- Document with FINISHED status
"_score" : null,
"_source" : {
"event_type" : "FINISHED",
"app_id" : "application_1569623930006_490200"
},
"sort" : [
"FINISHED"
]
}
]
}
}
},
{
"key" : "application_1569623930006_490202",
"doc_count" : 2,
"ids" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "mystatusindex",
"_type" : "_doc",
"_id" : "4",
"_score" : null,
"_source" : {
"event_type" : "STARTED",
"app_id" : "application_1569623930006_490202"
},
"sort" : [
"STARTED"
]
},
{
"_index" : "mystatusindex",
"_type" : "_doc",
"_id" : "5",
"_score" : null,
"_source" : {
"event_type" : "FINISHED",
"app_id" : "application_1569623930006_490202"
},
"sort" : [
"FINISHED"
]
}
]
}
}
},
{
"key" : "application_1569623930006_490201",
"doc_count" : 1,
"ids" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "mystatusindex",
"_type" : "_doc",
"_id" : "3",
"_score" : null,
"_source" : {
"event_type" : "STARTED",
"app_id" : "application_1569623930006_490201"
},
"sort" : [
"STARTED"
]
}
]
}
}
}
]
}
}
}
Note that the last document with only STARTED appears in the aggregation result as well.
Updated Answer
{
"size":0,
"query":{
"bool":{
"must":[
{
"range":{
"ts":{
"gte":"2019-10-02T00:00:00Z",
"lte":"2019-10-02T23:59:59Z"
}
}
}
],
"should":[
{
"term":{
"event_type.keyword":"STARTED" <----- Changed this
}
},
{
"term":{
"event_type.keyword":"FINISHED" <----- Changed this
}
}
]
}
},
"aggs":{
"application_IDs":{
"terms":{
"field":"app_id.keyword" <----- Changed this
},
"aggs":{
"ids":{
"top_hits":{
"size":10,
"_source":[
"event_type",
"app_id"
],
"sort":[
{
"event_type.keyword":{ <----- Changed this
"order":"desc"
}
}
]
}
}
}
}
}
}
Note the changes I've made. Whenever you would need exact matches or want to make use of aggregation, you would need to make use of keyword type.
In the mapping you've shared, there is no username field but two event_type fields. I'm assuming its just a human err and that one of the field should be username.
Now if you notice carefully, the field event_type has a text and its sibling keyword field. I've just modified the query to make use of the keyword field and when I am doing that, I'm use Term Query.
Try this out and let me know if it helps!

Nested boolean aggregation in elastic

I have json payloads as such
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 61,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "CAojVWwBO8H0jj7a_j3P",
"_score" : 1.0,
"_source" : {
"appName" : "BigApp",
"appVer" : "1.0",
"reviews" : {
"reviewer" : {
"value" : "Bob"
},
"testsPass" : [
{
"name" : "unit",
"pass" : false
},
{
"name" : "integraton",
"pass" : false
},
{
"name" : "ui",
"pass" : false
}
]
}
}
}
]
}
}
In elastic I want to aggregate the boolean values under testsPass to return true if all of the pass values are true.
I am new to Elastic and struggling to write a query in that shape, can someone please help?
So far I have tried nested aggregators but can't get the syntax right.
Looking at your data, I'm assuming the structure of your mapping is as follow:
Mapping:
PUT myindex
{
"mappings": {
"properties": {
"appName":{
"type": "keyword"
},
"appVer": {
"type": "keyword"
},
"reviews":{
"properties": {
"reviewer":{
"properties":{
"value": {
"type": "keyword"
}
}
},
"testsPass":{
"type": "nested"
}
}
}
}
}
}
Sample Documents:
POST myindex/_doc/1
{
"appName":"BigApp",
"appVer":"1.0",
"reviews":{
"reviewer":{
"value":"Bob"
},
"testsPass":[
{
"name":"unit",
"pass":false
},
{
"name":"integraton",
"pass":false
},
{
"name":"ui",
"pass":false
}
]
}
}
POST myindex/_doc/2
{
"appName":"MidApp",
"appVer":"1.0",
"reviews":{
"reviewer":{
"value":"Bob"
},
"testsPass":[
{
"name":"unit",
"pass":true
},
{
"name":"integraton",
"pass":true
},
{
"name":"ui",
"pass":true
}
]
}
}
POST myindex/_doc/3
{
"appName":"SmallApp",
"appVer":"1.0",
"reviews":{
"reviewer":{
"value":"Bob"
},
"testsPass":[
{
"name":"unit",
"pass":true
},
{
"name":"integraton",
"pass":true
},
{
"name":"ui",
"pass":false
}
]
}
}
Note that in the list of the above documents, only the document having appName: MidApp(2nd document) has the list of all true values.
Aggregation Query:
POST myindex/_search
{
"size":0,
"aggs":{
"pass_reviewers":{
"filter":{
"bool":{
"must":[
{
"nested":{
"path":"reviews.testsPass",
"query":{
"match":{
"reviews.testsPass.pass":"true"
}
}
}
}
],
"must_not":[
{
"nested":{
"path":"reviews.testsPass",
"query":{
"match":{
"reviews.testsPass.pass":"false"
}
}
}
}
]
}
},
"aggs":{
"myhits":{
"top_hits":{
"size":10
}
}
}
}
}
}
Note that the above returns only the concerned document as result of Top Hits aggregation. The main aggregation over here is in filter section which is just a Filter Aggregation
Response:
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"pass_reviewers" : {
"doc_count" : 1, <------ Note this. Returns count of docs. This is result of filtered aggregation
"myhits" : { <------ Start of top hits aggregation
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "2", <----- Document
"_score" : 1.0,
"_source" : {
"appName" : "MidApp",
"appVer" : "1.0",
"reviews" : {
"reviewer" : {
"value" : "Bob"
},
"testsPass" : [
{
"name" : "unit",
"pass" : true
},
{
"name" : "integraton",
"pass" : true
},
{
"name" : "ui",
"pass" : true
}
]
}
}
}
]
}
}
}
}
}
Just in case if you just want the query to return the documents having all true, and not necessarily make use of aggregation, you can simply make use of the below query:
Query:
POST myindex/_search
{
"query":{
"bool":{
"must":[
{
"nested":{
"path":"reviews.testsPass",
"query":{
"match":{
"reviews.testsPass.pass":"true"
}
}
}
}
],
"must_not":[
{
"nested":{
"path":"reviews.testsPass",
"query":{
"match":{
"reviews.testsPass.pass":"false"
}
}
}
}
]
}
}
}
Basically the core execution logic is the same in both the queries, I've just narrowed down the logic you are looking for.
Response:
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.597837,
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.597837,
"_source" : {
"appName" : "MidApp",
"appVer" : "1.0",
"reviews" : {
"reviewer" : {
"value" : "Bob"
},
"testsPass" : [
{
"name" : "unit",
"pass" : true
},
{
"name" : "integraton",
"pass" : true
},
{
"name" : "ui",
"pass" : true
}
]
}
}
}
]
}
}
Hope this helps!

enabled fielddata on text field in ElasticSearch but aggregation is not working

According to the documentation you can run ElasticSearch aggregations on fields that are type keyword or not a text field or which have fielddata set to true in the index mapping.
I am trying to count city_names in an nginx log. It works fine with the int field result. But it does not work with the field city_name even when I updated the index mapping for that to put fielddata=true. The should have been not required as it was of type keyword.
To say it does not work means that:
"aggregations" : {
"cities" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
}
Here is the field mapping:
"city_name" : {
"type" : "text",
"fielddata" : true
},
And here is the aggression query:
curl -XGET --user $pwd --header 'Content-Type: application/json' https://58571402f5464923883e7be42a037917.eu-central-1.aws.cloud.es.io:9243/logstash/_search?pretty -d '{
"aggs" : {
"cities": {
"terms" : { "field": "city_name"}
}
}
}'
If you don't get any error when executing your search it seems that is more like a problem with the data. Are you sure you have, at least, one document with the field city_name filled?
I tried to reproduce your issue with ElasticSearch 6.6.2.
I created an index
PUT cities
{
"mappings": {
"city": {
"dynamic": "true",
"properties": {
"id": {
"type": "long"
},
"city_name": {
"type": "text",
"fielddata": true
}
}
}
}
}
I added one document without the city_name
PUT cities/city/1
{
"id": "1"
}
When i performed the search:
GET cities/_search
{
"aggs": {
"cities": {
"terms" : { "field": "city_name"}
}
}
}
I got no buckets in the cities aggregation. But when I added one document with the city name filled:
PUT cities/city/2
{
"id": "2",
"city_name": "London"
}
I got the expected result:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [
{
"_index" : "cities",
"_type" : "city",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"id" : "2",
"city_name" : "london"
}
},
{
"_index" : "cities",
"_type" : "city",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"id" : "1"
}
}
]
},
"aggregations" : {
"cities" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "london",
"doc_count" : 1
}
]
}
}
}

Resources