Get events count by last minute and event level - elasticsearch

I have parsed events with field like "level" (DEBUG, INFO, ERROR, FATAL). How to retrieve events count by last minute and level type = ERROR?
screen from Kibana
I'm trying like that:
curl -XGET 'mysite.com:9200/myindex/_count?pretty=true' -d '
{
"query":{
"term":{
"level":"error"
}
},
"filter":{
"range":{
"_timestamp":{
"gt":"now-1m"
}
}
}
}'

You must have timestamp on your events.If yes, write a count aggregate query on events with query filters of level type and range timestamp(elasticsearch do support range on time/date field with 'now' parameter).
confusing part is you did't mention what kind of count you want.Total event count or you want to count by type or some name parameter(in that case use terms aggregation on that parameter).
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html
https://www.elastic.co/guide/en/elasticsearch/reference/1.4/mapping-date-format.html#date-math
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"level": "trace"
}
},
{
"range": {
"timestamp": {
"gt": "now-1m"
}
}
}
]
}
}
}
}
}

Related

Elasticsearch : filter results based on the date range

I'm using Elasticsearch 6.6, trying to extract multiple results/records based on multiple values (email_address) passed to the query (Bool) on a date range. For ex: I want to extract information about few employees based on their email_address (annie#test.com, charles#test.com, heman#test.com) and from the period i.e project_date (2019-01-01).
I did use should expression but unfortunately it's pulling all the records from elasticsearch based on the date range i.e. it's even pulling other employees information from project_date 2019-01-01.
{
"query": {
"bool": {
"should": [
{ "match": { "email_address": "annie#test.com" }},
{ "match": { "email_address": "chalavadi#test.com" }}
],
"filter": [
{ "range": { "project_date": { "gte": "2019-08-01" }}}
]
}
}
}
I also tried must expression but getting no result. Could you please help me on finding employees using their email_address with the date range?
Thanks in advance.
Should(Or) clauses are optional
Quoting from this article.
"In a query, if must and filter queries are present, the should query occurrence then helps to influence the score. However, if bool query is in a filter context or has neither must nor filter queries, then at least one of the should queries must match a document."
So in your query should is only influencing the score and not actually filtering the document. You must wrap should in must, or move it in filter(if scoring not required).
GET employeeindex/_search
{
"query": {
"bool": {
"filter": {
"range": {
"projectdate": {
"gte": "2019-01-01"
}
}
},
"must": [
{
"bool": {
"should": [
{
"term": {
"email.raw": "abc#text.com"
}
},
{
"term": {
"email.raw": "efg#text.com"
}
}
]
}
}
]
}
}
}
You can also replace should clause with terms clause as in #AlwaysSunny's answer.
You can do it with terms and range along with your existing query inside filter in more shorter way. Your existing query doesn't work as expected because of should clause, it makes your filter weaker. Read more here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
{
"query": {
"bool": {
"filter": [
{
"terms": {
"email_address.keyword": [
"annie#test.com", "chalavedi#test.com"
]
}
},
{
"range": {
"project_date": {
"gte": "2019-08-01"
}
}
}
]
}
}
}

ElasticSearch query with MUST and SHOULD

I have this query to get data from AWS elasticSearch instance v6.2
{
"query": {
"bool": {
"must": [
{
"term": {"logLevel": "error"}
},
{
"bool": {
"should": [
{
"match": {"EventCategory": "Home Management"}
}
]
}
}
],
"filter": [{
"range": { "timestamp": { "gte": 155254550880 }}
}
]
}
},
"size": 10,
"from": 0
}
My data has multiple EventCategories for example 'Home Management' and 'User Account Management'. Problem with this is inside should having match returns all data because phrase 'Management' is in both categories. If I use term instead of match, it don't returns anything at all even when the given value is exactly same as in document.
I need to get data when any of given category is matched with rest of filters.
EDIT:
There may none, one or more than one EventCategory be passed to should clause
I'm not sure why you added a should within a must. Do you expect to have more than one should cases? It looks a bit odd.
As for your question, you can't use the term query on an analysed field, but only on keyword typed fields. If your EventCategory field has the default mapping, you can run the term query against the default non-analysed multi-field of EventCategory as follows:
...
{
"term": { "EventCategory.keyword": "Home Management" }
}
...
Furthermore, if you just want to filter in/out documents without caring about their relevance, I'd recommend you to move all the conditions in the filter block, to speed-up your query and make a better use of the cache.
Below query should work.
I've just removed should and created two must clauses one for each of event and management. Note that the query is meant for text datatypes.
{
"query":{
"bool":{
"must":[
{
"term":{
"logLevel":"error"
}
},
{
"match":{
"EventCategory":"home"
}
},
{
"match":{
"EventCategory":"management"
}
}
],
"filter":[
{
"range":{
"timestamp":{
"gte":155254550880
}
}
}
]
}
},
"size":10,
"from":0
}
Hope it helps!

Elasticsearch DSL query - Get all matching results

I am trying to search an index using DSL query. I have many documents which matches the criteria of log and the range of timestamp.
I am passing dates and converting it to epoch milli seconds.
But I am specifying size parameter in DSL query.
What I see is that if I specify 5000, it extracts 5000 records in the time range. But there are more number of records in the specified time range.
How to retrieve all data matching the range of time so that I dont need to specify the size?
My DSL query is as below.
GET localhost:9200/_search
{
"query": {
"bool": {
"must": [
{"match_phrase": {
"log": "SOME_VALUE"
}
},
{"range": {
"#timestamp": {
"gte": "'"${fromDate}"'",
"lte": "'"${toDate}"'",
"format": "epoch_millis"
}
}
}
]
}
},
"size":5000
}
fromDate = 1519842600000
toDate = 1520533800000
I couldn't get the scan API or scroll pattern working as it was also not showing expected result.
I finally figured out a way to capture the number of hits and then pass that as parameter to extract the data.
GET localhost:9200/_count
{
"query": {
"bool": {
"must": [
{"match_phrase": {
"log": "SOME_VALUE"
}
},
{"range": {
"#timestamp": {
"gte": "'"${fromDate}"'",
"lte": "'"${toDate}"'",
"format": "epoch_millis"
}
}
}
]
}
}
}' > count_size.txt
size_count=`cat count_size.txt | cut -d "," -f1 | cut -d ":" -f2`
echo "Total hits matching this criteria is ${size_count}"
From this I get the size_count value.
If this value is less than 10000, extract the value, else reduce the time range for extraction.
GET localhost:9200/_search
{
"query": {
"bool": {
"must": [
{"match_phrase": {
"log": "SOME_VALUE"
}
},
{"range": {
"#timestamp": {
"gte": "'"${fromDate}"'",
"lte": "'"${toDate}"'",
"format": "epoch_millis"
}
}
}
]
}
},
"size":'"${size_count}"'
}
If large set of data is required for an extensive period, I need to run this with a different set of dates and combine them together to get an overall required reports.
This complete piece of code is written is shell script so I am able to use it much simpler.

Elasticsearch terms query on array of values

I have data on ElasticSearch index that looks like this
{
"title": "cubilia",
"people": [
"Ling Deponte",
"Dana Madin",
"Shameka Woodard",
"Bennie Craddock",
"Sandie Bakker"
]
}
Is there a way for me to do a search for all the people whos name starts with
"ling" (should be case insensitive) and get distinct terms properly cased "Ling Deponte" not "ling deponte"?
I am find with changing mappings on the index in any way.
Edit does what I want but is really bad query:
{
"size": 0,
"aggs": {
"person": {
"filter": {
"bool":{
"should":[
{"regexp":{
"people.raw":"(.* )?[lL][iI][nN][gG].*"
}}
]}
},
"aggs": {
"top-colors": {
"terms": {
"size":10,
"field": "people.raw",
"include":
{
"pattern": ["(.* )?[lL][iI][nN][gG].*"]
}
}
}
}
}
}
}
people.raw is not_analyzed
Yes, and you can do it without a regular expression by taking advantage of Elasticsearch's full text capabilities.
GET /test/_search
{
"query": {
"match_phrase": {
"people": "Ling"
}
}
}
Note: This could also be match or match_phrase_prefix in this case. The match_phrase* queries imply an order of the values in the text. match simply looks for any of the values. Since you only have one value, it's pretty much irrelevant.
The problem is that you cannot limit the document responses to just that name because the search API returns documents. With that said, you can use nested documents and get the desired behavior via inner_hits.
You do not want to do wildcard prefixing whenever possible because it simply does not work at scale. To put it in SQL terms, that's like doing a full table scan; you effectively lose the benefit of the inverted index because it has to walk it entirely to find the actual start.
Combining the two should work pretty well though. Here, I use the query to widdle down results to what you are interested in, then I use your inner aggregation to only include based on the value.
{
"size": 0,
"query": {
"match_phrase": {
"people": "Ling"
}
}
"aggs": {
"person": {
"terms": {
"size":10,
"field": "people.raw",
"include": {
"pattern": ["(.* )?[lL][iI][nN][gG].*"]
}
}
}
}
}
Hi Please find the query it may help for your request
GET skills/skill/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"wildcard": {
"skillNames.raw": "jav*"
}
}
]
}
}
}
}
}
My intention is to find documents starting with the "jav"

Counting with range and terms in Elasticsearch

I'm attempting to do a count query such that I return the number of unsuccessful attempts to log into my system within the last 10 minutes. I created this query:
{
"term": {
"success":false
},
"range": {
"_timestamp": {
"gt": "now-10m"
}
}
}
However, this returns all of the unsuccessful attempts for any time, disregarding the range filter in my query. Am I structuring this query correctly? The query works when I do a search with terms and ranges.
In other words, the output of the above query and curl -XGET localhost:9200/application/_count is the same (I have only tested unsuccessful attempts).
Try using the search_type parameter instead of using the countAPI. This is actually preferred:
curl -XGET localhost:9200/application/_search&search_type=count -d'{
query:....
}'
Documentation:
http://www.elasticsearch.org/guide/reference/api/search/search-type/
The range is a filter, so I think you have to create a filtered query to take it correctly into account :
{
"filtered": {
"query": {
"term": {
"success":false
},
},
"filter: {
"range": {
"_timestamp": {
"gt": "now-10m"
}
}
}
}
}

Resources