Elasticsearch query results return wrong results - elasticsearch

I'm trying to do a query for server logs. The search is returning results but there are a couple of issues.
1) I'm specifying the server name, yet I'm getting results back for other servers in the same domain.
2) Even though I'm specifying the query get results back from the past hour, they're coming back from two hours before, i.e. if I perform the search at 1pm, the results are returning from 12pm. The search returns the correct results if I specify sorting by timestamp but this seems to take longer for the results to appear so I would rather not do that unless I have to.
Any help you can give is greatly appreciated.
Here's my query (with edited log name and server name):
var searchParams = {
index: 'logs*',
"body": {
"from" : 0, "size": 50,
"sort": [
{
"timestamp": {
"order": "desc",
"unmapped_type": "boolean"
}
}
],
"query": {
"bool": {
"must": [
{
"match" : {"gl2_source_input" : "579f7b6696d78a4f6cbfa745"},
"match" : {"source" : "server01.fakedomain.com"},
"match" : {"EventID" : "5145"}
},
{
"range": {
"timestamp": {
"gte": "now-1h",
"lte": "now/m",
"time_zone": "-05:00"
}
}
}
],
"must_not": []
}
},
}
}

A couple of things here:
If you want to match a keyword exactly, then use a term query on a keyword type field.
Unless you're interested in your queries being scored, you should use a filter clause instead of the must clause.
So your query can look something like this (assuming that your filter fields are keyword type fields).
var searchParams = {
index: 'logs*',
"body": {
"from" : 0, "size": 50,
"sort": [
{
"timestamp": {
"order": "desc",
"unmapped_type": "boolean"
}
}
],
"query": {
"bool": {
"filter": [
{ "term" : {"gl2_source_input" : "579f7b6696d78a4f6cbfa745"} },
{ "term" : {"source" : "server01.fakedomain.com"} },
{ "term" : {"EventID" : "5145"} },
{
"range": {
"timestamp": {
"gte": "now-1h",
"lte": "now/m",
"time_zone": "-05:00"
}
}
}
]
}
},
}
}

Related

Optimize ES query with too many terms elements

We are processing a dataset of billions of records, currently all of the data are saved in ElasticSearch, and all of the queries and aggregations are performed with ElasticSearch.
The simplified query body is like below, we put the device ids in terms and then concate them with should to avoid the limit of 1024 to terms, the total count of terms element is up to 100,000, and it now becomes very slow.
{
"_source": {
"excludes": [
"raw_msg"
]
},
"query": {
"filter": {
"bool": {
"must": [
{
"range": {
"create_ms": {
"gte": 1664985600000,
"lte": 1665071999999
}
}
}
],
"should": [
{
"terms": {
"device_id": [
"1328871",
"1328899",
"1328898",
"1328934",
"1328919",
"1328976",
"1328977",
"1328879",
"1328910",
"1328902",
... # more values, since terms not support values more than 1024, wen concate all of them with should
]
}
},
{
"terms": {
"device_id": [
"1428871",
"1428899",
"1428898",
"1428934",
"1428919",
"1428976",
"1428977",
"1428879",
"1428910",
"1428902",
...
]
}
},
... # concate more terms until all of the 100,000 values are included
],
"minimum_should_match": 1
}
}
},
"aggs": {
"create_ms": {
"date_histogram": {
"field": "create_ms",
"interval": "hour",
}
}
},
"size": 0}
My question is that is there a way to optimize this case? Or is there a better choice to do this kind of search?
Realtime or near realtime is a must, other engine is acceptable.
simplified schema of the data:
"id" : {
"type" : "long"
},
"content" : {
"type" : "text"
},
"device_id" : {
"type" : "keyword"
},
"create_ms" : {
"type" : "date"
},
... # more field
You can use the terms query with a terms lookup to specify a larger list of values like here
Store your ids in a specific document with id like 'device_ids'
"should": [
{
"terms": {
"device_id": {
"index": "your-index-name",
"id": "device_ids",
"path": "field-name"
}
}
}
]

Elastic search bool query

My objective is to find out most recent 10 documents which match message id as MSG-1013 and Severity field must be info. Both conditions should satisfied and match text should be exact. I have tried with search query below but it does not give me expected results. What am I doing wrong here ?
{
"size": 10,
"query": {
"bool": {
"must": [
{
"match": { "messageId": "MSG-1013" }
},
{
"match": { "Severity": "Info" }
}
]
}
}
}
If I have understood you correctly, you want to find the top 10 (recent) documents having exactly fields "messageId" and "Severity". I assume, you don't need a score because your score seems to be the the document timestamp or something else like a date field. For this purpose, you could use the bool filter in combination with a sort query.
{
"query": {
"bool": {
"filter": [
{ "term": { "messageId": "MSG-1013" } },
{ "term": { "Severity": "Info" } }
]
}
},
"sort" : [
{ "documentTimestamp" : {"order" : "desc"}}
],
"size": 10
}

Elasticsearch - Aggregations on part of bool query

Say I have this bool query:
"bool" : {
"should" : [
{ "term" : { "FirstName" : "Sandra" } },
{ "term" : { "LastName" : "Jones" } }
],
"minimum_should_match" : 1
}
meaning I want to match all the people with first name Sandra OR last name Jones.
Now, is there any way that I can get perform an aggregation on all the documents that matched the first term only?
For example, I want to get all of the unique values of "Prizes" that anybody named Sandra has. Normally I'd just do:
"query": {
"match": {
"FirstName": "Sandra"
}
},
"aggs": {
"Prizes": {
"terms": {
"field": "Prizes"
}
}
}
Is there any way to combine the two so I only have to perform a single query which returns all of the people with first name Sandra or last name Jones, AND an aggregation only on the people with first name Sandra?
Thanks alot!
Use post_filter.
Please refer the following query. Post_filter will make sure that your bool should clause don't effect your aggregation scope.
Aggregations are filtered based on main query as well, but they are unaffected by post_filter. Please refer to the link
{
"from": 0,
"size": 20,
"aggs": {
"filtered_lastname": {
"filter": {
"query": {
"match": {
"FirstName": "sandra"
}
}
},
"aggs": {
"prizes": {
"terms": {
"field": "Prizes",
"size": 10
}
}
}
}
},
"post_filter": {
"bool": {
"should": [{
"term": {
"FirstName": "Sandra"
}
}, {
"term": {
"LastName": "Jones"
}
}],
"minimum_should_match": 1
}
}
}
Running a filter inside the aggs before aggregating on prizes can help you achieve your desired usecase.
Thanks
Hope this helps

elasticsearch inner join

I have an index with some fields, my documents contains valid "category" data also contains "url"(analyzed field) data but not contains respsize..
in the other hand documents that contains "respsize" data (greater than 0) also contains "url" data but not contains "category" data..
I think you got the point, I need join or intersection whatever that a query returns all documents contains respsize and category that have same same url documents.
Here what I did so far;(url field analyzed, rest of them not_analyzed)
here documents that have category:
and other documents have respsize that I need to combine them based on url
I need a dsl query that return records that have same url token(in this scenario it will be www.domainname.com) with merge category and respsize,
I simply want field in second img "category":"27" like in img1 but of course with rest of all fields.
here is my query but not work
GET webproxylog/accesslog/_search
{
"query": {
"filtered": {
"filter" : {
"and" : {
"filters": [
{
"not": {
"filter": {
"terms": {
"category": [
"-",
"-1",
"0"
]
},
"term": {
"respsize": "0"
}
}
},
"term": {
"category": "www.hurriyet.com.tr"
}
}
],
"_cache" : true
}
}
}
},
"sort": [
{
"respsize": {
"order": "desc"
}
}
]
}
You can try the query below. It will require the url field to be the one you specify (i.e. must) and then either of the next two clauses (i.e. should) must be true, i.e. category should be not one of the given terms or the respsize must be greater than 0.
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"url": "www.hurriyet.com.tr"
}
}
],
"should": [
{
"not": {
"terms": {
"category": [
"-",
"-1",
"0"
]
}
}
},
{
"range": {
"respsize": {
"gt": 0
}
}
}
]
}
}
}
}
}

elasticsearch combine multiple queries

I have an elasticsearch index for storing information about people.
To find specific persons I have some queries, each of them works alone but when I combine them using Bool Query I get an error.
One of the queries is a fuzzy search for the name
{
"query": {
"fuzzy_like_this": {
"fields": [
"firstname",
"lastname"
],
"like_text": "Peter"
}
}
}
Another query is for searching people who are born in a specific date range
{
"query": {
"range": {
"birthdate": {
"from": "1988-12-30",
"to": "1993-12-30"
}
}
}
}
Now I want to combine these two queries. My bool query:
{
"query": {
"bool": {
"must": [
{
"query": {
"fuzzy_like_this": {
"fields": [
"firstname",
"lastname"
],
"like_text": "Peter"
}
}
},
{
"query": {
"range": {
"birthdate": {
"from": "1988-12-30",
"to": "1993-12-30"
}
}
}
}
]
}
}
}
Although both queries work fine when I use them separately, when combining them I get an error.
There are people in my index whose firstname is Peter AND are born in this date range, but even if there were no people found I should get 0 results instead of an error.
The error says:
"error": "SearchPhaseExecutionException[Failed to execute phase [query], all shards failed;
nested: QueryParsingException[[indexname] No query registered for [query]]
Is combining queries the way I want to not possible with a bool query or did I just use the wrong syntax?
I think that you have a syntax error, the keyword query is not needed for queries that belong to must. In other words, it should be as follows:
{
"query": {
"bool": {
"must": [
{
"fuzzy_like_this": {
"fields": [
"firstname",
"lastname"
],
"like_text": "Peter"
}
},
{
"range": {
"birthdate": {
"from": "1988-12-30",
"to": "1993-12-30"
}
}
}
]
}
}
}
More info about boolean queries here

Resources