Elasticsearch: count documents from query match - elasticsearch

I would like to get the number of documents that matches a specific string within a time range.
How can I specify a time range from this query?
GET myindex/_count
{
"query": {
"match" : {
"log" : "ERROR"
}
}
}
To get a time range:
{
"query": {
"range": {
"msgSubmissionTime": {
"gte": "now-10m",
"lt": "now"
}
}
}
}
Is there a way to combine both queries?

The guys above me are correct, but they both added redundant [,] for the must which implies a query of more than on match field.
GET _search
{
"query": {
"bool" : {
"must" : {
"match" : { "log": "ERROR" }
},
"filter":
{
"range": {
"msgSubmissionTime": {
"gte": "now-10m",
"lte": "now"
}
}
}
}
}
}

Sure you can.
It can be done in two ways: with filtering and boolean query.
Elastic recommends to use filters to prefilter results - they are faster then queries.
Boolean queries can geather different queries by AND, OR, NOT operators.
In official Elastic docs you can find example that almost fits your question - elasticsearch documentation
So your query will be like:
{
"query": {
"bool": {
"must": [
{
"match": {
"log": "ERROR"
}
}
],
"filter": {
"range": {
"msgSubmissionTime": {
"gte": "now-10m",
"lte": "now"
}
}
}
}
}
}

try this.
{
"query": {
"bool": {
"filter": {
"range": {
"msgSubmissionTime": {
"gte": "now-10m",
"lt": "now"
}
}
},
"must": [
{
"term": {
"log" : "ERROR"
}
}
]
}
}
}

Related

Compond query with Elasticsearch

I'm trying to perform a search with the intended criteria being (activationDate in range 1598889600 to 1602051579) or someFlag=true.
Below is the query I tried, but it does not yield any records with someFlag=true (even with a big size, e.g. 5000). My Elasticsearch does have a lot of records with someFlag=true.
There are about 3000 total documents and this query returns around 280 documents.
{
"query": {
"bool": {
"must": [
{
"range": {
"activationDate": {
"gte": 1598889600
}
}
},
{
"range": {
"activationDate": {
"lte": 1602051579
}
}
}
],
"should": {
"match": {
"someFlag": true
}
}
}
},
"from": 1,
"size": 1000
}
Am I missing something?
This should work:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"should": [
{
"range": {
"activationDate": {
"gte": 1598889600,
"lte": 1602051579
}
}
},
{
"term": {
"someFlag": true
}
}
]
}
}
]
}
}
}
In theory this should do the same:
{
"query": {
"bool": {
"should": [
{
"range": {
"activationDate": {
"gte": 1598889600,
"lte": 1602051579
}
}
},
{
"term": {
"someFlag": true
}
}
]
}
}
}
However the first query I've given wraps bool clause within a filter context (so that it does not need to score and query becomes cacheable).
Your bool query might have not worked because you were using match query, not term. match is normally used for text search only.
Replace the must with an should and set minimum_should_match=1 as is is an OR query and you are fine if just one of the ceiterias is met by any record. Next reduce the two range criterias to just one, where you combine gte and lte.

Difference between the result of two date fields then getting average

I am looking to get the average of the difference between two different fields in an elastic DB, I have been able to write a query to return the last 1000 results, however I am not sure how I go about getting the difference between each result then getting an overall average.
Elastic query below:
POST my_index/_search
{
"size":1000,
"_source": ["date.time.received","date.time.sent"],
"query": {
"bool": {
"must": [
{
"range": {
"date.time.received": {
"gte": "2019-06-19"
}
}
},
{
"range": {
"date.time.sent": {
"gte": "2019-06-19"
}
}
}
]
}
}
}
I am using average aggregation and script
POST testindex5/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"date.time.received": {
"gte": "2019-06-19"
}
}
},
{
"range": {
"date.time.sent": {
"gte": "2019-06-19"
}
}
}
]
}
},
"aggs": {
"avg_resp": {
"avg": {
"script": "(doc['date.time.received'].value.toInstant().toEpochMilli()- doc['date.time.sent'].value.toInstant().toEpochMilli())/1000/86400" ---> convert to days
}
}
}
}

Putting two queries together

How am I able to put both of these queries together, as you can see that query one is bringing back all the date from today and the second query is bringing back data for all users that has the name test in it.
So I want to bring back all of the data for data with the name that has test in it.
Could someone show me how this is done please?
Query one:
{
"_source":["VT"],
"query": {
"range": {
"VT": {
"gte": "now/d",
"lt": "now/d+13h"
}
}}
}
Query two:
from elasticsearch import Elasticsearch
es = Elasticsearch(["9200"])
res = es.search(index="search", body=
{
"_source": ["DTDT", "TRDT"],
"query": {
"bool": {
"should": [
{"wildcard": {"N": "TEST*"}}
]
}
}
}, size=10
)
for doc in res['hits']['hits']:
print(doc)
You can use a bool query with two must clauses, like this:
{
"_source": ["DTDT", "TRDT", "VT"],
"query": {
"bool": {
"must": [
{
"range": {
"VT": {
"gte": "now/d",
"lt": "now/d+13h"
}
}
},
{
"wildcard": {
"N": "TEST*"
}
}
]
}
}
}
Check out the docs for the bool query.
This will help you:
POST _search
{
"query": {
"bool": {
"must": [
{
"range": {
"VT": {
"gte": "now/d",
"lt": "now/d+13h"
}
}
},
{
"match": {
"N": {
"query": "TEST",
"operator": "and"
}
}
}]
}
}
}

ElasticSearch 5.*, query for: field not exist or if exist value should be this

Consider below Stop field is the timestamp field.
i want to filter data with below condition:
stop field not exist
or, stop field value is >= now
I know, i should use must_not but cannot figure out how.
I want to do some scoring on child type and use this score to sort parent, then filter out parent using stop field.
GET indexName/parentType/_search
{
"query": {
"bool": {
"must": [
{
"has_child": {
"type": "child-type",
"score_mode": "max",
"query": {
"function_score": {
"functions": [
{
"script_score": {
"script": {
"file": "score-analytics",
"lang": "expression"
}
}
}
]
}
}
}
},
{
"bool": {
"should": [
{
"range": {
"stop": {
"gte": "now"
}
}
}
]
}
}
]
}
}
}
You need to use exist filter:
"bool":
{ "should":[
{ "range":
{ "stop":
{ "gte": "now" }
}
},
{ "query":
{ "exists" :
{ "field" : "stop" }
}
}
] }
What if you use, must_not bool condition in order to filter that the field does not exist. Your query could look something like this:
"query": {
"filtered": {
"filter": {
"bool": {
"must_not": [
{
"exists": {
"field": "stop" <-- give the field which shouldn't exist
}
}
]
}
}
}
}
The above is a sample so that you could reproduce. For the second condition, seems like using range query as you've done would do. I can't pretty much assure a better way of getting a timestamp range. Hope it helps!

Filter with match_all VS query

I have 2 types of queries. They are both logically identical however I'm not sure if there is any performance difference between the two.
I will be glad if someone can enlighten me.
Using match_all and filter:
{
"query": {
"filtered": {
"query": {
"term": {
"user_id": "1234567"
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"ephoc_date": {
"lt": 1437033590,
"gte": 1437026390
}
}
}
]
}
}
}
}
}
Using term query:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"term": {
"user_id": "1234567"
}
},
{
"range": {
"ephoc_date": {
"lt": 1437033590,
"gte": 1437026390
}
}
}
]
}
}
}
}
}
Looking at your query it seems like you don't care about how documents are scored based on the value of user_id field being "1234567". What I mean to say is - If more than one document have user_id set to "1234567", you don't care about the order of documents in the result. If that is the case, 2nd option is better with respect to performance because there is some computation cost associated with scoring in the 1st query while there is no scoring in the 2nd query. By the way, your 2nd query can also be simplified to below:
{
"filter": {
"bool": {
"must": [
{
"term": {
"user_id": "1234567"
}
},
{
"range": {
"ephoc_date": {
"lt": 1437033590,
"gte": 1437026390
}
}
}
]
}
}
}

Resources