Filter with match_all VS query - performance

I have 2 types of queries. They are both logically identical however I'm not sure if there is any performance difference between the two.
I will be glad if someone can enlighten me.
Using match_all and filter:
{
"query": {
"filtered": {
"query": {
"term": {
"user_id": "1234567"
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"ephoc_date": {
"lt": 1437033590,
"gte": 1437026390
}
}
}
]
}
}
}
}
}
Using term query:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"term": {
"user_id": "1234567"
}
},
{
"range": {
"ephoc_date": {
"lt": 1437033590,
"gte": 1437026390
}
}
}
]
}
}
}
}
}

Looking at your query it seems like you don't care about how documents are scored based on the value of user_id field being "1234567". What I mean to say is - If more than one document have user_id set to "1234567", you don't care about the order of documents in the result. If that is the case, 2nd option is better with respect to performance because there is some computation cost associated with scoring in the 1st query while there is no scoring in the 2nd query. By the way, your 2nd query can also be simplified to below:
{
"filter": {
"bool": {
"must": [
{
"term": {
"user_id": "1234567"
}
},
{
"range": {
"ephoc_date": {
"lt": 1437033590,
"gte": 1437026390
}
}
}
]
}
}
}

Related

need something like coalesce in elasticsearch

My current elasticsearch query is-
{
"must": [
{
"range": {
"firstClosedAt": {
"gte": 1667948400000,
"lte": 1668034800000
}
}
},
{
"term": {
"status": "CLOSED"
}
}
I want to modify it such that if "firstClosedAt" is null or not present then look for "closedAt".
Just like we have coalesce("firstClosedAt","closedAt") in sql
Help would be appreciated
There's no coalesce equivalent in ES, but you can do the query like below, which can read like: "either use firstClosedAt OR use closedAt if firstClosedAt does not exist":
{
"query": {
"bool": {
"filter": [
{
"term": {
"status": "CLOSED"
}
},
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"range": {
"firstClosedAt": {
"gte": 1667948400000,
"lte": 1668034800000
}
}
},
{
"bool": {
"must_not": {
"exists": {
"field": "firstClosedAt"
}
},
"filter": {
"range": {
"closedAt": {
"gte": 1667948400000,
"lte": 1668034800000
}
}
}
}
}
]
}
}
]
}
}
}
You could, however, create a much simpler query if you create another date field at indexing time which would either take the value of firstClosedAt or closedAt if firstClosedAt does not exist

Compond query with Elasticsearch

I'm trying to perform a search with the intended criteria being (activationDate in range 1598889600 to 1602051579) or someFlag=true.
Below is the query I tried, but it does not yield any records with someFlag=true (even with a big size, e.g. 5000). My Elasticsearch does have a lot of records with someFlag=true.
There are about 3000 total documents and this query returns around 280 documents.
{
"query": {
"bool": {
"must": [
{
"range": {
"activationDate": {
"gte": 1598889600
}
}
},
{
"range": {
"activationDate": {
"lte": 1602051579
}
}
}
],
"should": {
"match": {
"someFlag": true
}
}
}
},
"from": 1,
"size": 1000
}
Am I missing something?
This should work:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"should": [
{
"range": {
"activationDate": {
"gte": 1598889600,
"lte": 1602051579
}
}
},
{
"term": {
"someFlag": true
}
}
]
}
}
]
}
}
}
In theory this should do the same:
{
"query": {
"bool": {
"should": [
{
"range": {
"activationDate": {
"gte": 1598889600,
"lte": 1602051579
}
}
},
{
"term": {
"someFlag": true
}
}
]
}
}
}
However the first query I've given wraps bool clause within a filter context (so that it does not need to score and query becomes cacheable).
Your bool query might have not worked because you were using match query, not term. match is normally used for text search only.
Replace the must with an should and set minimum_should_match=1 as is is an OR query and you are fine if just one of the ceiterias is met by any record. Next reduce the two range criterias to just one, where you combine gte and lte.

With Elasticsearch, how to use an OR instead of AND within filter->terms query?

I have this following query with elastic:
{
"query": {
"bool": {
"filter": [{
"terms": {
"participants.group": ["group1","group2"]
}
}, {
"range": {
"recordDate": {
"gte": "2020-05-14 00:00:00.000",
"lte": "2020-07-22 20:30:56.566"
}
}
}]
}
}
}
Currently, this finds records with participants with group "group1" and "group2".
How to change the query so it finds records with participants from "group1" or "group2?
Is it possible to do it without changing the structure of the query?
I'm assuming that the field participants.group is of keyword type and not text type.
Assuming that, the query you have roughly translates to (group1) or (group2) or (group1 and group2).
All you need to do is modify the query as below and add a must_not clause like below:
POST my_filter_index/_search
{
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"range": {
"recordDate": {
"gte": "2020-05-14 00:00:00.000",
"lte": "2020-07-22 20:30:56.566"
}
}
}
],
"should": [
{
"terms": {
"participants.group": ["group1", "group2"]
}
}
]
}
}
],
"must_not": [
{
"bool": {
"must": [
{
"term": {
"participants.group": "group1"
}
},
{
"term": {
"participants.group": "group2"
}
}
]
}
}
]
}
}
}
Let me know if that works!

Query elasticsearch where a key's value is at least some number

I am processing files to recognize if they contain labels and what the confidence the label was recognized.
I created a nested mapping called tags which contains label (text) and confidence (float between 0 and 100).
Here is an example of how I think the query would work (I know it's invalid). It should be a something like "Find documents that have the tags labelled A and B. A must have a confidence of at least 37 and B must have a confidence of at least 80".
{
"query": {
"nested": {
"path": "tags",
"query": {
"bool": {
"must": [
{
"match": {
"tags.label": "A"
},
"range": {
"tags.confidence": {
"gte": 37
}
}
},
{
"match": {
"tags.label": "B"
},
"range": {
"tags.confidence": {
"gte": 80
}
}
}
]
}
}
}
}
}
Any ideas? I am pretty sure I need to approach it differently (different mapping). I am not sure how to accomplish this in ElasticSearch. Is this possible?
Let's say your parent document would contain two nested documents, something like below:
{
"tags":[
{
"label":"A",
"confidence":40
},
{
"label":"B",
"confidence":85
}
]
}
If that is the case, below is how your query would be:
Nested Query:
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "tags",
"query": {
"bool": {
"must": [
{
"match": {
"tags.label": "A"
}
},
{
"range": {
"tags.confidence": {
"gte": 37
}
}
}
]
}
}
}
},
{
"nested": {
"path": "tags",
"query": {
"bool": {
"must": [
{
"match": {
"tags.label": "B"
}
},
{
"range": {
"tags.confidence": {
"gte": 80
}
}
}
]
}
}
}
}
]
}
}
}
Note that each nested document is indexed as a separate document. That is the reason you have to mention two separate queries. Otherwise, with what you have what it does it, it would search all the four values inside one/single nested document of its parent document.
Hope this helps!

Putting two queries together

How am I able to put both of these queries together, as you can see that query one is bringing back all the date from today and the second query is bringing back data for all users that has the name test in it.
So I want to bring back all of the data for data with the name that has test in it.
Could someone show me how this is done please?
Query one:
{
"_source":["VT"],
"query": {
"range": {
"VT": {
"gte": "now/d",
"lt": "now/d+13h"
}
}}
}
Query two:
from elasticsearch import Elasticsearch
es = Elasticsearch(["9200"])
res = es.search(index="search", body=
{
"_source": ["DTDT", "TRDT"],
"query": {
"bool": {
"should": [
{"wildcard": {"N": "TEST*"}}
]
}
}
}, size=10
)
for doc in res['hits']['hits']:
print(doc)
You can use a bool query with two must clauses, like this:
{
"_source": ["DTDT", "TRDT", "VT"],
"query": {
"bool": {
"must": [
{
"range": {
"VT": {
"gte": "now/d",
"lt": "now/d+13h"
}
}
},
{
"wildcard": {
"N": "TEST*"
}
}
]
}
}
}
Check out the docs for the bool query.
This will help you:
POST _search
{
"query": {
"bool": {
"must": [
{
"range": {
"VT": {
"gte": "now/d",
"lt": "now/d+13h"
}
}
},
{
"match": {
"N": {
"query": "TEST",
"operator": "and"
}
}
}]
}
}
}

Resources