How does the flow works in elasticsearch queries? - elasticsearch

I have written a query which has couple of condition as shown below.
GET /agreement/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "T-0668",
"fields": [
"agreecondition.agreementId",
"agreecondition.conditionContractId"
]
}
},
,
{
"range": {
"agreecondition.validFrom": {
"gte": "02/18/2019"
}
}
},
{
"range": {
"agreecondition.validTo": {
"lte": "03/07/2019"
}
}
}
],
"filter": [
{
"terms": {
"agreecondition.promotionId.keyword": [
"x",
"y"
]
}
}
]
}
}
}
My question is how the flow works?
Ex: Does the ES first gets the results for the must condition's multi-match and on the output of the multi-match, does the range condition applies? followed by filter(on top of the output of the range condition)?
I just wanted to get a clarity on this, if my assumption is wrong, then i need to re-write the query.

You can check elasticsearch official blog on query execution order to understand this in details but you might just not get all the details you are looking for, due to limitation elastic put as mentioned at the end of the blog:
Q: How can I check which query/filter got executed first?
A: We don't really expose this information, which is very internal. However if you
check the output of the profile API, you can count how many times
nextDoc/advance have been called on the one hand, and matches on the
other hand. Query nodes that have the higher counts have been run
first.
Note: Profile API will be very handful for you as suggested in the blog as well.

Related

Elasticsearch: How to write an 'OR' clause in filter context?

I'm looking for syntax/example compatible with ES version is 6.7.
I have seen the docs, I don't see any examples for this and the explanation isn't clear enough to me. I have tried writing query according to that, but I keep on getting syntax error. I have seen below questions on SO already but they don't help me:
Filter context for should in bool query (Elasticsearch)
It doesn't have any example.
Multiple OR filter in Elasticsearch
I get a syntax error
"type": "parsing_exception",
"reason": "no [query] registered for [filtered]",
"line": 1,
"col": 31
Maybe it's for a different version of ES.
All I need is a simple example with two 'or'ed conditions (mine is one range and one term but I guess that shouldn't matter much), both I would like to have in filter context (I don't care about scores, nor text search).
If you really need it, I can show my attempts (need to remove some 'sensitive'(duh) parts from it before posting), but they give parsing/syntax errors so I don't think there is any sense in them. I am aware that questions which don't show any efforts are considered bad for SO but I don't see any logic in showing attempts that aren't even parsed successfully, and any example would help me understand the syntax.
You need to wrap your should query in a filter query.
{
"query":{
"bool":{
"filter":[{
"bool":{
"should":[
{ // Query 1 },
{ // Query 2 }
]
}
}]
}
}
}
I had a similar scenario (even the range and match filter), with one more nested level, two conditions to be 'or'ed (as in your case) and another condition to be logically 'and'ed with its result. As #Pierre-Nicolas Mougel suggested in another answer I had nested bool clauses with one more level around the should clause.
{
"_source": [
"my_field"
],
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"range": {
"start": {
"gt": "1558878457851",
"lt": "1557998559147"
}
}
},
{
"range": {
"stop": {
"gt": "1558898457851",
"lt": "1558899559147"
}
}
}
]
}
},
{
"match": {
"my_id": "<My_Id>"
}
}
],
"must_not": []
}
}
}
},
"from": 0,
"size": -1,
"sort": [],
"aggs": {}
}
I read in the docs that minimum_should_match can be used too for forcing filter context. This might help you if this query doesn't work.

Is there a way for me to conditionally link multiple indices in a query?

Within my cluster I have two indices, one containing employee information and group memberships and another containing group information such as id, names, proxy_id etc. These will be referred to as the employee index and the group index respectively.
I would like to construct a query which will only return documents to users provided they are a part of the correct group. The unfortunate part is that documents will only have one of several proxy_ids attached to them. In order to return the correct document my query will need to check the user index and verify that their group memberships are appropriate by checking each group they are apart of and checking each proxy address against the current document.
I understand that this needs to fall inside a 'must' condition however I am unsure how to structure this or if it is indeed possible.
Any advice would be greatly appreciated. Thanks!
This is for an Elastic Search Cluster running Elastic 6.6
There is no efficient way of doing Joins in elasticsearch, you could however add the information you need to the user documents.
You should try to denormalize your data as much as possible. Try to store all the information you need in one document - dont care about duplicated data.
You can add multiple indices .. say in the node js.. something like this
elasticClient.search({
index: 'pubmed-index,biorxiv-index',
type: 'pubmed-paper,biorxiv-paper',
"size": 10,
body: {
"query": {
"bool": {
"must": [
{
"query_string": {
"query": `${searchKeyword}`,
"default_operator": "AND",
"fields":[]
}
},
{
"query_string": {
"query": `${pubType[0]}`,
"fields": [
"Publication Type"
]
}
},
{
"query_string": {
"query": `${searchJournal}`,
"fields": [
"Journal"
]
}
}
]
,
"filter": [
{
"range": {
"Date Revised": {
"gte":`${searchGreaterYear}`,
"lte": `${searchLesserYear}`,
"format": "yyyy"
}
}
}
]
}
}
}
}).then(function (resp) {
console.log(searchPublicationType[0],searchLesserYear,searchGreaterYear,searchKeyword);
// return res.json(resp)
return res.json({source:resp.hits.hits,total:resp.hits})
// return res.json('Hi')
}, function (err) {
console.log(err.message);
return res.json(err.message)
})

ElasticSearch Query DSL Combine Terms and Wildcard

I have to distinct queries which are working well enough alone:
{"wildcard":{"city":"*Beach*"}}
{"terms":{"state":["Florida","Georgia"]}}
but trying to combine them into one query is proving to be quite the challenge.
I had thought just doing simply {{"wildcard":{"city":"*Beach*"}},{"terms":{"state":["Florida","Georgia"]}}} would do it, but it does not. So then I tried a few different iterations using arrays, and bool queries etc. Can someone point me in the correct direction?
Bool query should be the right way to go.
Below is an example for your use case:
{
"query": {
"bool": {
"must": [
{
"wildcard": { "city": "*Beach*" }
},
{
"terms": {
"state": [ "Florida", "Georgia" ]
}
}
]
}
}
}
If there is not result, it means that there is no entry matching both of the criteria.

Elasticsearch returning distinct results

Here is my ES query:
{
"fields": [
"news.authorname.raw",
"news.authorid"
],
"query": {
"filtered": {
"filter": {
"terms": {
"news.authorid": [
1,
2
]
}
}
}
}
}
With this query I get a list of pairs {authorid, authorname}. This list has repeated {authorid, authorname} values and I just need to get the same list but with no repetitions. This seems not that difficult or at least that is what I thought this morning. My small knowledge of ES together with the lack of documentation is making me desperate to find a solution to such a trivial problem.
Of course I could get the whole list and remove repetitions through code, but if it was possible I would prefere not to receive unnecessary data to have it removed afterwards.
Anyone can give a hand on that? Should I use some other approach?
Thanks in advance!!
I would suggest to use source filtering:
{
"_source": [ "news.authorname.raw", "news.authorid" ],
"query": {
"filtered": {
"filter": {
"terms": {
"news.authorid": [
1,
2
]
}
}
}
}
}
It is generally easier to handle than fields, which sometimes do look like a cartesian product.

Can _score from different queries be compared?

In my application, I issue multiple queries, each of which to a different index. Then, I merge the results from these queries, and sort them using the _score attribute, in order to rank them according to their relavance. But I wonder if this makes sense at all, since the results came from different queries?
I guess my question is: can _scores from different queries be compared?
Instead of issuing multiple queries , it would be a good idea to club them together in a single query.
You can use index query to do index specefic operation.
So something like
{
"bool": {
"should": [
{
"indices": {
"indices": [
"index1"
],
"query": {
"term": {
"tag": "wow"
}
}
}
},
{
"indices": {
"indices": [
"index2"
],
"query": {
"term": {
"name": "laptop"
}
}
}
}
]
}
}
Once this is done , results would be sorted based on the _score.
Hope that helps.

Resources