Elastic search query feasibility - Can we control execution of one filter, based on the output of another filter in elasticsearch? - elasticsearch

I have multiple filters inside a function as shown below in the sample query. I do not want to execute the second filter, if first filter is able to return me greater than 0 results. Can a control like this be achieved query level or if there are any options to achieve the same at query level only? My query looks like this :
"functions": [
{
"filter": {
"term": {
"display_text.keyword": "trai"
}
},
"weight": 10
},
{
"filter": {
"match": {
"display_text": {
"query": "trai",
"fuzziness": "AUTO:3,5"
}
}
},
"weight": 1
}
]

Related

multi fields search query for elasticsearch golang

I have a situation where I need to do elastic search based on multi-field. For Example: I have multiple fields in my postindex and I want to apply condition on four these fields (i.e. userid, channelid, createat, teamid) to meet my search requirement. When value of all these fields matched then search query displays results and if one of these is not match with values in postindex then it display no result.
I am trying to make a multifield search query for go-elasticsearch to search data from my post index. For the searcquery result four field must match otherwise it display 0 hit/no-result.
So, I think you need to write a following query :
GET postindex/_search
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"bool": {
"must": [
{
"term": {
"userid": {
"value": "mcqmycxpyjrddkie9mr13txaqe"
}
}
},
{
"term": {
"channelid": {
"value": "dnoihmrinins3qrm6bb9175ume"
}
}
},
{
"range": {
"createat": {
"gt": 1672909114890
}
}
}
]
}
},
{
"term": {
"teamid": {
"value": "qomrg11o8b8ijxoy8hrcnweoay"
}
}
}
]
}
}
}
In here, there is a bool query with should in parent scope, which is like OR. And inside the should there is another bool query with must which is like AND. We can also write the query shorter, but this will be better for you to understand.

Is it possible to affect execution order of filters in Elasticsearch?

We have a query of the form:
{
"query": {
"bool": {
"filter": [
{
"term": {
"userId": {
"value": "a_user_id",
"boost": 1
}
}
},
{
"range": {
"date": {
"from": 1648598400000,
"to": 1648684799999,
"boost": 1
}
}
},
{
"query_string": {
"query": "*MyQuery*",
"fields": [
"aField^1.0",
"anotherField^1.0",
"thirdField^1.0"
],
"boost": 1
}
}
],
"boost": 1
}
}
}
If we remove the third filter (the query_string one), performance is dramatically improved (typically going from around 2000 to 20 ms) for different variants of the above query.
The thing is, the first two filters (on userId and the date range) will always result in only a handful of search hits (say 50 or so).
So, if it was possible to hint that to Elasticsearch, or otherwise affect the query plan, it could solve our issue.
In old (1.x) versions of ES it seems that this was affected by the order of filters. from Elasticsearch: Order of filters for best performance:
"The order of filters in a bool clause is important for performance. More-specific filters should be placed before less-specific filters in order to exclude as many documents as possible, as early as possible. If Clause A could match 10 million documents, and Clause B could match only 100 documents, then Clause B should be placed before Clause A."
But newer versions are smarter - https://www.elastic.co/blog/elasticsearch-query-execution-order:
Q: Does the order in which I put my queries/filters in the query DSL matter?
A: No, because they will be automatically reordered anyway based on their respective costs and match costs.
But is it still possible to reach the desired outcome here by modifying the ES search request somehow?
Your query should be like below, so that filters run first and will only select ~50 or so documents and then your costly query_string (because of the leading wildcard) will only run on those 50 docs.
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "*MyQuery*",
"fields": [
"aField^1.0",
"anotherField^1.0",
"thirdField^1.0"
],
"boost": 1
}
}
],
"filter": [
{
"term": {
"userId": {
"value": "a_user_id",
"boost": 1
}
}
},
{
"range": {
"date": {
"from": 1648598400000,
"to": 1648684799999,
"boost": 1
}
}
}
],
"boost": 1
}
}
}

How can I score Elasticsearch matches for particular field names higher when using a full text search on _all?

I've setup an index that has many types representing user data such as a ShoppingList, Playlist, etc. Each type has an "identity_id" field for the user's unique identifier. I use the following query to search across all types and fields for a user (for a search function in a website):
GET _search
{
"query": {
"filtered": {
"query": {
"match_phrase_prefix": {
"_all": "awesome"
}
},
"filter": {
"match": {
"identity_id": 1
}
}
}
}
}
My questions are:
Is there a way to give a higher score to matches on fields that have "name" in the field name? For example, the ShoppingList type will have a shopping_list_name field, and I want a match on that to be higher than its other fields.
Is the above way of doing a full text search for a particular user (query then filter) the most efficient way? What about creating an index per user?
How about this query that boosts certain fields:
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "awesome",
"fields": [
"*_name",
"field*"
]
}
},
"functions": [
{
"weight": 2,
"filter": {
"multi_match": {
"query": "awesome",
"fields": [
"*_name"
]
}
}
},
{
"weight": 1,
"filter": {
"multi_match": {
"query": "awesome",
"fields": [
"field*"
]
}
}
}
]
}
}
}
What the query above does is to boost (weigth: 2) the *_name fields query and not do apply any boosting to fields called field*.
Is the above way of doing a full text search for a particular user (query then filter) the most efficient way? What about creating an index per user?
Regarding this ^ question, that's more complicated and you also need to consider how many users you have, the hardware resources the cluster has, structure of data, queries used etc.

Must match multiple values

I have a query that works fine when I need the property of a document
to match just one value.
However I also need to be able to search with must with two values.
So if a banana has id 1 and a lemon has id 2 and I search for yellow
I will get both if I have 1 and 2 in the must clause.
But if i have just 1 I will only get the banana.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match":
{ "fruit.color": "yellow" }}
],
"must" : [
{ "match": { "fruit.id" : "1" } }
]
}
}
}
I havenĀ“t found a way to search with two values with must.
is that possible?
If the document "must" be returned only if the id is 1 or 2, that sounds like another should clause. If I'm understanding your question properly, you want documents with either id 1 OR id 2. Additionally, if the color is yellow, give it a higher score.
Here's one way you might achieve what you're looking for:
{
"query": {
"bool": {
"should": {
"match": {
"fruit.color": "yellow"
}
},
"must": {
"bool": {
"should": [
{
"match": {
"fruit.id": "1"
}
},
{
"match": {
"fruit.id": "2"
}
}
]
}
}
}
}
}
Here I put the two match queries in the should clause of a separate bool query. This achieves the OR behavior you are looking for.
Have another look at the Bool Query documentation and take note of the nuances of should. It behaves differently by default depending on whether or not there is a sibling must clause and whether or not the bool query is being executed in filter context.
Another key option that is adjustable and can help you achieve your expected results is the minimum_should_match parameter. Have a look at this documentation page.
Instead of a match query, you could simply try the terms query for ORing between multiple terms.
Match queries are generally used for analyzed fields. For exact matching, you should use term queries
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match": { "fruit.color": "yellow" } }
],
"must" : [
{ "terms": { "fruit.id": ["1","2"] } }
]
}
}
}
term or terms query is the perfect way to fetch the exact text or id, using match query result in search inside the id or text
Ex:
id = '4'
id = '44'
Search using match query with id = 4 return both 4 & 44 since it matches 4 in both. This is where terms query come into play.
same search using terms query will return 4 only.
So the accepted is absolutely wrong. Use the #Rahul answer. Just one more thing you need to do, Instead of text you need to analyse the field as a keyword
Example for indexing a field both as a text and keyword (mapping is for flat level for nested change it accordingly).
{
"index_patterns": [ "test" ],
"mappings": {
"kb_mapping_doc": {
"_source": {
"enabled": true
},
"properties": {
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
using #Rahul's answer doesn't worked because you might be analysed as a text.
id - access a text field
id.keyword - access a keyword field
it would be
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [{
"match": {
"color": "yellow"
}
}],
"must": [{
"terms": {
"id.keyword": ["1", "2"]
}
}]
}
}
}
So I would say accepted answer will return falsy results Please use #Rahul's answer with the corresponding mapping.

Elastic Search - Sort By Doc Type

I have an elastic search index with 2 different doc types: 'a' and 'b'. I would like to sort my results by type and give preference to type='b' (even if it has a low score). I had been consuming the results of the search below at the client end and sorting them but I've realized that this approach does not work well since I am only inspecting the first 10 results which often does not contain any b's. Increasing the return results is not ideal. I'd like to get the elastic search to do the work.
http://<server>:9200/my_index/_search?q=london
You would need to play with function_score and, depending on how you already score your documents, test some weight values, boost_modes and score_modes for each type. For example:
GET /some_index/a,b/_search
{
"query": {
"function_score": {
"query": {
# your query here
},
"functions": [
{
"filter": {
"type": {
"value": "b"
}
},
"weight": 3
},
{
"filter": {
"type": {
"value": "a"
}
},
"weight": 1
}
],
"score_mode": "first",
"boost_mode": "multiply"
}
}
}
Its working for me.you will execute below commands at command Prompt.
curl -XGET localhost:9200/index_v1,index_v2/_search?pretty -d #boost.json
boost.json
{
"indices_boost" : {
"index_v2" : 1.4,
"index_v1" : 1.3
}
}

Resources