Elasticsearch ORing and ANDing mixed in "must" query - elasticsearch

I want to mixed required values of properties to be ANDed that ORed. After several helpful comments I reworded the question and came up with this:
GET /index/docs/_search
{
"size": 20,
"query": {
"bool": {
"must": [{
"terms": {
"category": ["cat1", "cat2"]
},
"terms": {
"category": ["cat3"]
}
}]
}
}
}
In the query above (with a "bool" "must" form) does this say that a doc MUST have cat1 AND cat2 OR cat3 in the "category" property?
NOTE: This is significantly modified from the original question.

You use to bool query to combine several different types of queries together. You would use it as you did only if you wanted to add another query like a filter or must not etc. along with your terms query. This will still work but it will simply be rewritten into a simpler terms query anyways which will take more time.
The terms query is sufficient for what you want to do and it will return all the documents that contain either cat1 or cat2 in categories.
So your query could just be like :
{
"constant_score" : {
"filter" : {
"terms" : { "categories" : ["cat1", "cat2"]}
}
}
}
You can use this same query inside a must query as well if you want to combine it with some other query.
Take a look at this and this for reference.
EDIT: For the other case ie. and you could use the following query:
"query": {
"bool": {
"should": [
{ "match": { "category": "cat1" }},
{ "match": { "category": "cat2" }}
],
"minimum_should_match": 2
}
}
You can combine all sorts of queries to depending upon what you really want to do. The elasticsearch query DSL documnetation is quite comprehensive in my opinion, you shoukd take a look. Also look at this, there are many diiferent ways of doing what you want. Keep in mind that term query look for exact matches ie. case sensitive while the match queries rely on analyzed data, so the result can be diffrent in both cases.

According to your latest comment, if you want to AND your categories, the correct way to go is by using bool/must, like this:
"query": {
"bool": {
"must": [
{ "match": { "category": "cat1" }},
{ "match": { "category": "cat2" }}
]
}
}
bool/should + minimum_should_match would work, too, but bool/must is more straightforward and doesn't require you to know how many clauses you have.

Related

ElasticSearch: obtaining individual scores from each query inside of a bool query

Assume I have a compound bool query with various "must" and "should" statements that each may include different leaf queries including "multi-match" and "match_phrase" queries such as below.
How can I get the score from individual queries packed into a single query?
I know one way could be to break it down into multiple queries, execute each, and then aggregate the results in code-level (not query-level). However, I suppose that is less efficient, plus, I lose sorting/pagination/.... features from ElasticSearch.
I think "Explanation API" is also not useful for me since it provides very low-level details of scoring (inefficient and hard to parse) while I just need to know the score for each specific leaf query (which I've also already named them)
If I'm wrong on any terminology (e.g. compound, leaf), please correct me. The big picture is how to obtain individual scores from each sub-query inside of a bool query.
PS: I came across Different score functions in bool query. However, it does not return the scores. If I wrap my queries in "function_score", I want the scoring to be default but obtain the individual scores in response to the query.
Please see the snippet below:
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "...",
"fields": [
"field1^3",
"field2^5"
],
"_name": "must1_mm",
"boost": 3
}
}
],
"should": [
{
"multi_match": {
"query": "...",
"fields": [
"field3^2",
"field4^5"
],
"boost": 2,
"_name": "should1_mm",
"boost": 2
}
},
{
"match_phrase": {
"field5": {
"_name": "phrase1",
"boost": 1.5,
"query": "..."
}
}
},
{
"match_phrase": {
"field6": {
"_name": "phrase2",
"boost": 1,
"query": "..."
}
}
}
]
}
}
}```

Give more score to documents that contains all query terms

I have a problem with scoring in elasticsearch. When user enter a query that contains 3 terms, sometimes a document that has two words a lot, outscores a document that contains all three words. for example if user enters "elasticsearch query tutorial", I want documents that contains all these words score higher than a document with a lot of "tutorial" and "elasticsearch" terms in it.
PS: I am using minimum should match and shingls in my query. also they made ranking a lot better, they did not solve this problem completely. I need something like query coordination in lucene's practical scoring function. is there anything like that in elastic with BM-25?
One of the possible solutions could be using function score:
{
"query": {
"function_score": {
"query": { "match_all": {} },
"functions": [
{
"filter": { "match": { "title": "elasticserch" } },
"weight": 1
},
{
"filter": { "match": { "title": "tutorial" } },
"weight": 1
}
],
"score_mode": "sum"
}
}
}
In this case, you would have clearly a better position for documents with more matches. However, this would completely ignore TF-IDF or any other parameters.

Elasticsearch query to search across indexes for geo and non-geo data?

I have two indexes, businesses and categories. Each have different mappings, one of those differences being that businesses has a geofield (lat/lng) associated with it. I would like to perform a query where a user could perform an autocomplete search that would span the two indexes (think Yelp.com). Additionally, the user's location would be provided so that only businesses in some x distance would appear. However, any categories that match the search should appear, as it doesn't matter where a user is located when returning categories. Because I do not associate categories with a geofield, I'm getting an error that the geofield property can't be found, and rightfully so (it works when I just query businesses). Is there a way to structure my query so that one statement only looks at one index, and one statement only looks at another index? Or do I need to give the categories some "dummy" geofield that would be ignored by adding a type property and using an or operator for "type":"category"? I.e., "matches the geoquery OR is type:category".
If I understand what you want to do, it can be rephrased as a boolean expression, like:
("index == businesses" AND "<geoquery> is OK") OR ("index == categories" AND "<categoryquery> is OK")
Here are a few hints to achieve this query :
An OR query can be defined in elasticsearch as a "bool" query with 2 or more "should" clauses and "minimum_should_match" set to 1
An AND query can be defined in elasticsearch as a "bool" query with 2 or more "must" clauses
You can check the index in your 2 subqueries using the "_index" field:
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-index-field.html
You will have to define the geopoint field in the mapping of "categories" index (not necessarily in the json documents of "categories" index)
You did not provide the geoquery and categoryquery, so i'll let them as placeholders, you will just have to replace them.
You should try something like this (elasticsearch v5.2.2 syntax, should work in elasticsearch v2.0 too) :
GET businesses,categories/_search
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [{
"bool": {
"must": [
{
"term": {
"_index": "businesses"
}
},
{
"<your_geoip_query>": {
<your_geoip_query_params>
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"_index": "categories"
}
},
{
"<your_category_query>": {
<your_category_query_params>
}
}
]
}
}
]
}
}
}

Different boosting for the same field in different types in Elasticsearch 2.x with multi_match query

I am trying to do the following as described in the documentation (which is maybe outdated at present date).
https://www.elastic.co/guide/en/elasticsearch/guide/current/mapping.html
I will adapt the scenario described there to what I want to achieve.
Imagine that we have two types in our index: blog_t1 for blog posts
about Topic 1, and blog_t2 for blog posts about Topic 2. Both types
have a title field.
Then, I want to apply query boosting to the title field for blog_t1
only.
In previous versions of Elasticsearch, you could reference the field
from the type by using blog_t1.title and blog_t2.title. So boosting
one of them was as simple as blog_t1.title^2.
But since Elasticsearch 2.x, some old support for types have been removed (for good reasons, like removing ambiguity). Those changes are described here.
https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking_20_mapping_changes.html
So my question is, how can I do that boosting for the title, just for the type blog_t1, and not blog_t2, with Elasticsearch 2.x, in a multi_match query?
The query would be something like this, but this obviously does not work as type.field is not a thing anymore.
GET /my_index/_search
{
"query": {
"multi_match": {
"query": "Hello World",
"fields": [
"blog_t1.title^2",
"blog_*.title",
"author",
"content"
]
}
}
}
FYI, the only solution I found so far is to give the titles different names, like title_boosted for blog_t1 and just title for the others, which is problematic when making use of the information, as I can no longer use the "title" as a unique thing.
Thanks.
What about adding another "optional" constraint for the document type so docs matching it have more score (you can tune it with boosting) like:
{
"query" : {
"bool" :
{
"must" :
[
{"match" : {"title" : "Hello world"}}
],
"should" :
[
{"match" : {"_type" : "blog_t1"}}
]
}
}
}
Or with score functions:
{
"query": {
"function_score": {
"query": {
"match": {
"title": "Hello world"
}
},
"boost_mode": "multiply",
"functions": [
{
"filter": {
"term": {
"_type": "blog_t1"
}
},
"weight": 2
},
{
"filter": {
"term": {
"_type": "blog_t2"
}
},
"weight": 3
}
]
}
}
}

Must match multiple values

I have a query that works fine when I need the property of a document
to match just one value.
However I also need to be able to search with must with two values.
So if a banana has id 1 and a lemon has id 2 and I search for yellow
I will get both if I have 1 and 2 in the must clause.
But if i have just 1 I will only get the banana.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match":
{ "fruit.color": "yellow" }}
],
"must" : [
{ "match": { "fruit.id" : "1" } }
]
}
}
}
I havenĀ“t found a way to search with two values with must.
is that possible?
If the document "must" be returned only if the id is 1 or 2, that sounds like another should clause. If I'm understanding your question properly, you want documents with either id 1 OR id 2. Additionally, if the color is yellow, give it a higher score.
Here's one way you might achieve what you're looking for:
{
"query": {
"bool": {
"should": {
"match": {
"fruit.color": "yellow"
}
},
"must": {
"bool": {
"should": [
{
"match": {
"fruit.id": "1"
}
},
{
"match": {
"fruit.id": "2"
}
}
]
}
}
}
}
}
Here I put the two match queries in the should clause of a separate bool query. This achieves the OR behavior you are looking for.
Have another look at the Bool Query documentation and take note of the nuances of should. It behaves differently by default depending on whether or not there is a sibling must clause and whether or not the bool query is being executed in filter context.
Another key option that is adjustable and can help you achieve your expected results is the minimum_should_match parameter. Have a look at this documentation page.
Instead of a match query, you could simply try the terms query for ORing between multiple terms.
Match queries are generally used for analyzed fields. For exact matching, you should use term queries
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match": { "fruit.color": "yellow" } }
],
"must" : [
{ "terms": { "fruit.id": ["1","2"] } }
]
}
}
}
term or terms query is the perfect way to fetch the exact text or id, using match query result in search inside the id or text
Ex:
id = '4'
id = '44'
Search using match query with id = 4 return both 4 & 44 since it matches 4 in both. This is where terms query come into play.
same search using terms query will return 4 only.
So the accepted is absolutely wrong. Use the #Rahul answer. Just one more thing you need to do, Instead of text you need to analyse the field as a keyword
Example for indexing a field both as a text and keyword (mapping is for flat level for nested change it accordingly).
{
"index_patterns": [ "test" ],
"mappings": {
"kb_mapping_doc": {
"_source": {
"enabled": true
},
"properties": {
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
using #Rahul's answer doesn't worked because you might be analysed as a text.
id - access a text field
id.keyword - access a keyword field
it would be
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [{
"match": {
"color": "yellow"
}
}],
"must": [{
"terms": {
"id.keyword": ["1", "2"]
}
}]
}
}
}
So I would say accepted answer will return falsy results Please use #Rahul's answer with the corresponding mapping.

Resources