Different boosting for the same field in different types in Elasticsearch 2.x with multi_match query - elasticsearch

I am trying to do the following as described in the documentation (which is maybe outdated at present date).
https://www.elastic.co/guide/en/elasticsearch/guide/current/mapping.html
I will adapt the scenario described there to what I want to achieve.
Imagine that we have two types in our index: blog_t1 for blog posts
about Topic 1, and blog_t2 for blog posts about Topic 2. Both types
have a title field.
Then, I want to apply query boosting to the title field for blog_t1
only.
In previous versions of Elasticsearch, you could reference the field
from the type by using blog_t1.title and blog_t2.title. So boosting
one of them was as simple as blog_t1.title^2.
But since Elasticsearch 2.x, some old support for types have been removed (for good reasons, like removing ambiguity). Those changes are described here.
https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking_20_mapping_changes.html
So my question is, how can I do that boosting for the title, just for the type blog_t1, and not blog_t2, with Elasticsearch 2.x, in a multi_match query?
The query would be something like this, but this obviously does not work as type.field is not a thing anymore.
GET /my_index/_search
{
"query": {
"multi_match": {
"query": "Hello World",
"fields": [
"blog_t1.title^2",
"blog_*.title",
"author",
"content"
]
}
}
}
FYI, the only solution I found so far is to give the titles different names, like title_boosted for blog_t1 and just title for the others, which is problematic when making use of the information, as I can no longer use the "title" as a unique thing.
Thanks.

What about adding another "optional" constraint for the document type so docs matching it have more score (you can tune it with boosting) like:
{
"query" : {
"bool" :
{
"must" :
[
{"match" : {"title" : "Hello world"}}
],
"should" :
[
{"match" : {"_type" : "blog_t1"}}
]
}
}
}
Or with score functions:
{
"query": {
"function_score": {
"query": {
"match": {
"title": "Hello world"
}
},
"boost_mode": "multiply",
"functions": [
{
"filter": {
"term": {
"_type": "blog_t1"
}
},
"weight": 2
},
{
"filter": {
"term": {
"_type": "blog_t2"
}
},
"weight": 3
}
]
}
}
}

Related

How to write ElasticSearch query with AND condition

I am trying to write an elastic search query for searching the data with two.conditions something as below
{
"query": {
"match": {
"trackingId": "track4324234234244",
"log_message": "downstream request-response"
}
}
}
The above query wont work because [match] query doesn't support multiple fields. Is there a way I can achieve this.
You can use Bool query, where a must clause can be used.
must means: The clause (query) must appear in matching documents. These clauses must match, like logical AND.
To know about the difference between must and should refer to this SO answer
Adding Working example with sample docs and search query
Index Sample Data:
{
"trackingId":"track4324234234244",
"log_message":"downstream request-response"
}
{
"trackingId":"track4324234234244",
"log_message":"downstream"
}
{
"trackingId":"tracks4324234234244",
"log_message":"downstream request-response"
}
Search query:
{
"query": {
"bool": {
"must": [
{
"match": {
"trackingId": "track4324234234244"
}
},
{
"match": {
"log_message": {
"query": "downstream request-response",
"operator": "and"
}
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "my_index",
"_type": "_doc",
"_id": "1",
"_score": 1.8570712,
"_source": {
"trackingId": "track4324234234244",
"log_message": "downstream request-response"
}
}
]
Apart from Bool, you can also make use of simple query string as mentioned below:
POST <your_index_name>/_search
{
"query": {
"simple_query_string": {
"fields": ["trackingId", "log_message"],
"query": "track4324234234244 downstream request-response",
"default_operator": "AND"
}
}
}
Note how I've just added all the terms and made use of default_operator: AND so that it returns only documents having all the terms present in the fields.
There is also query_string however I would recommend using the above one as query_string works in strict fashion meaning, it would throw errors if the query string has any syntax errors while simple_query_string does not.
POST <your_index_name>/_search
{
"query": {
"query_string": {
"fields": ["trackingId", "log_message"],
"query": "(track4324234234244) AND (downstream request-response)",
"default_operator": "AND"
}
}
}
So as to when to use simple_query_string, mostly only if you would want to expose the query string or terms to end user, at that point which this would be useful.
Hope that helps!

Elastic Search Query (a like x and y) or (b like x and y)

Some background info: In the bellow example user searched for "HTML CSS". I split each word from the search string and created the SQL query seen bellow.
Now I am trying to make an elastic search query that has the same logic as the following SQL query:
SELECT
title, description
FROM `classes`
WHERE
(`title` LIKE '%html%' AND `title` LIKE '%css%') OR
(description LIKE '%html%' AND description LIKE '%css%')
Currently, half way there but can't seem to get it right yet.
{
"query": {
"bool": {
"must": [
{
"term": {
"title": "html"
}
},
{
"term": {
"title": "css"
}
}
]
}
},
"_source": [
"title"
],
"size": 30
}
Now I need to find how to add follow logic
OR (description LIKE '%html%' AND description LIKE '%css%')
One important point is that I need to only fetch documents that have both words in either title or disruption. I don't want to fetch documents that have only 1 word.
I will update questions as I find more info.
Update: The chosen answer also provides a way to boost scoring based on the field.
Can you try following query. You can use should for making or operation
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"match": { // Go for term if your field is analyzed
"title": {
"query": "html css",
"operator": "and",
"boost" : 2
}
}
}
]
}
},
{
"bool": {
"must": [
{
"match": {
"description": {
"query": "html css",
"operator": "and"
}
}
}
]
}
}
],
"minimum_number_should_match": 1
}
},
"_source": [
"title",
"description"
]
}
Hope this helps!!
I feel most appropriate query to be used in this case is multi_match.
multi_match query is convenient way of running the same query on
multiple fields.
So your query can be written as:
GET /_search
{
"_source": ["title", "description"],
"query": {
"multi_match": {
"query": "html css",
"fields": ["title^2", "description"],
"operator":"and"
}
}
}
_source filters the dataset so that only fields mentioned in array
will be displayed in results.
^2 denotes boosting title field with the number 2
operator:and makes sure that all terms in query must be matched
in either fields
From the elasticsearch 5.2 doc:
One option is to use the nested datatype instead of the object datatype.
More details here: https://www.elastic.co/guide/en/elasticsearch/reference/5.2/nested.html
Hope this helps

Elasticsearch 5.1: applying additional filters to the "more like this" query

Building a search engine on top of emails. MLT is great at finding emails with similar bodies or subjects, but sometimes I want to do something like: show me the emails with similar content to this one, but only from joe#yahoo.com and only during this date range. This seems to have been possible with ES 2.x, but it seems that 5.x doesn't allow allow filtration on fields other than that being considered for similarity. Am I missing something?
i still can't figure how to do what i described. Imagine I have an index of emails with two types for the sake of simplicity: body and sender. I know now to find messages that are restricted to a sender, the posted query would be something like:
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"term": {
"sender": "mike#foo.com"
}
}
]
}
}
}
}
}
Similarly, if I wish to know how to find messages that are similar to a single hero message using the contents of the body, i can issue a query like:
{
"query": {
"more_like_this": {
"fields" : ["body"],
"like" : [{
"_index" : "foo",
"_type" : "email",
"_id" : "a1af33b9c3dd436dabc1b7f66746cc8f"
}],
"min_doc_freq" : 2,
"min_word_length" : 2,
"max_query_terms" : 12,
"include" : "true"
}
}
}
both of these queries specify the results by adding clauses inside the query clause of the root object. However, any way I try to put these together gives me parse exceptions. I can't find any examples of documentations that would say, give me emails that are similar to this hero, but only from mike#foo.com
You're almost there, you can combine them both using a bool/filter query like this, i.e. make an array out of your filter and put both constraints in there:
{
"query": {
"bool": {
"filter": [
{
"term": {
"sender": "mike#foo.com"
}
},
{
"more_like_this": {
"fields": [
"body"
],
"like": [
{
"_index": "foo",
"_type": "email",
"_id": "a1af33b9c3dd436dabc1b7f66746cc8f"
}
],
"min_doc_freq": 2,
"min_word_length": 2,
"max_query_terms": 12,
"include": "true"
}
}
]
}
}
}

Elasticsearch ORing and ANDing mixed in "must" query

I want to mixed required values of properties to be ANDed that ORed. After several helpful comments I reworded the question and came up with this:
GET /index/docs/_search
{
"size": 20,
"query": {
"bool": {
"must": [{
"terms": {
"category": ["cat1", "cat2"]
},
"terms": {
"category": ["cat3"]
}
}]
}
}
}
In the query above (with a "bool" "must" form) does this say that a doc MUST have cat1 AND cat2 OR cat3 in the "category" property?
NOTE: This is significantly modified from the original question.
You use to bool query to combine several different types of queries together. You would use it as you did only if you wanted to add another query like a filter or must not etc. along with your terms query. This will still work but it will simply be rewritten into a simpler terms query anyways which will take more time.
The terms query is sufficient for what you want to do and it will return all the documents that contain either cat1 or cat2 in categories.
So your query could just be like :
{
"constant_score" : {
"filter" : {
"terms" : { "categories" : ["cat1", "cat2"]}
}
}
}
You can use this same query inside a must query as well if you want to combine it with some other query.
Take a look at this and this for reference.
EDIT: For the other case ie. and you could use the following query:
"query": {
"bool": {
"should": [
{ "match": { "category": "cat1" }},
{ "match": { "category": "cat2" }}
],
"minimum_should_match": 2
}
}
You can combine all sorts of queries to depending upon what you really want to do. The elasticsearch query DSL documnetation is quite comprehensive in my opinion, you shoukd take a look. Also look at this, there are many diiferent ways of doing what you want. Keep in mind that term query look for exact matches ie. case sensitive while the match queries rely on analyzed data, so the result can be diffrent in both cases.
According to your latest comment, if you want to AND your categories, the correct way to go is by using bool/must, like this:
"query": {
"bool": {
"must": [
{ "match": { "category": "cat1" }},
{ "match": { "category": "cat2" }}
]
}
}
bool/should + minimum_should_match would work, too, but bool/must is more straightforward and doesn't require you to know how many clauses you have.

Must match multiple values

I have a query that works fine when I need the property of a document
to match just one value.
However I also need to be able to search with must with two values.
So if a banana has id 1 and a lemon has id 2 and I search for yellow
I will get both if I have 1 and 2 in the must clause.
But if i have just 1 I will only get the banana.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match":
{ "fruit.color": "yellow" }}
],
"must" : [
{ "match": { "fruit.id" : "1" } }
]
}
}
}
I havenĀ“t found a way to search with two values with must.
is that possible?
If the document "must" be returned only if the id is 1 or 2, that sounds like another should clause. If I'm understanding your question properly, you want documents with either id 1 OR id 2. Additionally, if the color is yellow, give it a higher score.
Here's one way you might achieve what you're looking for:
{
"query": {
"bool": {
"should": {
"match": {
"fruit.color": "yellow"
}
},
"must": {
"bool": {
"should": [
{
"match": {
"fruit.id": "1"
}
},
{
"match": {
"fruit.id": "2"
}
}
]
}
}
}
}
}
Here I put the two match queries in the should clause of a separate bool query. This achieves the OR behavior you are looking for.
Have another look at the Bool Query documentation and take note of the nuances of should. It behaves differently by default depending on whether or not there is a sibling must clause and whether or not the bool query is being executed in filter context.
Another key option that is adjustable and can help you achieve your expected results is the minimum_should_match parameter. Have a look at this documentation page.
Instead of a match query, you could simply try the terms query for ORing between multiple terms.
Match queries are generally used for analyzed fields. For exact matching, you should use term queries
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match": { "fruit.color": "yellow" } }
],
"must" : [
{ "terms": { "fruit.id": ["1","2"] } }
]
}
}
}
term or terms query is the perfect way to fetch the exact text or id, using match query result in search inside the id or text
Ex:
id = '4'
id = '44'
Search using match query with id = 4 return both 4 & 44 since it matches 4 in both. This is where terms query come into play.
same search using terms query will return 4 only.
So the accepted is absolutely wrong. Use the #Rahul answer. Just one more thing you need to do, Instead of text you need to analyse the field as a keyword
Example for indexing a field both as a text and keyword (mapping is for flat level for nested change it accordingly).
{
"index_patterns": [ "test" ],
"mappings": {
"kb_mapping_doc": {
"_source": {
"enabled": true
},
"properties": {
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
using #Rahul's answer doesn't worked because you might be analysed as a text.
id - access a text field
id.keyword - access a keyword field
it would be
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [{
"match": {
"color": "yellow"
}
}],
"must": [{
"terms": {
"id.keyword": ["1", "2"]
}
}]
}
}
}
So I would say accepted answer will return falsy results Please use #Rahul's answer with the corresponding mapping.

Resources