How to boost individual words in a elasticsearch match query - elasticsearch

Suppose I want to query "Best holiday places to visit during summer" in a Elasticsearch cluster. But I want holiday, visit and summer to have high priority than other words:
Something Like this: Best holiday^4 places to visit^3 during summer^2.
I know about field boosting but what I want to do is not achievable by boost.
Basically I want to boost individual words.
Does any one have any idea about doing this in Elasticsearch 5.6 above??

You could use query_string to boost individual terms like this:
{
"query" : {
"query_string" : {
"fields" : ["content", "name"],
"query" : "Best holiday^4 places to visit^3 during summer^2"
}
}
}

Related

Elastic Search - Conditional field query if no match found for another field

Is it possible to do conditional field query if match was not found for another field ?
for eg: if I have a 3 fields in the index local_rating , global_rating and default_rating , I need to first check in local_rating and if there is no match then try for global_rating and finally for default_rating .
is this possible to do with one query ? or any other ways to achieve this
thanks in advance
Not sure about any existing features of Elasticsearh to fulfill your current requirements but you can try with fields and per-fields boosting, Individual fields can be boosted with the caret (^)notation. Also I don't know boosting is possible with numeric value or not?
GET /_search
{
"query": {
"multi_match" : {
"query" : 10,
"fields" : [ "local_rating^6", "global_rating^3","default_rating"]
}
}
}
See: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#field-boost

how decrease score on TF in elasticsearch?

two docs: 1. "Some Important Company",2. "Some Important Company Important branch"
since "Important" have a high docCount(many docs has Important word), so when search on "Some Important Company"
the 2nd doc get a higher score, even though 1st doc has exactlly match.
so my question is how to boost score when exactlly matched or decrease the TF score?
my query is multi_match for customerName usedName,but usedName is all "" in this case
I assume the field of your document is indexed using a standard text analyzer or something of the like. I would combine a match query and a match_phrase query using a dismax compound query.
This would give something like that:
{
"query": {
"dis_max" : {
"queries" : [
{ "match" : { "myField" : "Some Important Company" }},
{ "match_phrase" : { "myField" : "Some Important Company" }}
],
"tie_breaker" : 0.7
}
}
}
There's no notion of "matching an exact phrase" with the match query. For this you need to use the match_phrase query. That's why you combine the two here. Using the dis_max, documents that match the two queries will get a boost. You can read more about dis_max and match_phrase:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase.html

ElasticSearch queries for Ngramm

I am trying to make search for such case
for example i have document
1)"There are a lot of diesel cars in the city"
2)"Cars have diesel engines"
3)"Bob sold diesel car"
and I want to find doc 1 and doc 3
if I wrote such query
"query":
{
"function_score":
{ "query":
{"bool":
{"should":[
{"query_string":
{ "fields" : ["text"],
"query" : "\"diesel car\"~1^5"
}}]}}}}
I will find doc1 but not doc3
Is it possible if i use Ngramm analyser this query will work also for doc3?
Or maybe there are other solutions?
Proximity search works only for totally exact phrases if only one character in word change then it's not work. Maybe ES have other solutions for that?
I found the solution
1)Use english stemmer to settings and mapping
2)Use simple query like
(diesel AND car)^5

Finding fields Elasticsearch has matched on

I am using Elasticsearch to search for a group a user should join. I have the user data nested into the search query. On return I get back the closest matched group that user should be in.
The field I am searching on is a nested field as follows:
`{"interests": [
{"topics":["python", "stackoverflow", "elasticsearch"]},
{"topics":["arts", "textiles"]}
]}`
However if you want an understanding of a match - how do you do this?
Elasticsearch does have an explain function which says what the scoring is made up of using tfidf, but not specifically what terms were used.
For example, if I search for 'Textile', the doc should match on 'textiles'. Thus I want the term 'textiles' to be returned in explain or some other way.
The only way I see that provides this need, is to store the search and the document retrieved and then process both to discover words ES has most likely matched on.
EDIT - for some more clarity of the question
An example in my index of a group which has "interests": ['arts', 'fine arts', 'art painting', 'arts and crafts', 'sports']
Now my search, I am looking for Arts and many other things. Now the term I am searching for comes up in this list many times, thus should always be a contributor.
What I want in the response is to say these words were matched ['arts', 'fine arts', 'art painting', 'arts and crafts']along with the degree to which they match i..e 'arts' should be higher than the others, but all others are also relevant
Elasticsearch allows you to specify the _name field for all queries and
filters. This means that you can separate your query into different parts with
separate names, which will allow you to determine which parts matched.
For example:
{
"query" : {
"bool" : {
"should" : [
{"match" : { "interests.topics" : {"query" : "python", "_name" : "py-topic"} }},
{"match" : { "interests.topics" : {"query" : "arts", "_name" : "arts-topic"} }}
]
}
}
}
Then, in your response, you will get back any array of which queries (or
filters) matched and you can determine if the py-topic query and/or the
arts-topic query matched above.

How to enable fuzziness for phrase queries in ElasticSearch

We're using ElasticSearch for searching through millions of tags. Our users should be able to include boolean operators (+, -, "xy", AND, OR, brackets). If no hits are returned, we fall back to a spelling suggestion provided by ES and search again. That's our query:
$ curl -XGET 'http://127.0.0.1:9200/my_index/my_type/_search' -d '
{
"query" : {
"query_string" : {
"query" : "some test query +bools -included",
"default_operator" : "AND"
}
},
"suggest" : {
"text" : "some test query +bools -included",
"simple_phrase" : {
"phrase" : {
"field" : "my_tags_field",
"size" : 1
}
}
}
}
Instead of only providing a fallback to spelling suggestions, we'd like to enable fuzzy matching. If, for example, a user searches for "stackoverfolw", ES should return matches for "stackoverflow".
Additional question: What's the better performing method for "correcting" spelling errors? As it is now, we have to perform two subsequent requests, first with the original search term, then with the by ES suggested term.
The query_string does support some fuzziness but only when using the ~ operator, which I think doesn't your usecase. I would add a fuzzy query then and put it in or with the existing query_string. For instance you can use a bool query and add the fuzzy query as a should clause, keeping the original query_string as a must clause.
As for your additional question about how to correct spelling mistakes: I would use fuzzy queries to automatically correct them and two subsequent requests if you want the user to select the right correction from a list (e.g. Did you mean), but your approach sounds good too.

Resources