Elasticsearch match query and tokenization - elasticsearch

I wrote the following query concerning a field that is tokenized by whitespace :
"match" {
"field" : {
"query" : "bora"
}
}
I have two documents that matches the query on my index, one with "bora" on that field, another with "bora bora".
My problem is that "bora bora" document ends up with a better score than the other and this is not the required behaviour.
Do you see a way to do the same query but prioritizing the records which are not a repetition of the searched word ?
I can't update the index / remove the tokenization.

Related

Elastic Search - Conditional field query if no match found for another field

Is it possible to do conditional field query if match was not found for another field ?
for eg: if I have a 3 fields in the index local_rating , global_rating and default_rating , I need to first check in local_rating and if there is no match then try for global_rating and finally for default_rating .
is this possible to do with one query ? or any other ways to achieve this
thanks in advance
Not sure about any existing features of Elasticsearh to fulfill your current requirements but you can try with fields and per-fields boosting, Individual fields can be boosted with the caret (^)notation. Also I don't know boosting is possible with numeric value or not?
GET /_search
{
"query": {
"multi_match" : {
"query" : 10,
"fields" : [ "local_rating^6", "global_rating^3","default_rating"]
}
}
}
See: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#field-boost

Elasticsearch more like this returns too many documents

I have documents like this:
{
title:'...',
body: '...'
}
I want to get documents which are more than 90% similar to the with a specific document. I have used this query:
query = {
"query": {
"more_like_this" : {
"fields" : ["title", "body"],
"like" : "body of another document",
"min_term_freq" : 1,
"max_query_terms" : 12
}
}
}
How to change this query to check for 90% similarity with specified doc?
Take a look at the Query Formation Parameter minimum_should_match
You should specify minimun_should_match
minimum_should_match
After the disjunctive query has been formed, this parameter controls
the number of terms that must match. The syntax is the same as the
minimum should match. (Defaults to "30%").
It form query using this
The MLT query simply extracts the text from the input document,
analyzes it, usually using the same analyzer at the field, then
selects the top K terms with the highest tf-idf to form a disjunctive
query of these terms
So if you would like to boost you title field you should boost your title field because if the title contains most of the terms present in the term frequency/ Inverse document frequency. the result should be boosted because it has more relevance. You can boost your title field by 1.5.
Refer this document for referenceren on the more_like_this query

elastic search fetch the exact match first followed by others

I am newbie to elastic search
I have an education index in es
index creation
when i search 'btech' with match query as
"match" : { "name" : "btech" }
the result is like
result json object
but i need btech(exact match word) as the first document and remaining documents followed by it.
so for that what i have to change in my index creation
can anybody please help me
You can use term query
"term" : { "name" : "btech" }
Or regexp query
"regexp" : { "name" : "btech" }
You are using text type, make sure to check keyword type too
from documentation
If you need to index structured content such as email addresses,
hostnames, status codes, or tags, it is likely that you should rather
use a keyword field.

Finding fields Elasticsearch has matched on

I am using Elasticsearch to search for a group a user should join. I have the user data nested into the search query. On return I get back the closest matched group that user should be in.
The field I am searching on is a nested field as follows:
`{"interests": [
{"topics":["python", "stackoverflow", "elasticsearch"]},
{"topics":["arts", "textiles"]}
]}`
However if you want an understanding of a match - how do you do this?
Elasticsearch does have an explain function which says what the scoring is made up of using tfidf, but not specifically what terms were used.
For example, if I search for 'Textile', the doc should match on 'textiles'. Thus I want the term 'textiles' to be returned in explain or some other way.
The only way I see that provides this need, is to store the search and the document retrieved and then process both to discover words ES has most likely matched on.
EDIT - for some more clarity of the question
An example in my index of a group which has "interests": ['arts', 'fine arts', 'art painting', 'arts and crafts', 'sports']
Now my search, I am looking for Arts and many other things. Now the term I am searching for comes up in this list many times, thus should always be a contributor.
What I want in the response is to say these words were matched ['arts', 'fine arts', 'art painting', 'arts and crafts']along with the degree to which they match i..e 'arts' should be higher than the others, but all others are also relevant
Elasticsearch allows you to specify the _name field for all queries and
filters. This means that you can separate your query into different parts with
separate names, which will allow you to determine which parts matched.
For example:
{
"query" : {
"bool" : {
"should" : [
{"match" : { "interests.topics" : {"query" : "python", "_name" : "py-topic"} }},
{"match" : { "interests.topics" : {"query" : "arts", "_name" : "arts-topic"} }}
]
}
}
}
Then, in your response, you will get back any array of which queries (or
filters) matched and you can determine if the py-topic query and/or the
arts-topic query matched above.

How to enable fuzziness for phrase queries in ElasticSearch

We're using ElasticSearch for searching through millions of tags. Our users should be able to include boolean operators (+, -, "xy", AND, OR, brackets). If no hits are returned, we fall back to a spelling suggestion provided by ES and search again. That's our query:
$ curl -XGET 'http://127.0.0.1:9200/my_index/my_type/_search' -d '
{
"query" : {
"query_string" : {
"query" : "some test query +bools -included",
"default_operator" : "AND"
}
},
"suggest" : {
"text" : "some test query +bools -included",
"simple_phrase" : {
"phrase" : {
"field" : "my_tags_field",
"size" : 1
}
}
}
}
Instead of only providing a fallback to spelling suggestions, we'd like to enable fuzzy matching. If, for example, a user searches for "stackoverfolw", ES should return matches for "stackoverflow".
Additional question: What's the better performing method for "correcting" spelling errors? As it is now, we have to perform two subsequent requests, first with the original search term, then with the by ES suggested term.
The query_string does support some fuzziness but only when using the ~ operator, which I think doesn't your usecase. I would add a fuzzy query then and put it in or with the existing query_string. For instance you can use a bool query and add the fuzzy query as a should clause, keeping the original query_string as a must clause.
As for your additional question about how to correct spelling mistakes: I would use fuzzy queries to automatically correct them and two subsequent requests if you want the user to select the right correction from a list (e.g. Did you mean), but your approach sounds good too.

Resources