how to allow exact word match to have higher score - elasticsearch

In elastic search, I have defined two synonyms for "swim", like "play", "walk".
Say if I have only two values in elastic search with value "I like to swim", "I like to play".
If the user enter a query "I hope to play" I want it to match "I like to play" with a higher score than "I like to swim" (exact word matches (in this case play) to have higher score), is there a way to achieve that?

Yes you can !
You can acheive any boost you want with the boolean query
The global concept and that you will define must clauses that need to be matched, and should that could match and thus add a boost in your document score.
So for your need, document needs to match with synonyms, and should match without it.
You just need to index your searchable fields with two properties, one with synonyms and another without synonyms.
and example could be
{
"query": {
"bool": {
"must": [
{
"match": {
"text.withSynonyms": "I hope to play"
}
}
],
"should": [
{
"match": {
"text.withoutSynonyms": "I hope to play"
}
}
]
}
}
}

Related

ElasticSearch match a word without compound words

I have to search on Elastic a full word like home but not compound words like homeland.
Using this query:
{
"query": {
"match": {
"text": {
"query": term,
"minimum_should_match": "100%"
}
}
}
}
}
will both match home and homeland or hometown etc. How to restrict the match to the input word token only?
NOTE. The word maybe in a different language, so I cannot specify something like a english analyzer, because I do not know the language of the word a priori.

How can we make few tokens to be phrase in elastic search query

I want to search part of query to be considered as phrase .For e.g. I want to search "Can you show me documents for Hospitality and Airline Industry"
Here I want Airline Industry to be considered as phrase.I dont find any such settings in multi_match .
Even when we try to use multi_match query using "Can you show me documents for Hospitality and \"Airline Industry\"" .Default analyser breaks it into separate tokens.I dont want to change settings of my analyser.Also I have found that we can do this in simple_query_string but that has consequences that we can not apply filter option as we have in multi_match boolean query because I want to apply filter on certain feilds as well.
search_text="Can you show me documents for Hospitality and Airline Industry" Now I Want to pass Airline Industry as a phrase to search my indexed document against 2 fields.
okay so say I have existing code like this.
If filter:
qry={
“query":{
“bool”:{
“must”:{
"multi_match":{
"query":search_text,
"type":"best_fields",
"fields":["TITLE1","TEXT"],
"tie_breaker":0.3,
}
},
“filter”:{“terms”:{“GRP_CD”:[“1234”,”5678”] }
}
}
else:
qry={
"query":{
"multi_match":{
"query":search_text',
"type":"best_fields",
"fields":["TITLE1",TEXT"],
"tie_breaker":0.3
}
}
}
'But then I have realised this code is not handling Airline Industry as a phrase even though I am passing search string like this
"Can you show me documents for Hospitality and \"Airline Industry\""
As per elastic search document I came to know there is this query which might handle this
qry={"query":{
"simple_query_string":{
"query":"Can you show me documents for Hospitality and \"Airline Industry\"",
"fields":["TITLE1","TEXT"] }
} }
But now my issue is what if user want to apply filter..with filter query as above I can not pass phrase and boolean query is not possible with simple_query_string'
You can always combine queries using boolean query. Lets understand this case by case. Before going to the cases I would like to clarify one thing which is about filter. The filter clause of boolean query behave just like a must clause but the difference is that any query (even another boolean query with a must/should clause(s)) inside filter clause have filter context. Filter context means, that part of query will not be considered for score calculation.
Now lets move on to cases:
Case 1: Only query and no filters.
{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"query": "Can you show me documents for Hospitality and \"Airline Industry\"",
"fields": [
"TITLE1",
"TEXT"
]
}
}
]
}
}
}
Notice that the query is same as specified by you in the question. All I have done here is that I wrapped it in a bool query. This doesn't make any logical change to the query but doing so will make it easier to add queries to filter clause programmatically.
Case 2: Phrase query with filter.
{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"query": "Can you show me documents for Hospitality and \"Airline Industry\"",
"fields": [
"TITLE1",
"TEXT"
]
}
}
],
"filter": [
{
"terms": {
"GRP_CD": [
"1234",
"5678"
]
}
}
]
}
}
}
This way you can combine query(query context) with the filters.

Elasticsearch: Must include all words in search if all exist, but ignore one or two if they don't?

I hope what I'm trying to explain makes sense, and there is a way that I could achieve it.
Currently I am searching in 40 million documents, with a query like this:
GET /all/_search
{
"query": {
"match": {
"full_text": {
"query": "insert ten or twelve words here to search",
"operator": "and"
}
}
}
}
Now I want to only return docs that their 'full_text' includes all of the words in the query. I am able to achieve that with above snippet.
My question is, when there is no match at all, but for example removing "ten" would yield one result, is there a way to configure my search to do that? I.e. to tell ES "aim for 100% match, but if nothing found, 90% would do just fine" !
Hope this is clear :)
You can use minimum_should_match clause along with match query
{
"query": {
"match": {
"text":{
"query": "insert ten or twelve words here",
"minimum_should_match":"90%"
}
}
}
}

How can I achieve this type of queries in ElasticSearch?

I have added a document like this to my index
POST /analyzer3/books
{
"title": "The other day I went with my mom to the pool and had a lot of fun"
}
And then I do queries like this
GET /analyzer3/_analyze
{
"analyzer": "english",
"text": "\"The * day I went with my * to the\""
}
And it successfully returns the previously added document.
My idea is to have quotes so that the query becomes exact, but also wildcards that can replace any word. Google has this exact functionality, where you can search queries like this, for instance "I'm * the university" and it will return page results that contain texts like I'm studying in the university right now, etc.
However I want to know if there's another way to do this.
My main concern is that this doesn't seem to work with other languages like Japanese and Chinese. I've tried with many analyzers and tokenizers to no avail.
Any answer is appreciated.
Exact matches on the tokenized fields are not that straightforward. Better save your field as keyword if you have such requirements.
Additionally, keyword data type support wildcard query which can help you in your wildcard searches.
So just create a keyword type subfield. Then use the wildcard query on it.
Your search query will look something like below:
GET /_search
{
"query": {
"wildcard" : {
"title.keyword" : "The * day I went with my * to the"
}
}
}
In the above query, it is assumed that title field has a sub-field named keyword of data type keyword.
More on wildcard query can be found here.
If you still want to do exact searches on text data type, then read this
Elasticsearch doesn't have Google like search out of the box, but you can build something similar.
Let's assume when someone quotes a search text what they want is a match phrase query. Basically remove the \" and search for the remaining string as a phrase.
PUT test/_doc/1
{
"title": "The other day I went with my mom to the pool and had a lot of fun"
}
GET test/_search
{
"query": {
"match_phrase": {
"title": "The other day I went with my mom to the pool and had a lot of fun"
}
}
}
For the * it's getting a little more interesting. You could just make multiple phrase searches out of this and combine them. Example:
GET test/_search
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"title": "The"
}
},
{
"match_phrase": {
"title": "day I went with my"
}
},
{
"match_phrase": {
"title": "to the"
}
}
]
}
}
}
Or you could use slop in the phrase search. All the terms in your search query have to be there (unless they are being removed by the tokenizer or as stop words), but the matched phrase can have additional words in the phrase. Here we can replace each * with 1 other words, so a slop of 2 in total. If you would want more than 1 word in the place of each * you will need to pick a higher slop:
GET test/_search
{
"query": {
"match_phrase": {
"title": {
"query": "The * day I went with my * to the",
"slop": 2
}
}
}
}
Another alternative might be shingles, but this is a more advanced concept and I would start off with the basics for now.

Custom score for exact, phonetic and fuzzy matching in elasticsearch

I have a requirement where there needs to be custom scoring on name. To keep it simple lets say, if I search for 'Smith' against names in the index, the logic should be:
if input = exact 'Smith' then score = 100%
else
if input = phonetic match then
score = <depending upon fuzziness match of input with name>%
end if
end if;
I'm able to search documents with a fuzziness of 1 but I don't know how to give it custom score depending upon how fuzzy it is. Thanks!
Update:
I went through a post that had the same requirement as mine and it was mentioned that the person solved it by using native scripts. My question still remains, how to actually get the score based on the similarity distance such that it can be used in the native scripts:
The post for reference:
https://discuss.elastic.co/t/fuzzy-query-scoring-based-on-levenshtein-distance/11116
The text to look for in the post:
"For future readers I solved this issue by creating a custom score query and
writing a (native) script to handle the scoring."
You can implement this search logic using the rescore function query (docs here).
Here there is a possible example:
{
"query": {
"function_score": {
"query": { "match": {
"input": "Smith"
} },
"boost": "5",
"functions": [
{
"filter": { "match": { "input.keyword": "Smith" } },
"random_score": {},
"weight": 23
}
]
}
}
}
In this example we have a mapping with the input field indexed both as text and keyword (input.keyword is for exact match). We re-score the documents that match exactly the term "Smith" with an higher score respect to the all documents matched by the first query (in the example is a match, but in your case will be the query with fuzziness).
You can control the re-score effect tuning the weight parameter.

Resources