Is there a way to score fuzzy hits with the same score as exact hits? - elasticsearch

I'm trying to use elasticsearch as a integration tool which can match records from different sources. I'm combining filters and query for this. Filters are filtering out irrevelant records and putting trough candidate matches. Then out of those candidates all are being scored. I'm using fuzzy match because some of the records might contain a misspell (Nicolson Way/Nicholson Way). I would like them to be scored equally with disregard if its a fuzzy match or equal match.
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/fuzzy-scoring.html
Is there a way to achieve this with Elasticsearch?

Use a constant_score to give it a score of your choice:
{
"query": {
"constant_score": {
"filter": {
"query": {
"fuzzy": {"text": "whatever"}
}
},
"boost": 1
}
}
}

Related

In Elasticsearch, how do I combine multiple filters with OR without affecting the score?

In Elasticsearch, I want to filter my results with two different clauses aggregated with OR e.g. return documents with PropertyA=true OR PropertyB=true.
I've been trying to do this using a bool query. My base query is just a text search in must. If I put both clauses in the filter occurrence type, it aggregates them with an AND. If I put both clauses in the should occurrence type with minimum_should_match set to 1, then I get the right results. But then, documents matching both conditions get a higher score because "should" runs in a query context.
How do I filter to only documents matching either of two conditions, without increasing the score of documents matching both conditions?
Thanks in advance
You need to leverage the constant_score query, so everything runs in the filter context:
{
"query": {
"constant_score": {
"filter": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"term": {
"PropertyA": true
}
},
{
"term": {
"PropertyB": true
}
}
]
}
}
}
}
}

How can we make few tokens to be phrase in elastic search query

I want to search part of query to be considered as phrase .For e.g. I want to search "Can you show me documents for Hospitality and Airline Industry"
Here I want Airline Industry to be considered as phrase.I dont find any such settings in multi_match .
Even when we try to use multi_match query using "Can you show me documents for Hospitality and \"Airline Industry\"" .Default analyser breaks it into separate tokens.I dont want to change settings of my analyser.Also I have found that we can do this in simple_query_string but that has consequences that we can not apply filter option as we have in multi_match boolean query because I want to apply filter on certain feilds as well.
search_text="Can you show me documents for Hospitality and Airline Industry" Now I Want to pass Airline Industry as a phrase to search my indexed document against 2 fields.
okay so say I have existing code like this.
If filter:
qry={
“query":{
“bool”:{
“must”:{
"multi_match":{
"query":search_text,
"type":"best_fields",
"fields":["TITLE1","TEXT"],
"tie_breaker":0.3,
}
},
“filter”:{“terms”:{“GRP_CD”:[“1234”,”5678”] }
}
}
else:
qry={
"query":{
"multi_match":{
"query":search_text',
"type":"best_fields",
"fields":["TITLE1",TEXT"],
"tie_breaker":0.3
}
}
}
'But then I have realised this code is not handling Airline Industry as a phrase even though I am passing search string like this
"Can you show me documents for Hospitality and \"Airline Industry\""
As per elastic search document I came to know there is this query which might handle this
qry={"query":{
"simple_query_string":{
"query":"Can you show me documents for Hospitality and \"Airline Industry\"",
"fields":["TITLE1","TEXT"] }
} }
But now my issue is what if user want to apply filter..with filter query as above I can not pass phrase and boolean query is not possible with simple_query_string'
You can always combine queries using boolean query. Lets understand this case by case. Before going to the cases I would like to clarify one thing which is about filter. The filter clause of boolean query behave just like a must clause but the difference is that any query (even another boolean query with a must/should clause(s)) inside filter clause have filter context. Filter context means, that part of query will not be considered for score calculation.
Now lets move on to cases:
Case 1: Only query and no filters.
{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"query": "Can you show me documents for Hospitality and \"Airline Industry\"",
"fields": [
"TITLE1",
"TEXT"
]
}
}
]
}
}
}
Notice that the query is same as specified by you in the question. All I have done here is that I wrapped it in a bool query. This doesn't make any logical change to the query but doing so will make it easier to add queries to filter clause programmatically.
Case 2: Phrase query with filter.
{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"query": "Can you show me documents for Hospitality and \"Airline Industry\"",
"fields": [
"TITLE1",
"TEXT"
]
}
}
],
"filter": [
{
"terms": {
"GRP_CD": [
"1234",
"5678"
]
}
}
]
}
}
}
This way you can combine query(query context) with the filters.

Why is Elasticsearch with Wildcard Query always 1.0?

When i do a search in Elasticsearch with a Wildcard-Query (Wildcard at the End) the score results for all hits in 1.0.
Is this by design? Can I change this behavior somewhere?
Elasticsearch is basically saying that all results are equally relevant, as you've provided an unqualified search (a wildcard, equivalent to a match_all). As soon as you add some additional context through the various types of queries, you will notice changes in the scoring.
Depending on your ultimate goal, you may want to look into the Function Score query - reference: https://www.elastic.co/guide/en/elasticsearch/reference/6.7/query-dsl-function-score-query.html
The first example provided would give you essentially random scores for all documents in your cluster:
GET /_search
{
"query": {
"function_score": {
"query": { "match_all": {} },
"boost": "5",
"random_score": {},
"boost_mode":"multiply"
}
}
}

Custom score for exact, phonetic and fuzzy matching in elasticsearch

I have a requirement where there needs to be custom scoring on name. To keep it simple lets say, if I search for 'Smith' against names in the index, the logic should be:
if input = exact 'Smith' then score = 100%
else
if input = phonetic match then
score = <depending upon fuzziness match of input with name>%
end if
end if;
I'm able to search documents with a fuzziness of 1 but I don't know how to give it custom score depending upon how fuzzy it is. Thanks!
Update:
I went through a post that had the same requirement as mine and it was mentioned that the person solved it by using native scripts. My question still remains, how to actually get the score based on the similarity distance such that it can be used in the native scripts:
The post for reference:
https://discuss.elastic.co/t/fuzzy-query-scoring-based-on-levenshtein-distance/11116
The text to look for in the post:
"For future readers I solved this issue by creating a custom score query and
writing a (native) script to handle the scoring."
You can implement this search logic using the rescore function query (docs here).
Here there is a possible example:
{
"query": {
"function_score": {
"query": { "match": {
"input": "Smith"
} },
"boost": "5",
"functions": [
{
"filter": { "match": { "input.keyword": "Smith" } },
"random_score": {},
"weight": 23
}
]
}
}
}
In this example we have a mapping with the input field indexed both as text and keyword (input.keyword is for exact match). We re-score the documents that match exactly the term "Smith" with an higher score respect to the all documents matched by the first query (in the example is a match, but in your case will be the query with fuzziness).
You can control the re-score effect tuning the weight parameter.

Elasticsearch - Edit distance using fuzzy is inaccurate

I am using ES 5.5 and my requirement is to allow upto two edits while matching a field.
In ES,I have value as 124456788 and query comes in as 123456789
"fuzzy": {
"idkey": {
"value": **"123456789"**,
"fuzziness": "20"
}
}
To my knowledge the edit distance is 2 between these two numbers. But it is not matching even with fuzziness property as 20.
I did an explain api call and here is what I am seeing
"description": "no match on required clause (((idkey:012345789)^0.7777778 (idkey:012346789)^0.7777778 (idkey:013456789)^0.7777778 (idkey:023456789)^0.8888889 (idkey:102345678)^0.7777778 (idkey:112345678)^0.7777778 (idkey:113456789)^0.8888889 (idkey:120456589)^0.7777778 (idkey:121345678)^0.7777778 (idkey:122345678)^0.7777778 (idkey:122345679)^0.7777778 (idkey:122456789)^0.8888889 (idkey:123006789)^0.7777778 (idkey:123045678)^0.7777778 (idkey:123096789)^0.7777778 (idkey:123106789)^0.7777778 (idkey:123145678)^0.7777778 (idkey:123146789)^0.7777778 (idkey:123226789)^0.7777778 (idkey:123256789)^0.8888889 (idkey:123345678)^0.7777778 (idkey:123345689)^0.7777778 (idkey:123346789)^0.7777778 (idkey:123406784)^0.7777778 (idkey:123415678)^0.7777778 (idkey:123435678)^0.7777778 (idkey:123446789)^0.8888889 (idkey:123453789)^0.8888889 (idkey:123454789)^0.8888889 (idkey:123455789)^0.8888889 (idkey:123456289)^0.8888889 (idkey:123456489)^0.8888889 (idkey:123456709)^0.8888889 (idkey:123456779)^0.8888889 (idkey:123456780)^0.8888889 (idkey:123456781)^0.8888889 (idkey:123456783)^0.8888889 (idkey:123456785)^0.8888889 (idkey:123456786)^0.8888889 (idkey:123456787)^0.8888889 (idkey:123456889)^0.8888889 (idkey:123457789)^0.8888889 (idkey:123466789)^0.8888889 (idkey:123496789)^0.8888889 (idkey:123556789)^0.8888889 (idkey:126456789)^0.8888889 (idkey:223456789)^0.8888889 (idkey:423456789)^0.8888889 (idkey:623456789)^0.8888889 (idkey:723456789)^0.8888889)^5.0)",
The value I am expecting to match is 124456788 but ES query is internally not converting it as one of the possible match parameter in fuzzy query.
Do i need to use different ES method to make this work?
This a simple indexing and search.
PUT /myIndex/type1/1
{
"key":"123456789",
"name":"test"
}
GET /myIndex/_search
{
"query": {
"fuzzy": {
"key": {
"value": "124456799",
"fuzziness": 2
}
}
}
}
It is always matching with the given key. fuzziness values 2 or greater is fine.

Resources