Exact and fuzzy search - elasticsearch

Exact and fuzzy search - elasticsearch

My setup:
I have some documents with name "Apple", "Apple delicous", ...
This is my query:
GET p_index/_search
{
"query": {
"bool": {
"should": [
{"match": {
"name": "apple"
}},
{ "fuzzy": {
"name": "apple"
}}
]
}
}
}
I want achieve, that first the exact match is shown and then the fuzzy one:
apple
apple delicous
Second, i am wondering that i did not get any result if i enter only app in the search:
GET p_index/_search
{
"query": {
"bool": {
"should": [
{"match": {
"name": "app"
}},
{ "fuzzy": {
"name": "app"
}}
]
}
}
}

There are two problems here.
1)To give higher score to an exact match you could try adding "index" : "not_analyzed" to your name field like this.
name: {
type: 'string',
"fields": {
"raw": {
"type": "string",
"index" : "not_analyzed" <--- here
}
}
}
After that your query would look like this
{
"query": {
"bool": {
"should": [
{
"match": {
"name": "apple"
}
},
{
"match": {
"name.raw": "apple"
},
"boost": 5
}
]
}
}
}
This will give higher score for document with "apple" than "apple delicous"
2)To better understand fuzziness you should go through this and this article.
From the Docs
The fuzziness parameter can be set to AUTO, which results in the
following maximum edit distances:
0 for strings of one or two characters
1 for strings of three, four, or five characters
2 for strings of more than five characters
So, the reason your fuzzy query did not return apple for app is because fuzziness i.e edit distance is 2 between those words and since "app" is only three letter word, fuzziness value is 1. You could achieve the desired result with following query
{
"query": {
"fuzzy": {
"name": {
"value": "app",
"fuzziness": 2
}
}
}
}
I seriously would not recommend using this query, because It will return bizarre results, the above query will return cap, arm, pip and lot of other words as they fall within edit distance of 2.
This would better query
{
"query": {
"fuzzy": {
"name": {
"value": "appl"
}
}
}
}
It will return apple.
I hope this helps.

I think ,This will help you.
{"query":{"bool":{"must":[{"function_score":{"query":{"multi_match":{"query":"airetl","fields":["brand_lower"],"boost":1,"fuzziness":Auto,"prefix_length":1}}}}}]}}

Related

Elasticsearch: alternative to cross_fields with fuzziness

I have an elasticsearch index with the standard analyzer. I would like to perfom search queries containing multiple words, e.g. human anatomy. This search should be performed across several fields:
Title
Subject
Description
All the words in the query should be present in any of the fields (e.g. 'human' in title and 'anatomy' in description, etc.). If not all the words are present across these fields, the result shouldn't be returned.
Now, more importantly, I want to get fuzzy matches (for example, these queries should return approximately the same results as human anatomy:
human anatom
human anatomic
humanic anatomic
etc.
So fuzziness should apply to every word in the query.
As Elasticsearch doesn't support fuzziness for the multi-match cross-fields queries, I have been trying to achieve the desired behaviour this way:
{
"query": {
"bool" : {
"must": [
{
"query": {
"bool":
{
"should": [
{
"match": {
"title": {
"query": "human",
"fuzziness": 2,
}
}
},
{
"match": {
"description": {
"query": "human",
"fuzziness": 2,
}
}
},
{
"match": {
"subject": {
"query": "human",
"fuzziness": 2,
}
}
},
]
}
}
},
{
"query": {
"bool":
{
"should": [
{
"match": {
"title": {
"query": "anatomy",
"fuzziness": 2,
}
}
},
{
"match": {
"description": {
"query": "anatomy",
"fuzziness": 2,
}
}
},
{
"match": {
"subject": {
"query": "anatomy",
"fuzziness": 2,
}
}
},
]
}
}
},
]
}
}
}
The idea behind this code is the following: find the results where
either of the fields contains human (with 2-letter edit distance, e.g.: humane, humon, humanic, etc.)
and
either of the fields contains anatomy (with 2-letter edit distance, e.g.: anatom, anatomic, etc.).
Unfortunately, this code does not work and fails to retrieve a great number of relevant results. For example (the edit distance between each of the words in the two queries <= 2):
human anatomic – 0 results
humans anatomy – 21 results
How can I make fuzziness work within the given conditions? Recreating the index with n-gram is currently not an option, so I would like to make fuzziness work.

ElasticSearch multimatch substring search

I have to combine two filters to match requirements:
- a specific list of values in r.status field
- one of the multiple text fields contains the value.
Result query (with using Nest, but it doesn't matter) looks like:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"term": {
"isActive": {
"value": true
}
}
},
{
"nested": {
"query": {
"bool": {
"must": [
{
"terms": {
"r.status": [
"VALUE_1",
"VALUE_2",
"VALUE_3"
]
}
},
{
"bool": {
"should": [
{
"match": {
"r.g.firstName": {
"type": "phrase",
"query": "SUBSTRING_VALUE"
}
}
},
{
"match": {
"r.g.lastName": {
"type": "phrase",
"query": "SUBSTRING_VALUE"
}
}
}
]
}
}
]
}
},
"path": "r"
}
}
]
}
}
]
}
}
}
Also tried with multi_match query:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"term": {
"isActive": {
"value": true
}
}
},
{
"nested": {
"query": {
"bool": {
"must": [
{
"terms": {
"r.status": [
"VALUE_1",
"VALUE_2",
"VALUE_3"
]
}
},
{
"multi_match": {
"query": "SUBSTRING_VALUE",
"fields": [
"r.g.firstName",
"r.g.lastName"
]
}
}
]
}
},
"path": "r"
}
}
]
}
}
]
}
}
}
FirstName and LastName are configured in index mappings as text:
"firstName": {
"type": "text"
},
"lastName": {
"type": "text"
}
Elastic gives a lot of full-text search options: multi_match, phrase, wildcards etc. But all of them fail in my case looking a sub-string in my text fields. (terms query and isActive one work well, I just tried to run only them).
What options do I have also or maybe where I made a mistake?
UPD: Combined wildcards worked for me, but such query looks ugly. Looking for a more elegant solution.

The elasticsearch way is to use ngram tokenizer.
The ngram analyzer will split your terms with a sliding window. For example, the input "Hello World" will generate the following terms:
Hel
Hell
Hello
ell
ello
...
Wor
World
orl
...
You can configure the minimum and maximum size of the sliding window (in the example the minimum size is 3). Once the sub terms are generated you can use a match query an the subfield.
Another point, it is weird to use must within a filter. If you are interested in the score, you should always use must otherwise use filter. Read this article for a good understanding.

ElasticSearch - score boosting using scripting

We have a specific use-case for our ElasticSearch instance: we store documents which contain proper names, dates of birth, addresses, ID numbers, and other related info.
We use a name-matching plugin which overrides the default scoring of ES and assigns a relevancy score between 0 and 1 based on how closely the name matches.
What we need to do is boost that score by a certain amount if other fields match. I have started to read up on ES scripting to achieve this. I need assistance on the script part of the query. Right now, our query looks like this:
{
"size":100,
"query":{
"bool":{
"should":[
{"match":{"Name":"John Smith"}}
]
}
},
"rescore":{
"window_size":100,
"query":{
"rescore_query":{
"function_score":{
"doc_score":{
"fields":{
"Name":{"query_value":"John Smith"},
"DOB":{
"function":{
"function_score":{
"script_score":{
"script":{
"lang":"painless",
"params":{
"query_value":"01-01-1999"
},
"inline":"if **<HERE'S WHERE I NEED ASSISTANCE>**"
}
}
}
}
}
}
}
}
},
"query_weight":0.0,
"rescore_query_weight":1.0
}
}
The Name field will always be required in a query and is the basis for the score, which is returned in the default _score field; for ease of demonstration, we'll just add one additional field, DOB, which if matched, should boost the score by 0.1. I believe I'm looking for something along the lines of if(query_value == doc['DOB'].value add 0.1 to _score), or something along these lines.
So, what would be the correct syntax to be entered into the inline row to achieve this? Or, if the query requires other syntax revision, please advise.
EDIT #1 - it's important to highlight that our DOB field is a text field, not a date field.

Splitting to a separate answer as this solves the problem differently (i.e. - by using script_score as OP proposed instead of trying to rewrite away from scripts).
Assuming the same mapping and data as the previous answer, a scripted version of the query might look like the following:
POST /employee/_search
{
"size": 100,
"query": {
"bool": {
"should": [
{
"match": {
"Name": "John"
}
},
{
"match": {
"Name": "Will"
}
}
]
}
},
"rescore": {
"window_size": 100,
"query": {
"rescore_query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"match": {
"Name": "John"
}
},
{
"match": {
"Name": "Will"
}
}
]
}
},
"functions": [
{
"script_score": {
"script": {
"source": "double boost = 0.0; if (params['_source']['State'] == 'FL') { boost += 0.1; } if (params['_source']['DOB'] == '1965-05-24') { boost += 0.3; } return boost;",
"lang": "painless"
}
}
}
],
"score_mode": "sum",
"boost_mode": "sum"
}
},
"query_weight": 0,
"rescore_query_weight": 1
}
}
}
Two notes about the script:
The script uses params['_source'][field_name] to access the document, which is the only way to get access to text fields. This is significantly slower as it requires accessing documents directly on disk, though this penalty might not be too bad in the context of a rescore. You could instead use doc[field_name].value if the field was an aggregatable type, such as keyword, date, or something numeric
DOB here is compared directly to a string. This is possible because we're using the _source field, and the JSON for the documents has the dates specified as strings. This is somewhat brittle, but likely will do the trick

Assuming static weights per additional field, you can accomplish this without using scripting (though you may need to use script_score for any more complex weighting). To solve your issue of directly adding to a document's original score, your rescoring query will need to be a function score query that:
Composes queries for additional fields in a should clause for the function score's main query (i.e. - will only produce scores for documents matching at least one additional field)
Uses one function per additional field, with the filter set to select documents with some value for that field, and a weight to specify how much the score should increase (or some other scoring function if desired)
Mapping (as template)
Adding a State and DOB field for sake of example (making sure multiple additional fields contribute to the score correctly)
PUT _template/employee_template
{
"index_patterns": ["employee"],
"settings": {
"number_of_shards": 1
},
"mappings": {
"_doc": {
"properties": {
"Name": {
"type": "text"
},
"State": {
"type": "keyword"
},
"DOB": {
"type": "date"
}
}
}
}
}
Sample data
POST /employee/_doc/_bulk
{"index":{}}
{"Name": "John Smith", "State": "NY", "DOB": "1970-01-01"}
{"index":{}}
{"Name": "John C. Reilly", "State": "CA", "DOB": "1965-05-24"}
{"index":{}}
{"Name": "Will Ferrell", "State": "FL", "DOB": "1967-07-16"}
Query
EDIT: Updated the query to include the original query in the new function score in an attempt to compensate for custom scoring plugins.
A few notes about the query below:
Setting the rescorers score_mode: max is effectively a replace here, since the newly computed function score should only be greater than or equal to the original score
query_weight and rescore_query_weight are both set to 1 such that they are compared on equal scales during score_mode: max comparison
In the function_score query:
score_mode: sum will add together all the scores from functions
boost_mode: sum will add the sum of the functions to the score of the query
POST /employee/_search
{
"size": 100,
"query": {
"bool": {
"should": [
{
"match": {
"Name": "John"
}
},
{
"match": {
"Name": "Will"
}
}
]
}
},
"rescore": {
"window_size": 100,
"query": {
"rescore_query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"match": {
"Name": "John"
}
},
{
"match": {
"Name": "Will"
}
}
],
"filter": {
"bool": {
"should": [
{
"term": {
"State": "CA"
}
},
{
"range": {
"DOB": {
"lte": "1968-01-01"
}
}
}
]
}
}
}
},
"functions": [
{
"filter": {
"term": {
"State": "CA"
}
},
"weight": 0.1
},
{
"filter": {
"range": {
"DOB": {
"lte": "1968-01-01"
}
}
},
"weight": 0.3
}
],
"score_mode": "sum",
"boost_mode": "sum"
}
},
"score_mode": "max",
"query_weight": 1,
"rescore_query_weight": 1
}
}
}

promote results in Elasticsearch

I searched in the documentation for a way to promote ElasticSearch results if a specific field has a certain value, but I didn't find any good practice, for example, I have a user that lives in Paris if the user search for a query I want the documents that are relevant to Paris to appear the first or just to be promoted.

There is a lot to this but you want to research "boosting". This can be done at the mapping level or the query level.
Mapping example:
{
"mappings": {
"_doc": {
"properties": {
"location": {
"type": "keyword",
"boost": 2 <--- 2x boost to the final score
}
}
}
}
}
Query Example:
GET /_search
{
"query": {
"bool": {
"must": {
"match": {
"content": {
"query": "full text search",
"operator": "and"
}
}
},
"should": [
{ "term": {
"location": {
"value": "xxx",
"boost": 3 <--- 3x boost if the location matches
}
}}
]
}
}
}

Elastic Search : Match Query not working in Nested Bool Filters

I am able to get data for the following elastic search query :
{
"query": {
"filtered": {
"query": [],
"filter": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"term": {
"gender": "malE"
}
},
{
"term": {
"sentiment": "positive"
}
}
]
}
}
]
}
}
}
}
}
However, If I query using "match" - I get error message with 400 status response
{
"query": {
"filtered": {
"query": [],
"filter": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"match": {
"gender": "malE"
}
},
{
"term": {
"sentiment": "positive"
}
}
]
}
}
]
}
}
}
}
}
Is match query not supported in nested bool filters ?
Since the term query looks for the exact term in the field’s inverted index and I want to query gender data as case_insensitive field - Which approach shall I try ?
Settings of the index :
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"analyzer_keyword": {
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
}
}
}
Mapping for field Gender:
{"type":"string","analyzer":"analyzer_keyword"}

The reason you're getting an error 400 is because there is no match filter, only match queries, even though there are both term queries and term filters.
Your query can be as simple as this, i.e. no need for a filtered query, simply put your term and match queries into a bool/should:
{
"query": {
"bool": {
"should": [
{
"match": {
"gender": "male"
}
},
{
"term": {
"sentiment": "positive"
}
}
]
}
}
}

This answer is for ElasticSearch 7.x. As I understand from the question, you would like to use a match query for the gender field and a term query for the sentiment field. The mappings for each of these field should look like below:
"sentiment": {
"type": "keyword"
},
"gender": {
"type": "text"
}
The corresponding search API would be:
"query": {
"bool": {
"must": [
{
"terms": {
"sentiment": [
"very positive", "positive"
]
}
},
{
"match": {
"gender": "malE"
}
}
]
}
}
This search API returns all the documents where gender is "Male"/"MALE"/"mALe" etc. So, you may have indexed the gender field holding "mALe", but, the match query for "gender": "malE" will still be able to retrieve it. In the latest version of ElasticSearch, if the query is a match type, the value (which is "gender": "malE") will be automatically lower cased internally before search begins. But, it should not be that tough for a client of the API to pass a lowercase to the match query at the onset itself. Coming to the sentiment field, since, its a keyword field, you can search for values that contain spaces too like very positive.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Exact and fuzzy search - elasticsearch

I think ,This will help you. {"query":{"bool":{"must":[{"function_score":{"query":{"multi_match":{"query":"airetl","fields":["brand_lower"],"boost":1,"fuzziness":Auto,"prefix_length":1}}}}}]}}

Related

Elasticsearch: alternative to cross_fields with fuzziness

ElasticSearch multimatch substring search

ElasticSearch - score boosting using scripting

promote results in Elasticsearch

Elastic Search : Match Query not working in Nested Bool Filters

Categories

Resources