Elasticsearch searching across fields with boosting and fuzziness - elasticsearch

I am creating an index in elasticsearch and i want the ability to search across multiple fields i.e. have those fields be treated as one big search field. I've done some researching a came across 2 different ways to do this:
The first is with cross_fields multi-match query. This allows for searching across multiple fields as one big field with the ability to boost certain fields. But does not allow for fuzziness to be added.
Using copy_to I can copy fields to an 'all' field so that all the searchable terms are in one big field. This allows for fuzzy search but then does not allow me to boost by specific fields
Is there another cross_fields or search option i'm unaware of that will allow for me to fuzzy search as well as boost by a specific field?

I think you could add fussiness to multi match.
But it will be applied to all fields.
Find an example below with boost and fuzziness
GET /my_index/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "bjorn borg schoenen",
"fields": [
"title^5.0",
"brand^2.0"
],
"type": "best_fields",
"operator": "and",
"fuzziness": "auto"
}
}
]
}
}
}
If you want to be more granular, you can use a boolean query with should and a minimum should match:
{
"query": {
"bool": {
"should": [
{
"match": {
"brand": {
"query": "my query",
"fuzziness": "auto",
"boost": 2
}
}
},
{
"match": {
"title": {
"query": "my query",
"fuzziness": "auto",
"boost": 5
}
}
}
],
"minimum_should_match": 1
}
}
}
And if the query become to complicated, I can suggest you to use a search template to keep integration easy on the app side:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-template.html

Related

Elasic search: find doc by id and highlight words based on query string

I like to find an document in elastic search an highlight terms based on an query string.
Is this possible?
I tried to run an query-string elastic search and filter the result based on ID. But those sounds not very efficient, because elastic first generates an huge list of all document matched the querystring (which could by millions) an pic only one document based on the filter.
Is there a way or query-contstruct to combine querystring and "search for term in _id field" in one boolean search?
Something like this (which is not working):
"query": {
"bool": {
"must": {
"query_string": {
"query": "red*",
"fields": [
"text",
"title"
]
},
"term": {
"_id":"fda72434fa172"
}
}
}
},
"highlight": {
"fields": {
[...]
I made a small example that can be a starting point.
Use filter to perform your query and retrieve the doc by id.
Then I used match and highlight to highlight the term I want.
POST test/_doc/fda72434fa172
{
"text": "I like to find an document in elastic search an highlight terms based on an query string. Is this possible?"
}
GET test/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"_id": "fda72434fa172"
}
}
],
"must": [
{
"match": {
"text": {
"query": "elastic search"
}
}
}
]
}
},
"highlight": {
"fields": {
"text": {}
}
}
}

Fuzzy match words in any order in Elasticsearch

What I need to achieve is to match documents based on single field (product name, which consists of basically all possible filter values). I know it is not the most reliable solution, but I only have this one field to work with.
I need to be able to send a search query and the words in that query to be matched in any order to the name field (name should contain all words from the search query). Actually at this point simple match_phrase_prefix works pretty well, but what is missing there is fuzziness. Because another thing we need is to allow user make some typos and still get relevant results.
My question is, is there any way to have match_phrase_prefix-like query, but with fuzziness?
I tried some nested bool queries with match, but I don't get anything near match_phrase_prefix this way.
Examples of what I tried:
Pretty good results, but no fuzziness:
{
"query": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"name.standard": {
"query": "brand thing model",
"slop": 10
}
}
}
]
}
}
}
Fuzziness, but very limited matches:
{
"query": {
"bool": {
"must": [
{
"match": {
"name.standard": {
"query": "thing",
"fuzziness": "AUTO",
"prefix_length": 3
}
}
},
{
"match": {
"name.standard": {
"query": "brand",
"fuzziness": "AUTO",
"prefix_length": 3
}
}
}
]
}
}
}
Using should above, I get more results, but they are way less relevant than the ones from first query.
Above can be achieved by simple match query
{
"query": {
"match": {
"name.standard": {
"query": "brand thing model",
"operator": "and" ,//It means all of above 3 tokens must be present in any order
"fuzziness": "AUTO" // value as per your choice
}
}
}
}

Elasticsearch: alternative to cross_fields with fuzziness

I have an elasticsearch index with the standard analyzer. I would like to perfom search queries containing multiple words, e.g. human anatomy. This search should be performed across several fields:
Title
Subject
Description
All the words in the query should be present in any of the fields (e.g. 'human' in title and 'anatomy' in description, etc.). If not all the words are present across these fields, the result shouldn't be returned.
Now, more importantly, I want to get fuzzy matches (for example, these queries should return approximately the same results as human anatomy:
human anatom
human anatomic
humanic anatomic
etc.
So fuzziness should apply to every word in the query.
As Elasticsearch doesn't support fuzziness for the multi-match cross-fields queries, I have been trying to achieve the desired behaviour this way:
{
"query": {
"bool" : {
"must": [
{
"query": {
"bool":
{
"should": [
{
"match": {
"title": {
"query": "human",
"fuzziness": 2,
}
}
},
{
"match": {
"description": {
"query": "human",
"fuzziness": 2,
}
}
},
{
"match": {
"subject": {
"query": "human",
"fuzziness": 2,
}
}
},
]
}
}
},
{
"query": {
"bool":
{
"should": [
{
"match": {
"title": {
"query": "anatomy",
"fuzziness": 2,
}
}
},
{
"match": {
"description": {
"query": "anatomy",
"fuzziness": 2,
}
}
},
{
"match": {
"subject": {
"query": "anatomy",
"fuzziness": 2,
}
}
},
]
}
}
},
]
}
}
}
The idea behind this code is the following: find the results where
either of the fields contains human (with 2-letter edit distance, e.g.: humane, humon, humanic, etc.)
and
either of the fields contains anatomy (with 2-letter edit distance, e.g.: anatom, anatomic, etc.).
Unfortunately, this code does not work and fails to retrieve a great number of relevant results. For example (the edit distance between each of the words in the two queries <= 2):
human anatomic – 0 results
humans anatomy – 21 results
How can I make fuzziness work within the given conditions? Recreating the index with n-gram is currently not an option, so I would like to make fuzziness work.

Get ElasticSearch simple_query_string to support fuzzy

I have a record in my ElasticSearch index with the term "cleveland". When I do this search:
"query": {
"multi_match": {
"fields": [
"firstname^3",
"lastname^3",
"home_address",
"home_city"
],
"query": "clevela",
"fuzziness": "AUTO"
}
},
it successfully finds the term. The missing two characters are within the fuzziness threshold. But I'd like to support the extended query syntax of simple_query_string (+, -, phrase search, etc.) So I tried this syntax:
"query": {
"simple_query_string": {
"query": "clevela",
"fields": [
"firstname^3",
"lastname^3",
"home_address",
"home_city"
],
"lenient": true
}
},
and it does not find the term. Fuzziness appears to be turned off. How do I turn it on?
In a simple query string, you need to specify the fuzziness parameter, by adding ~N (N is the max edit distance) after the search term. Modify your search query as
{
"query": {
"simple_query_string": {
"query": "clevela~2", // note this
"fields": [
"firstname^3",
"lastname^3",
"home_address",
"home_city"
],
"lenient": true
}
}
}

elasticsearch multi_match vs should

Can someone tell me the difference between
"query": {
"bool": {
"should": [
{ "match": {"title": keyword} },
{ "match": {"description": keyword} }
]
}
and
"query": {
"multi_match": {
"query": keyword,
"fields": [ "title", "description" ]
}
}
Is there any performance turning if choose one of two above?
It depends on the type parameter of your multi_match. In your example, since you didn't specify a type, best_fields is used. That makes use of a Dis Max Query and basically
uses the _score from the best field
On the other hand, your example with should
combines the _score from each field.
and it is equivalent to multi_match with type most_fields

Resources