Difference between Fuzzy and Match in Elasticsearch - elasticsearch

I wanna know the difference between the kind of search in Elasticsearch: Fuzzy and Match
I mean the advantages and disadvantages of each one, If anyone is better.
Thanks in advance.

Fuzzy can help you search within a term, match will match a whole term.
Take this example:
POST index1/test1
{
"field1": "this is a full on sentence"
}
Fuzzy will match part of a term (each word is a term)
GET index1/test1/_search
{
"query": {
"fuzzy": {
"field1": "ull"
}
}
}
Term match will not find the record because "ull" is not a full term.
GET index1/test1/_search
{
"query": {
"match": {
"field1": "ull"
}
}
}

Related

fuzziness in elastic search search by letter not by word

Hey I am trying to make a fuzzy search in Elasticsearch I write this query
"query": {
"match": {
"NAME": {
"query": "data" ,
"fuzziness": "AUTO"
}
}
}
but it keeps return the best match on word not the nearest letter
what I need is actually near to google search any idea ?

Elasticsearch: how to write bool query that will contain multiple conditions on the same token?

I have a field with tokenizer that splits by dots.
on search, the following value aaa.bbb will be splitted to two terms aaa and bbb.
My question is how to write bool query that will contain multiple conditions on the same term?
For example, i want to get all docs where its field contains a term that matches a fuzzy search for gmail but also the same term must not contain gamil.
Here are some examples of what i want to achieve:
bmail // MATCH: since its matches fuzzy search and is not gamil
gamil.bmail // MATCH: since the term bmail matches fuzzy search and is not gamil
gamil // NO MATCH: since its matches fuzzy search and but equals gamil
NOTE: the following query does NOT appear to be working since it looks as if one term matches one condition and the second term matches the other, it will be considered a hit.
{
...
"body": {
"query": {
"bool": {
"must": [
{
"fuzzy": {
"my_field": {
"value": "gmail",
"fuzziness": 1,
"max_expansions": 2100000000
}
}
},
{
"bool": {
"must_not": [
{
"query_string": {
"default_field": "my_field",
"query": "*gamil*",
"analyzer": "keyword"
}
}
]
}
}
]
}
}
},
}
I ended up using Highlight by executing fuzzy (or any other) query, and then programatically filter the results by the returned highlight object.
span queries might also be a good option if you don't need regular expression or you can make sure you don't exceed the boolean query limit.
(see more details in the provided link)

Elastic search Match Pharse Prefix not working exactly for different data

In Elastic search, i am using match_phrase_prefix to search data,
it is not working as expected,
{
"match_phrase_prefix": {
"title": {
"query": "microso"
}
}
}
Above sample returning all results with title Microsoft,
but for below query data is empty.
{
"match_phrase_prefix": {
"title": {
"query": "micro"
}
}
}
If prefix search word length is 50% then results are coming.
Please suggest how to fix it.
Mapping screenshot
Thanks

AND between tokens in elasticsearch

When I'm trying to search for a documents with such query (field indexed with Standard analyzer):
"query": {
"match": {
"Book": "OG/44"
}
}
I've got terms 'OG' and '44' and the result set will contain results where could be either of these terms. What analyzer/tokenizer I should use to get results when only both of terms are present?
You can set operator in match query (by default it is or)
"query": {
"match": {
"Book": {
"query": "OG/44",
"operator" : "and"
}
}
}
You have two tokens because standard analyzer tokenized them by slash, so if you need not this behaviour you can escape it

query_string vs group match in elasticsearch

What is the difference between such query:
"query": {
"bool": {
...
"should": [
{
"match": {
"description": {
"query": "test"
}
}
},
{
"match": {
"address": {
"query": "test",
}
}
},
{
"match": {
"country": {
"query": "test"
}
}
},
{
"match": {
"city": {
"query": "test"
}
}
}
]
}}
and that one:
"query": {
"bool": {
...
"should": [
{
"query_string": {
"query": "test",
"fields": [
"description",
"address",
"country",
"city"
]
}
}
]
}}
Performance, relevance?
Thanks in advance!
The query is analyzed depending on the field analyzer (unless you specify the analyzer in the query itself), thus querying multiple fields with a single query doesn't necessarily mean analyzing the query only once.
Keep in mind that the query_string supports the lucene query syntax: AND and OR operators, querying on specific fields, wildcard, phrase queries etc. therefore it needs to be parsed, which I don't think makes a lot of difference here in terms of performance, but it is error prone and might lead to errors. If you don't need all that power, stick to the match query, and if you want to perform the same query on multiple fields, have a look at the multi_match query, which does what you did with your query_string but translates internally to multiple match queries.
Also, the scores returned if you compare the output of multiple match queries and your query_string might be quite different. Using a bool query you effectively build a lucene boolean query, while the query_string uses by default "use_dis_max":"true", which means it uses internally a dis_max query by default. Same happens using the multi_match query. If you set use_dis_max to false a bool query is going to be used internally instead.
I terms of performance, I would say that the second query will have performance benefits because, the first query requires the query string to be analyzed for all the four match sections, while in the second there is only one query string that needs to be analyzed.
Apart from that, there are some comparisons done over here that you can look at.
I am not quite sure about the relevancy differences, but that you can always fire these two queries and see if there is any difference in relevance from the results fetched.

Resources