Some weird problem with fuzzy query in elasticsearch - elasticsearch

I have one doc in es
"_type": "_doc",
"_id": "109487",
"_score": null,
"_source": {
"id": "109487",
"title": "Interstellar",
"year": 2014,
"genre": [
"Sci-Fi",
"IMAX"
]
},
"sort": [
"Interstellar"
]
}
I am searching with a fuzzy query like
{
"query": {
"fuzzy": {
"title": {"value": "intersteller", "fuzziness": 1}
}
}
}
But the weird thing is if i am searching with small i in intersteller then i am getting the desired record with title as Interstellar but if i am searching with Capital I ie if my query is
"query": {
"fuzzy": {
"title": {"value": "Intersteller", "fuzziness": 1}
}
}
}
then am not getting and docs from db .. just wanted to understand what is happening behind the scenes

The fuzzy query does not analyze the text. Mostly fuzzy query acts like a term query itself.
In your case "title" field must be using standard analyzer. So "Intersteller" is indexed as "intersteller". Now when you are performing a fuzzy query on "intersteller", you will get the result but not with "Intersteller"
To know more about fuzzy query refer to this elasticsearch blog
It is better to use a match query along with the fuzziness parameter
{
"query": {
"match": {
"title": {
"query": "Intersteller",
"fuzziness": "auto"
}
}
}
}
If you want use fuzzy query, then you need to increase the fuzziness parameter, to get your document to match
{
"query": {
"fuzzy": {
"title": {
"value": "Intersteller",
"fuzziness": 3
}
}
}
}

Related

add fuzziness to elasticsearch query

I have a query for an autocomplete/suggestions index that looks like this:
{
"size": 10,
"query": {
"multi_match": {
"query": "'"+search_text+"'",
"type": "bool_prefix",
"fields": [
"company_name",
"company_name._2gram",
"company_name._3gram"
]
}
}
}
This query works exactly as I want it to. However I want to add fuzziness:"AUTO" to this query. I read the documentation and tried adding it like this:
{
"size": 10,
"query": {
"multi_match": {
"query": {
"fuzzy": {
"value": "'"+search_text+"'",
"fuzziness": "AUTO"
}
},
"type": "bool_prefix",
"fields": [
"company_name",
"company_name._2gram",
"company_name._3gram"
]
}
}
}
But I get a this error
```
"type": "parsing_exception",
"reason": "[multi_match] unknown token [START_OBJECT] after [query]",```
This is causing my query not to work.
There is no need to add a fuzzy query. To add fuzziness to a multi-match query you need to add the fuzziness property as described here :
Since you are using bool_prefix as the type of multi-match query, so it creates a match_bool_prefix on each field that analyzes its input and constructs a bool query from the terms. Each term except the last is used in a term query. The last term is used in a prefix query.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"company_name": {
"type": "search_as_you_type",
"max_shingle_size": 3
},
"serviceTitle": {
"type": "search_as_you_type",
"max_shingle_size": 3
},
"services": {
"type": "search_as_you_type",
"max_shingle_size": 3
}
}
}
}
Index Data:
{
"company_name":"sequencing how shingles are actually used"
}
Search Query:
{
"size": 10,
"query": {
"multi_match": {
"query": "sequensing how shingles",
"type": "bool_prefix",
"fields": [
"company_name",
"company_name._2gram",
"company_name._3gram"
],
"fuzziness":"auto"
}
}
}
Search Result:
"hits": [
{
"_index": "65153201",
"_type": "_doc",
"_id": "1",
"_score": 1.5465959,
"_source": {
"company_name": "sequencing how shingles are actually used"
}
}
]
If you want to query sequensing, and get the above document, then you need to change the type of multi-match from bool_prefix to another type according to your use case.

How to add fuzziness to search as you type field in Elasticsearch?

I've been trying to add some fuzziness to my search as you type field type on Elasticsearch, but never got the needed query. Anyone have any idea to implement this?
Fuzzy Query returns documents that contain terms similar to the search term, as measured by a Levenshtein edit distance.
The fuzziness parameter can be specified as:
AUTO -- It generates an edit distance based on the length of the term.
For lengths:
0..2 -- must match exactly
3..5 -- one edit allowed Greater than 5 -- two edits allowed
Adding working example with index data and search query.
Index Data:
{
"title":"product"
}
{
"title":"prodct"
}
Search Query:
{
"query": {
"fuzzy": {
"title": {
"value": "prodc",
"fuzziness":2,
"transpositions":true,
"boost": 5
}
}
}
}
Search Result:
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 2.0794415,
"_source": {
"title": "product"
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "2",
"_score": 2.0794415,
"_source": {
"title": "produt"
}
}
]
Refer these blogs to get a detailed explaination on fuzzy query
https://www.elastic.co/blog/found-fuzzy-search
https://qbox.io/blog/elasticsearch-optimization-fuzziness-performance
Update 1:
Refer this ES official documentation
The fuzziness , prefix_length , max_expansions , rewrite , and
fuzzy_transpositions parameters are supported for the terms that are
used to construct term queries, but do not have an effect on the
prefix query constructed from the final term.
There are some open issues and discuss links that states that - Fuzziness not work with bool_prefix multi_match (search-as-you-type)
https://github.com/elastic/elasticsearch/issues/56229
https://discuss.elastic.co/t/fuzziness-not-work-with-bool-prefix-multi-match-search-as-you-type/229602/3
I know this question is asked long ago but I think this worked for me.
Since Elasticsearch allows a single field to be declared with multiple data types, my mapping is like below.
PUT products
{
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"product_type": {
"type": "search_as_you_type"
}
}
}
}
}
}
After adding some data to the index I fetched like this.
GET products/_search
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "prodc",
"type": "bool_prefix",
"fields": [
"title.product_type",
"title.product_type._2gram",
"title.product_type._3gram"
]
}
},
{
"multi_match": {
"query": "prodc",
"fuzziness": 2
}
}
]
}
}
}

Returning documents that match multiple wildcard string queries

I'm new to Elasticsearch and would greatly appreciate help on this
In the query below I only want the first document to be returned, but instead both documents are returned. How can I write a query to search for two wildcard strings on two separate fields, but only return documents that match?
I think what's being returned currently is score dependent, but I don't need the score.
POST /pr/_doc/1
{
"type": "Type ONE",
"currency":"USD"
}
POST /pr/_doc/2
{
"type": "Type TWO",
"currency":"USD"
}
GET /pr/_search
{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"query": "Type ON*",
"fields": ["type"],
"analyze_wildcard": true
}
},
{
"simple_query_string": {
"query": "US*",
"fields": ["currency"],
"analyze_wildcard":true
}
}
]
}
}
}
Use below query which uses the default_operator: AND and query string for in depth information and further reading.
Search query
{
"query": {
"query_string": {
"query": "(Type ON*) AND (US*)",
"fields" : ["type", "currency"],
"default_operator" : "AND"
}
}
}
Index your sample docs and it returns your expected doc only:
"hits": [
{
"_index": "multiplequery",
"_type": "_doc",
"_id": "1",
"_score": 2.1823215,
"_source": {
"type": "Type ONE",
"currency": "USD"
}
}
]

Elasticsearch query_string search complex keyword by its terms

Now, I know that keyword is not supposed to comprise unstructured text, but let's say that for some reason it just so happened that such text was written into keyword field.
When searching such documents using match or term queries, the document is not found, but when searched using query_string the document is found by a partial match(a "term" inside keyword). I don't understand how this is possible when the documentation for Elasticsearch clearly states that keyword is inverse-indexed as is, without terms tokenization.
Example:
My index mapping:
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"full_text": {
"type": "text"
},
"exact_value": {
"type": "keyword"
}
}
}
}
}
Then I put a document in:
PUT my_index/my_type/2
{
"full_text": "full text search",
"exact_value": "i want to find this trololo!"
}
And imagine my surprise when I get a document by keyword term, not a full match:
GET my_index/my_type/_search
{
"query": {
"match": {
"exact_value": "trololo"
}
}
}
- no result;
GET my_index/my_type/_search
{
"query": {
"term": {
"exact_value": "trololo"
}
}
}
- no result;
POST my_index/_search
{"query":{"query_string":{"query":"trololo"}}}
- my document is returned(!):
"hits": {
"total": 1,
"max_score": 0.27233246,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "2",
"_score": 0.27233246,
"_source": {
"full_text": "full text search",
"exact_value": "i want to find this trololo!"
}
}
]
}
when you do a query_string query on elastic like below
POST index/_search
{
"query": {
"query_string": {
"query": "trololo"
}
}
}
This actually do a search on _all field which if you don't mention get analyzed by standard analyzer in elastic.
If you specify the field in query like the following you won't get records for keyword field.
POST my_index/_search
{
"query": {
"query_string": {
"default_field": "exact_value",
"query": "field"
}
}
}

Elasticsearch boost

I have an index called find and a type called song.
Song type structure :
"_index": "find",
"_type": "song",
"_id": "192108",
"_source": {
"id": 192108,
"artist": "Melanie",
"title": "Dark Night",
"lyrics": "Hot air hangs like a dead man\nFrom a white oak tree",
"downloadCount": 234
}
Because of multiple songs maybe has same field values, so I need to boost results by a popularity field such as downloadCount.
How can i change below query to optimize by downloadCount?
GET /search/song/_search
{
"query": {
"multi_match": {
"query": "like a dead hangs",
"type": "most_fields",
"fields": ["artist","title","lyrics"],
"operator": "or"
}
}
}
You can use field_value_factor feature of elastic_search to boost the result by downloadCount
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#function-field-value-factor
you can use function score query. Function score query provides api for scoring the document based on the document field through script_score functions.
{
"query": {
"function_score": {
"query": {
"bool": {
"must": [{
"term": {
"you_filter_field": {
"value": "VALUE"
}
}
}]
}
},
"functions": [{
"script_score": {
"script": "doc['downloadCount'].value"
}
}]
}
}
}
Thanks

Resources