Elasticsearch match phrase prefix not matching all terms - elasticsearch

I am having an issue where when I use the match_phrase_prefix query in Elasticsearch, it is not returning all the results I would expect it to, particularly when the query is one word followed by one letter.
Take this index mapping (this is a contrived example to protect sensitive data):
http://localhost:9200/test/drinks/_mapping
returns:
{
"test": {
"mappings": {
"drinks": {
"properties": {
"name": {
"type": "text"
}
}
}
}
}
}
And amongst millions of other records are these:
{
"_index": "test",
"_type": "drinks",
"_id": "2",
"_score": 1,
"_source": {
"name": "Johnnie Walker Black Label"
}
},
{
"_index": "test",
"_type": "drinks",
"_id": "1",
"_score": 1,
"_source": {
"name": "Johnnie Walker Blue Label"
}
}
The following query, which is one word followed by two letters:
POST http://localhost:9200/test/drinks/_search
{
"query": {
"match_phrase_prefix" : {
"name" : "Walker Bl"
}
}
}
returns this:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.5753642,
"hits": [
{
"_index": "test",
"_type": "drinks",
"_id": "2",
"_score": 0.5753642,
"_source": {
"name": "Johnnie Walker Black Label"
}
},
{
"_index": "test",
"_type": "drinks",
"_id": "1",
"_score": 0.5753642,
"_source": {
"name": "Johnnie Walker Blue Label"
}
}
]
}
}
Whereas this query with one word and one letter:
POST http://localhost:9200/test/drinks/_search
{
"query": {
"match_phrase_prefix" : {
"name" : "Walker B"
}
}
}
returns no results. What could be happening here?

I will assume that you are working with Elasticsearch 5.0 and above.
I think it might have to be because of the max_expansions default value.
As seen in the documentation here, the max_expansions parameters is used to control how many prefixes the last term will be expanded with. The default value is 50 and it might explain why you find "black" and "blue" with the two first letters B and L, but not with the B only.
The documentation is pretty clear about it:
The match_phrase_prefix query is a poor-man’s autocomplete. It is very easy to use, which let’s you get started quickly with search-as-you-type but it’s results, which usually are good enough, can sometimes be confusing.
Consider the query string quick brown f. This query works by creating a phrase query out of quick and brown (i.e. the term quick must exist and must be followed by the term brown). Then it looks at the sorted term dictionary to find the first 50 terms that begin with f, and adds these terms to the phrase query.
The problem is that the first 50 terms may not include the term fox so the phase quick brown fox will not be found. This usually isn’t a problem as the user will continue to type more letters until the word they are looking for appears
I wouldn't be able to tell you if it's ok to increase this parameter above 50 if you are looking for good performances since I never tried myself.

Related

How can I influence Elasticsearch scoring by using higher score results informations?

I am upgrading my Elasticsearch server from version 1.6.0 to 7.12.1, which made me rewrite every query I had.
Those queries retrieves materials identified by 3 field : nature.idCat, nature.idNat and marque.idMrq (category ID, nature ID and brand ID).
I have a searching field on my application to search for specific materials, so if the user enter "photoc", the query sent to my Elasticsearch server looks like this :
{
"sort": [
"_score"
],
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "search",
"query": "*photoc*",
"boost": 10
}
},
[...] // Some more irrelevant conditions for this question like
// if nature.idCat = 26 then idNat must be in some range and idMrq in some other range
]
}
}
}
And 2 examples of "hits" results of this query :
"hits": [
{
"_index": "ref_biens",
"_type": "_doc",
"_id": "T3RrpXsBz_TibRxz0akC",
"_score": 13.0,
"_source": {
"search": "Photocopieur GENERIQUE",
"nature": {
"idCat": 26,
"idNat": 665,
"libelle": "Photocopieur",
"ekip": "U03C",
"codeINSEE": 300121,
"noteMaterielArrondi": 5
},
"marque": {
"idMrq": 16,
"libelle": "GENERIQUE",
"ekip": "Z999",
"idVRDuree": 808
}
}
},
{
"_index": "ref_biens",
"_type": "_doc",
"_id": "UHRrpXsBz_TibRxz0akC",
"_score": 13.0,
"_source": {
"search": "Photocopieur INFOTEC",
"nature": {
"idCat": 26,
"idNat": 665,
"libelle": "Photocopieur",
"ekip": "U03C",
"codeINSEE": 300121,
"noteMaterielArrondi": 5
},
"marque": {
"idMrq": 1244,
"libelle": "INFOTEC",
"ekip": "I091",
"idVRDuree": 808
}
}
}
]
This works perfectly !
My problem appears when the user types more than one word, for example if he is searching specifically for the "Photocopieur PANASONIC", the results of the query shows the right material as the first result with a _score of 23 but then every other match has the same _score of 13 which can bring some totally different material as the next results (matching only on the brand name for example) even though I whish for other "Photocopieur" to be displayed first.
The way I'm thinking of doing it is by adding "score points" to results that have the most similarities to the best match, for instance I would add a 6 point boost for the same nature.idCat, 4 points for the same nature.idNat and finally 2 points for the same marque.idMrq.
Any idea on how I can achieve that ? Is this the correct approach to my problem ?

Is there any way to match similar match in Elastic Search

I have a elastic search big document
I am searching with below query
{"size": 1000, "query": {"query_string": {"query": "( string1 )"}}}
Let say my string1 = Product, If some one accident type prduct some one forgot to o
Is there any way to search for that also
{"size": 1000, "query": {"query_string": {"query": "( prdct )"}}} also has to return result of prdct + product
You can use fuzzy query that returns documents that contain terms similar to the search term. Refer this blog to get detailed explanation of fuzzy queries.
Since,you have more edit distance to match prdct. Fuzziness parameter can be defined as :
0, 1, 2
0..2 = Must match exactly
3..5 = One edit allowed
More than 5 = Two edits allowed
Index Data:
{
"title":"product"
}
{
"title":"prdct"
}
Search Query:
{
"query": {
"fuzzy": {
"title": {
"value": "prdct",
"fuzziness":15,
"transpositions":true,
"boost": 5
}
}
}
}
Search Result:
"hits": [
{
"_index": "my-index1",
"_type": "_doc",
"_id": "2",
"_score": 3.465736,
"_source": {
"title": "prdct"
}
},
{
"_index": "my-index1",
"_type": "_doc",
"_id": "1",
"_score": 2.0794415,
"_source": {
"title": "product"
}
}
]
There are many solutions to this problem:
Suggestions (did you mean X instead).
Fuzziness (edits from your original search term).
Partial matching with autocomplete (if someone types "pr" and you provide the available search terms, they can click on the correct results right away) or n-grams (matching groups of letters).
All of those have tradeoffs in index / search overhead as well as the classic precision / recall problem.

Query with `field` returns nothing

I'm new to elastic search and am having troubles with my queries.
When I do a match all I get this;
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 1,
"hits": [{
"_index": "stations",
"_type": "station",
"_id": "4432",
"_score": 1,
"_source": {
"SiteName": "Abborrkroksvägen",
"LastModifiedUtcDateTime": "2015-02-13 10:34:20.643",
"ExistsFromDate": "2015-02-14 00:00:00.000"
}
},
{
"_index": "stations",
"_type": "station",
"_id": "9110",
"_score": 1,
"_source": {
"SiteName": "Abrahamsberg",
"LastModifiedUtcDateTime": "2012-03-26 23:55:32.900",
"ExistsFromDate": "2012-06-23 00:00:00.000"
}
}
]
}
}
My search query looks like this:
{
"query": {
"query_string": {
"fields": ["SiteName"],
"query": "a"
}
}
}
The problem is that when I run the query above I get empty results which is strange. I should receive both of the documents from my index, right?
What am I doing wrong? Did I index my data wrong or is my query just messed up?
Appreciate any help I can get. Thanks guys!
There is nothing wrong either in your data or query. It seems you didn't understand how data get stored in elasticsearch!
Firstly, when you index data("SiteName": "Abborrkroksvägen" and "SiteName": "Abrahamsberg") they will get stored as individual analysed terms.
When you query ES using "query":"a"(means here you are looking for the term "a" ) then it will look for whether there is any match with term a but as there are no terms so you will get empty results.
When you query ES using "query":"a*"(means all terms starts with "a") then it will return you as you expected.
Hope this clarifies your question!
Also you may have a look at article I found recently about search - https://www.timroes.de/2016/05/29/elasticsearch-kibana-queries-in-depth-tutorial/

How to enable elasticsearch auto-complete return only matching word

I need to implement auto-complete but I m not sure about the exact strategy. For example I have the following product :
Highsound Smart Phone Watch for Android (Gray)
So I need when the user starts typing: "s", "sm", "smar" , the word "smart" or "smart watch" to come out rather than the whole phrase: Highsound Smart Phone Watch for Android (Gray)
I looked around how google, amazon etc. do it and they dont display the whole matching record, but rather they display either only the word ("smart") or a phrase ("smart watch").
Right now I enable the automcomplete in elasticsearch according to the following link, but it returns the whole name of the matching record.
https://www.elastic.co/guide/en/elasticsearch/guide/current/_index_time_search_as_you_type.html
Any suggestions?
This is expected. You get back what is inside _source field. You can use highlighting to get back only the word that was matched.
{
"query": {
"match_phrase": {
"name": "sm"
}
},
"highlight": {
"fields": {
"name": {
"fragment_size": 1,
"number_of_fragments": 2
}
}
}
}
I have used number_of_fragments : 2 in case there are more than one word starting with sm. You can also change fragment size according to your needs. More on that.You will get something like this, then you can use highlight part for the frontend.
"hits": {
"total": 1,
"max_score": 0.6349302,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "5",
"_score": 0.6349302,
"_source": {
"name": "Highsound Smart Phone Watch for Android (Gray)"
},
"highlight": {
"name": [
" <em>Smart</em>"
]
}
}
]
}

Spurious results from elasticsearch

I suspect I can't (or I'm just not quite desperate enough to try yet!) give enough information to give you enough work on but I'm just hoping someone may be able to give me an idea of where to investigate...
I have an elastic search index which is in a live system and is working fine. I've added 3 attributes to the core entity in the index (productId). I'm getting the correct data back but every now and then it includes spurious data in the return results.
So for example (I've cut the list of fields down which is my it is a multi_match query).
Using Postman I am sending
{
"query" : {
"multi_match" : {
"query" : "FD41D359-1066-47C5-B930-C839F380FBDE",
"fields" : [ "softwareitem.productId" ]
}
}
}
I'm expecting 1 item to come back in this example and I'm getting 2. I've modified the result a little but the key thing is the productId. You can see in the 2nd item returned it is not the product Id be searched ?
Can anyone give me any idea where I should look next with this ? Is there a fault with my query or do you think the index might be corrupt in some way ?
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 27.424479,
"hits": [
{
"_index": "core_products",
"_type": "softwareitem",
"_id": "040EEEA1-4758-4F01-A55A-CAE710117C81",
"_score": 27.424479,
"_source": {
"id": "040EEEA1-4758-4F01-A55A-CAE710117C81",
"productId": "FD41D359-1066-47C5-B930-C839F380FBDE",
"softwareitem": {
"id": "040EEEA1-4758-4F01-A55A-CAE710117C81",
"title": "Code Library",
"description": "Blah Blah Blah",
"rmType": "Software",
"created": 1424445765000,
"updated": null
},
"searchable": true
}
},
{
"_index": "core_products",
"_type": "softwareitem",
"_id": "806B8F04-3E53-4278-BCC2-C2E1A17D2813",
"_score": 1.049637,
"_source": {
"id": "806B8F04-3E53-4278-BCC2-C2E1A17D2813",
"productId": "9FB80ABA-B09C-47C5-929A-9FB6C48BD5A8",
"softwareitem": {
"id": "806B8F04-3E53-4278-BCC2-C2E1A17D2813",
"title": "Video Game",
"description": "Blah Blah Blah",
"rmType": "Software",
"created": 1424445765000,
"updated": null
},
"searchable": true
}
}
]
}
}
It seems softwareitem.productId is a string field that it's being analysed. For doing exact matching of a string field, use a not_analyzed string field in your mapping, something like:
"productId" : {
"type" : "string",
"index" : "not_analyzed"
}
Probably your field is alread not_analyzed you have to do an additional change.
At query time you don't need to use a multi_match / match query. These type of queries will analyze your input string query and build a more complex query out of that input, for that reason you are seeing a second unexpected result (it contains 47C5, probably the analyzer is tokenising the full string and building a query that only one token needs to match) . You should use terms / term queries

Resources