Not getting the correct score for the elastic search query result.
ES Query -
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "(emergency) OR (emergency*) OR (*emergency) OR (*emergency*)",
"fields": [
"MDMGlobalData.Name1"
]
}
}
]
}
}
}
ES result -
{
"took": 29,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 798,
"relation": "eq"
},
"max_score": 9.169065,
"hits": [
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551037160",
"_score": 9.169065,
"_source": {
"MDMGlobalData": {
"Name1": "PARAGON EMERGENCY"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551040507",
"_score": 9.169065,
"_source": {
"MDMGlobalData": {
"Name1": "EMERGENCY MD"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551076447",
"_score": 9.169065,
"_source": {
"MDMGlobalData": {
"Name1": "COASTAL EMERGENCY"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551100746",
"_score": 9.169065,
"_source": {
"MDMGlobalData": {
"Name1": "EMERGENCY MD"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551090880",
"_score": 9.169065,
"_source": {
"MDMGlobalData": {
"Name1": "PAFFORD EMERGENCY"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551106787",
"_score": 9.169065,
"_source": {
"MDMGlobalData": {
"Name1": "CAPROCK EMERGENCY"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551021568",
"_score": 9.121077,
"_source": {
"MDMGlobalData": {
"Name1": "WILTON EMERGENCY"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551124137",
"_score": 9.121077,
"_source": {
"MDMGlobalData": {
"Name1": "EMERGENCY ONE"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551125549",
"_score": 9.121077,
"_source": {
"MDMGlobalData": {
"Name1": "EMERGENCY ONE"
}
}
},
{
"_index": "customermasterdata",
"_type": "_doc",
"_id": "MDMCM551133066",
"_score": 9.121077,
"_source": {
"MDMGlobalData": {
"Name1": "EMERGENCY MD"
}
}
}
]
}
}
Ideally, The first set in the result should be the Name1 which has value just "emergency" or start with the word "emergency"
And how could we have the same score for almost first 5 result sets? Being the Name1 value is different.
Due to wrong scoring, the results are messed up.
How to correct the score in the result?
No, That need not be the case. Because ES follows Lucene scoring function
Reason for the same score:
You have only two terms in each document - emergency and one more word
Emergency word matches as it is. Field Length is same
Number of occurrence is one. i.e Term frequencies are same.
Relevancy is same for all the terms. idf
Coord is same as your doc contains only one occurrence of Emergency
But if you have a document with Emergency X Y Z, then score of this will be lower than the other documents which you have. Because term frequency is higher for this one.
And if you have only Emergency, score of this document will be higher than all.
It is perfectly normal to have same score in your scenario as user doesn't know which emergency he/she meant.
Update:
{
"query":{
"bool":{
"must":{
"term":{
"MDMGlobalData.Name1":"emergency"
}
}
}
}
}
With the sample data, output:
"hits": [
{
"_index": "emerge",
"_type": "_doc",
"_id": "iN1hKnMBojxRtp6HNI7d",
"_score": 0.10938574,
"_source": {
"MDMGlobalData": {
"Name1": "EMERGENCY"
}
}
},
{
"_index": "emerge",
"_type": "_doc",
"_id": "g91TKnMBojxRtp6Hto4q",
"_score": 0.08701137,
"_source": {
"MDMGlobalData": {
"Name1": "PARAGON EMERGENCY"
}
}
},
{
"_index": "emerge",
"_type": "_doc",
"_id": "hN1TKnMBojxRtp6H2I6A",
"_score": 0.08701137,
"_source": {
"MDMGlobalData": {
"Name1": "EMERGENCY MD"
}
}
},
{
"_index": "emerge",
"_type": "_doc",
"_id": "hd1TKnMBojxRtp6H_I6_",
"_score": 0.08701137,
"_source": {
"MDMGlobalData": {
"Name1": "COASTAL EMERGENCY"
}
}
},
{
"_index": "emerge",
"_type": "_doc",
"_id": "h91VKnMBojxRtp6HYI4e",
"_score": 0.07223585,
"_source": {
"MDMGlobalData": {
"Name1": "EMERGENCY MD X"
}
}
}
]
Is there a way to query starting from a particular value and get the next n records in Elasticsearch?
For example, I want to get 10 records starting from employee id "ABC_123".
The below query gives an error saying
[terms] query does not support [empId]
GET /_search
{
"from": 0, "size": 10,
"query" : {
"terms" : {
"empId" : "ABC_123"
}
}
}
What can I do about this?
You can use the prefix query, Also you can read more about the autocomplete on my blog, which discussed 4 approaches to make it work and their trade-off.
I used prefix query on your sample data and got the expected output and below is the step by step guide.
Index mapping
{
"mappings": {
"properties": {
"empId": {
"type": "keyword" --> field type `keyword`
}
}
}
}
Index sample docs
{
"empId" : "ABC_1231"
}
{
"empId" : "ABC_1232"
}
{
"empId" : "ABC_1233"
}
{
"empId" : "ABC_1234"
}
and so on
Prefix Search query
{
"from": 0,
"size": 10,
"query": {
"prefix": {
"empId": "ABC_123"
}
}
}
Search result
"hits": [
{
"_index": "so_prefix",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"empId": "ABC_1231"
}
},
{
"_index": "so_prefix",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"empId": "ABC_1232"
}
},
{
"_index": "so_prefix",
"_type": "_doc",
"_id": "3",
"_score": 1.0,
"_source": {
"empId": "ABC_1233"
}
},
{
"_index": "so_prefix",
"_type": "_doc",
"_id": "4",
"_score": 1.0,
"_source": {
"empId": "ABC_1234"
}
}
]
I have written the following lucene query in elasticsearch for getting documents with Id field as mentioned:
GET requirements_v3/_search
{
"from": 0,
"size": 10,
"query": {
"bool": {
"filter": {
"bool": {
"should": [
{"match": {
"Id": "b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b"
}},
{
"match": {
"Id": "048b7907-2b5a-438a-ace9-f1e1fd67ca69"
}
},
{
"match": {
"Id": "3b385896-1207-4f6d-8ae9-f3ced84cf1fa"
}
},
{
"match": {
"Id": "0aa1db52-c0fb-4bf6-9223-00edccc32703"
}
},
{
"match": {
"Id": "8c399993-f273-4ee0-a1ab-3a85c6848113"
}
},
{
"match": {
"Id": "4461eb37-487e-4899-a7be-914640fab0e0"
}
},
{
"match": {
"Id": "07052261-b904-4bfc-a6fd-3acd28114c6a"
}
},
{
"match": {
"Id": "95816ff0-9eae-4196-99fc-86c6f43395fd"
}
},
{
"match": {
"Id": "ea8a59a6-2b2f-467a-9beb-e281b1581a0a"
}
},
{
"match": {
"Id": "33f87d98-024f-4893-aa1c-8d438a98cd1f"
}
}
]
}
}
}
}
The response for the above query is:
{
"took": 14,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 18,
"max_score": 0,
"hits": [
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "9d8060da-c3e2-4f6d-b4e2-17e65b266c76",
"_score": 0,
"_source": {
"Id": "9d8060da-c3e2-4f6d-b4e2-17e65b266c76",
"Name": "Create Extended/Limited Warranty Configuration"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "4461eb37-487e-4899-a7be-914640fab0e0",
"_score": 0,
"_source": {
"Id": "4461eb37-487e-4899-a7be-914640fab0e0",
"Name": "Create Extended/Limited Warranty Configuration"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "33f87d98-024f-4893-aa1c-8d438a98cd1f",
"_score": 0,
"_source": {
"Id": "33f87d98-024f-4893-aa1c-8d438a98cd1f",
"Name": "Create Configurator"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "d75d9a7c-e145-487e-922f-102c16d0026f",
"_score": 0,
"_source": {
"Id": "d75d9a7c-e145-487e-922f-102c16d0026f",
"Name": "Create Configurator"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "007eadb7-adda-487e-b7fe-6f6b5648de2e",
"_score": 0,
"_source": {
"Id": "007eadb7-adda-487e-b7fe-6f6b5648de2e",
"Name": "Detail Page - Build"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "95816ff0-9eae-4196-99fc-86c6f43395fd",
"_score": 0,
"_source": {
"Id": "95816ff0-9eae-4196-99fc-86c6f43395fd",
"Name": "Create Extended/Limited Warranty Configuration"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "07052261-b904-4bfc-a6fd-3acd28114c6a",
"_score": 0,
"_source": {
"Id": "07052261-b904-4bfc-a6fd-3acd28114c6a",
"Name": "HUC"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "d60daf3a-4681-4bfc-a3a9-b04b5b005f73",
"_score": 0,
"_source": {
"Id": "d60daf3a-4681-4bfc-a3a9-b04b5b005f73",
"Name": "DAMS UpsertUnenrollPrice" }
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "c1b367f2-a57a-487e-994c-84470e0f9db4",
"_score": 0,
"_source": {
"Id": "c1b367f2-a57a-487e-994c-84470e0f9db4",
"Name": "Item Setup"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b",
"_score": 0,
"_source": {
"Id": "b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b",
"Name": "Installments"
}
}
]
}
}
This mentions totalHits as '18'. Why is it returning more items than 10? I believe match query should be used for 'exact' matches, so why more documents are returned here?
P.S.: I know I can use the Ids query for this, but I want to know why is this not returning the correct response
Update: Setting the size to 20 returns the following response:
{
"took": 195,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 18,
"max_score": 0,
"hits": [
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "9d8060da-c3e2-4f6d-b4e2-17e65b266c76",
"_score": 0,
"_source": {
"Id": "9d8060da-c3e2-4f6d-b4e2-17e65b266c76",
"Name": "Create Extended/Limited Warranty Configuration"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "4461eb37-487e-4899-a7be-914640fab0e0",
"_score": 0,
"_source": {
"Id": "4461eb37-487e-4899-a7be-914640fab0e0",
"Name": "Create Extended/Limited Warranty Configuration"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "33f87d98-024f-4893-aa1c-8d438a98cd1f",
"_score": 0,
"_source": {
"Id": "33f87d98-024f-4893-aa1c-8d438a98cd1f",
"Name": "Create Configurator"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "d75d9a7c-e145-487e-922f-102c16d0026f",
"_score": 0,
"_source": {
"Id": "d75d9a7c-e145-487e-922f-102c16d0026f",
"Name": "Create Configurator"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "007eadb7-adda-487e-b7fe-6f6b5648de2e",
"_score": 0,
"_source": {
"Id": "007eadb7-adda-487e-b7fe-6f6b5648de2e",
"Name": "Detail Page - Build"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "95816ff0-9eae-4196-99fc-86c6f43395fd",
"_score": 0,
"_source": {
"Id": "95816ff0-9eae-4196-99fc-86c6f43395fd",
"Name": "Create Extended/Limited Warranty Configuration"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "07052261-b904-4bfc-a6fd-3acd28114c6a",
"_score": 0,
"_source": {
"Id": "07052261-b904-4bfc-a6fd-3acd28114c6a",
"Name": "HUC"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "d60daf3a-4681-4bfc-a3a9-b04b5b005f73",
"_score": 0,
"_source": {
"Id": "d60daf3a-4681-4bfc-a3a9-b04b5b005f73",
"Name": "DAMS UpsertUnenrollPrice"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "c1b367f2-a57a-487e-994c-84470e0f9db4",
"_score": 0,
"_source": {
"Id": "c1b367f2-a57a-487e-994c-84470e0f9db4",
"Name": "Item Setup"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b",
"_score": 0,
"_source": {
"Id": "b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b",
"Name": "Installments"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "b9437079-47c4-487e-abf0-1ff076f69e0f",
"_score": 0,
"_source": {
"Id": "b9437079-47c4-487e-abf0-1ff076f69e0f",
"Name": "Detail Page - Strings "
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "0aa1db52-c0fb-4bf6-9223-00edccc32703",
"_score": 0,
"_source": {
"Id": "0aa1db52-c0fb-4bf6-9223-00edccc32703",
"Name": "Create Extended/Limited Warranty Configuration"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "ea8a59a6-2b2f-467a-9beb-e281b1581a0a",
"_score": 0,
"_source": {
"Id": "ea8a59a6-2b2f-467a-9beb-e281b1581a0a",
"Name": "Create Configurator"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "fd259359-4f6d-4530-ac29-fcebe00d66a6",
"_score": 0,
"_source": {
"Id": "fd259359-4f6d-4530-ac29-fcebe00d66a6",
"Name": "Invite Platform"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "1b2ba0bb-3e7f-46fb-b904-07460b84848b",
"_score": 0,
"_source": {
"Id": "1b2ba0bb-3e7f-46fb-b904-07460b84848b",
"Name": "Training"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "8c399993-f273-4ee0-a1ab-3a85c6848113",
"_score": 0,
"_source": {
"Id": "8c399993-f273-4ee0-a1ab-3a85c6848113",
"Name": "Configure ASIN for Reporting"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "3b385896-1207-4f6d-8ae9-f3ced84cf1fa",
"_score": 0,
"_source": {
"Id": "3b385896-1207-4f6d-8ae9-f3ced84cf1fa",
"Name": "Create Extended/Limited Warranty Configuration"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "048b7907-2b5a-438a-ace9-f1e1fd67ca69",
"_score": 0,
"_source": {
"Id": "048b7907-2b5a-438a-ace9-f1e1fd67ca69",
"Name": "Invite Platform"
}
}
]
}
}
Lets understand this by the following mapping e.g:
{
"_doc": {
"properties": {
"Id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"Name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
The above mapping is created dynamically by elasticsearch. Lets us now focus on Id field. Its type is text. By default the analyzer for text datatype is standard analyzer. When this analyzer is applied on the input for this field it get tokenized into terms. So for example if you input value for Id is 33f87d98-024f-4893-aa1c-8d438a98cd1f following tokens get generated:
33f87d98
024f
4893
aa1c
8d438a98cd1f
As you can see the input value is splitted by - being used as delimiter. This is because standard analyzer is applied on it.
There is another sub-field under Id which is keyword and its type is keyword. For type keyword the input is indexed as it is without applying any modification.
Now lets understand why more documents get matched and result count is more than expected. In your query you used match query on Id field as below:
{
"match": {
"Id": "b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b"
}
}
By default match query uses the same analyzer that is applied on the field in mapping. So on the Id value in the query again the same analyzer is applied and the input is splitted into tokens in a similar way as above. The default operator that is applied between tokens of match query input string is OR and hence your query actually becomes:
b8bf49a4 OR 960b OR 4fa8 OR 8c5f OR a3fce4b4d07b
There if any of the above tokens match to any of the indexed terms stored in Id field, the document is considered a match.
Solution for the above based on above mapping:
Use the keyword field instead. So the query becomes:
{
"match": {
"Id.keyword": "b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b"
}
}
More on how match works see here.
Also as mention by #Curious_MInd in his answer its better to use terms than using multiple match in should.
As you said that your Id is text as well as keyword so you should use Id.keyword for matching exact values like
GET requirements_v3/_search
{
"from": 0,
"size": 10,
"query": {
"bool": {
"filter": {
"bool": {
"should": [
{"match": {
"Id.keyword": "b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b"
}},
{
"match": {
"Id.keyword": "048b7907-2b5a-438a-ace9-f1e1fd67ca69"
}
}
]
}
}
}
}
But I guess you should use terms if you wants to match multiple exact values. Have a look here. For an example:
{
"terms" : {
"Id" : ["b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b", "048b7907-2b5a-438a-ace9-f1e1fd67ca69"]
}
}
I have a data set like below
"hits": [
{
"_index": "demotest",
"_type": "demotest",
"_id": "2",
"_score": 1,
"_source": {
"name": "Duvvuri ram gopal reddy"
}
},
{
"_index": "demotest",
"_type": "demotest",
"_id": "1",
"_score": 1,
"_source": {
"name": "ram gopal reddy"
}
},
{
"_index": "demotest",
"_type": "demotest",
"_id": "3",
"_score": 1,
"_source": {
"name": "reddy ram gopal"
}
}
]
When i try to perform a match query with value as ram gopal reddy exact matched record is not showing on top.
Query:
GET demotest/_search
{
"explain": true,
"query": {
"match": {
"name": "ram gopal reddy"
}
}
}
Result after execution of above query:
"hits": [
{
"_index": "demotest",
"_type": "demotest",
"_id": "2",
"_score": 0.8630463,
"_source": {
"name": "Duvvuri ram gopal reddy"
}
},
{
"_index": "demotest",
"_type": "demotest",
"_id": "1",
"_score": 0.7594807,
"_source": {
"name": "ram gopal reddy"
}
},
{
"_index": "demotest",
"_type": "demotest",
"_id": "3",
"_score": 0.7594807,
"_source": {
"name": "reddy ram gopal"
}
}
]
How to get exact matched record on top in search results. Also, I posted my complete test here and I want case insensitive search and not phrase search. Thanks
I'd use a bool query with multiple should clauses. One would be set with a phrase query and the other one as a match query.
I wrote a full (but complex example) here: https://gist.github.com/dadoonet/5179ee72ecbf08f12f53d4bda1b76bab
Should give something like:
GET oss/_search
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"name": {
"query" : "ram gopal reddy",
"boost": 8.0
}
}
},
{
"match": {
"name": {
"query": "ram gopal reddy"
}
}
}
]
}
}
}
Here is the query I am sending to ElasticSearch:
http://localhost:9200/user-index/user/_search/?queryb%5Bname%5D=Richard
The returned JSON is this:
{
"hits": [
{
"_index": "user-index",
"_type": "user",
"_id": "WgrvE-DzQJminNreBIsRNA",
"_score": 1.0,
"_source": {
"name": "Richard",
"db_id": "7"
}
},
{
"_index": "user-index",
"_type": "user",
"_id": "GwMOuYbUR8y48RrG4HgXdg",
"_score": 1.0,
"_source": {
"name": "John",
"db_id": "8"
}
},
{
"_index": "user-index",
"_type": "user",
"_id": "C-bgK3pNTNiX9Cz0x8EftA",
"_score": 1.0,
"_source": {
"name": "Harold",
"db_id": "2"
}
}
]
}
Only one of those actually matches. Why is it sending them all back?
Elasticsearch returns all records with the type user in the index user-index because it cannot find the search query. The search query should be specified as a query string in the parameter "q" or as a request body.
Try http://localhost:9200/user-index/user/_search?q=name%3ARichard