I am trying to query with Elasticsearch to find documents with 2 matching conditions:
Here's the mapping in use:
{
"mappings": {
"stores": {
"properties": {
"locality": {
"type": "text"
},
"city": {
"type": "text"
},
"type": {
"type": "integer"
}
}
}
}
}
And here's my filter:
{
"query": {
"constant_score": {
"filter": {
"bool" : {
"must" : [
{
"term" : { "locality": "Shivajinagar" }
}, {
"term" : { "city": "Bangalore" }
}
]
}
}
}
}
}
No matter what values I try I always get:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
Even though Data exists(all documents search):
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 10742,
"max_score": 1.0,
"hits": [
{
"_index": "test_es",
"_type": "stores",
"_id": "942",
"_score": 1.0,
"_source": {
"type": 2,
"locality": "Palam Vihar",
"city": "Gurgaon"
}
},
{
"_index": "test_es",
"_type": "stores",
"_id": "944",
"_score": 1.0,
"_source": {
"type": 2,
"locality": "Chirag Dilli",
"city": "Delhi"
}
},
{
"_index": "test_es",
"_type": "stores",
"_id": "948",
"_score": 1.0,
"_source": {
"type": 1,
"locality": "Vashi",
"city": "Navi Mumbai"
}
},
{
"_index": "test_es",
"_type": "stores",
"_id": "980",
"_score": 1.0,
"_source": {
"type": 3,
"locality": "Sector 48",
"city": "Faridabad"
}
},
{
"_index": "test_es",
"_type": "stores",
"_id": "982",
"_score": 1.0,
"_source": {
"type": 2,
"locality": "Kammanahalli",
"city": "Bangalore"
}
},
{
"_index": "test_es",
"_type": "stores",
"_id": "984",
"_score": 1.0,
"_source": {
"type": 3,
"locality": "Tilak Nagar",
"city": "Delhi"
}
},
{
"_index": "test_es",
"_type": "stores",
"_id": "742",
"_score": 1.0,
"_source": {
"type": 3,
"locality": "Shivajinagar",
"city": "Bangalore"
}
},
{
"_index": "test_es",
"_type": "stores",
"_id": "752",
"_score": 1.0,
"_source": {
"type": 1,
"locality": "DLF Phase 3",
"city": "Gurgaon"
}
},
{
"_index": "test_es",
"_type": "stores",
"_id": "754",
"_score": 1.0,
"_source": {
"type": 3,
"locality": "Electronic City",
"city": "Bangalore"
}
},
{
"_index": "test_es",
"_type": "stores",
"_id": "778",
"_score": 1.0,
"_source": {
"type": 2,
"locality": "Bandra East",
"city": "Mumbai"
}
}
]
}
}
I tried using query instead of filter, even though I don't really care about scores, but nada!
Where might I be going wrong?!
Short Answer: Use match instead of term.
Long Answer:
The important thing to know here is that your search keywords, like: { "locality": "Shivajinagar" } and { "city": "Bangalore" } need to be compared in the same form as they were stored.
In the question, the mapping specifies that "locality" and "city" fields are of type: text. According to the documentation, type: text fields are analyzed by standard analyzer by default.
The default standard analyzer drops most punctuation, breaks up text
into individual words, and lower cases them. For instance, the
standard analyzer would turn the string “Quick Brown Fox!” into the
terms [quick, brown, fox]. This analysis process makes it possible to
search for individual words within a big block of full text.
The term query looks for the exact term in the field’s inverted
index — it doesn’t know anything about the field’s analyzer. This
makes it useful for looking up values in keyword fields, or in numeric
or date fields. When querying full text fields, use the match query
instead, which understands how the field has been analyzed.
So, when you search for "Bangalore" in a term query it looks for "Bangalore" in the city field while the index mapping had ensured that it was stored as "bangalore". That's why you get no matches.
You can find the documentation regarding the exact question here: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
Side Tip: Use the _analyze endpoint to check exactly what a particular analyzer emits on passing the input text.
Documentation for _analyze endpoint: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html
Related
I have an elasticsearch index containing user locations.
I need to perform aggregate query with geo bounding box using geohash grid, and for buckets that have documents count less than some value, i need to return all documents.
How can i do this?
Since you have not given any relevant information about the index which you have created and the user locations.
I am considering the below data:
index Def
{
"mappings": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
Index Sample Doc
POST _bulk
{"index":{"_id":1}}
{"location":"52.37408,4.912350","name":"The golden dragon"}
{"index":{"_id":2}}
{"location":"52.369219,4.901618","name":"Burger King"}
{"index":{"_id":3}}
{"location":"52.371667,4.914722","name":"Wendys"}
{"index":{"_id":4}}
{"location":"51.222900,4.405200","name":"Taco Bell"}
{"index":{"_id":5}}
{"location":"48.861111,2.336389","name":"McDonalds"}
{"index":{"_id":6}}
{"location":"48.860000,2.327000","name":"KFC"}
According to your question:
When requesting detailed buckets a filter like geo_bounding_box
should be applied to narrow the subject area
To know more about this, you can refer to this official ES doc
Now, in order to filter data based on doc_count with aggregations, we can use bucket_selector pipeline aggregation.
From documentation
Pipeline aggregations work on the outputs produced from other
aggregations rather than from document sets, adding information to the
output tree.
So, the amount of work that need to be done to calculate doc_count will be the same.
Query
{
"aggs": {
"location": {
"filter": {
"geo_bounding_box": {
"location": {
"top_left": {
"lat": 52.5225,
"lon": 4.5552
},
"bottom_right": {
"lat": 52.2291,
"lon": 5.2322
}
}
}
},
"aggs": {
"around_amsterdam": {
"geohash_grid": {
"field": "location",
"precision": 8
},
"aggs": {
"the_filter": {
"bucket_selector": {
"buckets_path": {
"the_doc_count": "_count"
},
"script": "params.the_doc_count < 2"
}
}
}
}
}
}
}
}
Search Result
"hits": {
"total": {
"value": 6,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "restaurant",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"location": "52.37408,4.912350",
"name": "The golden dragon"
}
},
{
"_index": "restaurant",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"location": "52.369219,4.901618",
"name": "Burger King"
}
},
{
"_index": "restaurant",
"_type": "_doc",
"_id": "3",
"_score": 1.0,
"_source": {
"location": "52.371667,4.914722",
"name": "Wendys"
}
},
{
"_index": "restaurant",
"_type": "_doc",
"_id": "4",
"_score": 1.0,
"_source": {
"location": "51.222900,4.405200",
"name": "Taco Bell"
}
},
{
"_index": "restaurant",
"_type": "_doc",
"_id": "5",
"_score": 1.0,
"_source": {
"location": "48.861111,2.336389",
"name": "McDonalds"
}
},
{
"_index": "restaurant",
"_type": "_doc",
"_id": "6",
"_score": 1.0,
"_source": {
"location": "48.860000,2.327000",
"name": "KFC"
}
}
]
},
"aggregations": {
"location": {
"doc_count": 3,
"around_amsterdam": {
"buckets": [
{
"key": "u173zy3j",
"doc_count": 1
},
{
"key": "u173zvfz",
"doc_count": 1
},
{
"key": "u173zt90",
"doc_count": 1
}
]
}
}
}
}
It will filter out all the documents, whose count is less than 2 based on "params.the_doc_count < 2"
I'm trying to build a food search engine on Elasticsearch that should meet following use cases -
If the user searches for 'coff' then it should return all the documents with phrase 'coffee' in their name and the priority should be for food items that have 'coffee' at the starting of their name.
If the user searches for 'green tea' then it should give priority to the documents that have both the phrases 'green tea' instead of splitting 'green' and 'tea'
If the phrase does not exist in the 'name' then it should also search in the alias field.
To manage the first case, I've used the edge n-grams analyzer.
Mapping -
{
"settings": {
"index": {
"analysis": {
"filter": {},
"analyzer": {
"analyzer_keyword": {
"tokenizer": "standard",
"filter": "lowercase"
},
"edge_ngram_analyzer": {
"filter": [
"lowercase"
],
"tokenizer": "edge_ngram_tokenizer"
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 5,
"token_chars": [
"letter"
]
}
}
}
}
},
"mappings": {
"doc": {
"properties": {
"alias": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"name": {
"type": "text",
"search_analyzer": "analyzer_keyword",
"analyzer": "edge_ngram_analyzer",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
This is the search query that I'm using but it's not exactly returning the relevant search results
{
"query": {
"multi_match": {
"query": "coffee",
"fields": ["name^2", "alias"]
}
}
}
There are over 1500 food items with 'coffee' in their name but the above query is only returning 2
{
"took": 745,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 8.657346,
"hits": [
{
"_index": "food-master",
"_type": "doc",
"_id": "a9uzinABb4g7LgmgoI1I",
"_score": 8.657346,
"_source": {
"id": 17463,
"name": "Rotiboy, coffee bun",
"alias": [
"Mexican Coffee Bun (Rotiboy)",
"Mexican coffee bun"
],
}
},
{
"_index": "food-master",
"_type": "doc",
"_id": "TNuzinABb4g7LgmgoFVI",
"_score": 7.0164866,
"_source": {
"id": 1344,
"name": "Coffee with sugar",
"alias": [
"Heart Friendly",
"Coffee With Sugar",
"Coffee With Milk and Sugar",
"Gluten Free",
"Hypertension Friendly"
],
}
}
]
}
}
In the mapping, if I remove the analyzer_keyword then it returns relevant results but the documents that start with 'coffee' are not prioritized
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1323,
"max_score": 57.561867,
"hits": [
{
"_index": "food-master-new",
"_type": "doc",
"_id": "nduzinABb4g7LgmgoINI",
"_score": 57.561867,
"_source": {
"name": "Egg Coffee",
"alias": [],
"id": 12609
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "dNuzinABb4g7LgmgoFVI",
"_score": 55.811295,
"_source": {
"name": "Coffee (Black)",
"alias": [
"Weight Loss",
"Diabetes Friendly",
"Gluten Free",
"Lactose Free",
"Heart Friendly",
"Hypertension Friendly"
],
"id": 1341
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "NduzinABb4g7LgmgoHxI",
"_score": 54.303185,
"_source": {
"name": "Brewed Coffee",
"alias": [
"StarBucks"
],
"id": 15679
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "ltuzinABb4g7LgmgoJJI",
"_score": 54.303185,
"_source": {
"name": "Coffee - Masala",
"alias": [],
"id": 11329
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "oduzinABb4g7LgmgoGpI",
"_score": 53.171227,
"_source": {
"name": "Coffee, German",
"alias": [],
"id": 12257
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "YNuzinABb4g7LgmgoFRI",
"_score": 52.929176,
"_source": {
"name": "Soy Milk Coffee",
"alias": [
"Gluten Free",
"Lactose Free",
"Weight Loss",
"Diabetes Friendly",
"Heart Friendly",
"Hypertension Friendly"
],
"id": 978
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "8duzinABb4g7LgmgoFRI",
"_score": 52.068523,
"_source": {
"name": "Cold Coffee (Soy Milk)",
"alias": [
"Soy Milk"
],
"id": 1097
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "tNuzinABb4g7LgmgoF9I",
"_score": 50.956154,
"_source": {
"name": "Coffee Frappe",
"alias": [],
"id": 3142
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "ZduzinABb4g7LgmgoF5I",
"_score": 49.810112,
"_source": {
"name": "Big Apple Coffee",
"alias": [],
"id": 3130
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "eduzinABb4g7LgmgoHtI",
"_score": 49.62197,
"_source": {
"name": "Mexican Coffee",
"alias": [],
"id": 13604
}
}
]
}
}
If I change the tokenizer to 'keyword' from 'standard' then I face the same problem and it also splits phrases into individual words - 'green tea' to 'green' and 'tea'
Any suggestions on what I might be getting wrong with respect to analyzers? I've tried all possible combinations but meeting all 3 scenarios with high accuracy is getting a little difficult.
To simplify:
PUT /test/vendors/1
{
"type": "doctor",
"name": "Ron",
"place": "Boston"
}
PUT /test/vendors/2
{
"type": "doctor",
"name": "Tom",
"place": "Boston"
}
PUT /test/vendors/3
{
"type": "doctor",
"name": "Jack",
"place": "San Fran"
}
Then search:
GET /test/_search
{
"query": {
"multi_match" : {
"query": "doctor in Boston",
"fields": [ "type", "place" ]
}
}
}
I understand why I get Jack who works in San Fran -- it's because he's a doctor too. However, I can't figure out why the match score is the SAME for him. The other two were matched with the place too, weren't they? why aren't Ron and Tom scored higher?
{
"took": 11,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.9245277,
"hits": [
{
"_index": "test",
"_type": "vendors",
"_id": "2",
"_score": 0.9245277,
"_source": {
"type": "doctor",
"name": "Tom",
"place": "Boston"
}
},
{
"_index": "test",
"_type": "vendors",
"_id": "1",
"_score": 0.9245277,
"_source": {
"type": "doctor",
"name": "Ron",
"place": "Boston"
}
},
{
"_index": "test",
"_type": "vendors",
"_id": "3",
"_score": 0.9245277,
"_source": {
"type": "doctor",
"name": "Jack",
"place": "San Fran"
}
}
]
}
}
Is there a way to force it to score less when less search keywords are found? Also, If I'n going to wrong way about this kind of search and there's a better pattern/way to do it -- I'd appreciate to be pointed in the right direction.
Your search structure is incorrect. The search query above is ignoring the place property and that's why you get the same score for all documents (only type property is taken into account). The reason for that is because works_at is a nested mapping, which should be treated differently when searching.
First, you should defined works_at as a nested mapping (read more here). Then you'll have to adjust your query to work with that nested mapping, see an example here.
GET /test/_search
{
"query": {
"multi_match" : {
"query": "doctor in Boston",
"fields": [ "type", "place" ],
"type": "most_fields" . <---- I WAS MISSING THIS
}
}
}
once in, that gave the correct results, where the "San Fran" guy is scored lower.
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1.2122098,
"hits": [
{
"_index": "test",
"_type": "vendors",
"_id": "2",
"_score": 1.2122098,
"_source": {
"type": "doctor",
"name": "Tom",
"place": "Boston"
}
},
{
"_index": "test",
"_type": "vendors",
"_id": "1",
"_score": 1.2122098,
"_source": {
"type": "doctor",
"name": "Ron",
"place": "Boston"
}
},
{
"_index": "test",
"_type": "vendors",
"_id": "3",
"_score": 0.9245277,
"_source": {
"type": "doctor",
"name": "Jack",
"place": "San Fran"
}
}
]
}
}
I have written the following lucene query in elasticsearch for getting documents with Id field as mentioned:
GET requirements_v3/_search
{
"from": 0,
"size": 10,
"query": {
"bool": {
"filter": {
"bool": {
"should": [
{"match": {
"Id": "b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b"
}},
{
"match": {
"Id": "048b7907-2b5a-438a-ace9-f1e1fd67ca69"
}
},
{
"match": {
"Id": "3b385896-1207-4f6d-8ae9-f3ced84cf1fa"
}
},
{
"match": {
"Id": "0aa1db52-c0fb-4bf6-9223-00edccc32703"
}
},
{
"match": {
"Id": "8c399993-f273-4ee0-a1ab-3a85c6848113"
}
},
{
"match": {
"Id": "4461eb37-487e-4899-a7be-914640fab0e0"
}
},
{
"match": {
"Id": "07052261-b904-4bfc-a6fd-3acd28114c6a"
}
},
{
"match": {
"Id": "95816ff0-9eae-4196-99fc-86c6f43395fd"
}
},
{
"match": {
"Id": "ea8a59a6-2b2f-467a-9beb-e281b1581a0a"
}
},
{
"match": {
"Id": "33f87d98-024f-4893-aa1c-8d438a98cd1f"
}
}
]
}
}
}
}
The response for the above query is:
{
"took": 14,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 18,
"max_score": 0,
"hits": [
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "9d8060da-c3e2-4f6d-b4e2-17e65b266c76",
"_score": 0,
"_source": {
"Id": "9d8060da-c3e2-4f6d-b4e2-17e65b266c76",
"Name": "Create Extended/Limited Warranty Configuration"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "4461eb37-487e-4899-a7be-914640fab0e0",
"_score": 0,
"_source": {
"Id": "4461eb37-487e-4899-a7be-914640fab0e0",
"Name": "Create Extended/Limited Warranty Configuration"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "33f87d98-024f-4893-aa1c-8d438a98cd1f",
"_score": 0,
"_source": {
"Id": "33f87d98-024f-4893-aa1c-8d438a98cd1f",
"Name": "Create Configurator"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "d75d9a7c-e145-487e-922f-102c16d0026f",
"_score": 0,
"_source": {
"Id": "d75d9a7c-e145-487e-922f-102c16d0026f",
"Name": "Create Configurator"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "007eadb7-adda-487e-b7fe-6f6b5648de2e",
"_score": 0,
"_source": {
"Id": "007eadb7-adda-487e-b7fe-6f6b5648de2e",
"Name": "Detail Page - Build"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "95816ff0-9eae-4196-99fc-86c6f43395fd",
"_score": 0,
"_source": {
"Id": "95816ff0-9eae-4196-99fc-86c6f43395fd",
"Name": "Create Extended/Limited Warranty Configuration"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "07052261-b904-4bfc-a6fd-3acd28114c6a",
"_score": 0,
"_source": {
"Id": "07052261-b904-4bfc-a6fd-3acd28114c6a",
"Name": "HUC"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "d60daf3a-4681-4bfc-a3a9-b04b5b005f73",
"_score": 0,
"_source": {
"Id": "d60daf3a-4681-4bfc-a3a9-b04b5b005f73",
"Name": "DAMS UpsertUnenrollPrice" }
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "c1b367f2-a57a-487e-994c-84470e0f9db4",
"_score": 0,
"_source": {
"Id": "c1b367f2-a57a-487e-994c-84470e0f9db4",
"Name": "Item Setup"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b",
"_score": 0,
"_source": {
"Id": "b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b",
"Name": "Installments"
}
}
]
}
}
This mentions totalHits as '18'. Why is it returning more items than 10? I believe match query should be used for 'exact' matches, so why more documents are returned here?
P.S.: I know I can use the Ids query for this, but I want to know why is this not returning the correct response
Update: Setting the size to 20 returns the following response:
{
"took": 195,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 18,
"max_score": 0,
"hits": [
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "9d8060da-c3e2-4f6d-b4e2-17e65b266c76",
"_score": 0,
"_source": {
"Id": "9d8060da-c3e2-4f6d-b4e2-17e65b266c76",
"Name": "Create Extended/Limited Warranty Configuration"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "4461eb37-487e-4899-a7be-914640fab0e0",
"_score": 0,
"_source": {
"Id": "4461eb37-487e-4899-a7be-914640fab0e0",
"Name": "Create Extended/Limited Warranty Configuration"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "33f87d98-024f-4893-aa1c-8d438a98cd1f",
"_score": 0,
"_source": {
"Id": "33f87d98-024f-4893-aa1c-8d438a98cd1f",
"Name": "Create Configurator"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "d75d9a7c-e145-487e-922f-102c16d0026f",
"_score": 0,
"_source": {
"Id": "d75d9a7c-e145-487e-922f-102c16d0026f",
"Name": "Create Configurator"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "007eadb7-adda-487e-b7fe-6f6b5648de2e",
"_score": 0,
"_source": {
"Id": "007eadb7-adda-487e-b7fe-6f6b5648de2e",
"Name": "Detail Page - Build"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "95816ff0-9eae-4196-99fc-86c6f43395fd",
"_score": 0,
"_source": {
"Id": "95816ff0-9eae-4196-99fc-86c6f43395fd",
"Name": "Create Extended/Limited Warranty Configuration"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "07052261-b904-4bfc-a6fd-3acd28114c6a",
"_score": 0,
"_source": {
"Id": "07052261-b904-4bfc-a6fd-3acd28114c6a",
"Name": "HUC"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "d60daf3a-4681-4bfc-a3a9-b04b5b005f73",
"_score": 0,
"_source": {
"Id": "d60daf3a-4681-4bfc-a3a9-b04b5b005f73",
"Name": "DAMS UpsertUnenrollPrice"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "c1b367f2-a57a-487e-994c-84470e0f9db4",
"_score": 0,
"_source": {
"Id": "c1b367f2-a57a-487e-994c-84470e0f9db4",
"Name": "Item Setup"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b",
"_score": 0,
"_source": {
"Id": "b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b",
"Name": "Installments"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "b9437079-47c4-487e-abf0-1ff076f69e0f",
"_score": 0,
"_source": {
"Id": "b9437079-47c4-487e-abf0-1ff076f69e0f",
"Name": "Detail Page - Strings "
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "0aa1db52-c0fb-4bf6-9223-00edccc32703",
"_score": 0,
"_source": {
"Id": "0aa1db52-c0fb-4bf6-9223-00edccc32703",
"Name": "Create Extended/Limited Warranty Configuration"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "ea8a59a6-2b2f-467a-9beb-e281b1581a0a",
"_score": 0,
"_source": {
"Id": "ea8a59a6-2b2f-467a-9beb-e281b1581a0a",
"Name": "Create Configurator"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "fd259359-4f6d-4530-ac29-fcebe00d66a6",
"_score": 0,
"_source": {
"Id": "fd259359-4f6d-4530-ac29-fcebe00d66a6",
"Name": "Invite Platform"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "1b2ba0bb-3e7f-46fb-b904-07460b84848b",
"_score": 0,
"_source": {
"Id": "1b2ba0bb-3e7f-46fb-b904-07460b84848b",
"Name": "Training"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "8c399993-f273-4ee0-a1ab-3a85c6848113",
"_score": 0,
"_source": {
"Id": "8c399993-f273-4ee0-a1ab-3a85c6848113",
"Name": "Configure ASIN for Reporting"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "3b385896-1207-4f6d-8ae9-f3ced84cf1fa",
"_score": 0,
"_source": {
"Id": "3b385896-1207-4f6d-8ae9-f3ced84cf1fa",
"Name": "Create Extended/Limited Warranty Configuration"
}
},
{
"_index": "requirements_v3",
"_type": "_doc",
"_id": "048b7907-2b5a-438a-ace9-f1e1fd67ca69",
"_score": 0,
"_source": {
"Id": "048b7907-2b5a-438a-ace9-f1e1fd67ca69",
"Name": "Invite Platform"
}
}
]
}
}
Lets understand this by the following mapping e.g:
{
"_doc": {
"properties": {
"Id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"Name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
The above mapping is created dynamically by elasticsearch. Lets us now focus on Id field. Its type is text. By default the analyzer for text datatype is standard analyzer. When this analyzer is applied on the input for this field it get tokenized into terms. So for example if you input value for Id is 33f87d98-024f-4893-aa1c-8d438a98cd1f following tokens get generated:
33f87d98
024f
4893
aa1c
8d438a98cd1f
As you can see the input value is splitted by - being used as delimiter. This is because standard analyzer is applied on it.
There is another sub-field under Id which is keyword and its type is keyword. For type keyword the input is indexed as it is without applying any modification.
Now lets understand why more documents get matched and result count is more than expected. In your query you used match query on Id field as below:
{
"match": {
"Id": "b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b"
}
}
By default match query uses the same analyzer that is applied on the field in mapping. So on the Id value in the query again the same analyzer is applied and the input is splitted into tokens in a similar way as above. The default operator that is applied between tokens of match query input string is OR and hence your query actually becomes:
b8bf49a4 OR 960b OR 4fa8 OR 8c5f OR a3fce4b4d07b
There if any of the above tokens match to any of the indexed terms stored in Id field, the document is considered a match.
Solution for the above based on above mapping:
Use the keyword field instead. So the query becomes:
{
"match": {
"Id.keyword": "b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b"
}
}
More on how match works see here.
Also as mention by #Curious_MInd in his answer its better to use terms than using multiple match in should.
As you said that your Id is text as well as keyword so you should use Id.keyword for matching exact values like
GET requirements_v3/_search
{
"from": 0,
"size": 10,
"query": {
"bool": {
"filter": {
"bool": {
"should": [
{"match": {
"Id.keyword": "b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b"
}},
{
"match": {
"Id.keyword": "048b7907-2b5a-438a-ace9-f1e1fd67ca69"
}
}
]
}
}
}
}
But I guess you should use terms if you wants to match multiple exact values. Have a look here. For an example:
{
"terms" : {
"Id" : ["b8bf49a4-960b-4fa8-8c5f-a3fce4b4d07b", "048b7907-2b5a-438a-ace9-f1e1fd67ca69"]
}
}
In elasticsearch how can I replace the id of every document with the value of another field in the document?
I don't think you can change the ids of existing documents in the index, but you can reindex them using the path parameter in your mapping. Here is a trivial example.
I set up a simple index, using the path parameter in the _id definition in the mapping, and added a few docs:
PUT /test_index
{
"mappings": {
"doc":{
"_id": {
"path": "number"
},
"properties": {
"text_field": {
"type": "string"
},
"number": {
"type": "integer"
}
}
}
}
}
POST /test_index/doc/_bulk
{"index":{}}
{"text_field": "Apple TV","number":3}
{"index":{}}
{"text_field": "Apple iPhone","number":2}
{"index":{}}
{"text_field": "Apple MacBook","number":1}
Then if I search, I can see that the ids were set as I asked:
POST /test_index/_search
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1,
"_source": {
"text_field": "Apple MacBook",
"number": 1
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 1,
"_source": {
"text_field": "Apple iPhone",
"number": 2
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 1,
"_source": {
"text_field": "Apple TV",
"number": 3
}
}
]
}
}
Here's the code I used:
http://sense.qbox.io/gist/933bd839b2d524889e483f50c59c37ffaab2270a
You can do this by defining a mapping for _id as described here id-mapping