Extract top visited websites from logs - elasticsearch

We are storing log data containing information about sites that has been visited from our network. I had like to query the top 10 visited websites. How can I achieve this with ElasticSearch? The index mapping is as follows:
{
"data" : {
"properties" : {
"date": {
"type" : "date",
"format" : "yyyy-MM-dd HH:mm:ss"
},
"status": {"type" : "string"},
"group": {"type" : "string"},
"ip": {"type" : "ip"},
"username":{"type" : "string"},
"category":{"type" : "string"},
"url":{"type" : "string"}
}
}
}
Sample Data:
"hits": {
"total": 7,
"max_score": 1,
"hits": [
{
"_index": "squid",
"_type": "data",
"_id": "AU_DT4_ibdcNyAnt753J",
"_score": 1,
"_source": {
"date": "2015-08-16T00:02:00.195Z",
"status": "PASS",
"group": "level3",
"ip": "10.249.10.49",
"username": "Hyder",
"category": "ads",
"url": "https://gmail.com/mail/u/0/#inbox"
}
},
{
"_index": "squid",
"_type": "data",
"_id": "AU_DMjDpbdcNyAnt75iB",
"_score": 1,
"_source": {
"date": "2015-08-15T00:01:00.195Z",
"status": "BLOCK",
"group": "level3",
"ip": "10.249.10.51",
"username": "Fary",
"category": "ads",
"url": "https://gmail.com/details/blabla"
}
},
{
"_index": "squid",
"_type": "data",
"_id": "AU_DT94kbdcNyAnt753Y",
"_score": 1,
"_source": {
"date": "2015-08-17T00:02:00.195Z",
"status": "PASS",
"group": "level3",
"ip": "10.249.10.49",
"username": "Hyder",
"category": "news",
"url": "http://aol.com"
}
},
{
"_index": "squid",
"_type": "data",
"_id": "AU_CwTEqbdcNyAnt74RJ",
"_score": 1,
"_source": {
"date": "2015-08-15T00:00:00.195Z",
"status": "PASS",
"group": "level3",
"ip": "10.249.10.49",
"username": "Hyder",
"category": "Blog",
"url": "http://gmail.com"
}
},
{
"_index": "squid",
"_type": "data",
"_id": "AU_DMmUQbdcNyAnt75iQ",
"_score": 1,
"_source": {
"date": "2015-08-15T00:02:00.195Z",
"status": "PASS",
"group": "level3",
"ip": "10.249.10.51",
"username": "Fary",
"category": "ads",
"url": "http://yahoo.com/vbfhghfgjfdgfd"
}
},
{
"_index": "squid",
"_type": "data",
"_id": "AU_DT1yjbdcNyAnt753B",
"_score": 1,
"_source": {
"date": "2015-08-16T00:02:00.195Z",
"status": "REDIR",
"group": "level3",
"ip": "10.249.10.49",
"username": "Hyder",
"category": "ads",
"url": "http://news.yahoo.com/"
}
},
{
"_index": "squid",
"_type": "data",
"_id": "AU_DMV1ObdcNyAnt75hd",
"_score": 1,
"_source": {
"date": "2015-08-15T00:01:00.195Z",
"status": "BLOCK",
"group": "level3",
"ip": "10.249.10.50",
"username": "Kamal",
"category": "Blog",
"url": "http://hotmail.com/dfdgfgfdg"
}
}
]
What I had like to have:
Top visited sites:
- **Sites - Hits**
- gmail.com - 3
- yahoo.com - 2
- hotmail.com - 1
- aol.com - 1

First you need to extract the base site ( Like gmail.com ) from the URL field before indexing and add it to a new field. Lets assume this new field is baseSite.
Then , you need to follow what is exactly told in this blog.
First make the field baseSite as not_analyzed and then do a terms aggregation on that field.

Related

How to filter an aggregation based on the values of top hits?

The mapping of the elastic search documents are as follows:
{
"mappings": {
"properties": {
"vote_id": {
"type": "keyword"
},
"user_id": {
"type": "text"
},
"song_id": {
"type": "text"
},
"type": {
"type": "byte"
},
"timestamp": {
"type": "date"
}
}
}
}
I want to aggregate these votes such that it returns songs that you AND your friends like.
So far, I have buckets of songs that you and your friends like, but some buckets may be songs that only your friends like.
{
"query": {
"bool": {
"must": {
"terms": {
"user_id": ["you and your friend ids"]
}
}
}
},
"aggs": {
"songs": {
"terms": {
"field": "song_id"
},
"aggs": {
"docs": {
"top_hits": {
"size": "length of you and your friends",
"_source": ["vote_id", "song_id", "user_id", "type", "timestamp"]
}
},
"more_than_one": {
"bucket_selector": {
"buckets_path": {
"count": "_count"
},
"script": "params.count > 1"
}
}
},
}
}
}
I want to filter the buckets such that at least one of the documents in the top hits has your user id.
This is the current response
"aggregations": {
"songs": {
"buckets": [
{
"doc_count": 5,
"docs": {
"hits": {
"hits": [
{
"_id": "ivotNtFBb9TCEfpk3S54q6gcMbjZB82Xc1_ZCgA6kYsUmvk",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isngVOkuaMqJTu6eQDj73gDYGObUus3g5Qp8",
"timestamp": "2022-07-08T19:41:56.118207343Z",
"type": 1,
"user_id": "iusr8keSbPjX9ZqFhX4Dei4G",
"vote_id": "ivotNtFBb9TCEfpk3S54q6gcMbjZB82Xc1_ZCgA6kYsUmvk"
},
"_type": "_doc"
},
{
"_id": "ivotEFcqOlCL5htJZJ43NslAP555DaPj0Dgkcay_Ml2jAT4",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isngVOkuaMqJTu6eQDj73gDYGObUus3g5Qp8",
"timestamp": "2022-07-08T19:41:52.143988883Z",
"type": 1,
"user_id": "iusr18wnuxsy8oVVK3Xic4Sy",
"vote_id": "ivotEFcqOlCL5htJZJ43NslAP555DaPj0Dgkcay_Ml2jAT4"
},
"_type": "_doc"
},
{
"_id": "ivotToZ-0iBiM_zF5TP1Shj5C29WV3U0ibedlxvcQccimeo",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isngVOkuaMqJTu6eQDj73gDYGObUus3g5Qp8",
"timestamp": "2022-07-08T19:41:50.178450007Z",
"type": 1,
"user_id": "iusrKDxnm75fADEpusbmx5JM",
"vote_id": "ivotToZ-0iBiM_zF5TP1Shj5C29WV3U0ibedlxvcQccimeo"
},
"_type": "_doc"
},
{
"_id": "ivotAHBPual232E12ggibhr6GfQ5E3f9Ryov0gYKGrIRB0Y",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isngVOkuaMqJTu6eQDj73gDYGObUus3g5Qp8",
"timestamp": "2022-07-08T19:41:52.886305925Z",
"type": 1,
"user_id": "iusrJG4GwkWa6Y70LPkuNCPg",
"vote_id": "ivotAHBPual232E12ggibhr6GfQ5E3f9Ryov0gYKGrIRB0Y"
},
"_type": "_doc"
},
{
"_id": "ivot7a8rWunlFu_q5St44PYDeNelLq4bsxr9wzYP9D80wxE",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isngVOkuaMqJTu6eQDj73gDYGObUus3g5Qp8",
"timestamp": "2022-07-08T19:41:49.031694548Z",
"type": 1,
"user_id": "iusrxunBXT1UD0IrvjqjgWaj",
"vote_id": "ivot7a8rWunlFu_q5St44PYDeNelLq4bsxr9wzYP9D80wxE"
},
"_type": "_doc"
}
],
"max_score": 1,
"total": {
"relation": "eq",
"value": 5
}
}
},
"key": "isngVOkuaMqJTu6eQDj73gDYGObUus3g5Qp8"
},
{
"doc_count": 4,
"docs": {
"hits": {
"hits": [
{
"_id": "ivot9_2eQ_3eqU7SXQBnLWGQwFI5DE99Naf8wYbFNFrj1lk",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isng4hKgRPQvH0YhtBy5GaUqgCdwDoVhJuf6",
"timestamp": "2022-07-08T19:41:55.761587927Z",
"type": 1,
"user_id": "iusr8keSbPjX9ZqFhX4Dei4G",
"vote_id": "ivot9_2eQ_3eqU7SXQBnLWGQwFI5DE99Naf8wYbFNFrj1lk"
},
"_type": "_doc"
},
{
"_id": "ivotUZRVSKzGbmlP4LlmBkMwMM8xcR4nGTE9KNpysVR0vXQ",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isng4hKgRPQvH0YhtBy5GaUqgCdwDoVhJuf6",
"timestamp": "2022-07-08T19:41:52.555377592Z",
"type": 1,
"user_id": "iusr18wnuxsy8oVVK3Xic4Sy",
"vote_id": "ivotUZRVSKzGbmlP4LlmBkMwMM8xcR4nGTE9KNpysVR0vXQ"
},
"_type": "_doc"
},
{
"_id": "ivot5Wj8pIkbO0JOV_5s2PqEvZU3sy0WSYYUSlgs2Qizfo8",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isng4hKgRPQvH0YhtBy5GaUqgCdwDoVhJuf6",
"timestamp": "2022-07-08T19:41:49.756332674Z",
"type": 1,
"user_id": "iusrKDxnm75fADEpusbmx5JM",
"vote_id": "ivot5Wj8pIkbO0JOV_5s2PqEvZU3sy0WSYYUSlgs2Qizfo8"
},
"_type": "_doc"
},
{
"_id": "ivot8QNCJGsNtRiZYa-QMTUHEh5MHHHr4EKJsXm4UTAwJkg",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isng4hKgRPQvH0YhtBy5GaUqgCdwDoVhJuf6",
"timestamp": "2022-07-08T19:41:53.26319105Z",
"type": 1,
"user_id": "iusrJG4GwkWa6Y70LPkuNCPg",
"vote_id": "ivot8QNCJGsNtRiZYa-QMTUHEh5MHHHr4EKJsXm4UTAwJkg"
},
"_type": "_doc"
}
],
"max_score": 1,
"total": {
"relation": "eq",
"value": 4
}
}
},
"key": "isng4hKgRPQvH0YhtBy5GaUqgCdwDoVhJuf6"
},
{
"doc_count": 3,
"docs": {
"hits": {
"hits": [
{
"_id": "ivotoCfJc_q5vuY27KmvZUo8s4tilI57_xJoPXqfSeJTikg",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isngHAVCux40BgjGVZuAwZlTiEjVQFxuxurc",
"timestamp": "2022-07-08T19:41:50.527352591Z",
"type": 1,
"user_id": "iusrL8FCabxg1YCeaakcVXG5",
"vote_id": "ivotoCfJc_q5vuY27KmvZUo8s4tilI57_xJoPXqfSeJTikg"
},
"_type": "_doc"
},
{
"_id": "ivotStjKDIRy6vfaO5dNws4wGPELywPRgg7D7uSavatfIEo",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isngHAVCux40BgjGVZuAwZlTiEjVQFxuxurc",
"timestamp": "2022-07-08T19:41:51.733375716Z",
"type": 1,
"user_id": "iusr18wnuxsy8oVVK3Xic4Sy",
"vote_id": "ivotStjKDIRy6vfaO5dNws4wGPELywPRgg7D7uSavatfIEo"
},
"_type": "_doc"
},
{
"_id": "ivotUHj_Ebh-xIqqPJNEdWAuc_JO_mcVVG8F9wM67bJ7_6A",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isngHAVCux40BgjGVZuAwZlTiEjVQFxuxurc",
"timestamp": "2022-07-08T19:41:48.60900159Z",
"type": 1,
"user_id": "iusrxunBXT1UD0IrvjqjgWaj",
"vote_id": "ivotUHj_Ebh-xIqqPJNEdWAuc_JO_mcVVG8F9wM67bJ7_6A"
},
"_type": "_doc"
}
],
"max_score": 1,
"total": {
"relation": "eq",
"value": 3
}
}
},
"key": "isngHAVCux40BgjGVZuAwZlTiEjVQFxuxurc"
},
{
"doc_count": 3,
"docs": {
"hits": {
"hits": [
{
"_id": "ivotE5xO9hZGLhrS2sL1mgf5UbcHtf_5qAwbrp3QwEtf4zk",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isngOMnUKFVT1cKsH6Q9JfpF3WEZ4H4iU75W",
"timestamp": "2022-07-08T19:41:54.99023451Z",
"type": 1,
"user_id": "iusre80pxIMFB XfF61SHlCiz",
"vote_id": "ivotE5xO9hZGLhrS2sL1mgf5UbcHtf_5qAwbrp3QwEtf4zk"
},
"_type": "_doc"
},
{
"_id": "ivotI0OCI6gz6oEV94hgvjZmGB-qA4n-EigirDRJpgeZt68",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isngOMnUKFVT1cKsH6Q9JfpF3WEZ4H4iU75W",
"timestamp": "2022-07-08T19:41:50.952366924Z",
"type": 1,
"user_id": "iusr3oWy3mxsBWu6CU4mlw5L",
"vote_id": "ivotI0OCI6gz6oEV94hgvjZmGB-qA4n-EigirDRJpgeZt68"
},
"_type": "_doc"
},
{
"_id": "ivotm7GrIeyWRHamPXF9klzTZ0La8H4evCgWkCTIpx8rLl4",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isngOMnUKFVT1cKsH6Q9JfpF3WEZ4H4iU75W",
"timestamp": "2022-07-08T19:41:48.234881506Z",
"type": 1,
"user_id": "iusrCbCltg4nzv0b2JfUbyhj",
"vote_id": "ivotm7GrIeyWRHamPXF9klzTZ0La8H4evCgWkCTIpx8rLl4"
},
"_type": "_doc"
}
],
"max_score": 1,
"total": {
"relation": "eq",
"value": 3
}
}
},
"key": "isngOMnUKFVT1cKsH6Q9JfpF3WEZ4H4iU75W"
},
{
"doc_count": 2,
"docs": {
"hits": {
"hits": [
{
"_id": "ivotGWwrgPehs9s7ZwZACzkVNp4-_SUaUu3noUKyBH8IBnw",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isngfMv4IemhtjXX78LTqxKFBc1VMeUz6And",
"timestamp": "2022-07-08T19:41:53.785333717Z",
"type": 1,
"user_id": "iusrYvNFTaTg4RBNBxG63nkY",
"vote_id": "ivotGWwrgPehs9s7ZwZACzkVNp4-_SUaUu3noUKyBH8IBnw"
},
"_type": "_doc"
},
{
"_id": "ivotrWG5cy6vEbe0N4JO4IKzHZyahOlkyPctCdrBnBu-v9Q",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isngfMv4IemhtjXX78LTqxKFBc1VMeUz6And",
"timestamp": "2022-07-08T19:41:51.303745591Z",
"type": 1,
"user_id": "iusr18wnuxsy8oVVK3Xic4Sy",
"vote_id": "ivotrWG5cy6vEbe0N4JO4IKzHZyahOlkyPctCdrBnBu-v9Q"
},
"_type": "_doc"
}
],
"max_score": 1,
"total": {
"relation": "eq",
"value": 2
}
}
},
"key": "isngfMv4IemhtjXX78LTqxKFBc1VMeUz6And"
}
],
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0
}
}
I want to filter out the aggregate hits that have documents with your own id

Edge n-gram suggestions and 'starts with' keyword in Elasticsearch

I'm trying to build a food search engine on Elasticsearch that should meet following use cases -
If the user searches for 'coff' then it should return all the documents with phrase 'coffee' in their name and the priority should be for food items that have 'coffee' at the starting of their name.
If the user searches for 'green tea' then it should give priority to the documents that have both the phrases 'green tea' instead of splitting 'green' and 'tea'
If the phrase does not exist in the 'name' then it should also search in the alias field.
To manage the first case, I've used the edge n-grams analyzer.
Mapping -
{
"settings": {
"index": {
"analysis": {
"filter": {},
"analyzer": {
"analyzer_keyword": {
"tokenizer": "standard",
"filter": "lowercase"
},
"edge_ngram_analyzer": {
"filter": [
"lowercase"
],
"tokenizer": "edge_ngram_tokenizer"
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 5,
"token_chars": [
"letter"
]
}
}
}
}
},
"mappings": {
"doc": {
"properties": {
"alias": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"name": {
"type": "text",
"search_analyzer": "analyzer_keyword",
"analyzer": "edge_ngram_analyzer",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
This is the search query that I'm using but it's not exactly returning the relevant search results
{
"query": {
"multi_match": {
"query": "coffee",
"fields": ["name^2", "alias"]
}
}
}
There are over 1500 food items with 'coffee' in their name but the above query is only returning 2
{
"took": 745,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 8.657346,
"hits": [
{
"_index": "food-master",
"_type": "doc",
"_id": "a9uzinABb4g7LgmgoI1I",
"_score": 8.657346,
"_source": {
"id": 17463,
"name": "Rotiboy, coffee bun",
"alias": [
"Mexican Coffee Bun (Rotiboy)",
"Mexican coffee bun"
],
}
},
{
"_index": "food-master",
"_type": "doc",
"_id": "TNuzinABb4g7LgmgoFVI",
"_score": 7.0164866,
"_source": {
"id": 1344,
"name": "Coffee with sugar",
"alias": [
"Heart Friendly",
"Coffee With Sugar",
"Coffee With Milk and Sugar",
"Gluten Free",
"Hypertension Friendly"
],
}
}
]
}
}
In the mapping, if I remove the analyzer_keyword then it returns relevant results but the documents that start with 'coffee' are not prioritized
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1323,
"max_score": 57.561867,
"hits": [
{
"_index": "food-master-new",
"_type": "doc",
"_id": "nduzinABb4g7LgmgoINI",
"_score": 57.561867,
"_source": {
"name": "Egg Coffee",
"alias": [],
"id": 12609
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "dNuzinABb4g7LgmgoFVI",
"_score": 55.811295,
"_source": {
"name": "Coffee (Black)",
"alias": [
"Weight Loss",
"Diabetes Friendly",
"Gluten Free",
"Lactose Free",
"Heart Friendly",
"Hypertension Friendly"
],
"id": 1341
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "NduzinABb4g7LgmgoHxI",
"_score": 54.303185,
"_source": {
"name": "Brewed Coffee",
"alias": [
"StarBucks"
],
"id": 15679
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "ltuzinABb4g7LgmgoJJI",
"_score": 54.303185,
"_source": {
"name": "Coffee - Masala",
"alias": [],
"id": 11329
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "oduzinABb4g7LgmgoGpI",
"_score": 53.171227,
"_source": {
"name": "Coffee, German",
"alias": [],
"id": 12257
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "YNuzinABb4g7LgmgoFRI",
"_score": 52.929176,
"_source": {
"name": "Soy Milk Coffee",
"alias": [
"Gluten Free",
"Lactose Free",
"Weight Loss",
"Diabetes Friendly",
"Heart Friendly",
"Hypertension Friendly"
],
"id": 978
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "8duzinABb4g7LgmgoFRI",
"_score": 52.068523,
"_source": {
"name": "Cold Coffee (Soy Milk)",
"alias": [
"Soy Milk"
],
"id": 1097
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "tNuzinABb4g7LgmgoF9I",
"_score": 50.956154,
"_source": {
"name": "Coffee Frappe",
"alias": [],
"id": 3142
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "ZduzinABb4g7LgmgoF5I",
"_score": 49.810112,
"_source": {
"name": "Big Apple Coffee",
"alias": [],
"id": 3130
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "eduzinABb4g7LgmgoHtI",
"_score": 49.62197,
"_source": {
"name": "Mexican Coffee",
"alias": [],
"id": 13604
}
}
]
}
}
If I change the tokenizer to 'keyword' from 'standard' then I face the same problem and it also splits phrases into individual words - 'green tea' to 'green' and 'tea'
Any suggestions on what I might be getting wrong with respect to analyzers? I've tried all possible combinations but meeting all 3 scenarios with high accuracy is getting a little difficult.

Elasticsearch works for one term in must, but not two

I am trying to query with Elasticsearch to find documents with 2 matching conditions:
Here's the mapping in use:
{
"mappings": {
"stores": {
"properties": {
"locality": {
"type": "text"
},
"city": {
"type": "text"
},
"type": {
"type": "integer"
}
}
}
}
}
And here's my filter:
{
"query": {
"constant_score": {
"filter": {
"bool" : {
"must" : [
{
"term" : { "locality": "Shivajinagar" }
}, {
"term" : { "city": "Bangalore" }
}
]
}
}
}
}
}
No matter what values I try I always get:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
Even though Data exists(all documents search):
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 10742,
"max_score": 1.0,
"hits": [
{
"_index": "test_es",
"_type": "stores",
"_id": "942",
"_score": 1.0,
"_source": {
"type": 2,
"locality": "Palam Vihar",
"city": "Gurgaon"
}
},
{
"_index": "test_es",
"_type": "stores",
"_id": "944",
"_score": 1.0,
"_source": {
"type": 2,
"locality": "Chirag Dilli",
"city": "Delhi"
}
},
{
"_index": "test_es",
"_type": "stores",
"_id": "948",
"_score": 1.0,
"_source": {
"type": 1,
"locality": "Vashi",
"city": "Navi Mumbai"
}
},
{
"_index": "test_es",
"_type": "stores",
"_id": "980",
"_score": 1.0,
"_source": {
"type": 3,
"locality": "Sector 48",
"city": "Faridabad"
}
},
{
"_index": "test_es",
"_type": "stores",
"_id": "982",
"_score": 1.0,
"_source": {
"type": 2,
"locality": "Kammanahalli",
"city": "Bangalore"
}
},
{
"_index": "test_es",
"_type": "stores",
"_id": "984",
"_score": 1.0,
"_source": {
"type": 3,
"locality": "Tilak Nagar",
"city": "Delhi"
}
},
{
"_index": "test_es",
"_type": "stores",
"_id": "742",
"_score": 1.0,
"_source": {
"type": 3,
"locality": "Shivajinagar",
"city": "Bangalore"
}
},
{
"_index": "test_es",
"_type": "stores",
"_id": "752",
"_score": 1.0,
"_source": {
"type": 1,
"locality": "DLF Phase 3",
"city": "Gurgaon"
}
},
{
"_index": "test_es",
"_type": "stores",
"_id": "754",
"_score": 1.0,
"_source": {
"type": 3,
"locality": "Electronic City",
"city": "Bangalore"
}
},
{
"_index": "test_es",
"_type": "stores",
"_id": "778",
"_score": 1.0,
"_source": {
"type": 2,
"locality": "Bandra East",
"city": "Mumbai"
}
}
]
}
}
I tried using query instead of filter, even though I don't really care about scores, but nada!
Where might I be going wrong?!
Short Answer: Use match instead of term.
Long Answer:
The important thing to know here is that your search keywords, like: { "locality": "Shivajinagar" } and { "city": "Bangalore" } need to be compared in the same form as they were stored.
In the question, the mapping specifies that "locality" and "city" fields are of type: text. According to the documentation, type: text fields are analyzed by standard analyzer by default.
The default standard analyzer drops most punctuation, breaks up text
into individual words, and lower cases them. For instance, the
standard analyzer would turn the string “Quick Brown Fox!” into the
terms [quick, brown, fox]. This analysis process makes it possible to
search for individual words within a big block of full text.
The term query looks for the exact term in the field’s inverted
index — it doesn’t know anything about the field’s analyzer. This
makes it useful for looking up values in keyword fields, or in numeric
or date fields. When querying full text fields, use the match query
instead, which understands how the field has been analyzed.
So, when you search for "Bangalore" in a term query it looks for "Bangalore" in the city field while the index mapping had ensured that it was stored as "bangalore". That's why you get no matches.
You can find the documentation regarding the exact question here: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
Side Tip: Use the _analyze endpoint to check exactly what a particular analyzer emits on passing the input text.
Documentation for _analyze endpoint: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html

Accurate query confusion on Elasticsearch

Here I simulated batch of mock data:
{
"took": 35,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 9000009,
"max_score": 1,
"hits": [
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7B",
"_score": 1,
"_source": {
"operation_name": "ADD_IFC",
"hlrsn": "51",
"user_name": "boss2",
"business_type": "VoLTE",
"task_id": "a-ec0fe200-6219-46fa-8f9b-d23d3fc367a0#1509946767871",
"response_time": "2017-11-06T05:39:27.871Z",
"imsi": "460082279570892",
"msisdn": "8618882291205",
"content": """2017-11-06 05:39:27,871|User:boss2| id:a-ec0fe200-6219-46fa-8f9b-d23d3fc367a0#1509946767871 |{"HLRSN":"51","operationName":"ADD_IFC","ISDN":"8618882291205"}"""
}
},
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7C",
"_score": 1,
"_source": {
"operation_name": "BAT_RMV_EPSDATA",
"hlrsn": "50",
"user_name": "boss3",
"business_type": "OVERHEAD",
"task_id": "a-6dbf64ee-81e9-4ef4-8b05-664a7fc3f47b#1509946767871",
"response_time": "2017-11-06T05:39:27.871Z",
"imsi": "460050840482507",
"msisdn": "8618178395664",
"content": """2017-11-06 05:39:27,871|User:boss3| id:a-6dbf64ee-81e9-4ef4-8b05-664a7fc3f47b#1509946767871 |{"HLRSN":"50","operationName":"BAT_RMV_EPSDATA","ISDN":"8618178395664"}"""
}
},
...
I want to query data according to a specific task_id :
GET /boss-mock/soap-mock/_search
{
"query": {
"match": {
"task_id": "a-ec0fe200-6219-46fa-8f9b-d23d3fc367a0#1509946767871"
}
}
}
response:
{
"took": 66,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 9000009,
"max_score": 68.65554,
"hits": [
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7B",
"_score": 68.65554,
"_source": {
"operation_name": "ADD_IFC",
"hlrsn": "51",
"user_name": "boss2",
"business_type": "VoLTE",
"task_id": "a-ec0fe200-6219-46fa-8f9b-d23d3fc367a0#1509946767871",
"response_time": "2017-11-06T05:39:27.871Z",
"imsi": "460082279570892",
"msisdn": "8618882291205",
"content": """2017-11-06 05:39:27,871|User:boss2| id:a-ec0fe200-6219-46fa-8f9b-d23d3fc367a0#1509946767871 |{"HLRSN":"51","operationName":"ADD_IFC","ISDN":"8618882291205"}"""
}
},
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7K",
"_score": 20.13632,
"_source": {
"operation_name": "ADD_TPLSUB",
"hlrsn": "53",
"user_name": "boss1",
"business_type": "OVERHEAD",
"task_id": "a-931b0935-a0d4-46fa-b403-7c1075a1d7a7#1509946767871",
"response_time": "2017-11-06T05:39:27.871Z",
"imsi": "",
"msisdn": "8618509192307",
"content": """2017-11-06 05:39:27,871|User:boss1| id:a-931b0935-a0d4-46fa-b403-7c1075a1d7a7#1509946767871 |{"HLRSN":"53","operationName":"ADD_TPLSUB","ISDN":"8618509192307"}"""
}
},
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P06bepN_SAQa2S9uQ",
"_score": 17.619738,
"_source": {
"operation_name": "DEA_BOICEXHC",
"hlrsn": "52",
"user_name": "boss3",
"business_type": "VOICE",
"task_id": "a-cc771389-8712-46fa-8f9b-0e64e4fc38e6#1509946485051",
"response_time": "2017-11-06T05:34:45.051Z",
"imsi": "",
"msisdn": "8618914540349",
"content": """2017-11-06 05:34:45,051|User:boss3| id:a-cc771389-8712-46fa-8f9b-0e64e4fc38e6#1509946485051 |{"HLRSN":"52","operationName":"DEA_BOICEXHC","ISDN":"8618914540349"}"""
}
},
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15kQpN_SAQa2UP6w",
"_score": 12.451507,
"_source": {
"operation_name": "LST_STNSR",
"hlrsn": "51",
"user_name": "boss1",
"business_type": "",
"task_id": "a-30e82392-8817-48ed-8c3d-f4aee6e6c61d#1509946767871",
"response_time": "2017-11-06T05:39:27.871Z",
"imsi": "",
"msisdn": "8618871203019",
"content": """2017-11-06 05:39:27,871|User:boss1| id:a-30e82392-8817-48ed-8c3d-f4aee6e6c61d#1509946767871 |{"HLRSN":"51","operationName":"LST_STNSR","ISDN":"8618871203019"}"""
}
...
```
It seems that ES returned all the data,but the first piece is what I query.
Then I try to use `term` query:
```
GET /boss-mock/soap-mock/_search
{
"query": {
"term": {
"task_id": "a-ec0fe200-6219-46fa-8f9b-d23d3fc367a0#1509946767871"
}
}
}
But I get nothing:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
However,it works for other field ,which has 'shorter' data length,such as msisdn:
GET /boss-mock/soap-mock/_search
{
"query": {
"term": {
"msisdn": "8618882291205"
}
}
}
response:
{
"took": 35,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 9000009,
"max_score": 1,
"hits": [
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7B",
"_score": 1,
"_source": {
"operation_name": "ADD_IFC",
"hlrsn": "51",
"user_name": "boss2",
"business_type": "VoLTE",
"task_id": "a-ec0fe200-6219-46fa-8f9b-d23d3fc367a0#1509946767871",
"response_time": "2017-11-06T05:39:27.871Z",
"imsi": "460082279570892",
"msisdn": "8618882291205",
"content": """2017-11-06 05:39:27,871|User:boss2| id:a-ec0fe200-6219-46fa-8f9b-d23d3fc367a0#1509946767871 |{"HLRSN":"51","operationName":"ADD_IFC","ISDN":"8618882291205"}"""
}
},
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7C",
"_score": 1,
"_source": {
"operation_name": "BAT_RMV_EPSDATA",
"hlrsn": "50",
"user_name": "boss3",
"business_type": "OVERHEAD",
"task_id": "a-6dbf64ee-81e9-4ef4-8b05-664a7fc3f47b#1509946767871",
"response_time": "2017-11-06T05:39:27.871Z",
"imsi": "460050840482507",
"msisdn": "8618178395664",
"content": """2017-11-06 05:39:27,871|User:boss3| id:a-6dbf64ee-81e9-4ef4-8b05-664a7fc3f47b#1509946767871 |{"HLRSN":"50","operationName":"BAT_RMV_EPSDATA","ISDN":"8618178395664"}"""
}
},
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7J",
"_score": 1,
"_source": {
"operation_name": "MOD_EPS_CONTEXT",
"hlrsn": "52",
"user_name": "boss2",
"business_type": "LTE",
"task_id": "a-b0bed660-3fca-4201-a90c-e4103f6289c5#1509946767871",
"response_time": "2017-11-06T05:39:27.871Z",
"imsi": "460039208697055",
"msisdn": "8618275883802",
"content": """2017-11-06 05:39:27,871|User:boss2| id:a-b0bed660-3fca-4201-a90c-e4103f6289c5#1509946767871 |{"HLRSN":"52","operationName":"MOD_EPS_CONTEXT","ISDN":"8618275883802"}"""
}
},
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7L",
"_score": 1,
"_source": {
"operation_name": "DEA_BAIC",
"hlrsn": "53",
"user_name": "boss3",
"business_type": "VOICE",
"task_id": "a-c5cc2332-9d81-476c-ad0a-0809c23cfe49#1509946767871",
"response_time": "2017-11-06T05:39:27.871Z",
"imsi": "",
"msisdn": "",
"content": """2017-11-06 05:39:27,871|User:boss3| id:a-c5cc2332-9d81-476c-ad0a-0809c23cfe49#1509946767871 |{"HLRSN":"53","operationName":"DEA_BAIC","ISDN":"8618886204829"}"""
}
},
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7O",
"_score": 1,
"_source": {
"operation_name": "LST_SIFC",
"hlrsn": "51",
"user_name": "boss3",
"business_type": "",
"task_id": "a-b0f2a526-8757-4b2c-9011-674cc714fedc#1509946767871",
"response_time": "2017-11-06T05:39:27.871Z",
"imsi": "",
"msisdn": "",
"content": """2017-11-06 05:39:27,871|User:boss3| id:a-b0f2a526-8757-4b2c-9011-674cc714fedc#1509946767871 |{"HLRSN":"51","operationName":"LST_SIFC","ISDN":"8618258093284"}"""
}
},
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7R",
"_score": 1,
"_source": {
"operation_name": "LST_COLR",
"hlrsn": "52",
"user_name": "boss2",
"business_type": "",
"task_id": "a-348463b7-eb49-45e2-bffb-1068e706802b#1509946767872",
"response_time": "2017-11-06T05:39:27.872Z",
"imsi": "",
"msisdn": "8618557891401",
"content": """2017-11-06 05:39:27,872|User:boss2| id:a-348463b7-eb49-45e2-bffb-1068e706802b#1509946767872 |{"HLRSN":"52","operationName":"LST_COLR","ISDN":"8618557891401"}"""
}
},
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7W",
"_score": 1,
"_source": {
"operation_name": "BAT_ADD_TPLSUB",
"hlrsn": "52",
"user_name": "boss2",
"business_type": "OVERHEAD",
"task_id": "a-db3748af-0359-40d3-b5fd-eb09cc53ba56#1509946767872",
"response_time": "2017-11-06T05:39:27.872Z",
"imsi": "460017353100210",
"msisdn": "8618219821848",
"content": """2017-11-06 05:39:27,872|User:boss2| id:a-db3748af-0359-40d3-b5fd-eb09cc53ba56#1509946767872 |{"HLRSN":"52","operationName":"BAT_ADD_TPLSUB","ISDN":"8618219821848"}"""
}
},
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7d",
"_score": 1,
"_source": {
"operation_name": "ACT_BAOC",
"hlrsn": "51",
"user_name": "boss2",
"business_type": "VOICE",
"task_id": "a-80d105e7-138f-4c48-99df-e1b6ea404f43#1509946767872",
"response_time": "2017-11-06T05:39:27.872Z",
"imsi": "",
"msisdn": "",
"content": """2017-11-06 05:39:27,872|User:boss2| id:a-80d105e7-138f-4c48-99df-e1b6ea404f43#1509946767872 |{"HLRSN":"51","operationName":"ACT_BAOC","ISDN":"8618881023802"}"""
}
},
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7f",
"_score": 1,
"_source": {
"operation_name": "SND_CANCELC",
"hlrsn": "53",
"user_name": "boss1",
"business_type": "LOCATION",
"task_id": "a-1a26292d-0f6d-416b-ab3b-47b0c888843f#1509946767872",
"response_time": "2017-11-06T05:39:27.872Z",
"imsi": "",
"msisdn": "8618571785343",
"content": """2017-11-06 05:39:27,872|User:boss1| id:a-1a26292d-0f6d-416b-ab3b-47b0c888843f#1509946767872 |{"HLRSN":"53","operationName":"SND_CANCELC","ISDN":"8618571785343"}"""
}
},
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7g",
"_score": 1,
"_source": {
"operation_name": "MOD_MEDIAID",
"hlrsn": "53",
"user_name": "boss2",
"business_type": "VoLTE",
"task_id": "a-8d2b037b-d346-4b89-9ab7-8f828b1bb783#1509946767872",
"response_time": "2017-11-06T05:39:27.872Z",
"imsi": "",
"msisdn": "",
"content": """2017-11-06 05:39:27,872|User:boss2| id:a-8d2b037b-d346-4b89-9ab7-8f828b1bb783#1509946767872 |{"HLRSN":"53","operationName":"MOD_MEDIAID","ISDN":"8618458567583"}"""
}
}
]
}
}
So,what's going on here?Can't I just query on task_id?
By the way,I have background for SQL.
I need to query data like:
select * from table where task_id = ?
mapping:
```
{
"boss-mock": {
"mappings": {
"soap-mock": {
"properties": {
"business_type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"content": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"hlrsn": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"imsi": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"msisdn": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"operation_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"response_time": {
"type": "date"
},
"task_id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"user_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
```
It's happening because you're executing analyzed query against analyzed field. So let me briefly explain what I mean. Each field with type text is analyzed and stored as a set of tokens just for the sake of full text search functionality.
You can check how analyzer will process your data by sending
POST http://localhost:9200/_analyze HTTP/1.1
Content-type: application/json
{
"tokenizer": "standard",
"text": "a-ec0fe200-6219-46fa-8f9b-d23d3fc367a0#1509946767871"
}
Response will inform you that ES would index a list of following tokens
["a", "ec0fe200", "6219", "46fa", "8f9b", "d23d3fc367a0", "1509946767871"]
Same happens to your match which query is analyzed query so in fact you're querying
["a","ec0fe200","6219","46fa","8f9b","d23d3fc367a0","1509946767871"]
against
["a","ec0fe200","6219","46fa","8f9b","d23d3fc367a0","1509946767871"]
["a","931b0935","a0d4","46fa","b403","7c1075a1d7a7","1509946767871"]
["a","cc771389","8712","46fa","8f9b","0e64e4fc38e6","1509946485051"]
["a","30e82392","8817","48ed","8c3d","f4aee6e6c61d","1509946767871"]
What is more, match query's default operator is OR so you'll get a result if at least one token from your query matches with indexed ones ("46fa", "1509946767871" ...).
Then you tried term query which is not analyzed so the problem is quite opposite. You were trying to match
"a-ec0fe200-6219-46fa-8f9b-d23d3fc367a0#1509946767871"
as one string against the same lists of tokens. That's why you get empty results.
So the short answer for your problem is that if you want to have something similar to where from sql, you shouldn't use analyzed fields for these properties or maintain both analyzed and non-analyzed fields.
To solve this issue you should drop your index, define mapping statically like below, index your data again and then use terms query to match entire strings. Keyword data type would be the most relevant one here. You can read more here
PUT http://localhost:9200/boss_mock
Content-type: application/json
{
"mappings": {
"soap-mock": {
"properties": {
"task_id": {
"type": "keyword"
},
"msisdn": {
"type": "keyword"
}
//whatever else you need
}
}
}
}
Please note that you don't have to define mapping statically for all your properties (others will be added dynamically as texts).

How to sort by match prioritising the most left words matched

How to sort by match prioritising the most left words matched
Explanation
Sort the prefix query by the word it matches, but prioritising the matches in the words more at left.
Tests I've made
Data
DELETE /test
PUT /test
PUT /test/person/_mapping
{
"properties": {
"name": {
"type": "multi_field",
"fields": {
"name": {"type": "string"},
"original": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
PUT /test/person/1
{"name": "Berta Kassulke"}
PUT /test/person/2
{"name": "Kaley Bartoletti"}
PUT /test/person/3
{"name": "Kali Hahn"}
PUT /test/person/4
{"name": "Karolann Klein"}
PUT /test/person/5
{"name": "Sofia Mandez Kaloo"}
The mapping was added for the 'sort on original value' test.
Simple query
Query
POST /test/person/_search
{
"query": {
"prefix": {"name": {"value": "ka"}}
}
}
Result
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 1,
"hits": [
{
"_index": "test",
"_type": "person",
"_id": "4",
"_score": 1,
"_source": {
"name": "Karolann Klein"
}
},
{
"_index": "test",
"_type": "person",
"_id": "5",
"_score": 1,
"_source": {
"name": "Sofia Mandez Kaloo"
}
},
{
"_index": "test",
"_type": "person",
"_id": "1",
"_score": 1,
"_source": {
"name": "Berta Kassulke"
}
},
{
"_index": "test",
"_type": "person",
"_id": "2",
"_score": 1,
"_source": {
"name": "Kaley Bartoletti"
}
},
{
"_index": "test",
"_type": "person",
"_id": "3",
"_score": 1,
"_source": {
"name": "Kali Hahn"
}
}
]
}
}
With sorting
Request
POST /test/person/_search
{
"query": {
"prefix": {"name": {"value": "ka"}}
},
"sort": {"name": {"order": "asc"}}
}
Result
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": null,
"hits": [
{
"_index": "test",
"_type": "person",
"_id": "2",
"_score": null,
"_source": {
"name": "Kaley Bartoletti"
},
"sort": [
"bartoletti"
]
},
{
"_index": "test",
"_type": "person",
"_id": "1",
"_score": null,
"_source": {
"name": "Berta Kassulke"
},
"sort": [
"berta"
]
},
{
"_index": "test",
"_type": "person",
"_id": "3",
"_score": null,
"_source": {
"name": "Kali Hahn"
},
"sort": [
"hahn"
]
},
{
"_index": "test",
"_type": "person",
"_id": "5",
"_score": null,
"_source": {
"name": "Sofia Mandez Kaloo"
},
"sort": [
"kaloo"
]
},
{
"_index": "test",
"_type": "person",
"_id": "4",
"_score": null,
"_source": {
"name": "Karolann Klein"
},
"sort": [
"karolann"
]
}
]
}
}
With sort on original value
Query
POST /test/person/_search
{
"query": {
"prefix": {"name": {"value": "ka"}}
},
"sort": {"name.original": {"order": "asc"}}
}
Result
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": null,
"hits": [
{
"_index": "test",
"_type": "person",
"_id": "1",
"_score": null,
"_source": {
"name": "Berta Kassulke"
},
"sort": [
"Berta Kassulke"
]
},
{
"_index": "test",
"_type": "person",
"_id": "2",
"_score": null,
"_source": {
"name": "Kaley Bartoletti"
},
"sort": [
"Kaley Bartoletti"
]
},
{
"_index": "test",
"_type": "person",
"_id": "3",
"_score": null,
"_source": {
"name": "Kali Hahn"
},
"sort": [
"Kali Hahn"
]
},
{
"_index": "test",
"_type": "person",
"_id": "4",
"_score": null,
"_source": {
"name": "Karolann Klein"
},
"sort": [
"Karolann Klein"
]
},
{
"_index": "test",
"_type": "person",
"_id": "5",
"_score": null,
"_source": {
"name": "Sofia Mandez Kaloo"
},
"sort": [
"Sofia Mandez Kaloo"
]
}
]
}
}
Intended result
Sorted by name ASC but prioritising the matches on the most left words
Kaley Bartoletti
Kali Hahn
Karolann Klein
Berta Kassulke
Sofia Mandez Kaloo
Good Question. One way to achieve this would be with the combination of edge ngram filter and span first query
This is my setting
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"tokenizer": "standard",
"filter": ["lowercase",
"edge_filter",
"asciifolding"
]
}
},
"filter": {
"edge_filter": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 8
}
}
}
},
"mappings": {
"person": {
"properties": {
"name": {
"type": "string",
"analyzer": "my_custom_analyzer",
"search_analyzer": "standard",
"fields": {
"standard": {
"type": "string"
}
}
}
}
}
}
}
After that I inserted your sample documents. Then I wrote the following query with dis_max. Notice that end parameter for first span query is 1 so this will prioritize(higher score) leftmost match. I am first sorting by score and then by name.
{
"query": {
"dis_max": {
"tie_breaker": 0.7,
"boost": 1.2,
"queries": [
{
"match": {
"name": "ka"
}
},
{
"span_first": {
"match": {
"span_term": {
"name": "ka"
}
},
"end": 1
}
},
{
"span_first": {
"match": {
"span_term": {
"name": "ka"
}
},
"end": 2
}
}
]
}
},
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"name.standard": {
"order": "asc"
}
}
]
}
The result I get
"hits": [
{
"_index": "esedge",
"_type": "policy_data",
"_id": "2",
"_score": 0.72272325,
"_source": {
"name": "Kaley Bartoletti"
},
"sort": [
0.72272325,
"bartoletti"
]
},
{
"_index": "esedge",
"_type": "policy_data",
"_id": "3",
"_score": 0.72272325,
"_source": {
"name": "Kali Hahn"
},
"sort": [
0.72272325,
"hahn"
]
},
{
"_index": "esedge",
"_type": "policy_data",
"_id": "4",
"_score": 0.72272325,
"_source": {
"name": "Karolann Klein"
},
"sort": [
0.72272325,
"karolann"
]
},
{
"_index": "esedge",
"_type": "policy_data",
"_id": "1",
"_score": 0.54295504,
"_source": {
"name": "Berta Kassulke"
},
"sort": [
0.54295504,
"berta"
]
},
{
"_index": "esedge",
"_type": "policy_data",
"_id": "5",
"_score": 0.2905494,
"_source": {
"name": "Sofia Mandez Kaloo"
},
"sort": [
0.2905494,
"kaloo"
]
}
]
I hope this helps.

Resources