Accurate query confusion on Elasticsearch

Accurate query confusion on Elasticsearch - elasticsearch

Here I simulated batch of mock data:
{
"took": 35,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 9000009,
"max_score": 1,
"hits": [
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7B",
"_score": 1,
"_source": {
"operation_name": "ADD_IFC",
"hlrsn": "51",
"user_name": "boss2",
"business_type": "VoLTE",
"task_id": "a-ec0fe200-6219-46fa-8f9b-d23d3fc367a0#1509946767871",
"response_time": "2017-11-06T05:39:27.871Z",
"imsi": "460082279570892",
"msisdn": "8618882291205",
"content": """2017-11-06 05:39:27,871|User:boss2| id:a-ec0fe200-6219-46fa-8f9b-d23d3fc367a0#1509946767871 |{"HLRSN":"51","operationName":"ADD_IFC","ISDN":"8618882291205"}"""
}
},
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7C",
"_score": 1,
"_source": {
"operation_name": "BAT_RMV_EPSDATA",
"hlrsn": "50",
"user_name": "boss3",
"business_type": "OVERHEAD",
"task_id": "a-6dbf64ee-81e9-4ef4-8b05-664a7fc3f47b#1509946767871",
"response_time": "2017-11-06T05:39:27.871Z",
"imsi": "460050840482507",
"msisdn": "8618178395664",
"content": """2017-11-06 05:39:27,871|User:boss3| id:a-6dbf64ee-81e9-4ef4-8b05-664a7fc3f47b#1509946767871 |{"HLRSN":"50","operationName":"BAT_RMV_EPSDATA","ISDN":"8618178395664"}"""
}
},
...
I want to query data according to a specific task_id :
GET /boss-mock/soap-mock/_search
{
"query": {
"match": {
"task_id": "a-ec0fe200-6219-46fa-8f9b-d23d3fc367a0#1509946767871"
}
}
}
response:
{
"took": 66,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 9000009,
"max_score": 68.65554,
"hits": [
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7B",
"_score": 68.65554,
"_source": {
"operation_name": "ADD_IFC",
"hlrsn": "51",
"user_name": "boss2",
"business_type": "VoLTE",
"task_id": "a-ec0fe200-6219-46fa-8f9b-d23d3fc367a0#1509946767871",
"response_time": "2017-11-06T05:39:27.871Z",
"imsi": "460082279570892",
"msisdn": "8618882291205",
"content": """2017-11-06 05:39:27,871|User:boss2| id:a-ec0fe200-6219-46fa-8f9b-d23d3fc367a0#1509946767871 |{"HLRSN":"51","operationName":"ADD_IFC","ISDN":"8618882291205"}"""
}
},
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7K",
"_score": 20.13632,
"_source": {
"operation_name": "ADD_TPLSUB",
"hlrsn": "53",
"user_name": "boss1",
"business_type": "OVERHEAD",
"task_id": "a-931b0935-a0d4-46fa-b403-7c1075a1d7a7#1509946767871",
"response_time": "2017-11-06T05:39:27.871Z",
"imsi": "",
"msisdn": "8618509192307",
"content": """2017-11-06 05:39:27,871|User:boss1| id:a-931b0935-a0d4-46fa-b403-7c1075a1d7a7#1509946767871 |{"HLRSN":"53","operationName":"ADD_TPLSUB","ISDN":"8618509192307"}"""
}
},
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P06bepN_SAQa2S9uQ",
"_score": 17.619738,
"_source": {
"operation_name": "DEA_BOICEXHC",
"hlrsn": "52",
"user_name": "boss3",
"business_type": "VOICE",
"task_id": "a-cc771389-8712-46fa-8f9b-0e64e4fc38e6#1509946485051",
"response_time": "2017-11-06T05:34:45.051Z",
"imsi": "",
"msisdn": "8618914540349",
"content": """2017-11-06 05:34:45,051|User:boss3| id:a-cc771389-8712-46fa-8f9b-0e64e4fc38e6#1509946485051 |{"HLRSN":"52","operationName":"DEA_BOICEXHC","ISDN":"8618914540349"}"""
}
},
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15kQpN_SAQa2UP6w",
"_score": 12.451507,
"_source": {
"operation_name": "LST_STNSR",
"hlrsn": "51",
"user_name": "boss1",
"business_type": "",
"task_id": "a-30e82392-8817-48ed-8c3d-f4aee6e6c61d#1509946767871",
"response_time": "2017-11-06T05:39:27.871Z",
"imsi": "",
"msisdn": "8618871203019",
"content": """2017-11-06 05:39:27,871|User:boss1| id:a-30e82392-8817-48ed-8c3d-f4aee6e6c61d#1509946767871 |{"HLRSN":"51","operationName":"LST_STNSR","ISDN":"8618871203019"}"""
}
...
```
It seems that ES returned all the data,but the first piece is what I query.
Then I try to use `term` query:
```
GET /boss-mock/soap-mock/_search
{
"query": {
"term": {
"task_id": "a-ec0fe200-6219-46fa-8f9b-d23d3fc367a0#1509946767871"
}
}
}
But I get nothing:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
However,it works for other field ,which has 'shorter' data length,such as msisdn:
GET /boss-mock/soap-mock/_search
{
"query": {
"term": {
"msisdn": "8618882291205"
}
}
}
response:
{
"took": 35,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 9000009,
"max_score": 1,
"hits": [
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7B",
"_score": 1,
"_source": {
"operation_name": "ADD_IFC",
"hlrsn": "51",
"user_name": "boss2",
"business_type": "VoLTE",
"task_id": "a-ec0fe200-6219-46fa-8f9b-d23d3fc367a0#1509946767871",
"response_time": "2017-11-06T05:39:27.871Z",
"imsi": "460082279570892",
"msisdn": "8618882291205",
"content": """2017-11-06 05:39:27,871|User:boss2| id:a-ec0fe200-6219-46fa-8f9b-d23d3fc367a0#1509946767871 |{"HLRSN":"51","operationName":"ADD_IFC","ISDN":"8618882291205"}"""
}
},
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7C",
"_score": 1,
"_source": {
"operation_name": "BAT_RMV_EPSDATA",
"hlrsn": "50",
"user_name": "boss3",
"business_type": "OVERHEAD",
"task_id": "a-6dbf64ee-81e9-4ef4-8b05-664a7fc3f47b#1509946767871",
"response_time": "2017-11-06T05:39:27.871Z",
"imsi": "460050840482507",
"msisdn": "8618178395664",
"content": """2017-11-06 05:39:27,871|User:boss3| id:a-6dbf64ee-81e9-4ef4-8b05-664a7fc3f47b#1509946767871 |{"HLRSN":"50","operationName":"BAT_RMV_EPSDATA","ISDN":"8618178395664"}"""
}
},
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7J",
"_score": 1,
"_source": {
"operation_name": "MOD_EPS_CONTEXT",
"hlrsn": "52",
"user_name": "boss2",
"business_type": "LTE",
"task_id": "a-b0bed660-3fca-4201-a90c-e4103f6289c5#1509946767871",
"response_time": "2017-11-06T05:39:27.871Z",
"imsi": "460039208697055",
"msisdn": "8618275883802",
"content": """2017-11-06 05:39:27,871|User:boss2| id:a-b0bed660-3fca-4201-a90c-e4103f6289c5#1509946767871 |{"HLRSN":"52","operationName":"MOD_EPS_CONTEXT","ISDN":"8618275883802"}"""
}
},
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7L",
"_score": 1,
"_source": {
"operation_name": "DEA_BAIC",
"hlrsn": "53",
"user_name": "boss3",
"business_type": "VOICE",
"task_id": "a-c5cc2332-9d81-476c-ad0a-0809c23cfe49#1509946767871",
"response_time": "2017-11-06T05:39:27.871Z",
"imsi": "",
"msisdn": "",
"content": """2017-11-06 05:39:27,871|User:boss3| id:a-c5cc2332-9d81-476c-ad0a-0809c23cfe49#1509946767871 |{"HLRSN":"53","operationName":"DEA_BAIC","ISDN":"8618886204829"}"""
}
},
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7O",
"_score": 1,
"_source": {
"operation_name": "LST_SIFC",
"hlrsn": "51",
"user_name": "boss3",
"business_type": "",
"task_id": "a-b0f2a526-8757-4b2c-9011-674cc714fedc#1509946767871",
"response_time": "2017-11-06T05:39:27.871Z",
"imsi": "",
"msisdn": "",
"content": """2017-11-06 05:39:27,871|User:boss3| id:a-b0f2a526-8757-4b2c-9011-674cc714fedc#1509946767871 |{"HLRSN":"51","operationName":"LST_SIFC","ISDN":"8618258093284"}"""
}
},
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7R",
"_score": 1,
"_source": {
"operation_name": "LST_COLR",
"hlrsn": "52",
"user_name": "boss2",
"business_type": "",
"task_id": "a-348463b7-eb49-45e2-bffb-1068e706802b#1509946767872",
"response_time": "2017-11-06T05:39:27.872Z",
"imsi": "",
"msisdn": "8618557891401",
"content": """2017-11-06 05:39:27,872|User:boss2| id:a-348463b7-eb49-45e2-bffb-1068e706802b#1509946767872 |{"HLRSN":"52","operationName":"LST_COLR","ISDN":"8618557891401"}"""
}
},
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7W",
"_score": 1,
"_source": {
"operation_name": "BAT_ADD_TPLSUB",
"hlrsn": "52",
"user_name": "boss2",
"business_type": "OVERHEAD",
"task_id": "a-db3748af-0359-40d3-b5fd-eb09cc53ba56#1509946767872",
"response_time": "2017-11-06T05:39:27.872Z",
"imsi": "460017353100210",
"msisdn": "8618219821848",
"content": """2017-11-06 05:39:27,872|User:boss2| id:a-db3748af-0359-40d3-b5fd-eb09cc53ba56#1509946767872 |{"HLRSN":"52","operationName":"BAT_ADD_TPLSUB","ISDN":"8618219821848"}"""
}
},
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7d",
"_score": 1,
"_source": {
"operation_name": "ACT_BAOC",
"hlrsn": "51",
"user_name": "boss2",
"business_type": "VOICE",
"task_id": "a-80d105e7-138f-4c48-99df-e1b6ea404f43#1509946767872",
"response_time": "2017-11-06T05:39:27.872Z",
"imsi": "",
"msisdn": "",
"content": """2017-11-06 05:39:27,872|User:boss2| id:a-80d105e7-138f-4c48-99df-e1b6ea404f43#1509946767872 |{"HLRSN":"51","operationName":"ACT_BAOC","ISDN":"8618881023802"}"""
}
},
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7f",
"_score": 1,
"_source": {
"operation_name": "SND_CANCELC",
"hlrsn": "53",
"user_name": "boss1",
"business_type": "LOCATION",
"task_id": "a-1a26292d-0f6d-416b-ab3b-47b0c888843f#1509946767872",
"response_time": "2017-11-06T05:39:27.872Z",
"imsi": "",
"msisdn": "8618571785343",
"content": """2017-11-06 05:39:27,872|User:boss1| id:a-1a26292d-0f6d-416b-ab3b-47b0c888843f#1509946767872 |{"HLRSN":"53","operationName":"SND_CANCELC","ISDN":"8618571785343"}"""
}
},
{
"_index": "boss-mock",
"_type": "soap-mock",
"_id": "AV-P15lDpN_SAQa2UP7g",
"_score": 1,
"_source": {
"operation_name": "MOD_MEDIAID",
"hlrsn": "53",
"user_name": "boss2",
"business_type": "VoLTE",
"task_id": "a-8d2b037b-d346-4b89-9ab7-8f828b1bb783#1509946767872",
"response_time": "2017-11-06T05:39:27.872Z",
"imsi": "",
"msisdn": "",
"content": """2017-11-06 05:39:27,872|User:boss2| id:a-8d2b037b-d346-4b89-9ab7-8f828b1bb783#1509946767872 |{"HLRSN":"53","operationName":"MOD_MEDIAID","ISDN":"8618458567583"}"""
}
}
]
}
}
So,what's going on here?Can't I just query on task_id?
By the way,I have background for SQL.
I need to query data like:
select * from table where task_id = ?
mapping:
```
{
"boss-mock": {
"mappings": {
"soap-mock": {
"properties": {
"business_type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"content": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"hlrsn": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"imsi": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"msisdn": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"operation_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"response_time": {
"type": "date"
},
"task_id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"user_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
```

It's happening because you're executing analyzed query against analyzed field. So let me briefly explain what I mean. Each field with type text is analyzed and stored as a set of tokens just for the sake of full text search functionality.
You can check how analyzer will process your data by sending
POST http://localhost:9200/_analyze HTTP/1.1
Content-type: application/json
{
"tokenizer": "standard",
"text": "a-ec0fe200-6219-46fa-8f9b-d23d3fc367a0#1509946767871"
}
Response will inform you that ES would index a list of following tokens
["a", "ec0fe200", "6219", "46fa", "8f9b", "d23d3fc367a0", "1509946767871"]
Same happens to your match which query is analyzed query so in fact you're querying
["a","ec0fe200","6219","46fa","8f9b","d23d3fc367a0","1509946767871"]
against
["a","ec0fe200","6219","46fa","8f9b","d23d3fc367a0","1509946767871"]
["a","931b0935","a0d4","46fa","b403","7c1075a1d7a7","1509946767871"]
["a","cc771389","8712","46fa","8f9b","0e64e4fc38e6","1509946485051"]
["a","30e82392","8817","48ed","8c3d","f4aee6e6c61d","1509946767871"]
What is more, match query's default operator is OR so you'll get a result if at least one token from your query matches with indexed ones ("46fa", "1509946767871" ...).
Then you tried term query which is not analyzed so the problem is quite opposite. You were trying to match
"a-ec0fe200-6219-46fa-8f9b-d23d3fc367a0#1509946767871"
as one string against the same lists of tokens. That's why you get empty results.
So the short answer for your problem is that if you want to have something similar to where from sql, you shouldn't use analyzed fields for these properties or maintain both analyzed and non-analyzed fields.
To solve this issue you should drop your index, define mapping statically like below, index your data again and then use terms query to match entire strings. Keyword data type would be the most relevant one here. You can read more here
PUT http://localhost:9200/boss_mock
Content-type: application/json
{
"mappings": {
"soap-mock": {
"properties": {
"task_id": {
"type": "keyword"
},
"msisdn": {
"type": "keyword"
}
//whatever else you need
}
}
}
}
Please note that you don't have to define mapping statically for all your properties (others will be added dynamically as texts).

Related

How to filter an aggregation based on the values of top hits?

The mapping of the elastic search documents are as follows:
{
"mappings": {
"properties": {
"vote_id": {
"type": "keyword"
},
"user_id": {
"type": "text"
},
"song_id": {
"type": "text"
},
"type": {
"type": "byte"
},
"timestamp": {
"type": "date"
}
}
}
}
I want to aggregate these votes such that it returns songs that you AND your friends like.
So far, I have buckets of songs that you and your friends like, but some buckets may be songs that only your friends like.
{
"query": {
"bool": {
"must": {
"terms": {
"user_id": ["you and your friend ids"]
}
}
}
},
"aggs": {
"songs": {
"terms": {
"field": "song_id"
},
"aggs": {
"docs": {
"top_hits": {
"size": "length of you and your friends",
"_source": ["vote_id", "song_id", "user_id", "type", "timestamp"]
}
},
"more_than_one": {
"bucket_selector": {
"buckets_path": {
"count": "_count"
},
"script": "params.count > 1"
}
}
},
}
}
}
I want to filter the buckets such that at least one of the documents in the top hits has your user id.
This is the current response
"aggregations": {
"songs": {
"buckets": [
{
"doc_count": 5,
"docs": {
"hits": {
"hits": [
{
"_id": "ivotNtFBb9TCEfpk3S54q6gcMbjZB82Xc1_ZCgA6kYsUmvk",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isngVOkuaMqJTu6eQDj73gDYGObUus3g5Qp8",
"timestamp": "2022-07-08T19:41:56.118207343Z",
"type": 1,
"user_id": "iusr8keSbPjX9ZqFhX4Dei4G",
"vote_id": "ivotNtFBb9TCEfpk3S54q6gcMbjZB82Xc1_ZCgA6kYsUmvk"
},
"_type": "_doc"
},
{
"_id": "ivotEFcqOlCL5htJZJ43NslAP555DaPj0Dgkcay_Ml2jAT4",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isngVOkuaMqJTu6eQDj73gDYGObUus3g5Qp8",
"timestamp": "2022-07-08T19:41:52.143988883Z",
"type": 1,
"user_id": "iusr18wnuxsy8oVVK3Xic4Sy",
"vote_id": "ivotEFcqOlCL5htJZJ43NslAP555DaPj0Dgkcay_Ml2jAT4"
},
"_type": "_doc"
},
{
"_id": "ivotToZ-0iBiM_zF5TP1Shj5C29WV3U0ibedlxvcQccimeo",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isngVOkuaMqJTu6eQDj73gDYGObUus3g5Qp8",
"timestamp": "2022-07-08T19:41:50.178450007Z",
"type": 1,
"user_id": "iusrKDxnm75fADEpusbmx5JM",
"vote_id": "ivotToZ-0iBiM_zF5TP1Shj5C29WV3U0ibedlxvcQccimeo"
},
"_type": "_doc"
},
{
"_id": "ivotAHBPual232E12ggibhr6GfQ5E3f9Ryov0gYKGrIRB0Y",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isngVOkuaMqJTu6eQDj73gDYGObUus3g5Qp8",
"timestamp": "2022-07-08T19:41:52.886305925Z",
"type": 1,
"user_id": "iusrJG4GwkWa6Y70LPkuNCPg",
"vote_id": "ivotAHBPual232E12ggibhr6GfQ5E3f9Ryov0gYKGrIRB0Y"
},
"_type": "_doc"
},
{
"_id": "ivot7a8rWunlFu_q5St44PYDeNelLq4bsxr9wzYP9D80wxE",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isngVOkuaMqJTu6eQDj73gDYGObUus3g5Qp8",
"timestamp": "2022-07-08T19:41:49.031694548Z",
"type": 1,
"user_id": "iusrxunBXT1UD0IrvjqjgWaj",
"vote_id": "ivot7a8rWunlFu_q5St44PYDeNelLq4bsxr9wzYP9D80wxE"
},
"_type": "_doc"
}
],
"max_score": 1,
"total": {
"relation": "eq",
"value": 5
}
}
},
"key": "isngVOkuaMqJTu6eQDj73gDYGObUus3g5Qp8"
},
{
"doc_count": 4,
"docs": {
"hits": {
"hits": [
{
"_id": "ivot9_2eQ_3eqU7SXQBnLWGQwFI5DE99Naf8wYbFNFrj1lk",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isng4hKgRPQvH0YhtBy5GaUqgCdwDoVhJuf6",
"timestamp": "2022-07-08T19:41:55.761587927Z",
"type": 1,
"user_id": "iusr8keSbPjX9ZqFhX4Dei4G",
"vote_id": "ivot9_2eQ_3eqU7SXQBnLWGQwFI5DE99Naf8wYbFNFrj1lk"
},
"_type": "_doc"
},
{
"_id": "ivotUZRVSKzGbmlP4LlmBkMwMM8xcR4nGTE9KNpysVR0vXQ",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isng4hKgRPQvH0YhtBy5GaUqgCdwDoVhJuf6",
"timestamp": "2022-07-08T19:41:52.555377592Z",
"type": 1,
"user_id": "iusr18wnuxsy8oVVK3Xic4Sy",
"vote_id": "ivotUZRVSKzGbmlP4LlmBkMwMM8xcR4nGTE9KNpysVR0vXQ"
},
"_type": "_doc"
},
{
"_id": "ivot5Wj8pIkbO0JOV_5s2PqEvZU3sy0WSYYUSlgs2Qizfo8",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isng4hKgRPQvH0YhtBy5GaUqgCdwDoVhJuf6",
"timestamp": "2022-07-08T19:41:49.756332674Z",
"type": 1,
"user_id": "iusrKDxnm75fADEpusbmx5JM",
"vote_id": "ivot5Wj8pIkbO0JOV_5s2PqEvZU3sy0WSYYUSlgs2Qizfo8"
},
"_type": "_doc"
},
{
"_id": "ivot8QNCJGsNtRiZYa-QMTUHEh5MHHHr4EKJsXm4UTAwJkg",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isng4hKgRPQvH0YhtBy5GaUqgCdwDoVhJuf6",
"timestamp": "2022-07-08T19:41:53.26319105Z",
"type": 1,
"user_id": "iusrJG4GwkWa6Y70LPkuNCPg",
"vote_id": "ivot8QNCJGsNtRiZYa-QMTUHEh5MHHHr4EKJsXm4UTAwJkg"
},
"_type": "_doc"
}
],
"max_score": 1,
"total": {
"relation": "eq",
"value": 4
}
}
},
"key": "isng4hKgRPQvH0YhtBy5GaUqgCdwDoVhJuf6"
},
{
"doc_count": 3,
"docs": {
"hits": {
"hits": [
{
"_id": "ivotoCfJc_q5vuY27KmvZUo8s4tilI57_xJoPXqfSeJTikg",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isngHAVCux40BgjGVZuAwZlTiEjVQFxuxurc",
"timestamp": "2022-07-08T19:41:50.527352591Z",
"type": 1,
"user_id": "iusrL8FCabxg1YCeaakcVXG5",
"vote_id": "ivotoCfJc_q5vuY27KmvZUo8s4tilI57_xJoPXqfSeJTikg"
},
"_type": "_doc"
},
{
"_id": "ivotStjKDIRy6vfaO5dNws4wGPELywPRgg7D7uSavatfIEo",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isngHAVCux40BgjGVZuAwZlTiEjVQFxuxurc",
"timestamp": "2022-07-08T19:41:51.733375716Z",
"type": 1,
"user_id": "iusr18wnuxsy8oVVK3Xic4Sy",
"vote_id": "ivotStjKDIRy6vfaO5dNws4wGPELywPRgg7D7uSavatfIEo"
},
"_type": "_doc"
},
{
"_id": "ivotUHj_Ebh-xIqqPJNEdWAuc_JO_mcVVG8F9wM67bJ7_6A",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isngHAVCux40BgjGVZuAwZlTiEjVQFxuxurc",
"timestamp": "2022-07-08T19:41:48.60900159Z",
"type": 1,
"user_id": "iusrxunBXT1UD0IrvjqjgWaj",
"vote_id": "ivotUHj_Ebh-xIqqPJNEdWAuc_JO_mcVVG8F9wM67bJ7_6A"
},
"_type": "_doc"
}
],
"max_score": 1,
"total": {
"relation": "eq",
"value": 3
}
}
},
"key": "isngHAVCux40BgjGVZuAwZlTiEjVQFxuxurc"
},
{
"doc_count": 3,
"docs": {
"hits": {
"hits": [
{
"_id": "ivotE5xO9hZGLhrS2sL1mgf5UbcHtf_5qAwbrp3QwEtf4zk",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isngOMnUKFVT1cKsH6Q9JfpF3WEZ4H4iU75W",
"timestamp": "2022-07-08T19:41:54.99023451Z",
"type": 1,
"user_id": "iusre80pxIMFB XfF61SHlCiz",
"vote_id": "ivotE5xO9hZGLhrS2sL1mgf5UbcHtf_5qAwbrp3QwEtf4zk"
},
"_type": "_doc"
},
{
"_id": "ivotI0OCI6gz6oEV94hgvjZmGB-qA4n-EigirDRJpgeZt68",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isngOMnUKFVT1cKsH6Q9JfpF3WEZ4H4iU75W",
"timestamp": "2022-07-08T19:41:50.952366924Z",
"type": 1,
"user_id": "iusr3oWy3mxsBWu6CU4mlw5L",
"vote_id": "ivotI0OCI6gz6oEV94hgvjZmGB-qA4n-EigirDRJpgeZt68"
},
"_type": "_doc"
},
{
"_id": "ivotm7GrIeyWRHamPXF9klzTZ0La8H4evCgWkCTIpx8rLl4",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isngOMnUKFVT1cKsH6Q9JfpF3WEZ4H4iU75W",
"timestamp": "2022-07-08T19:41:48.234881506Z",
"type": 1,
"user_id": "iusrCbCltg4nzv0b2JfUbyhj",
"vote_id": "ivotm7GrIeyWRHamPXF9klzTZ0La8H4evCgWkCTIpx8rLl4"
},
"_type": "_doc"
}
],
"max_score": 1,
"total": {
"relation": "eq",
"value": 3
}
}
},
"key": "isngOMnUKFVT1cKsH6Q9JfpF3WEZ4H4iU75W"
},
{
"doc_count": 2,
"docs": {
"hits": {
"hits": [
{
"_id": "ivotGWwrgPehs9s7ZwZACzkVNp4-_SUaUu3noUKyBH8IBnw",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isngfMv4IemhtjXX78LTqxKFBc1VMeUz6And",
"timestamp": "2022-07-08T19:41:53.785333717Z",
"type": 1,
"user_id": "iusrYvNFTaTg4RBNBxG63nkY",
"vote_id": "ivotGWwrgPehs9s7ZwZACzkVNp4-_SUaUu3noUKyBH8IBnw"
},
"_type": "_doc"
},
{
"_id": "ivotrWG5cy6vEbe0N4JO4IKzHZyahOlkyPctCdrBnBu-v9Q",
"_index": "votes-index",
"_score": 1,
"_source": {
"song_id": "isngfMv4IemhtjXX78LTqxKFBc1VMeUz6And",
"timestamp": "2022-07-08T19:41:51.303745591Z",
"type": 1,
"user_id": "iusr18wnuxsy8oVVK3Xic4Sy",
"vote_id": "ivotrWG5cy6vEbe0N4JO4IKzHZyahOlkyPctCdrBnBu-v9Q"
},
"_type": "_doc"
}
],
"max_score": 1,
"total": {
"relation": "eq",
"value": 2
}
}
},
"key": "isngfMv4IemhtjXX78LTqxKFBc1VMeUz6And"
}
],
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0
}
}
I want to filter out the aggregate hits that have documents with your own id

Edge n-gram suggestions and 'starts with' keyword in Elasticsearch

I'm trying to build a food search engine on Elasticsearch that should meet following use cases -
If the user searches for 'coff' then it should return all the documents with phrase 'coffee' in their name and the priority should be for food items that have 'coffee' at the starting of their name.
If the user searches for 'green tea' then it should give priority to the documents that have both the phrases 'green tea' instead of splitting 'green' and 'tea'
If the phrase does not exist in the 'name' then it should also search in the alias field.
To manage the first case, I've used the edge n-grams analyzer.
Mapping -
{
"settings": {
"index": {
"analysis": {
"filter": {},
"analyzer": {
"analyzer_keyword": {
"tokenizer": "standard",
"filter": "lowercase"
},
"edge_ngram_analyzer": {
"filter": [
"lowercase"
],
"tokenizer": "edge_ngram_tokenizer"
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 5,
"token_chars": [
"letter"
]
}
}
}
}
},
"mappings": {
"doc": {
"properties": {
"alias": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"name": {
"type": "text",
"search_analyzer": "analyzer_keyword",
"analyzer": "edge_ngram_analyzer",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
This is the search query that I'm using but it's not exactly returning the relevant search results
{
"query": {
"multi_match": {
"query": "coffee",
"fields": ["name^2", "alias"]
}
}
}
There are over 1500 food items with 'coffee' in their name but the above query is only returning 2
{
"took": 745,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 8.657346,
"hits": [
{
"_index": "food-master",
"_type": "doc",
"_id": "a9uzinABb4g7LgmgoI1I",
"_score": 8.657346,
"_source": {
"id": 17463,
"name": "Rotiboy, coffee bun",
"alias": [
"Mexican Coffee Bun (Rotiboy)",
"Mexican coffee bun"
],
}
},
{
"_index": "food-master",
"_type": "doc",
"_id": "TNuzinABb4g7LgmgoFVI",
"_score": 7.0164866,
"_source": {
"id": 1344,
"name": "Coffee with sugar",
"alias": [
"Heart Friendly",
"Coffee With Sugar",
"Coffee With Milk and Sugar",
"Gluten Free",
"Hypertension Friendly"
],
}
}
]
}
}
In the mapping, if I remove the analyzer_keyword then it returns relevant results but the documents that start with 'coffee' are not prioritized
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1323,
"max_score": 57.561867,
"hits": [
{
"_index": "food-master-new",
"_type": "doc",
"_id": "nduzinABb4g7LgmgoINI",
"_score": 57.561867,
"_source": {
"name": "Egg Coffee",
"alias": [],
"id": 12609
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "dNuzinABb4g7LgmgoFVI",
"_score": 55.811295,
"_source": {
"name": "Coffee (Black)",
"alias": [
"Weight Loss",
"Diabetes Friendly",
"Gluten Free",
"Lactose Free",
"Heart Friendly",
"Hypertension Friendly"
],
"id": 1341
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "NduzinABb4g7LgmgoHxI",
"_score": 54.303185,
"_source": {
"name": "Brewed Coffee",
"alias": [
"StarBucks"
],
"id": 15679
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "ltuzinABb4g7LgmgoJJI",
"_score": 54.303185,
"_source": {
"name": "Coffee - Masala",
"alias": [],
"id": 11329
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "oduzinABb4g7LgmgoGpI",
"_score": 53.171227,
"_source": {
"name": "Coffee, German",
"alias": [],
"id": 12257
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "YNuzinABb4g7LgmgoFRI",
"_score": 52.929176,
"_source": {
"name": "Soy Milk Coffee",
"alias": [
"Gluten Free",
"Lactose Free",
"Weight Loss",
"Diabetes Friendly",
"Heart Friendly",
"Hypertension Friendly"
],
"id": 978
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "8duzinABb4g7LgmgoFRI",
"_score": 52.068523,
"_source": {
"name": "Cold Coffee (Soy Milk)",
"alias": [
"Soy Milk"
],
"id": 1097
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "tNuzinABb4g7LgmgoF9I",
"_score": 50.956154,
"_source": {
"name": "Coffee Frappe",
"alias": [],
"id": 3142
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "ZduzinABb4g7LgmgoF5I",
"_score": 49.810112,
"_source": {
"name": "Big Apple Coffee",
"alias": [],
"id": 3130
}
},
{
"_index": "food-master-new",
"_type": "doc",
"_id": "eduzinABb4g7LgmgoHtI",
"_score": 49.62197,
"_source": {
"name": "Mexican Coffee",
"alias": [],
"id": 13604
}
}
]
}
}
If I change the tokenizer to 'keyword' from 'standard' then I face the same problem and it also splits phrases into individual words - 'green tea' to 'green' and 'tea'
Any suggestions on what I might be getting wrong with respect to analyzers? I've tried all possible combinations but meeting all 3 scenarios with high accuracy is getting a little difficult.

Elasticsearch + Kibana + Alerting (X-Pack) For Energy Monitoring System

Can somebody help me with Alerting Via X-Pack for Energy monitoring system project? The main problem here is I can't collect the 'Value' data from the database, as I want to compare it later with the upper and the lower threshold.
So here is the index:
PUT /test-1
{
"mappings": {
"Test1": {
"properties": {
"Value": {
"type": "integer"
},
"date": {
"type": "date",
"format": "yyyy-MM-dd'T'HH:mm:ss.SSSZ"
},
"UpperThreshold": {
"type": "integer"
},
"LowerThreshold": {
"type": "integer"
}
}
}
}
}
Here is the example of the input:
POST /test-1/Test1
{
"Value": "500",
"date": "2017-06-13T16:20:00.000Z",
"UpperThreshold":"450",
"LowerThreshold": "380"
}
This is my alerting code
{
"trigger": {
"schedule": {
"interval": "10s"
}
},
"input": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
"logs"
],
"types": [],
"body": {
"query": {
"match": {
"message": "error"
}
}
}
}
}
},
"condition": {
"compare": {
"ctx.payload.hits.total": {
"gt": 0
}
}
},
"actions": {
"send_email": {
"email": {
"profile": "standard",
"to": [
"<account#gmail.com>"
],
"subject": "Watcher Notification",
"body": {
"text": "{{ctx.payload.hits.total}} error logs found"
}
}
}
}
}
Here is the response I got from the alerting plugin
{
"watch_id": "Alerting-Test",
"state": "execution_not_needed",
"_status": {
"state": {
"active": true,
"timestamp": "2017-07-26T15:27:35.497Z"
},
"last_checked": "2017-07-26T15:27:38.625Z",
"actions": {
"logging": {
"ack": {
"timestamp": "2017-07-26T15:27:35.497Z",
"state": "awaits_successful_execution"
}
}
}
},
"trigger_event": {
"type": "schedule",
"triggered_time": "2017-07-26T15:27:38.625Z",
"schedule": {
"scheduled_time": "2017-07-26T15:27:38.175Z"
}
},
"input": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
"test-1"
],
"types": [
"Test1"
],
"body": {
"query": {
"match_all": {}
}
}
}
}
},
"condition": {
"compare": {
"ctx.payload.hits.hits.0.Value": {
"gt": 450
}
}
},
"metadata": {
"name": "Alerting-Test"
},
"result": {
"execution_time": "2017-07-26T15:27:38.625Z",
"execution_duration": 0,
"input": {
"type": "search",
"status": "success",
"payload": {
"_shards": {
"total": 5,
"failed": 0,
"successful": 5
},
"hits": {
"hits": [
{
"_index": "test-1",
"_type": "Test1",
"_source": {
"date": "2017-07-22T12:00:00.000Z",
"LowerThreshold": "380",
"Value": "350",
"UpperThreshold": "450"
},
"_id": "AV1-1P3lArbJ1tbnct4e",
"_score": 1
},
{
"_index": "test-1",
"_type": "Test1",
"_source": {
"date": "2017-07-22T18:00:00.000Z",
"LowerThreshold": "380",
"Value": "4100",
"UpperThreshold": "450"
},
"_id": "AV1-1Sq0ArbJ1tbnct4v",
"_score": 1
},
{
"_index": "test-1",
"_type": "Test1",
"_source": {
"date": "2017-07-24T18:00:00.000Z",
"LowerThreshold": "380",
"Value": "450",
"UpperThreshold": "450"
},
"_id": "AV1-1eLJArbJ1tbnct6G",
"_score": 1
},
{
"_index": "test-1",
"_type": "Test1",
"_source": {
"date": "2017-07-23T00:00:00.000Z",
"LowerThreshold": "380",
"Value": "400",
"UpperThreshold": "450"
},
"_id": "AV1-1VUzArbJ1tbnct5A",
"_score": 1
},
{
"_index": "test-1",
"_type": "Test1",
"_source": {
"date": "2017-07-23T12:00:00.000Z",
"LowerThreshold": "380",
"Value": "390",
"UpperThreshold": "450"
},
"_id": "AV1-1X4FArbJ1tbnct5R",
"_score": 1
},
{
"_index": "test-1",
"_type": "Test1",
"_source": {
"date": "2017-07-23T18:00:00.000Z",
"LowerThreshold": "380",
"Value": "390",
"UpperThreshold": "450"
},
"_id": "AV1-1YySArbJ1tbnct5T",
"_score": 1
},
{
"_index": "test-1",
"_type": "Test1",
"_source": {
"date": "2017-07-26T00:00:00.000Z",
"LowerThreshold": "380",
"Value": "4700",
"UpperThreshold": "450"
},
"_id": "AV1-1mflArbJ1tbnct67",
"_score": 1
},
{
"_index": "test-1",
"_type": "Test1",
"_source": {
"date": "2017-07-26T06:00:00.000Z",
"LowerThreshold": "380",
"Value": "390",
"UpperThreshold": "450"
},
"_id": "AV1-1oluArbJ1tbnct7M",
"_score": 1
},
{
"_index": "test-1",
"_type": "Test1",
"_source": {
"date": "2017-07-21T12:00:00.000Z",
"LowerThreshold": "380",
"Value": "400",
"UpperThreshold": "450"
},
"_id": "AV1-1IrZArbJ1tbnct3r",
"_score": 1
},
{
"_index": "test-1",
"_type": "Test1",
"_source": {
"date": "2017-07-21T18:00:00.000Z",
"LowerThreshold": "380",
"Value": "440",
"UpperThreshold": "450"
},
"_id": "AV1-1LwzArbJ1tbnct38",
"_score": 1
}
],
"total": 20,
"max_score": 1
},
"took": 1,
"timed_out": false
},
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
"test-1"
],
"types": [
"Test1"
],
"body": {
"query": {
"match_all": {}
}
}
}
}
},
"condition": {
"type": "compare",
"status": "success",
"met": false,
"compare": {
"resolved_values": {
**"ctx.payload.hits.hits.0.Value": null**
}
}
},
"actions": []
},
"messages": []
}
Really appreciate for your help!!

How to sort by match prioritising the most left words matched

How to sort by match prioritising the most left words matched
Explanation
Sort the prefix query by the word it matches, but prioritising the matches in the words more at left.
Tests I've made
Data
DELETE /test
PUT /test
PUT /test/person/_mapping
{
"properties": {
"name": {
"type": "multi_field",
"fields": {
"name": {"type": "string"},
"original": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
PUT /test/person/1
{"name": "Berta Kassulke"}
PUT /test/person/2
{"name": "Kaley Bartoletti"}
PUT /test/person/3
{"name": "Kali Hahn"}
PUT /test/person/4
{"name": "Karolann Klein"}
PUT /test/person/5
{"name": "Sofia Mandez Kaloo"}
The mapping was added for the 'sort on original value' test.
Simple query
Query
POST /test/person/_search
{
"query": {
"prefix": {"name": {"value": "ka"}}
}
}
Result
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 1,
"hits": [
{
"_index": "test",
"_type": "person",
"_id": "4",
"_score": 1,
"_source": {
"name": "Karolann Klein"
}
},
{
"_index": "test",
"_type": "person",
"_id": "5",
"_score": 1,
"_source": {
"name": "Sofia Mandez Kaloo"
}
},
{
"_index": "test",
"_type": "person",
"_id": "1",
"_score": 1,
"_source": {
"name": "Berta Kassulke"
}
},
{
"_index": "test",
"_type": "person",
"_id": "2",
"_score": 1,
"_source": {
"name": "Kaley Bartoletti"
}
},
{
"_index": "test",
"_type": "person",
"_id": "3",
"_score": 1,
"_source": {
"name": "Kali Hahn"
}
}
]
}
}
With sorting
Request
POST /test/person/_search
{
"query": {
"prefix": {"name": {"value": "ka"}}
},
"sort": {"name": {"order": "asc"}}
}
Result
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": null,
"hits": [
{
"_index": "test",
"_type": "person",
"_id": "2",
"_score": null,
"_source": {
"name": "Kaley Bartoletti"
},
"sort": [
"bartoletti"
]
},
{
"_index": "test",
"_type": "person",
"_id": "1",
"_score": null,
"_source": {
"name": "Berta Kassulke"
},
"sort": [
"berta"
]
},
{
"_index": "test",
"_type": "person",
"_id": "3",
"_score": null,
"_source": {
"name": "Kali Hahn"
},
"sort": [
"hahn"
]
},
{
"_index": "test",
"_type": "person",
"_id": "5",
"_score": null,
"_source": {
"name": "Sofia Mandez Kaloo"
},
"sort": [
"kaloo"
]
},
{
"_index": "test",
"_type": "person",
"_id": "4",
"_score": null,
"_source": {
"name": "Karolann Klein"
},
"sort": [
"karolann"
]
}
]
}
}
With sort on original value
Query
POST /test/person/_search
{
"query": {
"prefix": {"name": {"value": "ka"}}
},
"sort": {"name.original": {"order": "asc"}}
}
Result
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": null,
"hits": [
{
"_index": "test",
"_type": "person",
"_id": "1",
"_score": null,
"_source": {
"name": "Berta Kassulke"
},
"sort": [
"Berta Kassulke"
]
},
{
"_index": "test",
"_type": "person",
"_id": "2",
"_score": null,
"_source": {
"name": "Kaley Bartoletti"
},
"sort": [
"Kaley Bartoletti"
]
},
{
"_index": "test",
"_type": "person",
"_id": "3",
"_score": null,
"_source": {
"name": "Kali Hahn"
},
"sort": [
"Kali Hahn"
]
},
{
"_index": "test",
"_type": "person",
"_id": "4",
"_score": null,
"_source": {
"name": "Karolann Klein"
},
"sort": [
"Karolann Klein"
]
},
{
"_index": "test",
"_type": "person",
"_id": "5",
"_score": null,
"_source": {
"name": "Sofia Mandez Kaloo"
},
"sort": [
"Sofia Mandez Kaloo"
]
}
]
}
}
Intended result
Sorted by name ASC but prioritising the matches on the most left words
Kaley Bartoletti
Kali Hahn
Karolann Klein
Berta Kassulke
Sofia Mandez Kaloo

Good Question. One way to achieve this would be with the combination of edge ngram filter and span first query
This is my setting
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"tokenizer": "standard",
"filter": ["lowercase",
"edge_filter",
"asciifolding"
]
}
},
"filter": {
"edge_filter": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 8
}
}
}
},
"mappings": {
"person": {
"properties": {
"name": {
"type": "string",
"analyzer": "my_custom_analyzer",
"search_analyzer": "standard",
"fields": {
"standard": {
"type": "string"
}
}
}
}
}
}
}
After that I inserted your sample documents. Then I wrote the following query with dis_max. Notice that end parameter for first span query is 1 so this will prioritize(higher score) leftmost match. I am first sorting by score and then by name.
{
"query": {
"dis_max": {
"tie_breaker": 0.7,
"boost": 1.2,
"queries": [
{
"match": {
"name": "ka"
}
},
{
"span_first": {
"match": {
"span_term": {
"name": "ka"
}
},
"end": 1
}
},
{
"span_first": {
"match": {
"span_term": {
"name": "ka"
}
},
"end": 2
}
}
]
}
},
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"name.standard": {
"order": "asc"
}
}
]
}
The result I get
"hits": [
{
"_index": "esedge",
"_type": "policy_data",
"_id": "2",
"_score": 0.72272325,
"_source": {
"name": "Kaley Bartoletti"
},
"sort": [
0.72272325,
"bartoletti"
]
},
{
"_index": "esedge",
"_type": "policy_data",
"_id": "3",
"_score": 0.72272325,
"_source": {
"name": "Kali Hahn"
},
"sort": [
0.72272325,
"hahn"
]
},
{
"_index": "esedge",
"_type": "policy_data",
"_id": "4",
"_score": 0.72272325,
"_source": {
"name": "Karolann Klein"
},
"sort": [
0.72272325,
"karolann"
]
},
{
"_index": "esedge",
"_type": "policy_data",
"_id": "1",
"_score": 0.54295504,
"_source": {
"name": "Berta Kassulke"
},
"sort": [
0.54295504,
"berta"
]
},
{
"_index": "esedge",
"_type": "policy_data",
"_id": "5",
"_score": 0.2905494,
"_source": {
"name": "Sofia Mandez Kaloo"
},
"sort": [
0.2905494,
"kaloo"
]
}
]
I hope this helps.

Extract top visited websites from logs

We are storing log data containing information about sites that has been visited from our network. I had like to query the top 10 visited websites. How can I achieve this with ElasticSearch? The index mapping is as follows:
{
"data" : {
"properties" : {
"date": {
"type" : "date",
"format" : "yyyy-MM-dd HH:mm:ss"
},
"status": {"type" : "string"},
"group": {"type" : "string"},
"ip": {"type" : "ip"},
"username":{"type" : "string"},
"category":{"type" : "string"},
"url":{"type" : "string"}
}
}
}
Sample Data:
"hits": {
"total": 7,
"max_score": 1,
"hits": [
{
"_index": "squid",
"_type": "data",
"_id": "AU_DT4_ibdcNyAnt753J",
"_score": 1,
"_source": {
"date": "2015-08-16T00:02:00.195Z",
"status": "PASS",
"group": "level3",
"ip": "10.249.10.49",
"username": "Hyder",
"category": "ads",
"url": "https://gmail.com/mail/u/0/#inbox"
}
},
{
"_index": "squid",
"_type": "data",
"_id": "AU_DMjDpbdcNyAnt75iB",
"_score": 1,
"_source": {
"date": "2015-08-15T00:01:00.195Z",
"status": "BLOCK",
"group": "level3",
"ip": "10.249.10.51",
"username": "Fary",
"category": "ads",
"url": "https://gmail.com/details/blabla"
}
},
{
"_index": "squid",
"_type": "data",
"_id": "AU_DT94kbdcNyAnt753Y",
"_score": 1,
"_source": {
"date": "2015-08-17T00:02:00.195Z",
"status": "PASS",
"group": "level3",
"ip": "10.249.10.49",
"username": "Hyder",
"category": "news",
"url": "http://aol.com"
}
},
{
"_index": "squid",
"_type": "data",
"_id": "AU_CwTEqbdcNyAnt74RJ",
"_score": 1,
"_source": {
"date": "2015-08-15T00:00:00.195Z",
"status": "PASS",
"group": "level3",
"ip": "10.249.10.49",
"username": "Hyder",
"category": "Blog",
"url": "http://gmail.com"
}
},
{
"_index": "squid",
"_type": "data",
"_id": "AU_DMmUQbdcNyAnt75iQ",
"_score": 1,
"_source": {
"date": "2015-08-15T00:02:00.195Z",
"status": "PASS",
"group": "level3",
"ip": "10.249.10.51",
"username": "Fary",
"category": "ads",
"url": "http://yahoo.com/vbfhghfgjfdgfd"
}
},
{
"_index": "squid",
"_type": "data",
"_id": "AU_DT1yjbdcNyAnt753B",
"_score": 1,
"_source": {
"date": "2015-08-16T00:02:00.195Z",
"status": "REDIR",
"group": "level3",
"ip": "10.249.10.49",
"username": "Hyder",
"category": "ads",
"url": "http://news.yahoo.com/"
}
},
{
"_index": "squid",
"_type": "data",
"_id": "AU_DMV1ObdcNyAnt75hd",
"_score": 1,
"_source": {
"date": "2015-08-15T00:01:00.195Z",
"status": "BLOCK",
"group": "level3",
"ip": "10.249.10.50",
"username": "Kamal",
"category": "Blog",
"url": "http://hotmail.com/dfdgfgfdg"
}
}
]
What I had like to have:
Top visited sites:
- **Sites - Hits**
- gmail.com - 3
- yahoo.com - 2
- hotmail.com - 1
- aol.com - 1

First you need to extract the base site ( Like gmail.com ) from the URL field before indexing and add it to a new field. Lets assume this new field is baseSite.
Then , you need to follow what is exactly told in this blog.
First make the field baseSite as not_analyzed and then do a terms aggregation on that field.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Accurate query confusion on Elasticsearch - elasticsearch

Related

How to filter an aggregation based on the values of top hits?

Edge n-gram suggestions and 'starts with' keyword in Elasticsearch

Elasticsearch + Kibana + Alerting (X-Pack) For Energy Monitoring System

How to sort by match prioritising the most left words matched

Extract top visited websites from logs

Categories

Resources