How to aggregate matched terms in a query_string search?

How to aggregate matched terms in a query_string search? - elasticsearch

I wish to search wildcard terms in a nested list of dict and then obtain a list of terms and its uuid grouped by matched wildcard.
I've the following mapping in my index:
"mappings": {
"properties": {
"uuid": {
"type": "keyword"
},
"urls": {
"type": "nested",
"properties": {
"url": {
"type": "keyword"
},
"is_visited": {
"type": "boolean"
}
}
}
}
}
and a lot of data such this:
{
"uuid":"afa9ac03-0723-4d66-ae18-08a51e2973bd"
"urls": [
{
"is_visited": true,
"url": "https://www.google.com"
},
{
"is_visited": false,
"url": "https://www.facebook.com"
},
{
"is_visited": true,
"url": "https://www.twitter.com"
},
]
},
{
"uuid":"4a1c695d-756b-4d9d-b3a0-cf524d955884"
"urls": [
{
"is_visited": true,
"url": "https://www.stackoverflow.com"
},
{
"is_visited": false,
"url": "https://www.facebook.com"
},
{
"is_visited": false,
"url": "https://drive.google.com"
},
{
"is_visited": false,
"url": "https://maps.google.com"
},
]
}
...
I wish to search via wildcard "*google.com OR *twitter.com" and obtain something like this:
"hits": [
"*google.com": [
{
"uuid": "4a1c695d-756b-4d9d-b3a0-cf524d955884",
"_source": {
"is_visited": false,
"url": "https://drive.google.com"
}
},
{
"id": "4a1c695d-756b-4d9d-b3a0-cf524d955884",
"_source": {
"is_visited": false,
"url": "https://maps.google.com"
}
},
{
"uuid":"afa9ac03-0723-4d66-ae18-08a51e2973bd",
"_source": {
"is_visited": true,
"url": "https://www.google.com"
}
}
]
"*twitter.com": [
{
"uuid":"afa9ac03-0723-4d66-ae18-08a51e2973bd",
"_source": {
"is_visited": true,
"url": "https://www.twitter.com"
},
},
]
]
This is my (python) search query:
body = {
#"_source": False,
"size": 100,
"query": {
"nested": {
"path": "urls",
"query":{
"query_string":{
"query": f"urls.url:{urlToSearch}",
}
}
,"inner_hits": {
"size":100 # returns top 100 results
}
}
}
}
but it returns an hit for each matched term instead of aggregate them in a list similar to what I would like to get.
EDIT
This is my setting and mapping:
{
"settings": {
"analysis": {
"char_filter": {
"my_filter": {
"type": "mapping",
"mappings": [
"- => _",
]
},
},
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"my_filter"
],
"filter": [
"lowercase",
]
}
}
}
},
"mappings": {
"properties": {
"uuid": {
"type": "keyword"
},
"urls": {
"type": "nested",
"properties": {
"url": {
"type": "keyword"
},
"is_visited": {
"type": "boolean"
}
}
}
}
}
}

Elasticsearch will not provide the output you want the way you set up the query.
This scenario to be an aggregation. My suggestion was to apply the nested query and use aggregation on the results.
Attention point wildcard query:
Avoid beginning patterns with * or ?. This can increase the iterations
needed to find matching terms and slow search performance.
{
"size": 0,
"query": {
"nested": {
"path": "urls",
"query": {
"bool": {
"should": [
{
"wildcard": {
"urls.url": {
"value": "*google.com"
}
}
},
{
"wildcard": {
"urls.url": {
"value": "*twitter.com"
}
}
}
]
}
}
}
},
"aggs": {
"agg_providers": {
"nested": {
"path": "urls"
},
"aggs": {
"google.com": {
"terms": {
"field": "urls.url",
"include": ".*google.com",
"size": 10
}
},
"twitter.com": {
"terms": {
"field": "urls.url",
"include": ".*twitter.com",
"size": 10
}
}
}
}
}
}
Results:
"aggregations": {
"agg_providers": {
"doc_count": 7,
"twitter.com": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "https://www.twitter.com",
"doc_count": 1
}
]
},
"google.com": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "https://drive.google.com",
"doc_count": 1
},
{
"key": "https://maps.google.com",
"doc_count": 1
},
{
"key": "https://www.google.com",
"doc_count": 1
}
]
}
}
}

Related

How to nested aggregate matched terms?

I've this mapping in my index:
{
"mappings": {
"properties": {
"uuid": {
"type": "keyword"
},
"last_visit": {
"type": "date"
},
"urls": {
"type": "nested",
"properties": {
"url": {
"type": "keyword"
},
"is_visited": {
"type": "boolean"
}
}
}
}
}
}
and hundreds of data like this:
This is my desired output when I search for *google.com and *facebook.com:
[
{
"uuid": "afa9ac03-0723-4d66-ae18-08a51e2973bd",
"urls": [
{
"is_visited": true,
"url": "https://www.google.com",
"last_visit": "2022-02-31"
},
{
"is_visited": false,
"url": "https://www.facebook.com",
"last_visit": "2022-02-03"
},
{
"is_visited": true,
"url": "https://www.twitter.com",
"last_visit": "2022-03-30"
}
]
},
{
"uuid": "4a1c695d-756b-4d9d-b3a0-cf524d955884",
"urls": [
{
"is_visited": true,
"url": "https://www.stackoverflow.com",
"last_visit": "2022-03-23"
},
{
"is_visited": false,
"url": "https://www.facebook.com",
"last_visit": "2022-02-02"
},
{
"is_visited": false,
"url": "https://drive.google.com",
"last_visit": "2022-05-01"
},
{
"is_visited": true,
"url": "https://www.google.com",
"last_visit": "2022-07-09"
}
]
}
]
and this is the code I wrote (thanks to another question where I have not explained myself well about desired output) with focus on *google.com when I try to add last_visit field to output :
{
"query": {
"nested": {
"path": "urls",
"query": {
"bool": {
"should": [
{
"wildcard": {
"urls.url": {
"value": "*google.com"
}
}
},
{
"wildcard": {
"urls.url": {
"value": "*facebook.com"
}
}
}
]
}
}
}
},
"aggs": {
"agg_providers": {
"nested": {
"path": "urls"
},
"aggs": {
"google.com": {
"terms": {
"field": "urls.url",
"include": ".*google.com",
"size": 10
},
"aggs": {
"top_hits": {
"top_hits": {
"size": 1,
"_source": {
"includes": ["last_visit"]
}
}
}
}
},
"facebook.com": {
"terms": {
"field": "urls.url",
"include": ".*facebook.com",
"size": 10
}
}
}
}
}
}
The code above returns 2 differents buckets lists in which I have key,doc_count dict values instead of all fields (is_visited, last_visit, uuid, etc.)
Thanks.

Elasticsearch Querying Double Nested Object, Match Multiple Rows in Query Within Parent

My data model is related to patient records. At the highest level is the Patient, then their information such as Lab Panels and the individual rows of the results of the panel. So it looks like this: {Patient:{Labs:[{Results:[{}]}]}}
I am able to successfully create the two nested objects Labs nested in Patient and Results nested in Labs, populate it, and query it. What I am unable to successfully do is create a query that constrains the results to a single Lab, and then match by more than one row in the Results object.
An example is attached, where I only want labs that are "Lipid Panel" and the results are HDL <= 46 and LDL >= 140.
Any suggestions?
Example Index
PUT localhost:9200/testpipeline
{
"aliases": {},
"mappings": {
"dynamic": "false",
"properties": {
"ageAtFirstEncounter": {
"type": "float"
},
"dateOfBirth": {
"type": "date"
},
"gender": {
"type": "keyword"
},
"id": {
"type": "float"
},
"labs": {
"type": "nested",
"properties": {
"ageOnDateOfService": {
"type": "float"
},
"date": {
"type": "date"
},
"encounterId": {
"type": "keyword"
},
"id": {
"type": "keyword"
},
"isEdVisit": {
"type": "boolean"
},
"labPanelName": {
"type": "keyword"
},
"labPanelNameId": {
"type": "float"
},
"labPanelSourceName": {
"type": "text",
"store": true
},
"personId": {
"type": "keyword"
},
"processingLogId": {
"type": "float"
},
"results": {
"type": "nested",
"properties": {
"dataType": {
"type": "keyword"
},
"id": {
"type": "float"
},
"labTestName": {
"type": "keyword"
},
"labTestNameId": {
"type": "float"
},
"resultAsNumber": {
"type": "float"
},
"resultAsText": {
"type": "keyword"
},
"sourceName": {
"type": "text",
"store": true
},
"unit": {
"type": "keyword"
}
}
}
}
},
"personId": {
"type": "keyword"
},
"processingLogId": {
"type": "float"
},
"race": {
"type": "keyword"
}
}
}
}
Example Document
PUT localhost:9200/testpipeline/_doc/274746
{
"id": 274746,
"personId": "10005786.000000",
"processingLogId": 51,
"gender": "Female",
"dateOfBirth": "1945-01-01T00:00:00",
"ageAtFirstEncounter": 76,
"labs": [
{
"isEdVisit": false,
"labPanelSourceName": "Lipid Panel",
"dataType": "LAB",
"ageOnDateOfService": 76.9041,
"results": [
{
"unit": "mg/dL",
"labTestNameId": 160,
"labTestName": "HDL",
"sourceName": "HDL",
"resultAsNumber": 46.0,
"resultAsText": "46",
"id": 2150284
},
{
"unit": "mg/dL",
"labTestNameId": 158,
"labTestName": "LDL",
"sourceName": "LDL",
"resultAsNumber": 144.0,
"resultAsText": "144.00",
"id": 2150286
}
],
"id": "9ab9ba84-580b-f2d2-4d32-25658ea5f1bf",
"sourceId": 2150278,
"personId": "10003783.000000",
"encounterId": "39617217.000000",
"processingLogId": 51,
"date": "2021-11-08T00:00:00"
}
],
"lastModified": "2022-03-24T10:21:29.8682784-05:00"
}
Example Query
POST localhost:9200/testpipeline/_search
{
"fields": [
"personId",
"processingLogId",
"id",
"gender",
"ageAtFirstDOS",
"dateOfBirth"
],
"from": 0,
"query": {
"bool": {
"should": [
{
"constant_score": {
"boost": 200,
"filter": {
"bool": {
"_name": "CriteriaFilterId:2068,CriteriaId:1,CriteriaClassId:1,Points:200,T5:False,SoftScore:200",
"should": [
{
"bool": {
"must": [
{
"nested": {
"path": "labs",
"inner_hits": {
"size": 3,
"name": "labs,CriteriaFilterId:2068,CriteriaId:1,CriteriaClassId:1,Points:200,T5:False,guid:8b41f346-2861-4099-b3c0-fcd6393c367b"
},
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"match_phrase": {
"labs.labPanelSourceName": {
"_name": "CriteriaFilterId:2068,Pipeline.Labs.LabPanelSourceName,es_match_phrase=>'Lipid Panel' found in text",
"query": "Lipid Panel",
"slop": 100
}
}
},
{
"nested": {
"path": "labs.results",
"inner_hits": {
"size": 3,
"name": "labs.results,CriteriaFilterId:2068,CriteriaId:1,CriteriaClassId:1,Points:200,T5:False,guid:3564e83f-958b-4fe8-848e-f9edb5d7f3b2"
},
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"range": {
"labs.results.resultAsNumber": {
"lte": 46
}
}
},
{
"term": {
"labs.results.labTestNameId": {
"value": 160
}
}
}
]
}
},
{
"bool": {
"must": [
{
"range": {
"labs.results.resultAsNumber": {
"gte": 140.0
}
}
},
{
"term": {
"labs.results.labTestNameId": {
"value": 158
}
}
}
]
}
}
],
"minimum_should_match": 2
}
}
]
}
}
}
}
]
}
}
]
}
}
}
}
]
}
}
]
}
}
}
}
],
"minimum_should_match": 1,
"filter": [
]
}
},
"size": 10,
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"processingLogId": {
"order": "asc"
}
},
{
"personId": {
"order": "asc"
}
}
],
"_source": false
}

ElasticSearch won't search specific field

I have a problem searching a specific field inside my index.
little background:
On my project we need to search inside a terminology Server like FHIR but then our own.
So we have an object that contains a Code (123564/A), multiple translations as term/display (urine problem) and mapping to other codes that are equal to that code but in a different system (ICD-10, SNOMED-CT, ICPC-2,..) example what has been indexed:
{
"Code": "10008220/A1",
"EffectiveTime": "0001-01-01T00:00:00Z",
"Active": true,
"System": "ibui",
"Purpose": "",
"Descriptions": [
{
"DescriptionId": "2464cf5c-d4fc-4a61-b6bc-746d003cb4ef",
"Code": "10008220/A1",
"System": "ibui",
"Term": "gebroken arm",
"LanguageId": "3d50c237-0add-43e7-92a2-5edf1ac7c6ee",
"FSN": false,
"Preferred": true,
"EffectiveTime": "0001-01-01T00:00:00Z",
"Active": true,
"SendVersion": "2021-12-07T17:01:53.786755Z",
"Purpose": ""
},
{
"DescriptionId": "95501583-9f24-4964-bbc9-1a6e95eba30f",
"Code": "10008220/A1",
"System": "ibui",
"Term": "fracture du bras",
"LanguageId": "1238dde0-08df-4ae0-8676-59919f66737e",
"FSN": false,
"Preferred": true,
"EffectiveTime": "0001-01-01T00:00:00Z",
"Active": true,
"SendVersion": "2021-12-07T17:01:53.786755Z",
"Purpose": ""
}
],
"Mappings": [
{
"MappingId": "",
"FromSys": "ibui",
"From": "10008220/A1",
"ToSys": "icd-10",
"To": "T10",
"EffectiveTime": "0001-01-01T00:00:00Z",
"Active": true
},
{
"MappingId": "",
"FromSys": "ibui",
"From": "10008220/A1",
"ToSys": "icpc-2",
"To": "L76",
"EffectiveTime": "0001-01-01T00:00:00Z",
"Active": true
}
],
"SendVersion": "2021-12-07T17:01:53.786755Z"
}
The problem:
We can search on 2 different fields : Code & Term. and when searching we keep in mind that we have some filters for a specific language code (Dutch,..) or A system like ICD-10 or ICPC-2,..
I have a query that is working and returns the above object when searching in 1 field (Descriptions.Term) that is the following:
working query
{
"query": {
"bool": {
"must": {
"nested": {
"inner_hits": {
"highlight": {
"fields": {
"*": {}
}
}
},
"path": "Descriptions",
"query": {
"bool": {
"should": [
{
"multi_match": {
"fields": [
"Descriptions.Term",
"Descriptions.Term._2gram",
"Descriptions.Term._3gram"
],
"query": "gebroken*~ n",
"type": "bool_prefix"
}
}
],
"filter": [
{
"bool": {
"should": [
{
"term": {
"Descriptions.System": "ibui"
}
},{
"term": {
"Descriptions.System": "icd-10"
}
},{
"term": {
"Descriptions.System": "icpc-2"
}
}
],
"minimum_should_match": "1"
}
},
{
"term": {
"Descriptions.Active": "true"
}
},
{
"term": {
"Descriptions.LanguageId": "3d50c237-0add-43e7-92a2-5edf1ac7c6ee"
}
}
]
}
}
}
}
}
}
}
But when we somethings need to search in multiple fields.
When adding the Descriptions.Code field to the fields map the query is not working and I can't figure out why. I have it decleared inside my mapping so it should be searchable?
I'm searching for the Code of the object above in both fields (Descriptions.Term & Descriptions.Code) but it doesn't returns the hit.
not working query
{
"query": {
"bool": {
"must": {
"nested": {
"inner_hits": {
"highlight": {
"fields": {
"*": {}
}
}
},
"path": "Descriptions",
"query": {
"bool": {
"should": [
{
"multi_match": {
"fields": [
"Descriptions.Term",
"Descriptions.Term._2gram",
"Descriptions.Term._3gram",
"Descriptions.Code"
],
"query": "10008220*~ n",
"type": "bool_prefix"
}
}
],
"filter": [
{
"bool": {
"should": [
{
"term": {
"Descriptions.System": "ibui"
}
},{
"term": {
"Descriptions.System": "icd-10"
}
},{
"term": {
"Descriptions.System": "icpc-2"
}
}
],
"minimum_should_match": "1"
}
},
{
"term": {
"Descriptions.Active": "true"
}
},
{
"term": {
"Descriptions.LanguageId": "3d50c237-0add-43e7-92a2-5edf1ac7c6ee"
}
}
]
}
}
}
}
}
}
}
mapping:
{
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "custom_tokenizer"
}
},
"tokenizer": {
"custom_tokenizer": {
"type": "ngram",
"min_gram": 2,
"max_gram": 6,
"token_chars": [
"letter",
"digit",
"symbol",
"punctuation"
]
}
}
},
"max_ngram_diff" : "5"
},
"mappings": {
"properties": {
"Descriptions": {
"type": "nested",
"properties": {
"Term": {
"type": "search_as_you_type",
"analyzer": "autocomplete"
},
"Code": {
"type": "keyword",
"index": true
},
"System": {
"type": "keyword",
"index": true
},
"LanguageId": {
"type": "keyword",
"index": true
},
"Purpose": {
"type": "keyword",
"index": true
},
"Active": {
"type": "keyword",
"index": true
}
}
},
"Mappings": {
"properties": {
"To": {
"type": "keyword",
"index": true
},
"ToSys": {
"type": "keyword",
"index": true
}
}
}
}
}
}
Thank you for helping me out!

Elasticsearch complex aggregation on match results

I'm trying to get the top 5 teams based on win rate (matches won / all matches) having the match result documents in the index (see below).
The fields I need to get in the bucket:
apiId
name
winrate (won/total)
I guess this would require complex aggregation calculation witch is far beyond my current elasticsearch skills.
Elasticsearch version: 7.15.2
Could anyone help me with such elasticsearch query?
Thanks a lot in advance!
{
"lastModified": "2022-01-14T09:33:48.232Z",
"uuid": "01234567",
"started": "2022-01-14T09:31:27.651Z",
"editing": false,
"approved": true,
"statistics": {
"teams": [
{
"name": "Team1",
"score": 0,
"winner": false,
"apiId": "1"
},
{
"name": "Team2",
"score": 2,
"winner": true,
"apiId": "2"
}
]
}
}
and the mapping:
{
"mappings": {
"properties": {
"_class": {
"type": "keyword",
"index": false,
"doc_values": false
},
"approved": {
"type": "boolean"
},
"editing": {
"type": "boolean"
},
"ended": {
"type": "date",
"format": "date_optional_time||epoch_millis"
},
"game": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"lastModified": {
"type": "date",
"format": "date_optional_time||epoch_millis"
},
"started": {
"type": "date",
"format": "date_optional_time||epoch_millis"
},
"statistics": {
"type": "nested",
"include_in_parent": true,
"properties": {
"_class": {
"type": "keyword",
"index": false,
"doc_values": false
},
"teams": {
"type": "nested",
"include_in_parent": true,
"properties": {
"_class": {
"type": "keyword",
"index": false,
"doc_values": false
},
"apiId": {
"type": "long"
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"score": {
"type": "long"
},
"winner": {
"type": "boolean"
}
}
}
}
},
"uuid": {
"type": "keyword"
}
}
}
}
Edit:
Based on #ilvar answer, I constructed the query:
{
"query": {
"bool": {
"must": [
{
"term": {
"approved": true
}
}
]
}
},
"size": 0,
"aggs": {
"top_teams": {
"nested": {
"path": "statistics.teams"
},
"aggs": {
"the_all": {
"multi_terms": {
"terms": [
{
"field": "statistics.teams.name.keyword"
},
{
"field": "statistics.teams.apiId"
}
]
}
},
"the_won": {
"filter": {
"terms": {
"statistics.teams.winner": [
true
]
}
},
"aggs": {
"teams": {
"multi_terms": {
"terms": [
{
"field": "statistics.teams.name.keyword"
},
{
"field": "statistics.teams.apiId"
}
]
}
}
}
}
}
}
}
}
Which gives me:
{
"took": 20,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 5,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"top_teams": {
"doc_count": 10,
"the_all": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": [
"Team1",
2
],
"key_as_string": "Team1|2",
"doc_count": 2
},
{
"key": [
"Team2",
3
],
"key_as_string": "Team2|3",
"doc_count": 2
},
{
"key": [
"Team3",
5
],
"key_as_string": "Team3|5",
"doc_count": 2
},
{
"key": [
"Team4",
1
],
"key_as_string": "Team4|1",
"doc_count": 1
},
{
"key": [
"Team5",
4
],
"key_as_string": "Team5|4",
"doc_count": 1
},
{
"key": [
"Team6",
7
],
"key_as_string": "Team6|7",
"doc_count": 1
},
{
"key": [
"Team7",
6
],
"key_as_string": "Team7|6",
"doc_count": 1
}
]
},
"the_won": {
"doc_count": 4,
"teams": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": [
"Team2",
5
],
"key_as_string": "Team2|5",
"doc_count": 2
},
{
"key": [
"Team4",
2
],
"key_as_string": "Team4|2",
"doc_count": 1
},
{
"key": [
"Team7",
3
],
"key_as_string": "Team7|3",
"doc_count": 1
}
]
}
}
}
}
}
But I still cannot get the winrate from two siblings, where one sibling might have missing teams that have not won any match.
Should I use some pipeline aggregation?

The final solution I came out with is bellow. the sort pipeline aggregation did the final part of the job
{
"query": {
"bool": {
"must": [
{
"term": {
"approved": true
}
}
]
}
},
"size": 0,
"aggs": {
"top_teams": {
"nested": {
"path": "statistics.teams"
},
"aggs": {
"the_all": {
"terms": {
"field": "statistics.teams.apiId"
},
"aggs": {
"the_won_by_team": {
"filter": {
"terms": {
"statistics.teams.winner": [
true
]
}
}
},
"the_lost_by_team": {
"filter": {
"terms": {
"statistics.teams.winner": [
false
]
}
}
},
"the_all_by_team": {
"filter": {
"terms": {
"statistics.teams.winner": [
true,
false
]
}
}
},
"the_winrate": {
"bucket_script": {
"buckets_path": {
"the_won_count": "the_won_by_team._count",
"the_all_count": "the_all_by_team._count"
},
"script": "params.the_won_count / params.the_all_count"
}
},
"the_sort": {
"bucket_sort": {
"sort": [
{
"the_winrate": "desc"
},
{
"the_all_by_team._count": "desc"
}
],
"size": 5
}
}
}
}
}
}
}
}

This is very similar to the example docs have for nested aggregation. The only difference would be that you'll have a terms aggregation on the top level instead of filter so you get back all of the teams.

ElasticSearch double nested sorting

I have documents which look like this (here is example):
{
"user": "xyz",
"state": "FINISHED",
"finishedTime": 1465566467161,
"jobCounters": {
"counterGroup": [
{
"counterGroupName": "org.apache.hadoop.mapreduce.FileSystemCounter",
"counter": [
{
"name": "FILE_BYTES_READ",
"mapCounterValue": 206509212380,
"totalCounterValue": 423273933523,
"reduceCounterValue": 216764721143
},
{
"name": "FILE_BYTES_WRITTEN",
"mapCounterValue": 442799895522,
"totalCounterValue": 659742824735,
"reduceCounterValue": 216942929213
},
{
"name": "HDFS_BYTES_READ",
"mapCounterValue": 207913352565,
"totalCounterValue": 207913352565,
"reduceCounterValue": 0
},
{
"name": "HDFS_BYTES_WRITTEN",
"mapCounterValue": 0,
"totalCounterValue": 89846725044,
"reduceCounterValue": 89846725044
}
]
},
{
"counterGroupName": "org.apache.hadoop.mapreduce.JobCounter",
"counter": [
{
"name": "TOTAL_LAUNCHED_MAPS",
"mapCounterValue": 0,
"totalCounterValue": 13394,
"reduceCounterValue": 0
},
{
"name": "TOTAL_LAUNCHED_REDUCES",
"mapCounterValue": 0,
"totalCounterValue": 720,
"reduceCounterValue": 0
}
]
}
]
}
}
Now I want to sort this data to get TOP 15 documents on the basis of totalCounterValue where counter.name is FILE_BYTES_READ. I have tried nested sorting on this but no matter which key name I write in counter.name, it is always sorting on the basis of HDFS_BYTES_READ. Can anyone please help me with my query.
{
"_source": true,
"size": 15,
"query": {
"bool": {
"must": [
{
"term": {
"state": {
"value": "FINISHED"
}
}
},
{
"range": {
"startedTime": {
"gte": "now - 4d",
"lte": "now"
}
}
}
]
}
},
"sort": [
{
"jobCounters.counterGroup.counter.totalCounterValue": {
"order": "desc",
"nested_path": "jobCounters.counterGroup",
"nested_filter": {
"nested": {
"path": "jobCounters.counterGroup.counter",
"filter": {
"term": {
"jobCounters.counterGroup.counter.name": "file_bytes_read"
}
}
}
}
}
}
]}
This is the mapping for jobCounters we have created:
"jobCounters": {
"type": "nested",
"include_in_parent": true,
"properties" : {
"counterGroup": {
"type": "nested",
"include_in_parent": true,
"properties": {
"counterGroupName": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"counter" : {
"type": "nested",
"include_in_parent": true,
"properties": {
"reduceCounterValue": {
"type": "long"
},
"name": {
"type": "string",
"analyzer": "english",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"totalCounterValue": {
"type": "long"
},
"mapCounterValue": {
"type": "long"
}
}
}
}
}
}
}
I followed nested sorting documentation of ElasticSearch and came up with this query, but I don't know why it is always sorting the totalCounterValue of HDFS_BYTES_READ irrespective of jobCounters.counterGroup.counter.name's value.

you can try something like this,
curl -XGET 'http://localhost:9200/index/jobCounters/_search' -d '
{
"size": 15,
"query": {
"nested": {
"path": "jobCounters.counterGroup.counter",
"filter": {
"term": {
"jobCounters.counterGroup.counter.name": "file_bytes_read"
}
}
}
},
"sort": [
{
"jobCounters.counterGroup.counter.totalCounterValue": {
"order": "desc",
"nested_path": "jobCounters.counterGroup",
"nested_filter": {
"nested": {
"path": "jobCounters.counterGroup.counter",
"filter": {
"term": {
"jobCounters.counterGroup.counter.name": "file_bytes_read"
}
}
}
}
}
}
]
}
'
Read the end of this document. It explains that we have to repeat the same query in nested_filter too.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to aggregate matched terms in a query_string search? - elasticsearch

Related

How to nested aggregate matched terms?

Elasticsearch Querying Double Nested Object, Match Multiple Rows in Query Within Parent

ElasticSearch won't search specific field

Elasticsearch complex aggregation on match results

ElasticSearch double nested sorting

Categories

Resources