multiple fields in aggs elastic query - elasticsearch

i have elastic mapped as
"mappings": {
"keywords": {
"properties": {
"Keyword": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"KeywordType": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
I trying to retrieve two fields keywords and its keyword type.
{
"query": {
"bool": {
"must": [{
"match": {
"Keyword": TEXT_REQ
}
}]
}
},
"aggs": {
"keywords": {
"terms": {
"field":"Keyword.keyword",
"size": 500
}
}
}
}
It returns all the keywords that are present in the text. I want the keywordtype also along with it, i tried with multiple value scores
{aggs:{
"keywords":{"terms":{"field":"Keyword.keyword"}},
"keywordtype":{"terms":{"field":"KeywordType.keyword"}}
}}
but i don't get the corresponding keywordtype for the keyword. I got the overall keywordtypes present.
{... "aggregations":{"keywords":{... "buckets":[ {"key": "management"}]},
"keywordtype":{... "buckets":[{"key":"Tools"}, {"key":"technology"}]}
i need output to be
bucket:[{"keyword":"management", keywordtype:"Tools"}]
how to modify the elastic query ?

You can use either of the below queries:
Solution 1: Using Composite Aggregation:
You can make use of the below Composite Aggregation as you mentioned that you would want to group the Keyword and KeywordType
Aggregation Query:
POST <your_index_name>/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"Keyword": "TEXT_REQ"
}
}
]
}
},
"aggs" : {
"my_buckets": {
"composite" : {
"sources" : [
{ "keyword": { "terms" : { "field": "Keyword.keyword" } } },
{ "keywordType": { "terms" : { "field": "KeywordType.keyword" } } }
]
}
}
}
}
Sample Response:
{
"took" : 40,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"my_buckets" : {
"after_key" : {
"keyword" : "TEXT_REQ",
"keywordType" : "TEXT_REQ_Type3"
},
"buckets" : [ <----- Required Results Start
{
"key" : {
"keyword" : "TEXT_REQ",
"keywordType" : "TEXT_REQ_Type1"
},
"doc_count" : 1
},
{
"key" : {
"keyword" : "TEXT_REQ",
"keywordType" : "TEXT_REQ_Type2"
},
"doc_count" : 2
},
{
"key" : {
"keyword" : "TEXT_REQ",
"keywordType" : "TEXT_REQ_Type3"
},
"doc_count" : 1
}
] <----- Required Results End
}
}
}
Solution 2: Using Terms Aggregation
Using Terms Aggregation, I've constructed parent-child(parent being Keyword and child being KeywordType) which would be in below tree structure.
Bool Query
Terms Aggregation on Keyword.keyword
- Terms Aggregation on KeywordType.keyword
Aggregation Query:
POST <your_index_name>/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"Keyword": "TEXT_REQ"
}
}
]
}
},
"aggs": {
"mykeywords": {
"terms": {
"field": "Keyword.keyword",
"size": 10
},
"aggs": {
"mytypes": {
"terms": {
"field": "KeywordType.keyword",
"size": 10
}
}
}
}
}
}
Sample Response:
{
"took" : 97,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"mykeywords" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "TEXT_REQ", <----- Parent Value i.e Keyword
"doc_count" : 4,
"mytypes" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ <----- Children i.e. KeywordType
{
"key" : "TEXT_REQ_Type2",
"doc_count" : 2
},
{
"key" : "TEXT_REQ_Type1",
"doc_count" : 1
},
{
"key" : "TEXT_REQ_Type3",
"doc_count" : 1
}
]
}
}
]
}
}
}
Let me know if this helps!

Related

Can Elastic Search do aggregations for within a document?

I have a mapping like this:
mappings: {
"seller": {
"properties" : {
"overallRating": {"type" : byte}
"items": [
{
itemName: {"type": string},
itemRating: {"type" : byte}
}
]
}
}
}
Each item will only have one itemRating. Each seller will only have one overall rating. There can be many items, and at most I'm expecting maybe 50 items with itemRatings. Not all items have to have an itemRating.
I'm trying to get an average rating for each seller that combines all itemRatings and the overallRating. I have looked into aggregations but all I have seen are aggregations for across all documents. The aggregation I'm looking to do is within the document itself, and I am not sure if that is possible. Any tips would be appreciated.
Yes this is very much possible with Elasticeasrch. To produce a combined rating, you simply need to subaggregate by the document id. The only thing present in the bucket would be the individual document . That is what you want.
Here is an example:
Create the index:
PUT /ratings
{
"mappings": {
"properties": {
"overallRating": {"type" : "float"},
"items": {
"type" : "nested",
"properties": {
"itemName" : {"type" : "keyword"},
"itemRating" : {"type" : "float"},
"overallRating": {"type" : "float"}
}
}
}
}
}
Add some data:
POST ratings/_doc/
{
"overallRating" : 1,
"items" : [
{
"itemName" : "labrador",
"itemRating" : 10,
"overallRating" : 1
},
{
"itemName" : "saint bernard",
"itemRating" : 20,
"overallRating" : 1
}
]
}
{
"overallRating" : 1,
"items" : [
{
"itemName" : "cat",
"itemRating" : 5,
"overallRating" : 1
},
{
"itemName" : "rat",
"itemRating" : 10,
"overallRating" : 1
}
]
}
Query the index for a combined rating and sort by the rating:
GET ratings/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"average_rating": {
"composite": {
"sources": [
{
"ids": {
"terms": {
"field": "_id"
}
}
}
]
},
"aggs": {
"average_rating": {
"nested": {
"path": "items"
},
"aggs": {
"avg": {
"avg": {
"field": "items.compound"
}
}
}
}
}
}
},
"runtime_mappings": {
"items.compound": {
"type": "double",
"script": {
"source": "emit(doc['items.overallRating'].value + doc['items.itemRating'].value)"
}
}
}
}
The result (Pls note that i changed the exact values of ratings between writing the answer and running it in the console, so the averages are a bit different)
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"average_rating" : {
"after_key" : {
"ids" : "3vUp44EBbR3hrRYkA8pj"
},
"buckets" : [
{
"key" : {
"ids" : "3_Up44EBbR3hrRYkLsrC"
},
"doc_count" : 1,
"average_rating" : {
"doc_count" : 2,
"avg" : {
"value" : 151.0
}
}
},
{
"key" : {
"ids" : "3vUp44EBbR3hrRYkA8pj"
},
"doc_count" : 1,
"average_rating" : {
"doc_count" : 2,
"avg" : {
"value" : 8.5
}
}
}
]
}
}
}
One change for convenience:
I edited your mappings to add the overAllRating to each Item entry. This simplifies the calculations that come subsequently, simply because you only look in the nested scope and never have to step out.
I also had to use a "runtime mapping" to combine the value of each overAllRating and ItemRating, to produce a better average. I basically made a sum of every ItemRating with the OverAllRating and averaged those across every entry.
I had to use a top level composite "id" aggregation so that we only get results per document (which is what you want).
There is some pretty heavy lifting happening here, but it is very possible and easy to edit this as you require.
HTH.

Elasticsearch _search not providing results

I'm trying to return all name fields and count fields from my index however when I try to search for data no data is returned (as shown in last code stub). I definitely have data in my index. What am I doing wrong in my _search command?
My mappings:
PUT /visual
{
"mappings": {
"properties": {
"#timestamp": {"type": "date"},
"name": {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"count": {"type": "integer"},
"err": {"type": "integer"},
"delta1": {"type": "integer"},
"str_list": {"type": "text"}
}
}
}
My search command where I have tried to return the name field, count field and timestamp:
POST visual/_search
{
"query":{
"range":{
"order_date":{
"gte":"now-80d"
}
}
},
"aggs": {
"names":{
"terms":{"field":"name.keyword"},
"aggs": {
"counts":{
"terms":{"field":"count"},
"aggs": {
"time_buckets": {
"date_histogram": {
"field": "#timestamp",
"fixed_interval": "1h",
"extended_bounds": {
"min": "now-80d"
},
"min_doc_count": 0
}
}
}
}
}
}
},"size":100
}
The Response where no data has been returned:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"names" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
}
}
In your range query, you're using the field order_field, which doesn't exist given your mappings. So maybe using #timestamp will already solve the problem?
"query":{
"range":{
"#timestamp":{
"gte":"now-80d"
}
}
}
Check the range query doc for more information.

Elasticsearch get average

I'm trying to average aggregate data on elasticsearch. This is the structure of my data:
document 1
{
"groupId":"TEST_01",
"lag":10,
"detectionDate":"2021-02-26T21:42:30.010Z",
"tipo":"uno",
"topics":[
{
"name":"topic_01",
"valore":2
},
{
"name":"topic_02",
"valore":4
}
]
}
document 2
{
"groupId":"TEST_01",
"lag":10,
"detectionDate":"2021-02-26T21:42:30.010Z",
"tipo":"uno",
"topics":[
{
"name":"topic_01",
"valore":4
},
{
"name":"topic_02",
"valore":8
}
]
}
I have to create an aggregation by groupId and by topic name and on this aggregation calculate the average of the value field. But trying with the source code the result of the obtained average is wrong.
With the above data of documents one and two the expected result should be:
groupId
topicName
average
TEST_01
topic_01
3
TEST_01
topic_02
6
TermsAggregationBuilder aggregation = AggregationBuilders
.terms("groupId")
.field("groupId.keyword")
.subAggregation(AggregationBuilders
.terms("topicName")
.field("topics.name.keyword").subAggregation(AggregationBuilders
.avg("avg").field("topics.valore")));
First of all make sure you topics field is type "nested", because if it is "object" the topicName and valores will be flattened. This mean you will end up with a set of valores and topicNames without relation between them.
Mappings
{
"test_ynsanity" : {
"mappings" : {
"properties" : {
"detectionDate" : {
"type" : "date"
},
"groupId" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"lag" : {
"type" : "long"
},
"tipo" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"topics" : {
"type" : "nested",
"properties" : {
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"valore" : {
"type" : "long"
}
}
}
}
}
}
}
Ingesting data
POST test_ynsanity/_doc
{
"groupId":"TEST_01",
"lag":10,
"detectionDate":"2021-02-26T21:42:30.010Z",
"tipo":"uno",
"topics":[
{
"name":"topic_01",
"valore":2
},
{
"name":"topic_02",
"valore":4
}
]
}
POST test_ynsanity/_doc
{
"groupId":"TEST_01",
"lag":10,
"detectionDate":"2021-02-26T21:42:30.010Z",
"tipo":"uno",
"topics":[
{
"name":"topic_01",
"valore":4
},
{
"name":"topic_02",
"valore":8
}
]
}
Query
POST test_ynsanity/_search
{
"size": 0,
"aggs": {
"groups": {
"terms": {
"field": "groupId.keyword",
"size": 10
},
"aggs": {
"topics": {
"nested": {
"path": "topics"
},
"aggs": {
"topic_names": {
"terms": {
"field": "topics.name.keyword"
},
"aggs": {
"topic_avg": {
"avg": {
"field": "topics.valore"
}
}
}
}
}
}
}
}
}
}
Response
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"groups" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "TEST_01",
"doc_count" : 2,
"topics" : {
"doc_count" : 4,
"topic_names" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "topic_01",
"doc_count" : 2,
"NAME" : {
"value" : 3.0
}
},
{
"key" : "topic_02",
"doc_count" : 2,
"NAME" : {
"value" : 6.0
}
}
]
}
}
}
]
}
}
}
I have no access to the Java DSL right now, but the query should look something like this:
TermsAggregationBuilder aggregation = AggregationBuilders
.terms("groupId")
.field("groupId.keyword")
.subAggregation(AggregationBuilders
.nested("agg", "topics")
.terms("topic_names")
.field("topics.name.keyword").subAggregation(AggregationBuilders
.avg("avg").field("topics.valore")));

How to use composite aggregation with a single bucket

The following composite aggregation query
{
"query": {
"range": {
"orderedAt": {
"gte": 1591315200000,
"lte": 1591438881000
}
}
},
"size": 0,
"aggs": {
"my_buckets": {
"composite": {
"sources": [
{
"aggregation_target": {
"terms": {
"field": "supplierId"
}
}
}
]
},
"aggs": {
"aggregated_hits": {
"top_hits": {}
},
"filter": {
"bucket_selector": {
"buckets_path": {
"doc_count": "_count"
},
"script": "params.doc_count > 2"
}
}
}
}
}
}
returns something like below.
{
"took" : 67,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 34,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"my_buckets" : {
"after_key" : {
"aggregation_target" : "0HQI2G2HG00100G8"
},
"buckets" : [
{
"key" : {
"aggregation_target" : "0HQI2G0K000100G8"
},
"doc_count" : 4,
"aggregated_hits" : {...}
},
{
"key" : {
"aggregation_target" : "0HQI2G18G00100G8"
},
"doc_count" : 11,
"aggregated_hits" : {...}
},
{
"key" : {
"aggregation_target" : "0HQI2G2HG00100G8"
},
"doc_count" : 16,
"aggregated_hits" : {...}
}
]
}
}
}
The aggregated results are put into buckets based on the condition set in the query.
Is there any way to put them in a single bucket and paginate thought the whole result(i.e. 31 documents in this case)?
I don't think you can. A doc's context doesn't include information about other docs unless you perform a cardinality, scripted_metric or terms aggregation. Also, once you bucket your docs based on the supplierId, it'd sort of defeat the purpose of aggregating in the first place...
What you wrote above is as good as it gets and you'll have to combine the aggregated_hits within some post processing step.

Elasticsearch aggregation on different search in same query

I want to make a query to aggregate base only on match no matter what other parameters(terms , term , etc...) are used.
To be more specific I have an online shop where I use multiple filters (color ,size etc..) If I check a field for example color : red the other colors are no longer aggregated.
A solution that I am using is to make 2 separated queries (one for search where filters are applied and other for aggregation. Any idea how can I combine the 2 separated queries ?
You can take advantage of post_filter which will not apply to your aggregations but will only filter the to-be-returned hits. For example:
Create a shop
PUT online_shop
{
"mappings": {
"properties": {
"color": {
"type": "keyword"
},
"size": {
"type": "integer"
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
Populate it w/ a few products
POST online_shop/_doc
{"color":"red","size":35,"name":"Louboutin High heels abc"}
POST online_shop/_doc
{"color":"black","size":34,"name":"Louboutin Boots abc"}
POST online_shop/_doc
{"color":"yellow","size":36,"name":"XYZ abc"}
Apply a shared query to the hits as well as aggregations and use post_filter to ... post-filter the hits:
GET online_shop/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "abc"
}
}
]
}
},
"aggs": {
"by_color": {
"terms": {
"field": "color"
}
},
"by_size": {
"terms": {
"field": "size"
}
}
},
"post_filter": {
"bool": {
"must": [
{
"term": {
"color": {
"value": "red"
}
}
}
]
}
}
}
Expected result
{
...
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.11750763,
"hits" : [
{
"_index" : "online_shop",
"_type" : "_doc",
"_id" : "cehma3IBG_KW3EFn1QYa",
"_score" : 0.11750763,
"_source" : {
"color" : "red",
"size" : 35,
"name" : "Louboutin High heels abc"
}
}
]
},
"aggregations" : {
"by_color" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "black",
"doc_count" : 1
},
{
"key" : "red",
"doc_count" : 1
},
{
"key" : "yellow",
"doc_count" : 1
}
]
},
"by_size" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 34,
"doc_count" : 1
},
{
"key" : 35,
"doc_count" : 1
},
{
"key" : 36,
"doc_count" : 1
}
]
}
}
}

Resources