Can I pass result of an aggregation to a range aggregation? - elasticsearch

I have the following dataset
[
{
"rating": "10",
"subject": "maths"
},
{
"rating": "9",
"subject": "physics"
},
{
"rating": "10",
"subject": "chemistry"
},
{
"rating": "5",
"subject": "physics"
},
{
"rating": "2",
"subject": "geography"
},
{
"rating": "5",
"subject": "maths"
},
{
"rating": "1",
"subject": "geography"
},
{
"rating": "5",
"subject": "maths"
},
{
"rating": "8",
"subject": "chemistry"
}
]
What I need to do is find the avg rating for each subject, and then calculate the # of subjects in ranges of rating (0-2,2-5,5-8,8-10) with an elastic search query.
The query I have so far creates buckets for each subject calculating the avg of each bucket. But I can't find how to do a range aggregation on the result of the composite aggregation. Is it even possible? Is there an alternative?
Here is my query that buckets the data according to the subject and calculates the avg rating.
GET kibana_sample/_search
{
"size":0,
"aggs" : {
"my_buckets": {
"composite" : {
"sources" : [
{ "subject": { "terms" : { "field": "subject" } } }
]
},
"aggs": {
"avg_rating": {
"avg" : { "field" : "rating" }
}
}
}
}
}
It results in the following.
"aggregations": {
"my_buckets": {
"buckets": [
{
"key": {
"subject": "maths"
},
"doc_count": 3,
"avg_rating": {
"value": 6.66666667
}
},
{
"key": {
"subject": "physics"
},
"doc_count": 2,
"avg_rating": {
"value": 7
}
},
{
"key": {
"subject": "chemistry"
},
"doc_count": 2,
"avg_rating": {
"value": 9
}
},
{
"key": {
"subject": "geography"
},
"doc_count": 2,
"avg_rating": {
"value": 1.5
}
}
]
}
}
It's all good, but now I need to perform a range aggregation on top of this result to get the number of subjects in ranges of ratings
eg:
ratings range: {0-2}: 1 subject, {2-5}: 0 subjects, {5-8}: 2 subjects,
{8-10}: 1 subject

You can use pipeline aggregations in order to concat one aggregation results through more aggregations. Another thing you can do is use scripts in the pipeline, in order to filter only the relevant results.
Check out for the scripts examples here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline.html

Related

Elasticsearch aggregation to retrieve array of values associated with another value

I am working with an Elasticsearch index with data like this:
"_source": {
"article_number": "123456",
"title": "Example item #1",
"attributes": [
{
"key": "Type",
"value": "Bag"
},
{
"key": "Color",
"value": "Grey"
}
]
},
"_source": {
"article_number": "654321",
"title": "Example item #2",
"attributes": [
{
"key": "Type",
"value": "Bag"
},
{
"key": "Color",
"value": "Red"
}
]
}
The goal is to dynamically generate search inputs in a page where there is one search input for each unique value of attributes.key and within that input one value for each corresponding value of attributes.value. So in this case I would want to render a "Type" input offering only the value "Bag" and a "Color" input offering the values "Grey" and "Red."
I am trying to accomplish this with an aggregation that will give me a unique set of all values of attributes.key along with an array of all the values of attributes.value that are associated with each key. An example of a result that would fit what I am hoping for would be this:
{
[
{
"key": "Type",
"values": [{
"name": "Bag",
"doc_count": 2
}]
},
{
"key": "Color",
"values": [{
"name": "Grey",
"doc_count": 1
},
{
"name": "Red",
"doc_count": 1
}]
}
}
I have tried nested and reverse nested aggregations, as well as composite aggregations, but so far without success.
Assuming your index mapping looks like this:
PUT attrs
{
"mappings": {
"properties": {
"attributes": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"value": {
"type": "keyword"
}
}
}
}
}
}
you can achieve the desired results with the following combination of a nested terms aggregation and its sub-aggregation:
POST attrs/_search
{
"size": 0,
"aggs": {
"nested_context": {
"nested": {
"path": "attributes"
},
"aggs": {
"by_keys": {
"terms": {
"field": "attributes.key",
"size": 10
},
"aggs": {
"by_values": {
"terms": {
"field": "attributes.value",
"size": 10
}
}
}
}
}
}
}
}

Elasticsearch 7.8 Nested Aggregation not returning correct data

I have been struggling for a week trying to get correct data out of an Elasticsearch nested aggregtation index. Below is my index mapping and two sample documents inserted. What i want to find is:
Match all documents with the field xforms.sentence.tokens.value equal to 24
Within the matched set of documents do a count of matches grouped by xforms.sentence.tokens.tag where xforms.sentence.tokens.value equal to 24
So as an example in the inserted documents below the output i expect is:
{"JJ": 1, "NN": 1}
{
"_doc": {
"_meta": {},
"_source": {},
"properties": {
"originalText": {
"type": "text"
},
"testDataId": {
"type": "text"
},
"xforms": {
"type": "nested",
"properties": {
"sentence": {
"type": "nested"
},
"predicate": {
"type": "nested"
}
}
},
"corpusId": {
"type": "text"
},
"row": {
"type": "text"
},
"batchId": {
"type": "text"
},
"processor": {
"type": "text"
}
}
}
}
A sample doc inserted is as follows:
{
"_id": "28",
"_source": {
"testDataId": "5e97e9bef033448b893e485baa0fdf15",
"originalText": "Some text with the word 24",
"xforms": [{
"sentence": {
"tokens": [{
"lemma": "Some",
"index": 1,
"after": " ",
"tag": "JJ",
"value": "Some"
},
{
"lemma": "text",
"index": 2,
"after": " ",
"tag": "NN",
"value": "text"
},
{
"lemma": "with",
"index": 3,
"after": " ",
"tag": "NN",
"value": "with"
},
{
"lemma": "the",
"index": 4,
"after": "",
"tag": "CD",
"value": "the"
},
{
"lemma": "word",
"index": 5,
"after": " ",
"tag": "CC",
"value": "word"
},
{
"lemma": "24",
"index": 6,
"after": " ",
"tag": "JJ",
"value": "24"
}
],
"type": "RAW"
},
"originalSentence": "Some text with the word 24 in it",
"id": "e724611d8c024bcb8f0158b60e3df87e"
}]
}
},
{
"_id": "56",
"_source": {
"testDataId": "5e97e9bef033448b893e485baa0fad15",
"originalText": "24 word",
"xforms": [{
"sentence": {
"tokens": [{
"lemma": "24",
"index": 1,
"after": " ",
"tag": "NN",
"value": "24"
},
{
"lemma": "word",
"index": 2,
"after": " ",
"tag": "JJ",
"value": "word"
}
],
"type": "RAW"
},
"originalSentence": "24 word",
"id": "e724611d8c024bcb8f0158b60e3d123"
}]
}
}
Expanding on #Gibbs's answer, #N Kiram you'll need to set the tokens as nested too:
{
"xforms":{
"type":"nested",
"properties":{
"sentence":{
"type":"nested",
"properties":{
"tokens":{ <----
"type":"nested"
}
}
},
"predicate":{
"type":"nested"
}
}
}
}
Then and only then will your aggs yield the correct counts:
{
"aggregations":{
"xforms":{
"doc_count":8,
"inner":{
"doc_count":2,
"tag_count":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[
{
"key":"JJ",
"doc_count":1
},
{
"key":"NN",
"doc_count":1
}
]
}
}
}
}
}
Side note: you'll have to reindex in order for the changed mapping to apply.
{
"aggs": {
"xforms": {
"nested": { //Nested aggregation
"path": "xforms.sentence"
},
"aggs": {
"inner": { //Counting only within the matching doc
"filter": {
"bool": {
"filter": { //Filtering docs with value=24
"terms": {
"xforms.sentence.tokens.value": [
"24"
]
}
}
}
},
"aggs" : {
"tag_count":{ //On filtered doc, doing terms aggregation on tag's keyword version as tag is of type text
"terms":{
"field":"xforms.sentence.tokens.tag.keyword"
}
}
}
}
}
}
}
}
It provides the below output
"aggregations": {
"xforms": {
"doc_count": 2,
"inner": {
"doc_count": 2,
"tag_count": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "JJ",
"doc_count": 2
},
{
"key": "NN",
"doc_count": 2
},
{
"key": "CC",
"doc_count": 1
},
{
"key": "CD",
"doc_count": 1
}
]
}
}
}
}

Doing a Range Query over particular Nested Document

I have a document structure like this. For this below two documents, we have nested documents called interaction info. I just need to get only the documents that have title duration and their value is greater than 60
{
"key": "f07ff9ba-36e4-482a-9c1c-d888e89f926e",
"interactionInfo": [
{
"title": "duration",
"value": "11"
},
{
"title": "timetaken",
"value": "9"
},
{
"title": "talk_time",
"value": "145"
}
]
},
{
"key": "f07ff9ba-36e4-482a-9c1c-d888e89f926e",
"interactionInfo": [
{
"title": "duration",
"value": "120"
},
{
"title": "timetaken",
"value": "9"
},
{
"title": "talk_time",
"value": "60"
}
]
}
]
Is it possible to get only the document that has title: duration and their value is greater than 60.Value Property in the nested Document is text and keyword.
There are few basic mistakes in your solution, in order to utilize the range query(ie find a document which has more than 60 value, you need to store them as an integer in your case).
Also please refer this official guide which has a similar example.
Let me show you a step-by-step example on how to do it.
Index def
{
"mappings" :{
"properties" :{
"interactionInfo" :{
"type" : "nested"
},
"key" : {
"type" : "keyword"
}
}
}
}
Index sample docs
{
"key": "f07ff9ba-36e4-482a-9c1c-d888e89f926e",
"interactionInfo": [
{
"title": "duration",
"value": 120. --> note, not using `""` double quotes which would store them as integer
},
{
"title": "timetaken",
"value": 9
},
{
"title": "talk_time",
"value": 60
}
]
}
{
"key": "f07ff9ba-36e4-482a-9c1c-d888e89f926e",
"interactionInfo": [
{
"title": "duration",
"value": 11
},
{
"title": "timetaken",
"value": 9
},
{
"title": "talk_time",
"value": 145
}
]
}
Search query
{
"query": {
"nested": {
"path": "interactionInfo",
"query": {
"bool": {
"must": [
{
"match": {
"interactionInfo.title": "duration"
}
},
{
"range": {
"interactionInfo.value": {
"gt": 60
}
}
}
]
}
}
}
}
}
And your expected search result
"hits": [
{
"_index": "nestedsoint",
"_type": "_doc",
"_id": "2",
"_score": 2.0296195,
"_source": {
"key": "f07ff9ba-36e4-482a-9c1c-d888e89f926e",
"interactionInfo": [
{
"title": "duration",
"value": 120
},
{
"title": "timetaken",
"value": 9
},
{
"title": "talk_time",
"value": 60
}
]
}
}
]

Is there any solution with elasticsearch parent-child join

I have an es settings like following:
PUT /test
{
"mappings": {
"doc": {
"properties": {
"status": {
"type": "keyword"
},
"counting": {
"type": "integer"
},
"join": {
"type": "join",
"relations": {
"vsim": ["pool", "package"]
}
},
"poolId": {
"type": "keyword"
},
"packageId": {
"type": "keyword"
},
"countries": {
"type": "keyword"
},
"vId": {
"type": "keyword"
}
}
}
}}
Then add data:
// add vsim
PUT /test/doc/doc1
{"counting":6, "join": {"name": "vsim"}, "content": "1", "status": "disabled"}
PUT /test/doc/doc2
{"counting":5,"join": {"name": "vsim"}, "content": "2", "status": "disabled"}
PUT /test/doc/doc3
{"counting":5,"join": {"name": "vsim"}, "content": "2", "status": "enabled"}
// add package
PUT /test/doc/ner2?routing=doc2
{"join": {"name": "package", "parent": "doc2"}, "countries":["CN", "UK"]}
PUT test/doc/ner12?routing=doc1
{"join": {"name": "package", "parent": "doc1"}, "countries":["CN", "US"]}
PUT /test/doc/ner11?routing=doc1
{"join":{"name": "package", "parent": "doc1"}, "countries":["US", "KR"]}
PUT /test/doc/ner13?routing=doc3
{"join":{"name": "package", "parent": "doc3"}, "countries":["UK", "AU"]}
// add pool
PUT /test/doc/ner21?routing=doc1
{"join": {"name": "pool", "parent": "doc1"}, "poolId": "MER"}
PUT /test/doc/ner22?routing=doc2
{"join": {"name": "pool", "parent": "doc2"}, "poolId": "MER"}
PUT /test/doc/ner23?routing=doc2
{"join": {"name": "pool", "parent": "doc2"}, "poolId": "NER"}
and then I want to count the counting group by the status(vsim), poolId(pool) and countries(package), the expect result like:
disabled-MER-CN: 3
disabled-MER-US: 3
enabled-MR-CN: 1
... and so on.
I'm a new player for elasticsearch, and I have learnt the document like
https://www.elastic.co/guide/en/elasticsearch/reference/current/joining-queries.html
and
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-children-aggregation.html
but still have no idea to implement this aggregation query, PLEASE give me some suggestion, thanks!
If I followed your structure of the documents - you have types pool and package on the same level (they are siblings) - I wasn't able to achieve exactly your expected results. I also highly doubt that it's possible with those types being siblings.
However, it's still possible to slice per one field in your doc (status) and later separately slice both by poolId and countries with a query like this:
{
"aggs": {
"status-aggs": {
"terms": {
"field": "status",
"size": 10
},
"aggs": {
"to-pool": {
"children": {
"type": "pool"
},
"aggs": {
"top-poolid": {
"terms": {
"field": "poolId",
"size": 10
}
}
}
},
"to-package": {
"children": {
"type": "package"
},
"aggs": {
"top-countries": {
"terms": {
"field": "countries",
"size": 10
}
}
}
}
}
}
}
}
with a response from Elasticsearch like this (I've omitted some part of json for readability):
{
"status-aggs": {
"buckets": [
{
"key": "disabled",
"doc_count": 2,
"to-pool": {
"doc_count": 3,
"top-poolid": {
"buckets": [
{
"key": "MER",
"doc_count": 2
},
{
"key": "NER",
"doc_count": 1
}
]
}
},
"to-package": {
"doc_count": 3,
"top-countries": {
"buckets": [
{
"key": "CN",
"doc_count": 2
},
{
"key": "US",
"doc_count": 2
},
{
"key": "KR",
"doc_count": 1
},
{
"key": "UK",
"doc_count": 1
}
]
}
}
},
{
"key": "enabled",
"doc_count": 1,
"to-pool": {
"doc_count": 0,
"top-poolid": {
"buckets": []
}
},
"to-package": {
"doc_count": 1,
"top-countries": {
"buckets": [
{
"key": "AU",
"doc_count": 1
},
{
"key": "UK",
"doc_count": 1
}
]
}
}
}
]
}
}

Elastic Search multi-value field aggregation

My indexed documents have a schema:
{
...
'authors': [{'first name': 'John', 'last name': 'Smith'},
{'first name': 'Mark', 'last name': 'Spencer'}]
...
}
I would like to search them and aggregate by the individual authors, so get a list with top authors which occurred in my hits. Terms aggregation seems to be a match for my needs, but I'm not able to get it working for field with a list of values. Any help?
You will probably want to use a nested type, then you can use a nested aggregation on the author names.
As an example, I set up a simple index like this:
PUT /test_index
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"doc": {
"properties": {
"title": {
"type": "string"
},
"authors": {
"type": "nested",
"properties": {
"first_name": {
"type": "string"
},
"last_name": {
"type": "string"
}
}
}
}
}
}
}
Then added a couple of docs:
PUT /test_index/doc/1
{
"title": "Book 1",
"authors": [
{
"first_name": "John",
"last_name": "Smith"
},
{
"first_name": "Mark",
"last_name": "Spencer"
}
]
}
PUT /test_index/doc/2
{
"title": "Book 2",
"authors": [
{
"first_name": "Ben",
"last_name": "Jones"
},
{
"first_name": "Tom",
"last_name": "Lawrence"
}
]
}
Then I can get the list of (analyzed) author last names with:
POST /test_index/_search?search_type=count
{
"aggs": {
"nested_authors": {
"nested": {
"path": "authors"
},
"aggs": {
"author_last_names": {
"terms": {
"field": "authors.last_name"
}
}
}
}
}
}
...
{
"took": 71,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"nested_authors": {
"doc_count": 4,
"author_last_names": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "jones",
"doc_count": 1
},
{
"key": "lawrence",
"doc_count": 1
},
{
"key": "smith",
"doc_count": 1
},
{
"key": "spencer",
"doc_count": 1
}
]
}
}
}
}
Here is the code I used:
http://sense.qbox.io/gist/ca94cc11a12f8e4fed5c62c52966128b9a6f58de

Resources