Elasticsearch aggregation on nested objects - elasticsearch

I have an document with the following mappings:
{
"some_doc_name": {
"mappings": {
"_doc": {
"properties": {
"stages": {
"properties": {
"name": {
"type": "text"
},
"durationMillis": {
"type": "long"
}
}
}
}
}
}
}
}
And I would like to have an aggregation like: "The average duration of the stages which name contains the SCM token"
I tried something like:
{
"aggs": {
"scm_stage": {
"filter": {
"bool": {
"should": [{
"match_phrase": {
"stages.name": "SCM"
}
}]
}
},
"aggs" : {
"avg_duration": {
"avg": {
"field": "stages.durationMillis"
}
}
}
}
}
}
But that's giving me the average of all stages for all documents that contain at least one stage with the SCM token. Any advice on how to get this aggregation right?

Answering my own question thanks to the help of val
My mappings file was missing the "type": "nested", something like:
...
"stages": {
"type": "nested",
"properties": {
"id": {
"type": "keyword",
"ignore_above": 256
},
...
Then I can get my aggregation working with something like this:
{
"size": 0,
"query": {
"nested": {
"path": "stages",
"query": {
"match": {
"stages.name": "scm"
}
}
}
},
"aggs": {
"stages": {
"nested": {
"path": "stages"
},
"aggs": {
"stages-filter": {
"filter": {
"terms": {
"stages.name": [
"scm"
]
}
},
"aggs": {
"avg_duration": {
"avg": {
"field": "stages.durationMillis"
}
}
}
}
}
}
}
}

Related

Elasticsearch nested query with aggregation using nested term doesn't return any bucket

I have an ES index with this mapping:
{
"_doc": {
"dynamic": "false",
"properties": {
"original": {
"properties":{
"id": {
"type": "keyword"
},
"purchaseStatus": {
"type": "keyword"
},
"marketCode": {
"type": "keyword"
},
"salesProfiles": {
"type": "nested",
"properties": {
"marketCode": {
"type": "keyword"
},
"purchaseStatus": {
"type": "keyword"
}
}
}
}
},
"recommended": {
"properties":{
"id": {
"type": "keyword"
},
"purchaseStatus": {
"type": "keyword"
},
"marketCode": {
"type": "keyword"
},
"salesProfiles": {
"type": "nested",
"properties": {
"marketCode": {
"type": "keyword"
},
"purchaseStatus": {
"type": "keyword"
}
}
}
}
},
"distance": {
"type": "double"
},
"rank": {
"type": "double"
},
"source": {
"properties": {
"application": {
"type": "keyword"
},
"platform": {
"type": "keyword"
}
}
},
"timestamp": {
"properties": {
"createdAt": {
"type": "date"
},
"updatedAt": {
"type": "date"
}
}
}
}
},
"_default_": {
"dynamic": "false"
}
}
and I need to obtain the recommended docs with salesProfiles.marketCode equal to original.marketCode but my query doesn't return any buckets:
GET index/_search
{
"aggs": {
"similarities": {
"filter": {
"bool": {
"must": [
{
"term": {
"original.storefrontId": "12345"
}
},
{
"nested": {
"path": "recommended.salesProfiles",
"query": {
"bool": {
"must": [
{
"match": {
"recommended.salesProfiles.purchaseStatus": "PAID"
}
}
]
}
}
}
}
]
}
},
"aggs": {
"markets": {
"nested": {
"path": "recommended.salesProfiles"
},
"aggs": {
"recommendedMarket": {
"terms": {
"field": "recommended.salesProfiles.marketCode",
"size": 100
}
}
}
}
}
}
},
"explain": false
}
Any suggestion would be really appreciated. Thanks in advance!
Its hard to debug this without any example docs, but I think this might work
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"original.storefrontId": "12345"
}
},
{
"nested": {
"path": "recommended.salesProfiles",
"query": {
"bool": {
"must": [
{
"match": {
"recommended.salesProfiles.purchaseStatus": "PAID"
}
}
]
}
}
}
}
]
}
},
"aggs": {
"Profiles": {
"nested": {
"path": "recommended.salesProfiles"
},
"aggs": {
"by_term": {
"terms": {
"field": "recommended.salesProfiles.marketCode",
"size": 100
}
}
}
}
}
}
I don't think you can use "nested" under the filter agg without being under a nested aggregation, so I believe that's why you didn't get any docs.
I basically moved all the filtering to the query and just aggregated the terms later

Find the count of nested.nested Elastic Search documents

Is it possible using the Elastic Search _count API and having the following abbreviated ES template to find the count of sponsorships for all the campaigns by brandId?
sponsorshipSets and sponsorships are optional so it can be null.
{
"index_patterns": "campaigns*",
"order": 4,
"version": 4,
"aliases": {
"campaigns": {
}
},
"settings": {
"number_of_shards": 5
},
"mappings": {
"dynamic": "false",
"properties": {
"brandId": {
"type": "keyword"
},
"sponsorshipSets": {
"type": "nested",
"properties": {
"id": {
"type": "keyword"
},
"sponsorships": {
"type": "nested",
"properties": {
"id": {
"type": "keyword"
}
}
}
}
}
}
filter aggregation can be used to fetch docs with certain brand Id. Two Nested aggregations to point to sponsorship and value_count aggregation to get the count.
Query
{
"aggs": {
"selected_brand": {
"filter": {
"term": {
"brandId": "1"
}
}
},
"sponsorshipSets": {
"nested": {
"path": "sponsorshipSets"
},
"aggs": {
"sponsorships": {
"nested": {
"path": "sponsorshipSets.sponsorships"
},
"aggs": {
"count": {
"value_count": {
"field": "sponsorshipSets.sponsorships.id"
}
}
}
}
}
}
}
}
I found a solution without using Aggregations, it seems more accurate from the above and I can use the _count API.
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "sponsorshipSets.sponsorships",
"query": {
"bool": {
"filter": {
"exists": {
"field": "sponsorshipSets.sponsorships"
}
}
}
}
}
},
{
"term": {
"brandId": "b1d28821-3730-4266-8f55-eb69596004fb"
}
}
]
}
}
}

Filter inside a subagregation on elastic search

I'm try to extract aggregated data, but I'm a little lost when I want to further filter a set of documents. Getting the color seems ok, but when I want to aggregate the categories with some colors filter the query fail. What am I doing wrong on this query?
This is the query I already have:
GET/my_index/_search
{
"_source": false,
"aggs": {
"global": {
"global": {
},
"aggs": {
"all_products": {
"nested": {
"path": "simple"
},
"aggs": {
"filter_top": {
"filter": {
"bool": {
"must": [
{
"match": {
"simple.compound_words": {
"query": "tisch",
"operator": "AND"
}
}
}
]
}
},
"aggs": {
"filter_merged": {
"aggs": {
"filter": {
"bool": {
"must": [
{
"terms": {
"simple.filter_color": [
"green",
"red"
]
}
}
]
}
},
"aggs": {
"filter_category": {
"terms": {
"field": "simple.filter_category"
}
}
}
}
},
"filter_color": {
"terms": {
"field": "simple.filter_color"
}
}
}
}
}
}
}
}
}
}
This is the relevant part of the index mappings.
{
"my_index": {
"mappings": {
"_doc": {
"properties": {
"simple": {
"type": "nested",
"properties": {
"compound_words": {
"type": "text",
"analyzer": "GermanCompoundWordsAnalyzer"
},
"filter_category": {
"type": "keyword"
},
"filter_color": {
"type": "keyword"
}
}
}
}
}
}
}
}
Thanks for your support.

Elasticsearch - return aggregation to match a specific value?

Using Elasticsearch 2, is it possible to return an aggregation where a document category matches a specific field value? For example, I want to get all the categories where categories.type = "application".
My mapping looks like this:
"mappings": {
"products": {
"_all": {
"enabled": true
},
"properties": {
"title": {
"type": "string"
},
"categories": {
"type":"nested",
"properties": {
"type": {
"type": "string",
"index": "not_analyzed"
},
"name": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
My query looks like this, which returns all category types, but I want to filter just the ones where categories.type = "application".
{
"query":{
"multi_match": {
"query": "Sound",
"fields": [
"title"
]
}
},
"aggs":{
"Applications": {
"nested": {
"path": "categories"
},
"aggs": {
"meta": {
"terms": {
"field": "categories.type"
},
"aggs": {
"name": {
"terms": {
"field": "categories.name"
}
}
}
}
}
}
}
}
You can use aggregation filter if I understand correctly:
{
size : 50,
"query":{
"multi_match": {
"query": "Sound",
"fields": [
"title"
]
}
},
"aggs":{
"Applications": {
"nested": {
"path": "categories"
},
"aggs": {
"meta": {
"filter" : {
"term" : {
"categories.type" : "application"
}
},
"aggs": {
"name": {
"terms": {
"field": "categories.name"
}
}
}
}
}
}
}
}
Hope that helpes.
You just need to replace "include": ".*" to "include": "application"
{
"query":{
"multi_match": {
"query": "Sound",
"fields": [
"title"
]
}
},
"aggs":{
"Applications": {
"nested": {
"path": "categories"
},
"aggs": {
"meta": {
"terms": {
"field": "categories.type"
, "include": "application"
},
"aggs": {
"name": {
"terms": {
"field": "categories.name"
}
}
}
}
}
}
}
}

An Elasticsearch filter to determine the absence of a value

I have a document that has students and grades for each student. It looks something like this:
"name": "bill",
"year": 2015,
"grades": [
{"subject": "math", grade: "A"},
{"subject": "english", grade: "B"}
], ...
I'm looking for query filter(s) that can give me:
a list of students who have studied 'math', and
a list of students who have not studied 'math'.
I'm thinking that an exists filter should do it, but I'm struggling to get my head around it.
It's a stylised example but the mappings are something like this:
"mappings": {
"student": {
"properties": {
"name": {
"type": "string"
},
"grades": {
"type": "nested",
"properties": {
"subject": {
"type": "string"
},
"grade": {
"type": "string"
}
}
}
}
}
}
You need to change a bit your mapping and, depending on the your needs, I'd suggest aggregations.
First, your nested object needs to be "include_in_parent": true so that you can easily do the not studied 'math' part:
PUT /grades
{
"mappings": {
"student": {
"properties": {
"name": {
"type": "string"
},
"grades": {
"type": "nested",
"include_in_parent": true,
"properties": {
"subject": {
"type": "string"
},
"grade": {
"type": "string"
}
}
}
}
}
}
}
And the full query, using aggregations:
GET /grades/student/_search?search_type=count
{
"aggs": {
"studying_math": {
"filter": {
"nested": {
"path": "grades",
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"grades.subject": "math"
}
}
]
}
}
}
}
}
},
"aggs": {
"top_10": {
"top_hits": {
"size": 10
}
}
}
},
"not_studying_math": {
"filter": {
"bool": {
"must_not": [
{
"term": {
"grades.subject": "math"
}
}
]
}
},
"aggs": {
"top_10": {
"top_hits": {
"size": 10
}
}
}
}
}
}
A term filter should do just fine. For the inverse query, just negate it with a not filter:
"query":
{
"filtered" : {
"query": {
"match_all": {}
},
"filter" : {
"term": {
"grades.subject": "math"
}
}
}
}
And for the ones who did not study math:
"query":
{
"filtered" : {
"query": {
"match_all": {}
},
"filter" : {
"not": {
"filter": {
"term": {
"grades.subject": "math"
}
}
}
}
}
}

Resources