Using inner_hits inside an aggregation

Using inner_hits inside an aggregation - elasticsearch

I have a collection of documents which all contain an array of nested objects with important data. I want do to an aggregation on these which returns me the first document, last document, and all of the nested objects in that group. I can achieve everything in that list except for the nested objects.
Mapping:
"instances": {
"properties": {
"aggField": {
"type": "string",
"index": "not_analyzed"
},
"id": {
"type": "integer"
},
"nestedObjs": {
"type": "nested",
"properties": {
"key": {
"type": "string",
"index": "not_analyzed"
},
"value": {
"type": "integer"
}
}
},
"timestamp": {
"type": "date",
"format": "dateOptionalTime"
}
}
}
Query:
{
"size" : 0,
"aggs" : {
"agg-buckets" : {
"terms" : {
"field" : "aggField",
"size" : 10
},
"aggs": {
"last-report": {
"top_hits": {
"sort": [
{
"timestamp": {
"order": "desc"
}
}
],
"size": 1
}
},
"first-report": {
"top_hits": {
"sort": [
{
"timestamp": {
"order": "asc"
}
}
],
"size": 1
}
},
"nested-objs": {
"nested": {
"path": "nestedObjs",
"inner_hits": {}
}
}
}
}
}
But this fails with:
Parse Failure [Unexpected token START_OBJECT in [nested-objs].]
If I remove the "inner_hits" field it works ok. But it just gives me the document count and not the documents themselves.
What am I doing wrong?
E: I'm using ES version 1.7.1

Are you sure that inner_hits is allowed in a nested aggregation (as opposed to a nested query)? I suspect that's what's causing the error.

Related

Nested aggregation in nested field?

I am new to elasticsearch and don't know a lot about aggregations but I have this ES6 mapping:
{
"mappings": {
"test": {
"properties": {
"id": {
"type": "integer"
}
"countries": {
"type": "nested",
"properties": {
"global_id": {
"type": "keyword"
},
"name": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
},
"areas": {
"type": "nested",
"properties": {
"global_id": {
"type": "keyword"
},
"name": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"parent_global_id": {
"type": "keyword"
}
}
}
}
}
}
}
How can I get all documents grouped by areas which is then grouped by countries. Also the document has to be returned in full, not just the nested document. Is this even possible ?

1) Aggregation _search query:
first agg by area, with the path as this is nested. Then reverse to the root document and nested agg to country.
{
"size": 0,
"aggs": {
"agg_areas": {
"nested": {
"path": "areas"
},
"aggs": {
"areas_name": {
"terms": {
"field": "areas.name"
},
"aggs": {
"agg_reverse": {
"reverse_nested": {},
"aggs": {
"agg_countries": {
"nested": {
"path": "countries"
},
"aggs": {
"countries_name": {
"terms": {
"field": "countries.name"
}
}
}
}
}
}
}
}
}
}
}
}
2) retrieve documents:
add a tophits inside your aggregation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html
top_hits is slow so you will have to read documentation and adjust size and sort to your context.
...
"terms": {
"field": "areas.name"
},
"aggregations": {
"hits": {
"top_hits": { "size": 100}
}
},
...

Elasticsearch query error in percolate query in ES

I am use the percolate query in ES. But I don't merge bool query and sort query:
My purpose:
Sort prices of added product today.
My existing index
PUT /product-alert
{
"mappings": {
"doctype": {
"properties": {
"product_name": { "type": "text" },
"price": { "type": "double"},
"user_id": { "type": "integer" },
"date" : { "type": "date" }
}
},
"queries": {
"properties": {
"query": {
"type": "percolator"
}
}
}
}
}
I have a following error.
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "mapper [sort] of different type, current_type [text], merged_type [ObjectMapper]"
}
],
"type": "illegal_argument_exception",
"reason": "mapper [sort] of different type, current_type [text], merged_type [ObjectMapper]"
},
"status": 400
}
Elastic query:
PUT /product-alert/queries/1?refresh
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "(product_name:iphone)"
}
},
{
"range": {
"created_at": {
"gte": "2017-05-12",
"lte": "2017-05-12",
"include_lower": true,
"include_upper": true
}
}
}
]
}
},
"from": 0,
"size": 200,
"sort": [
{
"price": {
"order": "asc"
}
},
"_score"
]
}
Where is my fault? Sort working 'sort':'_score' only, but it is mischievous to me.
Thanks in advance

ElasticSearch Advanced Aggregations

I currently have documents indexed with the following structure:
"ProductInteractions": {
"properties": {
"SKU": {
"type": "string"
},
"Name": {
"type": "string"
},
"Sources": {
"properties": {
"Source": {
"type": "string"
},
"Type": {
"type": "string"
},
}
}
}
}
I want to aggregate on results when searching over this type. I initially just wanted the terms from the Source field, which was easy. I just used a terms aggregations for the Source field.
Now I would like to aggregate the Type field as well. However, the types are related to the sources. For example, I could have two Sources like this:
{
"Source": "The Store",
"Type": "Purchase"
}
and
{
"Source": "The Store",
"Type": "Return"
}
I want to show the different types and their counts for each different source. In other words, I would want my response to be something like this:
{
"aggs": {
"Sources": [
{
"Key": "The Store",
"DocCount": 2,
"Aggregations": {
"Types": [
{
"Key": "Purchase",
"DocCount": 1
},
{
"Key": "Return",
"DocCount": 1
}
]
}
}
]
}
}
Is there a way to get these sub-aggregations?

Yes, there is but you need to slightly change your mapping to make your fields `not_analyzed``
"ProductInteractions": {
"properties": {
"SKU": {
"type": "string"
},
"Name": {
"type": "string"
},
"Sources": {
"properties": {
"Source": {
"type": "string",
"index": "not_analyzed"
},
"Type": {
"type": "string",
"index": "not_analyzed"
},
}
}
}
}
Then you can use the following aggregation in order to get what you want:
{
"aggs": {
"sources": {
"terms": {
"field": "Sources.Source"
},
"aggs": {
"types": {
"terms": {
"field": "Sources.Type"
}
}
}
}
}
}

Elasticsearch get the latest documents, grouped by multiple fields

Similarly to Query the latest document of each type on Elasticsearch, I have a set of records in ES. For the sake of the example, lets say it's news as well, each with mapping:
"news": {
"properties": {
"source": { "type": "string", "index": "not_analyzed" },
"headline": { "type": "object" },
"timestamp": { "type": "date", "format": "date_hour_minute_second_millis" },
"user": { "type": "string", "index": "not_analyzed" }
"newspaper": { "type": "string", "index": "not_analyzed"}
}
}
I am able to get the latest 'news article' per user with:
"size": 0,
"aggs": {
"sources" : {
"terms" : {
"field" : "user"
},
"aggs": {
"latest": {
"top_hits": {
"size": 1,
"sort": {
"timestamp": "desc"
}
}
}
}
}
}
However what I am trying to achieve is to get the last article per user, per newspaper and I cannot get it quite right.
e.g.
John, NY Times, Title1
John, BBC, Title2
Jane, NY Times, Title3
etc.

You can add another terms sub-aggregation for the newspaper field like this
"size": 0,
"aggs": {
"sources" : {
"terms" : {
"field" : "user"
},
"aggs": {
"newspaper": {
"terms": {
"field": "newspaper"
},
"aggs": {
"latest": {
"top_hits": {
"size": 1,
"sort": {
"timestamp": "desc"
}
}
}
}
}
}
}
}

Unable to drop result bucket in terms aggregation - Elasticsearch

I have documents in Elasticsearch with the following structure:
"mappings": {
"document": {
"properties": {
"#timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"#version": {
"type": "string"
},
"id_secuencia": {
"type": "long"
},
"event": {
"properties": {
"elapsedTime": {
"type": "double"
},
"requestTime": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"error": {
"properties": {
"errorCode": {
"type": "string",
"index": "not_analyzed"
},
"failureDetail": {
"type": "string"
},
"fault": {
"type": "string"
}
}
},
"file": {
"type": "string",
"index": "not_analyzed"
},
"messageId": {
"type": "string"
},
"request": {
"properties": {
"body": {
"type": "string"
},
"header": {
"type": "string"
}
}
},
"responseTime": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"service": {
"properties": {
"operation": {
"type": "string",
"index": "not_analyzed"
},
"project": {
"type": "string",
"index": "not_analyzed"
},
"proxy": {
"type": "string",
"index": "not_analyzed"
},
"version": {
"type": "string",
"index": "not_analyzed"
}
}
},
"timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"user": {
"type": "string",
"index": "not_analyzed"
}
}
},
"type": {
"type": "string"
}
}
}
}
And I need to retrieve a list of unique values for the field "event.file" (to show in a Kibana Data Table) according to the following criteria:
There is more than one document with the same value for the field "event.file"
All the occurences for that value of "event.file" have resulted in error (field "event.error.errorCode" exists in all documents)
For that purpose the approach I've been testing is the use of terms aggregation, so I can get a list of buckets with all documents for a single file name. What I haven't been able to achieve is to drop some of the resulting buckets in the aggregation according to the previous criteria (if at least one of them does not have an error the bucket should be discarded).
Is this the correct approach or is there a better/easier way to get this type of result?
Thanks a lot.

After trying out several queries I found the following approach (see query below) to be valid for my purpose. The problem I see now is that apparently it is not possible to do this in Kibana, as it has no support for pipeline aggregations (see https://github.com/elastic/kibana/issues/4584).
{
"query": {
"bool": {
"must": [
{
"filtered": {
"filter": {
"exists": {
"field": "event.file"
}
}
}
}
]
}
},
"size": 0,
"aggs": {
"file-events": {
"terms": {
"field": "event.file",
"size": 0,
"min_doc_count": 2
},
"aggs": {
"files": {
"filter": {
"exists": {
"field": "event.file"
}
},
"aggs": {
"totalFiles": {
"value_count": {
"field": "event.file"
}
}
}
},
"errors": {
"filter": {
"exists": {
"field": "event.error.errorCode"
}
},
"aggs": {
"totalErrors": {
"value_count": {
"field": "event.error.errorCode"
}
}
}
},
"exhausted": {
"bucket_selector": {
"buckets_path": {
"total_files":"files>totalFiles",
"total_errors":"errors>totalErrors"
},
"script": "total_errors == total_files"
}
}
}
}
}
}
Again, if I'm missing something feedback will be appreciated :)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Using inner_hits inside an aggregation - elasticsearch

Are you sure that inner_hits is allowed in a nested aggregation (as opposed to a nested query)? I suspect that's what's causing the error.

Related

Nested aggregation in nested field?

Elasticsearch query error in percolate query in ES

ElasticSearch Advanced Aggregations

Elasticsearch get the latest documents, grouped by multiple fields

Unable to drop result bucket in terms aggregation - Elasticsearch

Categories

Resources