I have a collection of documents which all contain an array of nested objects with important data. I want do to an aggregation on these which returns me the first document, last document, and all of the nested objects in that group. I can achieve everything in that list except for the nested objects.
Mapping:
"instances": {
"properties": {
"aggField": {
"type": "string",
"index": "not_analyzed"
},
"id": {
"type": "integer"
},
"nestedObjs": {
"type": "nested",
"properties": {
"key": {
"type": "string",
"index": "not_analyzed"
},
"value": {
"type": "integer"
}
}
},
"timestamp": {
"type": "date",
"format": "dateOptionalTime"
}
}
}
Query:
{
"size" : 0,
"aggs" : {
"agg-buckets" : {
"terms" : {
"field" : "aggField",
"size" : 10
},
"aggs": {
"last-report": {
"top_hits": {
"sort": [
{
"timestamp": {
"order": "desc"
}
}
],
"size": 1
}
},
"first-report": {
"top_hits": {
"sort": [
{
"timestamp": {
"order": "asc"
}
}
],
"size": 1
}
},
"nested-objs": {
"nested": {
"path": "nestedObjs",
"inner_hits": {}
}
}
}
}
}
But this fails with:
Parse Failure [Unexpected token START_OBJECT in [nested-objs].]
If I remove the "inner_hits" field it works ok. But it just gives me the document count and not the documents themselves.
What am I doing wrong?
E: I'm using ES version 1.7.1
Are you sure that inner_hits is allowed in a nested aggregation (as opposed to a nested query)? I suspect that's what's causing the error.
Related
I am new to elasticsearch and don't know a lot about aggregations but I have this ES6 mapping:
{
"mappings": {
"test": {
"properties": {
"id": {
"type": "integer"
}
"countries": {
"type": "nested",
"properties": {
"global_id": {
"type": "keyword"
},
"name": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
},
"areas": {
"type": "nested",
"properties": {
"global_id": {
"type": "keyword"
},
"name": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"parent_global_id": {
"type": "keyword"
}
}
}
}
}
}
}
How can I get all documents grouped by areas which is then grouped by countries. Also the document has to be returned in full, not just the nested document. Is this even possible ?
1) Aggregation _search query:
first agg by area, with the path as this is nested. Then reverse to the root document and nested agg to country.
{
"size": 0,
"aggs": {
"agg_areas": {
"nested": {
"path": "areas"
},
"aggs": {
"areas_name": {
"terms": {
"field": "areas.name"
},
"aggs": {
"agg_reverse": {
"reverse_nested": {},
"aggs": {
"agg_countries": {
"nested": {
"path": "countries"
},
"aggs": {
"countries_name": {
"terms": {
"field": "countries.name"
}
}
}
}
}
}
}
}
}
}
}
}
2) retrieve documents:
add a tophits inside your aggregation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html
top_hits is slow so you will have to read documentation and adjust size and sort to your context.
...
"terms": {
"field": "areas.name"
},
"aggregations": {
"hits": {
"top_hits": { "size": 100}
}
},
...
I am use the percolate query in ES. But I don't merge bool query and sort query:
My purpose:
Sort prices of added product today.
My existing index
PUT /product-alert
{
"mappings": {
"doctype": {
"properties": {
"product_name": { "type": "text" },
"price": { "type": "double"},
"user_id": { "type": "integer" },
"date" : { "type": "date" }
}
},
"queries": {
"properties": {
"query": {
"type": "percolator"
}
}
}
}
}
I have a following error.
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "mapper [sort] of different type, current_type [text], merged_type [ObjectMapper]"
}
],
"type": "illegal_argument_exception",
"reason": "mapper [sort] of different type, current_type [text], merged_type [ObjectMapper]"
},
"status": 400
}
Elastic query:
PUT /product-alert/queries/1?refresh
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "(product_name:iphone)"
}
},
{
"range": {
"created_at": {
"gte": "2017-05-12",
"lte": "2017-05-12",
"include_lower": true,
"include_upper": true
}
}
}
]
}
},
"from": 0,
"size": 200,
"sort": [
{
"price": {
"order": "asc"
}
},
"_score"
]
}
Where is my fault? Sort working 'sort':'_score' only, but it is mischievous to me.
Thanks in advance
I currently have documents indexed with the following structure:
"ProductInteractions": {
"properties": {
"SKU": {
"type": "string"
},
"Name": {
"type": "string"
},
"Sources": {
"properties": {
"Source": {
"type": "string"
},
"Type": {
"type": "string"
},
}
}
}
}
I want to aggregate on results when searching over this type. I initially just wanted the terms from the Source field, which was easy. I just used a terms aggregations for the Source field.
Now I would like to aggregate the Type field as well. However, the types are related to the sources. For example, I could have two Sources like this:
{
"Source": "The Store",
"Type": "Purchase"
}
and
{
"Source": "The Store",
"Type": "Return"
}
I want to show the different types and their counts for each different source. In other words, I would want my response to be something like this:
{
"aggs": {
"Sources": [
{
"Key": "The Store",
"DocCount": 2,
"Aggregations": {
"Types": [
{
"Key": "Purchase",
"DocCount": 1
},
{
"Key": "Return",
"DocCount": 1
}
]
}
}
]
}
}
Is there a way to get these sub-aggregations?
Yes, there is but you need to slightly change your mapping to make your fields `not_analyzed``
"ProductInteractions": {
"properties": {
"SKU": {
"type": "string"
},
"Name": {
"type": "string"
},
"Sources": {
"properties": {
"Source": {
"type": "string",
"index": "not_analyzed"
},
"Type": {
"type": "string",
"index": "not_analyzed"
},
}
}
}
}
Then you can use the following aggregation in order to get what you want:
{
"aggs": {
"sources": {
"terms": {
"field": "Sources.Source"
},
"aggs": {
"types": {
"terms": {
"field": "Sources.Type"
}
}
}
}
}
}
Similarly to Query the latest document of each type on Elasticsearch, I have a set of records in ES. For the sake of the example, lets say it's news as well, each with mapping:
"news": {
"properties": {
"source": { "type": "string", "index": "not_analyzed" },
"headline": { "type": "object" },
"timestamp": { "type": "date", "format": "date_hour_minute_second_millis" },
"user": { "type": "string", "index": "not_analyzed" }
"newspaper": { "type": "string", "index": "not_analyzed"}
}
}
I am able to get the latest 'news article' per user with:
"size": 0,
"aggs": {
"sources" : {
"terms" : {
"field" : "user"
},
"aggs": {
"latest": {
"top_hits": {
"size": 1,
"sort": {
"timestamp": "desc"
}
}
}
}
}
}
However what I am trying to achieve is to get the last article per user, per newspaper and I cannot get it quite right.
e.g.
John, NY Times, Title1
John, BBC, Title2
Jane, NY Times, Title3
etc.
You can add another terms sub-aggregation for the newspaper field like this
"size": 0,
"aggs": {
"sources" : {
"terms" : {
"field" : "user"
},
"aggs": {
"newspaper": {
"terms": {
"field": "newspaper"
},
"aggs": {
"latest": {
"top_hits": {
"size": 1,
"sort": {
"timestamp": "desc"
}
}
}
}
}
}
}
}
I have documents in Elasticsearch with the following structure:
"mappings": {
"document": {
"properties": {
"#timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"#version": {
"type": "string"
},
"id_secuencia": {
"type": "long"
},
"event": {
"properties": {
"elapsedTime": {
"type": "double"
},
"requestTime": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"error": {
"properties": {
"errorCode": {
"type": "string",
"index": "not_analyzed"
},
"failureDetail": {
"type": "string"
},
"fault": {
"type": "string"
}
}
},
"file": {
"type": "string",
"index": "not_analyzed"
},
"messageId": {
"type": "string"
},
"request": {
"properties": {
"body": {
"type": "string"
},
"header": {
"type": "string"
}
}
},
"responseTime": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"service": {
"properties": {
"operation": {
"type": "string",
"index": "not_analyzed"
},
"project": {
"type": "string",
"index": "not_analyzed"
},
"proxy": {
"type": "string",
"index": "not_analyzed"
},
"version": {
"type": "string",
"index": "not_analyzed"
}
}
},
"timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"user": {
"type": "string",
"index": "not_analyzed"
}
}
},
"type": {
"type": "string"
}
}
}
}
And I need to retrieve a list of unique values for the field "event.file" (to show in a Kibana Data Table) according to the following criteria:
There is more than one document with the same value for the field "event.file"
All the occurences for that value of "event.file" have resulted in error (field "event.error.errorCode" exists in all documents)
For that purpose the approach I've been testing is the use of terms aggregation, so I can get a list of buckets with all documents for a single file name. What I haven't been able to achieve is to drop some of the resulting buckets in the aggregation according to the previous criteria (if at least one of them does not have an error the bucket should be discarded).
Is this the correct approach or is there a better/easier way to get this type of result?
Thanks a lot.
After trying out several queries I found the following approach (see query below) to be valid for my purpose. The problem I see now is that apparently it is not possible to do this in Kibana, as it has no support for pipeline aggregations (see https://github.com/elastic/kibana/issues/4584).
{
"query": {
"bool": {
"must": [
{
"filtered": {
"filter": {
"exists": {
"field": "event.file"
}
}
}
}
]
}
},
"size": 0,
"aggs": {
"file-events": {
"terms": {
"field": "event.file",
"size": 0,
"min_doc_count": 2
},
"aggs": {
"files": {
"filter": {
"exists": {
"field": "event.file"
}
},
"aggs": {
"totalFiles": {
"value_count": {
"field": "event.file"
}
}
}
},
"errors": {
"filter": {
"exists": {
"field": "event.error.errorCode"
}
},
"aggs": {
"totalErrors": {
"value_count": {
"field": "event.error.errorCode"
}
}
}
},
"exhausted": {
"bucket_selector": {
"buckets_path": {
"total_files":"files>totalFiles",
"total_errors":"errors>totalErrors"
},
"script": "total_errors == total_files"
}
}
}
}
}
}
Again, if I'm missing something feedback will be appreciated :)