Elastic Search Nested Aggregation size - elasticsearch

I have a nested aggregation that is only returning 10 results. I want it to return 1000 results. However, I'm not sure where to to specify the size. My mapping looks like (its in YAML but is processed to json, dont worry about that)
mappings:
datainfo:
properties:
filterValues:
type: string
metadata:
properties:
isPrimary:
type: boolean
name:
index: not_analyzed
type: string
source:
enabled: false
type: object
type:
index: not_analyzed
type: string
val:
index: not_analyzed
type: string
type: nested
source:
enabled: false
type: object
title:
type: string
My query looks something like
{
"query": "<some query>",
"aggs": {
"series": {
"nested": { "path": "metadata" },
"aggs": {
"val": {
"terms": { "field": "metadata.val" },
"aggs": {
"type": {
"terms": { "field": "metadata.name" }
}
}
}
}
}
}
}
Where do I put a "size" field in order to make this return X results? It currently only returns 10

To specify the number of results of the query, you can use size, or size with range:
{
"query": "<some query>",
"size": 1000,
"aggs": {
"series": {
"nested": { "path": "metadata" },
"aggs": {
"val": {
"terms": { "field": "metadata.val" },
"aggs": {
"type": {
"terms": { "field": "metadata.name" }
}
}
}
}
}
}
}
To specify the number of results on the aggregation buckets you can use Top Hits Aggregation (example from link):
{
"aggs": {
"top-tags": {
"terms": {
"field": "tags",
"size": 3
},
"aggs": {
"top_tag_hits": {
"top_hits": {
"sort": [
{
"last_activity_date": {
"order": "desc"
}
}
],
"_source": {
"includes": [
"title"
]
},
"size" : 100
}
}
}
}
}
}
One recommended approach if you only need the aggregated results is to specify a size of 0 for the query results which eliminates the first fetch and consequently performs better.

Related

Elastic: How i can filter aggregation buckets by string key

i have some data from one provider - very big structured JSON data:
"mappings": {
"properties": {
"field_a": { .. },
"field_b": { .. },
"field_c": { .. },
"field_d": {
"properties": {
"subfield_a": {...},
"subfield_b": {...},
"subfield_c": {...},
"subfield_d": {...},
"subfield_e": {
"properties": {
"myfield": {
"type": "keyword"
},
"another_a": {...},
"another_b": {...},
}
}
}
}
}
}
subfield_e is array of objects contains many fields with my interest "myfield".
I need aggregation with only fields "myfield" what contain some string.
So, i now do this with wrong (but logic result):
GET /index/_search
{
"query": {
"wildcard": {
"field_d.subfield_e.myfield": "*string*"
}
},
"aggs": {
"interest": {
"terms": {
"field": "field_d.subfield_e.myfield",
"size": 10
}
}
},
"size": 0
}
The problem of this query is, that query will choose all documents where array of objects "esubfield_e" contain object myfield with string and under these all documents made aggregation. So, finally i get results with all "myfields" under these documents and not only myfields containing string.
I was try make a bucket_selector aggregation after my main aggregation, but i got error: "buckets_path must reference either a number value or a single value numeric metric aggregation, got: [String] at aggregation [_key]"
My code is inspired by: Filter Elasticsearch Aggregation by Bucket Key Value and looks now:
GET /index/_search
{
"query": {
"wildcard": {
"field_d.subfield_e.myfield": "*string*"
}
},
"aggs": {
"interest": {
"terms": {
"field": "field_d.subfield_e.myfield",
"size": 10
}
},
"aggs": {
"buckets": {
"bucket_selector": {
"buckets_path": {
"key": "_key"
},
"script": "params.key.contains('string')"
}
}
}
}
},
"size": 0
}
So, how i can filter a aggregations buckets (term aggs) by their string key ?
I solved it by switching subfield_e to nested object instead of undefined array and I reimported all data to this new mapping.
Current mapping looks as:
"mappings": {
"properties": {
"field_a": { .. },
"field_b": { .. },
"field_c": { .. },
"field_d": {
"properties": {
"subfield_a": {...},
"subfield_b": {...},
"subfield_c": {...},
"subfield_d": {...},
"subfield_e": {
"type": "nested" <======= This line added
"properties": {
"myfield": {
"type": "keyword"
},
"another_a": {...},
"another_b": {...},
}
}
}
}
}
}
And final working query is:
GET /index/_search
{
"query": {
"nested": {
"path": "field_d.subfield_e",
"query": {
"wildcard": {
"field_d.subfield_e.myfield": {
"value": "*string*"
}
}
}
}
},
"aggs": {
"agg": {
"nested": {
"path": "field_d.subfield_e"
},
"aggs": {
"inner": {
"filter": {
"wildcard": {
"field_d.subfield_e.myfield": "*string*"
}
}, "aggs": {
"interest": {
"terms": {
"field": "field_d.subfield_e.myfield",
"size": 10
}
}
}
}
}
}
},
"size": 0
}
The speed of this query is in my case much more better than using include/exclude in terms aggregation.

ELASTICSEARCH - How can i get an aggregation in a boolean field?

I want to get aggregation in a boolean field, but the out is a error:
query:
"""
{
"size": 0,
"aggs": {
"RecentCreated": {
"terms": {
"field": "created_at.keyword",
"order": {
"_key": "desc"
},
"size": 1
},
"aggs": {
"nestedData": {
"nested": {
"path": "data.add.serv"
},
"aggs": {
"NAME": {
"terms": {
"field": "data.add.serv.beast"
, "include": true
}
}
}
}
}
}
}
}
"""
error:
"type" : "x_content_parse_exception",
"reason" : "[terms] include doesn't support values of type: VALUE_BOOLEAN"
I have been reading that it is possible to transform the true values ​​into 1 through script to get count in the aggregation, but I cannot get the result of the true values
How could I get a count of the boolean field with true value?
I think what you might want to do is use a filter aggregation over your nested document rather than a terms aggregation. So in short change this bit of your query:
"aggs": {
"NAME": {
"terms": {
"field": "data.add.serv.beast",
"include": true
}
}
}
to
"aggs": {
"NAME": {
"filter": {
"term": {
"data.add.serv.beast": true
}
}
}
}
I'm not too familiar with nested aggregations, so there might still be an error with my syntax. The main point is to use a filter aggregation rather than terms, hopefully that should work for you.

How to diversify the result of top-hits aggregation?

Let's start with a concrete example. I have a document with these fields:
{
"template": {
"mappings": {
"template": {
"properties": {
"tid": {
"type": "long"
},
"folder_id": {
"type": "long"
},
"status": {
"type": "integer"
},
"major_num": {
"type": "integer"
}
}
}
}
}
}
I want to aggregate the query result by field folder_id, and for each group divided by folder_id, retrieve the top-N documents' _source detail. So i write query DSL like:
GET /template/template/_search
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"status": 1
}
}
]
}
},
"aggs": {
"folder": {
"terms": {
"field": "folder_id",
"size": 10
},
"aggs": {
"top_hit":{
"top_hits": {
"size": 5,
"_source": ["major_num"]
}
}
}
}
}
}
However, now comes a requirement that the top hits documents for each folder_id must be diversified on the field major_num. For each folder_id, the top hits documents retrieve by the sub top_hits aggregation under the terms aggregation, must be unique on field major_num, and for each major_num value, return at most 1 document in the sub top hits aggregation result.
top_hits aggregation cannot accept sub-aggregations, so how should i solve the question?
Why not simply adding another terms aggregation on the major_num field ?
GET /template/template/_search
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"status": 1
}
}
]
}
},
"aggs": {
"folder": {
"terms": {
"field": "folder_id",
"size": 10
},
"aggs": {
"majornum": {
"terms": {
"field": "major_num",
"size": 10
},
"aggs": {
"top_hit": {
"top_hits": {
"size": 1
}
}
}
}
}
}
}
}

How to mention from and size for the first level of elastic search aggregation in nested aggregation?

I have written a query to get the buckets based on id and then sort it. This works fine. But how to make it return buckets from position 100 till 200 for aggregation_by_id rule?
{
"query": {
"match_all": {}
},
"size": 0,
"aggregations": {
"aggregation_by_id": {
"terms": {
"field": "id.keyword"
"size" : 200
},
"aggs": {
"sort_timestamp": {
"top_hits": {
"sort": [{
"timestamp": {
"order": "desc",
"unmapped_type": "long"
}
}],
"size": 1
}
}
}
}
}
}

Elasticsearch: Aggregation on filtered nested objects to find unique values

I have an array of objects (tags) in each document in Elasticsearch 5:
{
"tags": [
{ "key": "tag1", "value": "val1" },
{ "key": "tag2", "value": "val2" },
...
]
}
Now I want to find unique tag values for a certain tag key. Something similiar to this SQL query:
SELECT DISTINCT(tags.value) FROM tags WHERE tags.key='some-key'
I have came to this DSL so far:
{
"size": 0,
"aggs": {
"my_tags": {
"nested": {
"path": "tags"
},
"aggs": {
"filter" : { "terms": { "tags.key": "tag1" } },
"aggs": {
"my_tags_values": {
"terms" : {
"field" : "tags.value",
"size": 9999
}
}
}
}
}
}
}
But It is showing me this error:
[terms] unknown field [tags.key], parser not found.
Is this the right approach to solve the problem? Thanks for your help.
Note: I have declared the tags field as a nested field in my mapping.
You mixed up things there. You wanted probably to add a filter aggregation, but you didn't give it any name:
{
"size": 0,
"aggs": {
"my_tags": {
"nested": {
"path": "tags"
},
"aggs": {
"my_filter": {
"filter": {
"terms": {
"tags.key": [
"tag1"
]
}
},
"aggs": {
"my_tags_values": {
"terms": {
"field": "tags.value",
"size": 9999
}
}
}
}
}
}
}
}
Try Bool Query inside the Filter-Aggregation:
{
"size": 0,
"aggs": {
"my_tags": {
"nested": {
"path": "tags"
},
"aggs": {
"filter": {
"bool": {
"must": [
{
"term": {
"tags.key": "tag1"
}
}
]
},
"aggs": {
"my_tags_values": {
"terms": {
"field": "tags.value",
"size": 0
}
}
}
}
}
}
}
}
BTW: if you want to retrieve all buckets, you can write 0 instead of 9999 in aggregation size.

Resources