ELASTICSEARCH - How can i get an aggregation in a boolean field? - elasticsearch

I want to get aggregation in a boolean field, but the out is a error:
query:
"""
{
"size": 0,
"aggs": {
"RecentCreated": {
"terms": {
"field": "created_at.keyword",
"order": {
"_key": "desc"
},
"size": 1
},
"aggs": {
"nestedData": {
"nested": {
"path": "data.add.serv"
},
"aggs": {
"NAME": {
"terms": {
"field": "data.add.serv.beast"
, "include": true
}
}
}
}
}
}
}
}
"""
error:
"type" : "x_content_parse_exception",
"reason" : "[terms] include doesn't support values of type: VALUE_BOOLEAN"
I have been reading that it is possible to transform the true values ​​into 1 through script to get count in the aggregation, but I cannot get the result of the true values
How could I get a count of the boolean field with true value?

I think what you might want to do is use a filter aggregation over your nested document rather than a terms aggregation. So in short change this bit of your query:
"aggs": {
"NAME": {
"terms": {
"field": "data.add.serv.beast",
"include": true
}
}
}
to
"aggs": {
"NAME": {
"filter": {
"term": {
"data.add.serv.beast": true
}
}
}
}
I'm not too familiar with nested aggregations, so there might still be an error with my syntax. The main point is to use a filter aggregation rather than terms, hopefully that should work for you.

Related

How to do proportions in Elastic search query

I have a field in my data that has four unique values for all the records. I have to aggregate the records based on each unique value and find the proportion of each field in the data. Essentially, (Number of records in each unique field/total number of records). Is there a way to do this with elastic search dashboards? I have used terms aggregation to aggregate the fields and applied value_count metric aggregation to get the doc_count value. But I am not able to use the bucket script to do the division. I am getting the error ""buckets_path must reference either a number value or a single value numeric metric aggregation, got: [StringTerms] at aggregation [latest_version]""
Below is my code:
{
"size": 0,
"aggs": {
"BAR": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "day"
},
"aggs": {
"latest_version": {
"filter": {
"match_phrase": {
"log": "main_filter"
}
},
"aggs": {
"latest_version_count": {
"terms": {
"field": "field_name"
},
"aggs": {
"version_count": {
"value_count": {
"field": "field_name"
}
}
}
},
"sum_buckets": {
"sum_bucket": {
"buckets_path": "latest_version_count>_count"
}
}
}
},
"BAR-percentage": {
"bucket_script": {
"buckets_path": {
"eachVersionCount": "latest_version>latest_version_count",
"totalVersionCount": "latest_version>sum_buckets"
},
"script": "params.eachVersionCount/params.totalVersionCount"
}
}
}
}
}
}

Filter out terms aggregation buckets in elasticsearch after applying aggregation

Below is snapshot of the dataset:
recordNo employeeId employeeStatus employeeAddr
1 employeeA Permanent
2 employeeA ABC
3 employeeB Contract
4 employeeB CDE
I want to get the list of employees along with employeeStatus and employeeAddr.
So I am using terms aggregation on employeeId and then using sub-aggregations of employeeStatus and employeeAddr to get these details.
Below query returns the results correctly.
{
"aggregations": {
"Employee": {
"terms": {
"field": "employeeID"
},
"aggregations": {
"employeeStatus": {
"terms": {"field": "employeeStatus"}
},
"employeeAddr": {
"terms": {"field": "employeeAddr"}
}
}
}
}
}
Now I want only the employees which are in Permanent status. So I am applying filter aggregation.
{
"aggregations": {
"filter_Employee_employeeID": {
"filter": {
"bool": {
"must": [
{
"match": {
"employeeStatus": {"query": "Permanent"}
}
}
]
}
},
"aggregations": {
"Employee": {
"terms": {
"field": "employeeID"
},
"aggregations": {
"employeeStatus": {
"terms": {"field": "employeeStatus"}
},
"employeeAddr": {
"terms": {"field": "employeeAddr"}
}
}
}
}
}
}
}
Now the problem is that the employeeAddr aggregation returns no buckets for employeeA because record 2 gets filtered out before the aggregation is done.
Assuming that I cannot modify the data set and I want to achieve the result with a single elastic query, how can I do it?
I checked the Bucket Selector pipeline aggregation but it only works for metric aggregations.
Is there a way to filter out term buckets after the aggregation is applied?
If I understood correctly you want to preserve the aggregations even if you use some kind of filter. To achieve that, try using the post_filter clause.
You can check the docs here
The clause is applied "outside" the aggregation. Using your example, it should look like this:
{
"aggregations": {
"filter_Employee_employeeID": {
"aggregations": {
"Employee": {
"terms": {
"field": "employeeID"
},
"aggregations": {
"employeeStatus": {
"terms": {
"field": "employeeStatus"
}
},
"employeeAddr": {
"terms": {
"field": "employeeAddr"
}
}
}
}
}
}
},
"post_filter": {
"bool": {
"must": [
{
"match": {
"employeeStatus": {
"query": "Permanent"
}
}
}
]
}
}
}
I tested a combination of the include field for the terms aggregation, plus using a bucket_selector with document count would give you the desired result.
Filtering term values is here.
Bucket selector using document count is here
the subtlety here is that, yes you need numeric values, but you can also reference meta/custom fields that elasticsearch has
{
"aggregations": {
"Employee": {
"terms": {
"field": "employeeId.keyword"
},
"aggregations": {
"employeeStatus": {
"terms": {"field": "employeeStatus", "include": "Permanent"}
},
"employeeAddr": {
"terms": {"field": "employeeAddr"}
},
"min_bucket_selector": {
"bucket_selector": {
"buckets_path": {
"count": "employeeStatus._bucket_count"
},
"script": {
"source": "params.count != 0"
}
}
}
}
}
}
}
I tested this on 7.10 and it worked, returning only employeeA, with the address included.

query return [parsing_exception] [size] query malformed, no start_object after query name, with { line=1 & col=264 }

I'm new in elasticsearch, and i try to use dev tools to create filters.
here is what work and I want to use
POST /transform_alldomain/_search
{
"size":0,
"aggs": {
"group": {
"terms": {
"field": "Email.keyword"
},
"aggs": {
"group": {
"terms": {
"field": "bln.keyword"
},
"aggs": {
"group_docs": {
"top_hits": {
"size": 1,
"sort": [
{
"extract_date.max": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}}
now i want to use this similiar stuff to filter as type this into filter, edit as query dsl
{
"size":0,
"aggs": {
"group": {
"terms": {
"field": "Email.keyword"
},
"aggs": {
"group": {
"terms": {
"field": "bln.keyword"
},
"aggs": {
"group_docs": {
"top_hits": {
"size": 1,
"sort": [
{
"extract_date.max": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}}
it returns
[parsing_exception] [size] query malformed, no start_object after query name, with { line=1 & col=324 }
I don't know what is the difference and how to make it work
I need to create searched object from this
How I execute the filter:
it returns
The Discover app is not the right tool to use to make aggregations, the Discover app is only useful for queries and filters.
What you want to achieve can be done with a Data table visualization. So instead of Discover, go to Visualize, then pick "Create Visualization"
Then pick the "Data Table" Visualization
Then pick your index pattern
And finally you can define your two terms aggregations like this:

How to count number of objects in a nested field in elastic search?

How to count number of objects in a nested filed in elastic search?
Sample mapping :
"base_keywords": {
"type": "nested",
"properties": {
"base_key": {
"type": "text"
},
"category": {
"type": "text"
},
"created_at": {
"type": "date"
},
"date": {
"type": "date"
},
"rank": {
"type": "integer"
}
}
}
I would like to count number of objects in nested filed 'base_keywords'.
You would need to do this with inline script. This is what worked for me: (Using ES 6.x):
GET your-indices/_search
{
"aggs": {
"whatever": {
"sum": {
"script": {
"inline": "params._source.base_keywords.size()"
}
}
}
}
}
Aggs are normally good for counting and grouping, for nested documents you can use nested aggs:
"aggs": {
"MyAggregation1": {
"terms": {
"field": "FieldA",
"size": 0
},
"aggs": {
"BaseKeyWords": {
"nested": { "path": "base_keywords" },
"aggs": {
"BaseKeys": {
"terms": {
"field": "base_keywords.base_key.keyword",
"size": 0
}
}
}
}
}
}
}
You don't specify what you want to count, but aggs are quite flexible for grouping and counting data.
The "doc_count" and "key" behave similar to an sql group by + count()
Updated (This assumes you have a .keyword field create the "keys" values, since a property of type "text" can't be aggregated or counted:
{
"aggs": {
"MyKeywords1Agg": {
"nested": { "path": "keywords1" },
"aggs": {
"NestedKeywords": {
"terms": {
"field": "keywords1.keys.keyword",
"size": 0
}
}
}
}
}
}
For simply counting the number of nested keys you could simply do this:
{
"aggs": {
"MyKeywords1Agg": {
"nested": { "path": "keywords1" }
}
}
}
If you want to get some grouping on the field values on the "main" document or the nested documents, you will have to extend your mapping / data model to include terms that are aggregatable, which includes most data types in elasticsearch except "text", ex.: dates, numbers, geolocations, keywords.
Edit:
Example with aggregating on a unique identifier for each top level document, assuming you have a property on it called "WordMappingId" of type integer
{
"aggs": {
"word_maping_agg": {
"terms": {
"field": "WordMappingId",
"size": 0,
"missing": -1
},
"aggs": {
"Keywords1Agg": null,
"nested": { "path": "keywords1" }
}
}
}
}
If you don't add any properties to the "word_maping" document on the top level there is no way to do an aggregation for each unique document. The builtin _id field is by default not aggregateable, and I suggest you include a unique identifier from the source data on the top level to aggregate on.
Note: the "missing" parameter will put all documents that don't have the WordMappingId property set in a bucked with the supplied value, this makes sure you're not missing any documents in the search results.
Aggs can support a behaviour similar to a group by in SQL, but you need something to actually group it by, and according to the mapping you supplied there are no such fields currently in your index.
I was trying to do similar to understand production data distribution
The following query helped me find top 5
{
"query": {
"match_all": {}
},
"aggs": {
"n_base_keywords": {
"nested": { "path": "base_keywords" },
"aggs": {
"top_count": { "terms": { "field": "_id", "size" : 5 } }
}
}
}
}

ElasticSearch - Ordering aggregation by nested aggregation on nested field

{
"query": {
"match_all": {}
},
"from": 0,
"size": 0,
"aggs": {
"itineraryId": {
"terms": {
"field": "iid",
"size": 2147483647,
"order": [
{
"price>price>price.max": "desc"
}
]
},
"aggs": {
"duration": {
"stats": {
"field": "drn"
}
},
"price": {
"nested": {
"path": "prl"
},
"aggs": {
"price": {
"filter": {
"terms": {
"prl.cc.keyword": [
"USD"
]
}
},
"aggs": {
"price": {
"stats": {
"field": "prl.spl.vl"
}
}
}
}
}
}
}
}
}
}
Here, I am getting the error:
"Invalid terms aggregation order path [price>price>price.max]. Terms
buckets can only be sorted on a sub-aggregator path that is built out
of zero or more single-bucket aggregations within the path and a final
single-bucket or a metrics aggregation at the path end. Sub-path
[price] points to non single-bucket aggregation"
query works fine if I order by duration aggregation like
"order": [
{
"duration.max": "desc"
}
So is there any way to Order aggregation by nested aggregation on nested field i.e something like below ?
"order": [
{
"price>price>price.max": "desc"
}
As Val has pointed out in the comments ES does not support it yet.
Till then you can first aggregate the nested aggregation and then use the reverse nested aggregation to aggregate the duration, that is present in the root of the document.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-reverse-nested-aggregation.html

Resources