elasticsearch 1.7 group by bucket with multi field concat - elasticsearch

I have an aggregate statement that groups by firstname and buckets them with necessary fields. But I want to group by concatenation of firstname+lastname. I do not want to use nested aggregates like group by firstname and then group by lastname. How do I change the field to include a string concatenation of multiple fields?
"aggs": {
"by_name": {
"terms": {
"field": "firstname"
},
"aggs": {
"source": {
"top_hits": {
"_source": {
"include": [
"id","name"
]
}
}
}
}
}
}

In ES 1.7
You may use script aggregation with terms aggregation
GET _search
{
"size": 20,
"aggs": {
"con": {
"terms": {
"script": "doc['firstName'].value + doc['lastName'].value"
}
}
}
}
For current version, ie. ES 5.2, there is bucket script aggregaton for the same purpose

Related

How to do proportions in Elastic search query

I have a field in my data that has four unique values for all the records. I have to aggregate the records based on each unique value and find the proportion of each field in the data. Essentially, (Number of records in each unique field/total number of records). Is there a way to do this with elastic search dashboards? I have used terms aggregation to aggregate the fields and applied value_count metric aggregation to get the doc_count value. But I am not able to use the bucket script to do the division. I am getting the error ""buckets_path must reference either a number value or a single value numeric metric aggregation, got: [StringTerms] at aggregation [latest_version]""
Below is my code:
{
"size": 0,
"aggs": {
"BAR": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "day"
},
"aggs": {
"latest_version": {
"filter": {
"match_phrase": {
"log": "main_filter"
}
},
"aggs": {
"latest_version_count": {
"terms": {
"field": "field_name"
},
"aggs": {
"version_count": {
"value_count": {
"field": "field_name"
}
}
}
},
"sum_buckets": {
"sum_bucket": {
"buckets_path": "latest_version_count>_count"
}
}
}
},
"BAR-percentage": {
"bucket_script": {
"buckets_path": {
"eachVersionCount": "latest_version>latest_version_count",
"totalVersionCount": "latest_version>sum_buckets"
},
"script": "params.eachVersionCount/params.totalVersionCount"
}
}
}
}
}
}

Deduplicate and perform composite aggregation on deduced result

I've an index in elastic search which contains data of daily transactions. Each doc has mainly three fields as below :
TxnId, Status, TxnType,userId
two documents can have same TxnIds.
I'm looking for a query that provides aggregation over status,TxnType for unique txnIds. Basically I'm looking for something like : select unique txnIds from user_table group by status,txnType.
I've a ES query which will dedup on TxnIds. I've another ES query which can perform composite aggregation on status and txnType. I want to do both things in Single query.
I tried collapse feature . I also tried cardinality and dedup features. But query is not giving correct output.:
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"streamSource": 3
}
}
]
}
},
"collapse": {
"field": "txnId"
},
"aggs": {
"buckets": {
"composite": {
"size": 30,
"sources": [
{
"status": {
"terms": {
"field": "status"
}
}
},
{
"txnType": {
"terms": {
"field": "txnType"
}
}
}
]
}
}
}
}

Excluding inner hits from top hits aggregation with source filter

In my query, I am using the inner_hits to return the list of nested objects that match my query.
I then add an aggregations for categoryId of my document, and then a top hit aggregation to get the display name for that category.
"aggs": {
"category": {
"terms": {
"field": "categoryId",
"size": 100
},
"aggs": {
"category_value": {
"top_hits": {
"size": 1,
"_source": {
"includes": "categoryName"
}
}
}
}
}
}
Now, when I look at the aggregation buckets, I do get a _source document with only the categoryName property, but I also get the entire inner_hits collection:
{
...
"_source": {
"categoryName": "Armchairs"
},
"inner_hits": {
"my_inner_hits": {
"hits": {
"total": 260,
"max_score": null,
"hits": [{
...
"_source": {
//nested document here
}
}
]
}
}
}
}
Is there a way to not include the inner_hits data in a top_hits aggregation?
Since you only need a single field, what I suggest you to do is to get rid of top_hits aggregation and use another terms aggregation for the name:
{
...
"aggs": {
"category": {
"terms": {
"field": "categoryId",
"size": 100
},
"aggs": {
"category_value": {
"terms": {
"field": "categoryName",
"size": 1
}
}
}
}
}
}
That will also be a little bit more efficient.
UPDATE:
Another way to keep using terms/top_hits is to leverage response filtering and only return what you need. For instance, appending this to your URL will make sure that you won't find any inner hits inside your aggregation
?filter_path=hits.hits,aggregations.**.key,aggregations.**.doc_count,aggregations.**.hits.hits.hits._source

Elasticsearch include other fields in top level aggregation

My indexed documents are as follows:
{
"user": {
"email": "test#test.com",
"firstName": "test",
"lastName": "test"
},
...
"category": "test_category"
}
Currently I have an aggregation which counts documents by the user's email and then a sub aggregation to count categories for each user:
"aggs": {
"users": {
"terms": {
"field": "user.email",
"order": {
"_count": "desc"
}
},
"aggs": {
"categories": {
"terms": {
"field": "category",
"order": {
"_count": "desc"
}
}
}
}
}
}
I am trying to include the user's first and last name to the buckets generated by the top aggregation, while still getting the same results from the categories sub aggregation. I've tried including the top_hits aggregation, but I didn't have any luck getting the results I want.
Any advice? Thanks!
EDIT:
Let me rephrase. I actually did get the desired result in terms of user data with the top_hits aggregation, I just don't know how to properly include it in my original aggregation so that the categories sub aggregation still gives me the same result. I tried the following top_hits aggregation:
"aggs": {
"user": {
"top_hits": {
"size": 1,
"_source": {
"include": ["user"]
}
}
}
}
I want to have the user data in the top level agg buckets and then still have the aggregation by category below that.
If i right, user and firstname lastname have a bijection.
So you could retrieve them using a customs script on these fields (and extract these buckets value on client side spliting with the "_" or wathever separator)
aggs: {
users: {
terms: {
script: 'doc["users.email"].value + "_" + doc["users.firstName"].value + "_" + doc["users.lastName"].value'
}
}
}

Get all documents from elastic search with a field having same value

Say I have documents of type Order and they have a field bulkOrderId. Bulkorderid represents a group or bulk of orders issued at once. They all have the same Id like this :
Order {
bulkOrderId": "bulkOrder:12345678";
}
The id is unique and is generated using UUID.
How do I find groups of orders with the same bulkOrderId from elasticsearch when the bulkOrderId is not known? Is it possible?
You can achieve that using a terms aggregation and a top_hits sub-aggregation, like this:
{
"query": {
"match_all": {}
},
"aggs": {
"bulks": {
"terms": {
"field": "bulkOrderId",
"size": 10
},
"aggs": {
"orders": {
"top_hits": {
"size": 10
}
}
}
}
}
}

Resources