Get all documents from elastic search with a field having same value - elasticsearch

Say I have documents of type Order and they have a field bulkOrderId. Bulkorderid represents a group or bulk of orders issued at once. They all have the same Id like this :
Order {
bulkOrderId": "bulkOrder:12345678";
}
The id is unique and is generated using UUID.
How do I find groups of orders with the same bulkOrderId from elasticsearch when the bulkOrderId is not known? Is it possible?

You can achieve that using a terms aggregation and a top_hits sub-aggregation, like this:
{
"query": {
"match_all": {}
},
"aggs": {
"bulks": {
"terms": {
"field": "bulkOrderId",
"size": 10
},
"aggs": {
"orders": {
"top_hits": {
"size": 10
}
}
}
}
}
}

Related

How to do proportions in Elastic search query

I have a field in my data that has four unique values for all the records. I have to aggregate the records based on each unique value and find the proportion of each field in the data. Essentially, (Number of records in each unique field/total number of records). Is there a way to do this with elastic search dashboards? I have used terms aggregation to aggregate the fields and applied value_count metric aggregation to get the doc_count value. But I am not able to use the bucket script to do the division. I am getting the error ""buckets_path must reference either a number value or a single value numeric metric aggregation, got: [StringTerms] at aggregation [latest_version]""
Below is my code:
{
"size": 0,
"aggs": {
"BAR": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "day"
},
"aggs": {
"latest_version": {
"filter": {
"match_phrase": {
"log": "main_filter"
}
},
"aggs": {
"latest_version_count": {
"terms": {
"field": "field_name"
},
"aggs": {
"version_count": {
"value_count": {
"field": "field_name"
}
}
}
},
"sum_buckets": {
"sum_bucket": {
"buckets_path": "latest_version_count>_count"
}
}
}
},
"BAR-percentage": {
"bucket_script": {
"buckets_path": {
"eachVersionCount": "latest_version>latest_version_count",
"totalVersionCount": "latest_version>sum_buckets"
},
"script": "params.eachVersionCount/params.totalVersionCount"
}
}
}
}
}
}

Aggregation on terms and intervals in elasticsearch

My documents are like this below:
{
"uri" : "post:1130a8ef197882bc3ebd",
"topic_list" : [
"bye",
"hello"
],
"datetime" : "2010-06-06T22:08:49"
}
I want to make a query to aggregate on both datetime and topic_list. My desired output is to tell me that on each time interval, how many docs has the hello topic in topic_list.
What I've tried was this:
{
"size": 0,
"aggs": {
"test": {
"terms": {
"field": "topic_list"
}
}
}
}
But the output just tell me how many docs containing every topic at all times and not in the intervals.
How can I create such aggregation?
You need to add two more things:
a query to only restrict the results to documents containing the topic "hello". If you're only interested in the document count per time interval, you don't need the terms aggregation on the topic_list field
a date_histogram aggregation to create time intervals
Here is the query:
{
"size": 0,
"query": {
"term": {
"topic_list": "hello"
}
},
"aggs": {
"intervals": {
"date_histogram": {
"field": "date",
"calendar_interval": "1d"
}
}
}
}

Deduplicate and perform composite aggregation on deduced result

I've an index in elastic search which contains data of daily transactions. Each doc has mainly three fields as below :
TxnId, Status, TxnType,userId
two documents can have same TxnIds.
I'm looking for a query that provides aggregation over status,TxnType for unique txnIds. Basically I'm looking for something like : select unique txnIds from user_table group by status,txnType.
I've a ES query which will dedup on TxnIds. I've another ES query which can perform composite aggregation on status and txnType. I want to do both things in Single query.
I tried collapse feature . I also tried cardinality and dedup features. But query is not giving correct output.:
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"streamSource": 3
}
}
]
}
},
"collapse": {
"field": "txnId"
},
"aggs": {
"buckets": {
"composite": {
"size": 30,
"sources": [
{
"status": {
"terms": {
"field": "status"
}
}
},
{
"txnType": {
"terms": {
"field": "txnType"
}
}
}
]
}
}
}
}

elasticsearch 1.7 group by bucket with multi field concat

I have an aggregate statement that groups by firstname and buckets them with necessary fields. But I want to group by concatenation of firstname+lastname. I do not want to use nested aggregates like group by firstname and then group by lastname. How do I change the field to include a string concatenation of multiple fields?
"aggs": {
"by_name": {
"terms": {
"field": "firstname"
},
"aggs": {
"source": {
"top_hits": {
"_source": {
"include": [
"id","name"
]
}
}
}
}
}
}
In ES 1.7
You may use script aggregation with terms aggregation
GET _search
{
"size": 20,
"aggs": {
"con": {
"terms": {
"script": "doc['firstName'].value + doc['lastName'].value"
}
}
}
}
For current version, ie. ES 5.2, there is bucket script aggregaton for the same purpose

how to get the top 1 document of each type, from a search on index(having multiple types)?

We have an index named "machines", and have types "auto, bike, car, flight" in ElasticSearch
I want to get the similar brands from my search on an index - from every type
How do I query to get the top 1 document of each type, from a search on an index (having multiple types) via the Elasticsearch REST API?
Try this, using top_hits aggregation:
GET /machines/_search?search_type=count
{
"query": {
"match_all": {} //your query here
},
"aggs": {
"top-types": {
"terms": {
"field": "_type"
},
"aggs": {
"top_docs": {
"top_hits": {
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"size": 1
}
}
}
}
}
}

Resources