Getting cardinality of multiple fields? - elasticsearch

How can I get count of all unique combinations of values of 2 fields that are present in documents of my database, i.e. achieve the same functionality as the "cardinality" aggregation provides, but for more than 1 field?

You can use a script to achieve this. Assuming the character '#' is not present in any value of both the fields (you can use anything else to act as a separator), the query you're looking for is as under. Mind you, scripting will come with a performance hit.
{
"aggs" : {
"multi_field_cardinality" : {
"cardinality" : {
"script": "doc['<field1>'].value + '#' + doc['<field2'].value"
}
}
}
}
Read more about it here.

A better solution is to use nested aggregations and then count the resulting buckets.
"aggs": {
"Group1": {
"terms": {
"field": "Field1",
"size": 0
},
"aggs": {
"Group2": {
"terms": {
"field": "Field2",
"size": 0
}
}
}
}
}

Related

ElasticSearch range in sum aggregation

I'm a new user of elasticsearch and I would like make a range on sum aggregation.
So, I have :
{
"query": {},
"aggs": {
"group_by_trainset" : {
"terms": {
"field": "trainset",
"order": { "sum_compteur": "desc" }
},
"aggs": {
"sum_compteur": {
"sum": {
"field": "compteur"
}
}
}
}
}
}
And I have a 10 first results.
I want a pagination or it's not possible to aggs on elasticsearch. I try to return the next 10 results.
So, I want display the 10 results that are lower than the lowest value of the "sum_compteur" of the first 10 results and I don't know how.
Thanks for your help !
For every hit you'll get same Aggregations given input parameters are not changes.
If you want to specify size in aggregation counts you can do is:
"aggs": {
"sum_compteur": {
"sum": {
"field": "compteur",
"size" : 1000,
"order" : { "_count" : "asc" }
}
}
}
Where *1000 is the no of aggregation values you need.
You can also sort the results using "order". And later add pagination in the output array..

Elasticsearch term aggregation with range doc count

I want to aggregate a field and return only those buckets in which doc count is within 10 to 20 for e.g.
So far from documentation it says that we can provide min_doc_count parameter.
Is there any way we can provide max_doc_count also so i only get required buckets ?
Thanks
you can use the following query to filter the buckets using bucket_selector aggregation. You can take a deep look at pipeline aggregations and buckets paths here.
In the following example i am aggregating the document on product.name field where product is of type object for me.
{
"size": 0,
"aggs": {
"values": {
"terms": {
"field": "product.name.raw",
"size": 10
},
"aggs": {
"final_filter": {
"bucket_selector": {
"buckets_path": {
"values": "_count"
},
"script": "params.values > 10 && params.values < 20"
}
}
}
}
}
}
Hope this helps
Thanks

Compare IDs between two indices in elasticsearch

I have two indices in an elasticsearch cluster, containing what ought to be the same data in two slightly different formats. However, the number of records are different. The IDs of each document should be the same. Is there a way to extract a list of what IDs are present in one index but not the other?
If your two indices have the same type where these documents are stored, you can use something like this:
GET index1,index2/_search
{
"size": 0,
"aggs": {
"group_by_uid": {
"terms": {
"field": "_uid"
},
"aggs": {
"count_indices": {
"cardinality": {
"field": "_index"
}
},
"values_bucket_filter_by_index_count": {
"bucket_selector": {
"buckets_path": {
"count": "count_indices"
},
"script": "params.count < 2"
}
}
}
}
}
}
The query above works in 5.x. If your ID is a field inside a document, that's even better to test.
For anyone that comes across this, Scrutineer (https://github.com/Aconex/scrutineer/) provides this sort of ability if you follow convention of ID & Version concepts within Elasticsearch.

elastic search by day aggregation, sum of two properties

I'm trying to aggregate on the sum of two fields, but can't seem to get the syntax right.
Let's say I have the following aggregation:
{
"aggregations": {
"byDay": {
"date_histogram": {
"field": "#timestamp",
"interval": "1d"
},
"aggregations": {
"sum_a": {
"sum": {
"field": "a"
}
},
"sum_b": {
"sum": {
"field": "b"
}
},
"sum_a_and_b": {
/* what goes here? */
}
}
}
}
}
What I really want is an aggregation that is the sum of fields a and b.
It seem like something that would be simple, but I've hit a brick wall trying to get it right. Online examples have either been too simple (summing only on one field), or tried to do much more than this, so I've not found them helpful.
Try Terms Aggregation generating the terms using a script :
"aggs": {
"sum_a_and_b": {
"terms": {
"script": "doc['a'].value + doc['b'].value"
}
}
}
In order to enable dynamic scripting add the following to your config file (elasticsearch.yml by default) :
script.aggs: true # enable just for aggregations

show all buckets from aggregation within a single _type where one index contains multiple _type with same field names

I created an index named "electronics". I created two _type in index i.e "mobiles", "laptops" which have common field name "screensize".
Since I need to show facets for all the terms present in the fields, I am using aggregations to generate the terms and its facets.
{
"aggs": {
"distinct_field": {
"terms": {
"field": "screensize",
'min_doc_count': 0,
'size': 0
}
}
}
}
In the response I am getting all the screensizes with _type of mobiles as well as laptops(Since lucene treats same field names from different types as single field.). I only need the terms present in mobiles even if their count is 0.
I thought about doing a filtered query for mobiles _type before doing aggregations, but the results were still the same.
{
"query": {
"filtered": {
"filter": {
"type": {
"value": "mobiles"
}
}
}
},
"aggs": {
"distinct_field": {
"terms": {
"field": "screensize",
'min_doc_count': 0,
'size': 0
}
}
}
}
Is there any way I could possibly get only the terms from a single _type for a particular field?
I'm suggesting another approach using a terms aggregation with a script instead of field like this. The script will only return the value of the screensize if the type of the document is mobiles and null instead. This should work, try it out:
{
"aggs": {
"distinct_field": {
"terms": {
"script": "doc._type.value == 'mobiles' ? doc.screensize.value : null",
"min_doc_count": 0,
"size": 0
}
}
}
}
For this to work you also need to make sure that scripting is enabled

Resources