Can I limit aggregations to return only specific list of values? I have something like this:
{ "aggs" : {
"province" : {
"terms" : {
"field" : "province"
}
}
},
"query": {
"bool": {
//my query..
But let's say I know list of province for which I want make count ({'province1', 'province2', 'province3'}). Is it possible to restrict returned list of province without influence on my query results?
I want to get:
//list of hits..
//
"aggregations": {
"province": {
"buckets": [
{
"key": "province1",
"doc_count": 200
},
{
"key": "province2",
"doc_count": 162
},
{
"key": "province3",
"doc_count": 162
}
// even if there is more possible provinces
// I don't want to see them
Sure, just use term filters.
Here's an example. Let's say I have visit stats for a bunch of different IP addresses, but I only want to get counts of document for two of them, I could do this:
POST /test_index/_search?search_type=count
{
"aggregations": {
"ip": {
"terms": {
"field": "ip",
"size": 10,
"include": [
"146.233.189.126",
"193.33.153.89"
]
}
}
}
}
and get back something like:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 7,
"max_score": 0,
"hits": []
},
"aggregations": {
"ip": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "146.233.189.126",
"doc_count": 3
},
{
"key": "193.33.153.89",
"doc_count": 3
}
]
}
}
}
Here is some code I used to play around with it:
http://sense.qbox.io/gist/68697646ef7afc9f0375995b6f84181a7ac4cba9
So your example might look like:
{
"aggs": {
"province": {
"terms": {
"field": "province",
"include": [
"province1",
"province2",
"province3"
]
}
}
}
}
Related
I have an elastic index for products, each product has Brand attribution and I "have to" create an aggregation that returns Brands of the products.
My Sample Query:
GET /products/product/_search
{
"size": 0,
"aggs": {
"myFancyFilter": {
"filter": {
"match_all": {}
},
"aggs": {
"inner": {
"terms": {
"field": "Brand",
"size": 3
}
}
}
}
},
"query": {
"match_all": {}
}
}
And the result:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 236952,
"max_score": 0,
"hits": []
},
"aggregations": {
"myFancyFilter": {
"doc_count": 236952,
"inner": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 139267,
"buckets": [
{
"key": "Brand1",
"doc_count": 3144
},
{
"key": "Brand2",
"doc_count": 1759
},
{
"key": "Brand3",
"doc_count": 1737
}
]
}
}
}
}
It works perfect for me. Elastic sorts buckets according to doc_count, however I would like to manipulate the bucket order in result. For example, assume that I have Brand5 and I want to increment its order to #2. I want result coming in order Brand1, Brand5 and Brand3.
If it was not in an aggregation, but in a query, I could use function_score, but now, I don't have an idea. Any clues?
What you are looking for is to define your own sorting definition and that to be applied in aggregation in elasticsearch. I've been able to come up with a solution by renaming the aggregation terms in below manner:
Brand1 to a_Brand1
Brand5 to b_Brand5
Brand3 to c_Brand3
And then apply sorting on the terms so that sorting happens lexicographically.
Of course this may not be the exact or the best solution but I felt this can help.
Below is the query that I've used. Please note that my field name is brand and it is a multifield and I'm using the field brand.keyword.
POST testdataindex/_search
{
"size":0,
"query":{
"match_all":{
}
},
"aggs":{
"myFancyFilter":{
"filter":{
"match_all":{
}
},
"aggs":{
"inner":{
"terms":{
"script":{
"lang":"painless",
"inline":"if(params.newNames.containsKey(doc['brand.keyword'].value)) { return params.newNames[doc['brand.keyword'].value];} return null;",
"params":{
"newNames":{
"Brand1":"a_Brand1",
"Brand5":"b_Brand5",
"Brand3":"c_Brand3"
}
}
},
"order":{
"_term":"asc"
}
}
}
}
}
}
}
I've created a sample data with brand names Brand1, Brand3 and Brand5 and below how the results would appear. Note the change in the term names.
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 8,
"max_score": 0,
"hits": []
},
"aggregations": {
"myFancyFilter": {
"doc_count": 8,
"inner": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "a_Brand1",
"doc_count": 2
},
{
"key": "b_Brand5",
"doc_count": 4
},
{
"key": "c_Brand3",
"doc_count": 2
}
]
}
}
}
}
Hope it helps!
I want to aggregate on data, which has inner objects. For example:
{
"_index": "product_index-en",
"_type": "elasticproductmodel",
"_id": "000001111",
"_score": 6.3316255,
"_source": {
"productId": "11111111111",
"productIdOnlyLetterAndDigit": "11111111111",
"productIdOnlyDigit": "11111111111",
"productNumber": "11111111111",
"name": "Glow Plug",
"nameOnlyLetterAndDigit": "glowplug",
"productImageLarge": "11111111111.jpg",
"itemGroupId": "11111",
"relatedProductIds": [],
"dataAreaCountries": [
"fra",
"pol",
"uk",
"sie",
"sve",
"atl",
"ita",
"hol",
"dk"
],
"oemItems": [
{
"manufactorName": "BERU",
"manufacType": "0"
},
{
"manufactorName": "LUCAS",
"manufacType": "0"
}
]
}
}
I need to be able aggregates oemItems.manufactorName values, but only where oemItems.manufacType is "0". I have tried a number of examples, such as the accepted one here ( Elastic Search Aggregate into buckets on conditions ), but I just cannot seem to wrap my head around it.
I've tried following, hopeing it will aggragate on manufacType first, which it does, and then manufactorName for each type, which it seems to display correct hit count. However, buckets for manufactorName are empty:
GET /product_index-en/_search
{
"size": 0,
"aggs": {
"baked_goods": {
"nested": {
"path": "oemItems"
},
"aggs": {
"test1": {
"terms": {
"field": "oemItems.manufacType",
"size": 500
},
"aggs": {
"test2": {
"terms": {
"field": "oemItems.manufactorName",
"size": 500
}
}
}
}
}
}
}
}
And the result:
{
"took": 27,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 471214,
"max_score": 0,
"hits": []
},
"aggregations": {
"baked_goods": {
"doc_count": 677246,
"test1": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "0",
"doc_count": 436557,
"test2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
},
{
"key": "1",
"doc_count": 240689,
"test2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
}
]
}
}
}
}
I have also tried to add a nested term filter, to only look at oemItems which have manufacType 1 with following query. However, it returns Objects where oemItems include manufacType 1, meaning it oemItems within products still contain either 1 or 0 manufacType. I don't see how doing an aggregate on this response will only return oemItems.manufactorName where oemItems.manufacType is 0
GET /product_index-en/_search
{
"query" : { "match_all" : {} },
"filter" : {
"nested" : {
"path" : "oemItems",
"filter" : {
"bool" : {
"must" : [
{
"term" : {"oemItems.manufacType" : "1"}
}
]
}
}
}
}
}
Good start so far. Just try it like this:
POST /product_index-en/_search
{
"size": 0,
"query": {
"nested": {
"path": "oemItems",
"query": {
"term": {
"oemItems.manufacType": "0"
}
}
}
},
"aggs": {
"baked_goods": {
"nested": {
"path": "oemItems"
},
"aggs": {
"test1": {
"terms": {
"field": "oemItems.manufactorName",
"size": 500
}
}
}
}
}
}
I have some test documents that look like
"hits": {
...
"_source": {
"student": "DTWjkg",
"name": "My Name",
"grade": "A"
...
"student": "ggddee",
"name": "My Name2",
"grade": "B"
...
"student": "ggddee",
"name": "My Name3",
"grade": "A"
And I wanted to get the percentage of students that have a grade of B, the result would be "33%", assuming there were only 3 students.
How would I do this in Elasticsearch?
So far I have this aggregation, which I feel like is close:
"aggs": {
"gradeBPercent": {
"terms": {
"field" : "grade",
"script" : "_value == 'B'"
}
}
}
This returns:
"aggregations": {
"gradeBPercent": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "false",
"doc_count": 2
},
{
"key": "true",
"doc_count": 1
}
]
}
}
I'm not looking necessarily looking for an exact answer, perhaps what I could terms and keywords I could google. I've read over the elasticsearch docs and not found anything that could help.
First off, you shouldn't need a script for this aggregation. If you want to limit your results to everyone where `value == 'B' then you should do that using a filter, not a script.
ElasticSearch won't return you a percentage exactly, but you can easily calculate that using the result from a TERMS AGGREGATION.
Example:
GET devdev/audittrail/_search
{
"size": 0,
"aggs": {
"a1": {
"terms": {
"field": "uIDRequestID"
}
}
}
}
That returns:
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 25083,
"max_score": 0,
"hits": []
},
"aggregations": {
"a1": {
"doc_count_error_upper_bound": 9,
"sum_other_doc_count": 1300,
"buckets": [
{
"key": 556,
"doc_count": 34
},
{
"key": 393,
"doc_count": 28
},
{
"key": 528,
"doc_count": 15
}
]
}
}
}
So what does that return mean?
the hits.total field is the total number of records matching your query.
the doc_count is telling you how many items are in each bucket.
So for my example here: I could say that the key "556" shows up in 34 of 25083 documents, so it has a percentage of (34 / 25083) * 100
I have documents like
{"domain":"US", "zipcode":"11111", "eventType":"click", "id":"1", "time":100}
{"domain":"US", "zipcode":"22222", "eventType":"sell", "id":"2", "time":200}
{"domain":"US", "zipcode":"22222", "eventType":"click", "id":"3","time":150}
{"domain":"US", "zipcode":"11111", "eventType":"sell", "id":"4","time":350}
{"domain":"US", "zipcode":"33333", "eventType":"sell", "id":"5","time":225}
{"domain":"EU", "zipcode":"44444", "eventType":"click", "id":"5","time":120}
I want to filter these documents by eventType=sell and time between 125 and 400, group by domain followed by zipcode and count the documents in each bucket. So my output would be like (first and last docs would be ignored by the filters)
US, 11111,1
US, 22222,1
US, 33333,1
In SQL, this should have been straightforward. But I am not able to get this to work on ElasticSearch. Could someone please help me out here?
How do I write ElasticSearch query to accomplish the above?
This query seems to do what you want:
POST /test_index/_search
{
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"eventType": "sell"
}
},
{
"range": {
"time": {
"gte": 125,
"lte": 400
}
}
}
]
}
}
}
},
"aggs": {
"zipcode_terms": {
"terms": {
"field": "zipcode"
}
}
}
}
returning
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"zipcode_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "11111",
"doc_count": 1
},
{
"key": "22222",
"doc_count": 1
},
{
"key": "33333",
"doc_count": 1
}
]
}
}
}
(Note that there is only 1 "sell" at "22222", not 2).
Here is some code I used to test it:
http://sense.qbox.io/gist/1c4cb591ab72a6f3ae681df30fe023ddfca4225b
You might want to take a look at terms aggregations, the bool filter, and range filters.
EDIT: I just realized I left out the domain part, but it should be straightforward to add in a bucket aggregation on that as well if you need to.
I've got the following simple ElasticSearch query:
{
"query": {
"term": {
"categories": "1234"
}
}
}
Which returns a number of documents containing a structure like this:
{
"properties": [
{
"name": "foo",
"value": 20
},
{
"name": "bar",
"value": 30
}
]
}
How do I have to alter the above query so ElasticSearch returns a set of values in properties.name that all result documents have in common?
You can't do this with a simple query. One of the solution is to use a term aggregation, like this one:
{
"query": {
"term": {
"categories": "1234"
}
},
"aggs": {
"properties_name": {
"terms": {
"field": "properties.name"
}
}
}
}
You will get a similar response:
{
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 10,
"hits": [{...}]
}
"aggregations": {
"properties_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{
"key": "foo",
"doc_count": 10
}, {
"key": "bar",
"doc_count": 4
}, {}]
}
}
}
You usual results will be available under hits, and the aggregation results under aggregations.
Then you can use hits.total (10) to find properties_names which are present in all documents. You simply need to iterate over buckets, and keep ones with doc_count == hits.total
In this example only "foo" properties is present in all documents