Grouped bar-chart in Kibana using Vega-lite - elasticsearch

From looking at https://vega.github.io/editor/#/examples/vega-lite/bar_grouped it shows example of creating grouped bar chart from a table of data.
In my case since I am getting data from elasticsearch it is not in tabular form.
I can't figure out a way to create two bar chart for each sum metric on a bucket.
"buckets" : [
{
"key_as_string" : "03/Dec/2019:00:00:00 +0900",
"key" : 1575298800000,
"doc_count" : 11187,
"deploy_agg" : {
"buckets" : {
"deploy_count" : {
"doc_count" : 43
}
}
},
"start_agg" : {
"buckets" : {
"start_count" : {
"doc_count" : 171
}
}
},
"sum_start_agg" : {
"value" : 171.0
},
"sum_deploy_agg" : {
"value" : 43.0
}
},..
I want to create two bars, one representing value of sum_start_agg and another one representing sum_deploy_agg value.
This is what I had for one bar chart.
"encoding": {
"x": {
"field": "key",
"type": "temporal",
"axis": {"title": "DATE"}
},
"y": {
"field": "deploy_agg.buckets.deploy_count.doc_count",
"type": "quantitative",
"axis": {"title": "deploy_count"}
}
"color": {"value": "green"}
"tooltip": [
{
"field": "deploy_agg.buckets.deploy_count.doc_count",
"type": "quantitative",
"title":"value"
}
]
}

You can use the Fold Transform to fold your two columns so that they can be referenced in an encoding. It might look something like this:
{
"data": {
"values": [
{
"key_as_string": "03/Dec/2019:00:00:00 +0900",
"key": 1575298800000,
"doc_count": 11187,
"deploy_agg": {"buckets": {"deploy_count": {"doc_count": 43}}},
"start_agg": {"buckets": {"start_count": {"doc_count": 171}}},
"sum_start_agg": {"value": 171},
"sum_deploy_agg": {"value": 43}
}
]
},
"transform": [
{
"fold": ["sum_start_agg.value", "sum_deploy_agg.value"],
"as": ["entry", "value"]
}
],
"mark": "bar",
"encoding": {
"x": {"field": "entry", "type": "nominal", "axis": null},
"y": {"field": "value", "type": "quantitative"},
"column": {"field": "key", "type": "temporal"},
"color": {"field": "entry", "type": "nominal"}
}
}

Related

Aggregator of type top_hits cannot accept sub-aggregations with Percentiles

I have the following documents:
{"id": 1, "type": "bags", "brand": "Louis Vuitton", "condition": "new", "price": 500}
{"id": 2, "type": "bags", "brand": "Louis Vuitton", "condition": "new", "price": 450}
{"id": 3, "type": "bags", "brand": "Louis Vuitton", "condition": "new", "price": 420}
{"id": 4, "type": "bags", "brand": "Louis Vuitton", "condition": "like new", "price": 150}
{"id": 5, "type": "bags", "brand": "Louis Vuitton", "condition": "like new", "price": 150}
{"id": 6, "type": "bags", "brand": "Louis Vuitton", "condition": "like new", "price": 100}
{"id": 7, "type": "bags", "brand": "Louis Vuitton" "condition": "used", "price": 400}
{"id": 8, "type": "bags", "brand": "Louis Vuitton", "condition": "used", "price": 350}
{"id": 9, "type": "bags", "brand": "Louis Vuitton", "condition": "used", "price": 300}
I am looking to write a query that will return to me the Percentiles of prices for the top 2 documents for each condition. In other words, I want to perform some calculation after getting the top 2 best scoring documents for each item condition (new, like new, used). I have tried this but I am getting the error the error Aggregator of type top_hits cannot accept sub-aggregations:
{
"query": {
"match": {
"brand": "Louis Vuitton"
}
},
"aggs": {
"item_conditions": {
"terms": {
"field": "condition"
},
"aggs": {
"top_two": {
"top_hits": {
"size": 2
},
"aggs": {
"top_two_percentiles": {
"percentiles": {
"field": "price"
}
}
}
}
}
}
}
}
Is there another way to achieve this, or do I have to do some post-processing myself after getting the results back from ES? The end result I want is to be able to supply this data to charts to make it look like this: https://ibb.co/y5FpV80
"... the percentiles of prices for the top two documents ..." is somewhat arbitrary. What's the metric that determines the score? A terms aggregation would score the buckets equally. The only differentiating factor would be the bucket count... What I'm saying is, you'll need to first determine what puts a given bucket in the top 2 and go from there.
In any event, you can:
Order any terms aggregation by the result of one of its numeric child aggregations.
After that, you can limit it to 2 buckets.
When that's done, you can use a percentiles bucket aggregation to calculate the percentiles of the two top prices.
In concrete terms:
POST your-index/_search?filter_path=aggregations.*.buckets.key,aggregations.*.buckets.doc_count,aggregations.*.buckets.percentiles_top_two_prices
{
"size": 0,
"query": {
"match": {
"brand": "Louis Vuitton"
}
},
"aggs": {
"item_conditions": {
"terms": {
"field": "condition"
},
"aggs": {
"top_two": {
"terms": {
"field": "price",
"size": 2,
"order": {
"max_score": "desc" <-- here's how you enforce the top 2 docs
}
},
"aggs": {
"max_score": {
"max": {
"script": "_score" <-- how you determine what happens here is up to you. _score will be equal across all buckets (I believe) so pick some other metric.
}
},
"just_the_price": {
"min": {
"field": "price" <-- there's no "identity" agg in ES so I'm using min. There will be only bucket because you're already under the parent which aggregates the price.
}
}
}
},
"percentiles_top_two_prices": {
"percentiles_bucket": {
"buckets_path": "top_two>just_the_price"
}
}
}
}
}
}
yielding something along the lines of:
{
"aggregations" : {
"item_conditions" : {
"buckets" : [
{
"key" : "like new",
"doc_count" : 3,
"percentiles_top_two_prices" : {
"values" : {
"1.0" : 100.0,
"5.0" : 100.0,
"25.0" : 100.0,
"50.0" : 150.0,
"75.0" : 150.0,
"95.0" : 150.0,
"99.0" : 150.0
}
}
},
{
"key" : "new",
"doc_count" : 3,
"percentiles_top_two_prices" : {
"values" : {
"1.0" : 420.0,
"5.0" : 420.0,
"25.0" : 420.0,
"50.0" : 450.0,
"75.0" : 450.0,
"95.0" : 450.0,
"99.0" : 450.0
}
}
},
{
"key" : "used",
"doc_count" : 3,
"percentiles_top_two_prices" : {
"values" : {
"1.0" : 300.0,
"5.0" : 300.0,
"25.0" : 300.0,
"50.0" : 350.0,
"75.0" : 350.0,
"95.0" : 350.0,
"99.0" : 350.0
}
}
}
]
}
}
}
I'm frankly not sure what these stats would bring you (when based on only two values) but this is how it could be done 😉

How to get multiple fields returned in elasticsearch query?

How to get multiple fields returned that are unique using elasticsearch query?
All of my documents have duplicate name and job fields. I would like to use an es query to get all the unique values which include the name and job in the same response, so they are tied together.
[
{
"name": "albert",
"job": "teacher",
"dob": "11/22/91"
},
{
"name": "albert",
"job": "teacher",
"dob": "11/22/91"
},
{
"name": "albert",
"job": "teacher",
"dob": "11/22/91"
},
{
"name": "justin",
"job": "engineer",
"dob": "1/2/93"
},
{
"name": "justin",
"job": "engineer",
"dob": "1/2/93"
},
{
"name": "luffy",
"job": "rubber man",
"dob": "1/2/99"
}
]
Expected result in any format -> I was trying to use aggs but I only get one field
[
{
"name": "albert",
"job": "teacher"
},
{
"name": "justin",
"job": "engineer"
},
{
"name": "luffy",
"job": "rubber man"
},
]
This is what I tried so far
GET name.test.index/_search
{
"size": 0,
"aggs" : {
"name" : {
"terms" : { "field" : "name.keyword" }
}
}
}
using the above query gets me this which is good that its unique
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 95,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Justin",
"doc_count" : 56
},
{
"key" : "Luffy",
"doc_count" : 31
},
{
"key" : "Albert",
"doc_count" : 8
}
]
}
}
}
I tried doing nested aggregation but that did not work. Is there an alternative solution for getting multiple unique values or am I missing something?
That's a good start! There are a few ways to achieve what you want, each provides a different response format, so you can decide which one you prefer.
The first option is to leverage the top_hits sub-aggregation and return the two fields for each name bucket:
GET name.test.index/_search
{
"size": 0,
"aggs": {
"name": {
"terms": {
"field": "name.keyword"
},
"aggs": {
"top": {
"top_hits": {
"_source": [
"name",
"job"
],
"size": 1
}
}
}
}
}
}
The second option is to use a script in your terms aggregation instead of a field to return a compound value:
GET name.test.index/_search
{
"size": 0,
"aggs": {
"name": {
"terms": {
"script": "doc['name'].value + ' - ' + doc['job'].value"
}
}
}
}
The third option is to use two levels of field collapsing:
GET name.test.index/_search
{
"collapse": {
"field": "name",
"inner_hits": {
"name": "by_job",
"collapse": {
"field": "job"
},
"size": 1
}
}
}

Elasticsearch aggregate on nested JSON data

I have to do some aggregation on json data. I saw multiple answers here on stackoverflow but not nothing worked for me.
I have multiple row and in timeCountry column i have an array which stores JSON objects. with keys count, country_name, s_name.
I have to find the sum of all the rows according to s_name,
Example - if in 1st row timeCountry holds array like below
[ {
"count": 12,
"country_name": "america",
"s_name": "us"
},
{
"count": 10,
"country_name": "new zealand",
"s_name": "nz"
},
{
"count": 20,
"country_name": "India",
"s_name": "Ind"
}]
Row 2 data is like below
[{
"count": 12,
"country_name": "america",
"s_name": "us"
},
{
"count": 10,
"country_name": "South Africa",
"s_name": "sa"
},
{
"count": 20,
"country_name": "india",
"s_name": "ind"
}]
like so on.
I need result like below
[{
"count": 24,
"country_name": "america",
"s_name": "us"
}, {
"count": 10,
"country_name": "new zealand",
"s_name": "nz"
},
{
"count": 40,
"country_name": "India",
"s_name": "Ind"
}, {
"count": 10,
"country_name": "South Africa",
"s_name": "sa"
}
]
the above data is for only one row i have multiple rows timeCountry is column
What I tried writing for aggregation
{
"query": {
"match_all": {}
},
"aggregations":{
"records" :{
"nested":{
"path":"timeCountry"
},
"aggregations":{
"ids":{
"terms":{
"field": "timeCountry.country_name"
}
}
}
}
}
}
But its not working Please help
I tried this on my local elastic cluster and I was able to get aggregated data on the nested documents. Depending on your mapping of index the answer may vary from mine. Following is the DSL that I tried with for aggregation :
{
"aggs" : {
"records" : {
"nested" : {
"path" : "timeCountry"
},
"aggs" : {
"ids" : { "terms" : {
"field" : "timeCountry.country_name.keyword"
},
"aggs": {"sum_name": { "sum" : { "field" : "timeCountry.count" } } }
}
}
}
}
}
Following is the mapping of my index:
{
"settings" : {
"number_of_shards" : 1
},
"mappings": {
"agg_data" : {
"properties" : {
"timeCountry" : {
"type" : "nested"
}
}
}
}
}

elastic search query for aggregating on sub-fields of array

My elastic search index contain data in such format, the data is an array of object which contain date , order, visit for that date on that term :-
{
"term": "ふるさと納税",
"data": [
{
"date": "2018-01-25",
"visits": 17670,
"ranking": 1,
"orders": 154
},
{
"date": "2018-02-14",
"visits": 13758,
"ranking": 1,
"orders": 116
},
{
"date": "2017-12-24",
"visits": 142578,
"ranking": 1,
"orders": 2565
},
{
"date": "2018-03-08",
"visits": 21799,
"ranking": 1,
"orders": 312
}
]
},{
"term": "帯 中古 振袖",
"data": [
{
"date": "2018-01-30",
"ranking": 2966,
"orders": 0,
"visits": 345
}
]
}
i would like to sum all the visits and orders for the term within a defined date range
I have created this query :-
{
"_source": [],
"query": {
"bool": {
"filter": [
{"range": {"data.date": {"gte" : "2018-03-21"}}},
{"range": {"data.date": {"lte" : "2018-03-21"}}}
]
}
},
"aggs" : {
"by_term": {
"terms": {
"field": "term",
"order":{"sum_ranking":"desc"},
"size":100
},"aggs": {
"sum_ranking": {
"sum": {
"field" : "data.visits"
}
}
}
}
},
"from" : 0,
"size" : 0
}
it seems the filter is not working .
can any one help.
The mapping is :-
{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"keyword" : {
"properties" : {
"term" : { "type" : "keyword" }
}
}
}
}

how to enable elasticsearch return aggregations including the name in the response

I m performing some aggregations (by_shop and by_category ) over a data set using elasticsearch. The thing is I get the response where it s not specified the name of each agg and thus it s difficult to parse the response.
Query
"aggregations" : {
"byShop" : {
"terms" : {
"field" : "shopName",
"size" : 0
}
},
"byCategory" : {
"terms" : {
"field" : "category",
"size" : 0
}
}
}
Respone
"aggs": [
[
{
"name": "bucket",
"count": 5075,
"key": "shop1"
},
{
"name": "bucket",
"count": 1,
"key": "shop2"
}
],
[
{
"name": "bucket",
"count": 11,
"key": "Jewelry & Watches"
},
{
"name": "bucket",
"count": 1,
"key": "Home & Garden/Home Décor"
}
]
Ideally, I would like to see the following:
"aggregations": {
"byShop": {
"buckets": [
{
"count": 5075,
"key": "shop1"
},
{
"count": 1,
"key": "shop2"
}
]
},
"byCategory": {
"buckets": [
{
"count": 11,
"key": "Jewelry & Watches"
},
{
"count": 11,
"key": "Home & Garden/Home Décor"
}
]
}
}
EDIT
productResponse.getAggs().add(searchResult.getAggregations().getTermsAggregation("ByCategory").getBuckets());
productResponse.getAggs().add(searchResult.getAggregations().getTermsAggregation("ByShopname").getBuckets());
where searchResult holds the response from Elastcisearch. It seems that getBucket() trims the names of the aggs , right?

Resources