From looking at https://vega.github.io/editor/#/examples/vega-lite/bar_grouped it shows example of creating grouped bar chart from a table of data.
In my case since I am getting data from elasticsearch it is not in tabular form.
I can't figure out a way to create two bar chart for each sum metric on a bucket.
"buckets" : [
{
"key_as_string" : "03/Dec/2019:00:00:00 +0900",
"key" : 1575298800000,
"doc_count" : 11187,
"deploy_agg" : {
"buckets" : {
"deploy_count" : {
"doc_count" : 43
}
}
},
"start_agg" : {
"buckets" : {
"start_count" : {
"doc_count" : 171
}
}
},
"sum_start_agg" : {
"value" : 171.0
},
"sum_deploy_agg" : {
"value" : 43.0
}
},..
I want to create two bars, one representing value of sum_start_agg and another one representing sum_deploy_agg value.
This is what I had for one bar chart.
"encoding": {
"x": {
"field": "key",
"type": "temporal",
"axis": {"title": "DATE"}
},
"y": {
"field": "deploy_agg.buckets.deploy_count.doc_count",
"type": "quantitative",
"axis": {"title": "deploy_count"}
}
"color": {"value": "green"}
"tooltip": [
{
"field": "deploy_agg.buckets.deploy_count.doc_count",
"type": "quantitative",
"title":"value"
}
]
}
You can use the Fold Transform to fold your two columns so that they can be referenced in an encoding. It might look something like this:
{
"data": {
"values": [
{
"key_as_string": "03/Dec/2019:00:00:00 +0900",
"key": 1575298800000,
"doc_count": 11187,
"deploy_agg": {"buckets": {"deploy_count": {"doc_count": 43}}},
"start_agg": {"buckets": {"start_count": {"doc_count": 171}}},
"sum_start_agg": {"value": 171},
"sum_deploy_agg": {"value": 43}
}
]
},
"transform": [
{
"fold": ["sum_start_agg.value", "sum_deploy_agg.value"],
"as": ["entry", "value"]
}
],
"mark": "bar",
"encoding": {
"x": {"field": "entry", "type": "nominal", "axis": null},
"y": {"field": "value", "type": "quantitative"},
"column": {"field": "key", "type": "temporal"},
"color": {"field": "entry", "type": "nominal"}
}
}
Related
I have the following documents:
{"id": 1, "type": "bags", "brand": "Louis Vuitton", "condition": "new", "price": 500}
{"id": 2, "type": "bags", "brand": "Louis Vuitton", "condition": "new", "price": 450}
{"id": 3, "type": "bags", "brand": "Louis Vuitton", "condition": "new", "price": 420}
{"id": 4, "type": "bags", "brand": "Louis Vuitton", "condition": "like new", "price": 150}
{"id": 5, "type": "bags", "brand": "Louis Vuitton", "condition": "like new", "price": 150}
{"id": 6, "type": "bags", "brand": "Louis Vuitton", "condition": "like new", "price": 100}
{"id": 7, "type": "bags", "brand": "Louis Vuitton" "condition": "used", "price": 400}
{"id": 8, "type": "bags", "brand": "Louis Vuitton", "condition": "used", "price": 350}
{"id": 9, "type": "bags", "brand": "Louis Vuitton", "condition": "used", "price": 300}
I am looking to write a query that will return to me the Percentiles of prices for the top 2 documents for each condition. In other words, I want to perform some calculation after getting the top 2 best scoring documents for each item condition (new, like new, used). I have tried this but I am getting the error the error Aggregator of type top_hits cannot accept sub-aggregations:
{
"query": {
"match": {
"brand": "Louis Vuitton"
}
},
"aggs": {
"item_conditions": {
"terms": {
"field": "condition"
},
"aggs": {
"top_two": {
"top_hits": {
"size": 2
},
"aggs": {
"top_two_percentiles": {
"percentiles": {
"field": "price"
}
}
}
}
}
}
}
}
Is there another way to achieve this, or do I have to do some post-processing myself after getting the results back from ES? The end result I want is to be able to supply this data to charts to make it look like this: https://ibb.co/y5FpV80
"... the percentiles of prices for the top two documents ..." is somewhat arbitrary. What's the metric that determines the score? A terms aggregation would score the buckets equally. The only differentiating factor would be the bucket count... What I'm saying is, you'll need to first determine what puts a given bucket in the top 2 and go from there.
In any event, you can:
Order any terms aggregation by the result of one of its numeric child aggregations.
After that, you can limit it to 2 buckets.
When that's done, you can use a percentiles bucket aggregation to calculate the percentiles of the two top prices.
In concrete terms:
POST your-index/_search?filter_path=aggregations.*.buckets.key,aggregations.*.buckets.doc_count,aggregations.*.buckets.percentiles_top_two_prices
{
"size": 0,
"query": {
"match": {
"brand": "Louis Vuitton"
}
},
"aggs": {
"item_conditions": {
"terms": {
"field": "condition"
},
"aggs": {
"top_two": {
"terms": {
"field": "price",
"size": 2,
"order": {
"max_score": "desc" <-- here's how you enforce the top 2 docs
}
},
"aggs": {
"max_score": {
"max": {
"script": "_score" <-- how you determine what happens here is up to you. _score will be equal across all buckets (I believe) so pick some other metric.
}
},
"just_the_price": {
"min": {
"field": "price" <-- there's no "identity" agg in ES so I'm using min. There will be only bucket because you're already under the parent which aggregates the price.
}
}
}
},
"percentiles_top_two_prices": {
"percentiles_bucket": {
"buckets_path": "top_two>just_the_price"
}
}
}
}
}
}
yielding something along the lines of:
{
"aggregations" : {
"item_conditions" : {
"buckets" : [
{
"key" : "like new",
"doc_count" : 3,
"percentiles_top_two_prices" : {
"values" : {
"1.0" : 100.0,
"5.0" : 100.0,
"25.0" : 100.0,
"50.0" : 150.0,
"75.0" : 150.0,
"95.0" : 150.0,
"99.0" : 150.0
}
}
},
{
"key" : "new",
"doc_count" : 3,
"percentiles_top_two_prices" : {
"values" : {
"1.0" : 420.0,
"5.0" : 420.0,
"25.0" : 420.0,
"50.0" : 450.0,
"75.0" : 450.0,
"95.0" : 450.0,
"99.0" : 450.0
}
}
},
{
"key" : "used",
"doc_count" : 3,
"percentiles_top_two_prices" : {
"values" : {
"1.0" : 300.0,
"5.0" : 300.0,
"25.0" : 300.0,
"50.0" : 350.0,
"75.0" : 350.0,
"95.0" : 350.0,
"99.0" : 350.0
}
}
}
]
}
}
}
I'm frankly not sure what these stats would bring you (when based on only two values) but this is how it could be done 😉
How to get multiple fields returned that are unique using elasticsearch query?
All of my documents have duplicate name and job fields. I would like to use an es query to get all the unique values which include the name and job in the same response, so they are tied together.
[
{
"name": "albert",
"job": "teacher",
"dob": "11/22/91"
},
{
"name": "albert",
"job": "teacher",
"dob": "11/22/91"
},
{
"name": "albert",
"job": "teacher",
"dob": "11/22/91"
},
{
"name": "justin",
"job": "engineer",
"dob": "1/2/93"
},
{
"name": "justin",
"job": "engineer",
"dob": "1/2/93"
},
{
"name": "luffy",
"job": "rubber man",
"dob": "1/2/99"
}
]
Expected result in any format -> I was trying to use aggs but I only get one field
[
{
"name": "albert",
"job": "teacher"
},
{
"name": "justin",
"job": "engineer"
},
{
"name": "luffy",
"job": "rubber man"
},
]
This is what I tried so far
GET name.test.index/_search
{
"size": 0,
"aggs" : {
"name" : {
"terms" : { "field" : "name.keyword" }
}
}
}
using the above query gets me this which is good that its unique
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 95,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Justin",
"doc_count" : 56
},
{
"key" : "Luffy",
"doc_count" : 31
},
{
"key" : "Albert",
"doc_count" : 8
}
]
}
}
}
I tried doing nested aggregation but that did not work. Is there an alternative solution for getting multiple unique values or am I missing something?
That's a good start! There are a few ways to achieve what you want, each provides a different response format, so you can decide which one you prefer.
The first option is to leverage the top_hits sub-aggregation and return the two fields for each name bucket:
GET name.test.index/_search
{
"size": 0,
"aggs": {
"name": {
"terms": {
"field": "name.keyword"
},
"aggs": {
"top": {
"top_hits": {
"_source": [
"name",
"job"
],
"size": 1
}
}
}
}
}
}
The second option is to use a script in your terms aggregation instead of a field to return a compound value:
GET name.test.index/_search
{
"size": 0,
"aggs": {
"name": {
"terms": {
"script": "doc['name'].value + ' - ' + doc['job'].value"
}
}
}
}
The third option is to use two levels of field collapsing:
GET name.test.index/_search
{
"collapse": {
"field": "name",
"inner_hits": {
"name": "by_job",
"collapse": {
"field": "job"
},
"size": 1
}
}
}
I have to do some aggregation on json data. I saw multiple answers here on stackoverflow but not nothing worked for me.
I have multiple row and in timeCountry column i have an array which stores JSON objects. with keys count, country_name, s_name.
I have to find the sum of all the rows according to s_name,
Example - if in 1st row timeCountry holds array like below
[ {
"count": 12,
"country_name": "america",
"s_name": "us"
},
{
"count": 10,
"country_name": "new zealand",
"s_name": "nz"
},
{
"count": 20,
"country_name": "India",
"s_name": "Ind"
}]
Row 2 data is like below
[{
"count": 12,
"country_name": "america",
"s_name": "us"
},
{
"count": 10,
"country_name": "South Africa",
"s_name": "sa"
},
{
"count": 20,
"country_name": "india",
"s_name": "ind"
}]
like so on.
I need result like below
[{
"count": 24,
"country_name": "america",
"s_name": "us"
}, {
"count": 10,
"country_name": "new zealand",
"s_name": "nz"
},
{
"count": 40,
"country_name": "India",
"s_name": "Ind"
}, {
"count": 10,
"country_name": "South Africa",
"s_name": "sa"
}
]
the above data is for only one row i have multiple rows timeCountry is column
What I tried writing for aggregation
{
"query": {
"match_all": {}
},
"aggregations":{
"records" :{
"nested":{
"path":"timeCountry"
},
"aggregations":{
"ids":{
"terms":{
"field": "timeCountry.country_name"
}
}
}
}
}
}
But its not working Please help
I tried this on my local elastic cluster and I was able to get aggregated data on the nested documents. Depending on your mapping of index the answer may vary from mine. Following is the DSL that I tried with for aggregation :
{
"aggs" : {
"records" : {
"nested" : {
"path" : "timeCountry"
},
"aggs" : {
"ids" : { "terms" : {
"field" : "timeCountry.country_name.keyword"
},
"aggs": {"sum_name": { "sum" : { "field" : "timeCountry.count" } } }
}
}
}
}
}
Following is the mapping of my index:
{
"settings" : {
"number_of_shards" : 1
},
"mappings": {
"agg_data" : {
"properties" : {
"timeCountry" : {
"type" : "nested"
}
}
}
}
}
My elastic search index contain data in such format, the data is an array of object which contain date , order, visit for that date on that term :-
{
"term": "ふるさと納税",
"data": [
{
"date": "2018-01-25",
"visits": 17670,
"ranking": 1,
"orders": 154
},
{
"date": "2018-02-14",
"visits": 13758,
"ranking": 1,
"orders": 116
},
{
"date": "2017-12-24",
"visits": 142578,
"ranking": 1,
"orders": 2565
},
{
"date": "2018-03-08",
"visits": 21799,
"ranking": 1,
"orders": 312
}
]
},{
"term": "帯 中古 振袖",
"data": [
{
"date": "2018-01-30",
"ranking": 2966,
"orders": 0,
"visits": 345
}
]
}
i would like to sum all the visits and orders for the term within a defined date range
I have created this query :-
{
"_source": [],
"query": {
"bool": {
"filter": [
{"range": {"data.date": {"gte" : "2018-03-21"}}},
{"range": {"data.date": {"lte" : "2018-03-21"}}}
]
}
},
"aggs" : {
"by_term": {
"terms": {
"field": "term",
"order":{"sum_ranking":"desc"},
"size":100
},"aggs": {
"sum_ranking": {
"sum": {
"field" : "data.visits"
}
}
}
}
},
"from" : 0,
"size" : 0
}
it seems the filter is not working .
can any one help.
The mapping is :-
{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"keyword" : {
"properties" : {
"term" : { "type" : "keyword" }
}
}
}
}
I m performing some aggregations (by_shop and by_category ) over a data set using elasticsearch. The thing is I get the response where it s not specified the name of each agg and thus it s difficult to parse the response.
Query
"aggregations" : {
"byShop" : {
"terms" : {
"field" : "shopName",
"size" : 0
}
},
"byCategory" : {
"terms" : {
"field" : "category",
"size" : 0
}
}
}
Respone
"aggs": [
[
{
"name": "bucket",
"count": 5075,
"key": "shop1"
},
{
"name": "bucket",
"count": 1,
"key": "shop2"
}
],
[
{
"name": "bucket",
"count": 11,
"key": "Jewelry & Watches"
},
{
"name": "bucket",
"count": 1,
"key": "Home & Garden/Home Décor"
}
]
Ideally, I would like to see the following:
"aggregations": {
"byShop": {
"buckets": [
{
"count": 5075,
"key": "shop1"
},
{
"count": 1,
"key": "shop2"
}
]
},
"byCategory": {
"buckets": [
{
"count": 11,
"key": "Jewelry & Watches"
},
{
"count": 11,
"key": "Home & Garden/Home Décor"
}
]
}
}
EDIT
productResponse.getAggs().add(searchResult.getAggregations().getTermsAggregation("ByCategory").getBuckets());
productResponse.getAggs().add(searchResult.getAggregations().getTermsAggregation("ByShopname").getBuckets());
where searchResult holds the response from Elastcisearch. It seems that getBucket() trims the names of the aggs , right?