New to Kibana visualizations here...
I'm planning on publishing a JSON (once a day) that has populations of list of cities. Following is a sample JSON:
{
"timestamp":"2019-10-10",
"population_stats":[
{
"city":"New York",
"population":8398748
},
{
"city":"Los Angeles",
"population":3976322
}
]
}
I'd like to setup cities in the X axis and population count in Y axis.
I can setup my X axis property (with Field aggregations) however I just can't get the populations to reflect in the Y axis.
Using "count" in the Y axis always gives me 1 -- I guess this is because there's only one document for the given date range.
Is there a proper way to get the correct population count to display on the Y axis?
Finally managed to figure this out!
Folks are correct about Kibana not being able detecting inner fields, so you basically have to create a JSON for each city (going by my example in the question above). And then from visualizations, you need to select "sum" or "average" aggregation-type. That's all!
Related
I'm posting because I have found no content surrounding this topic.
My goal is essentially to produce a time-binned graph that plots some aggregated value. For Example. Usually this would be a doddle, since there is a single timestamp for each value, making it relatively straight forward to bin.
However, my problem lies in having two timestamps for each value - a start and an end. Similar to a gantt chart, here is an example of my plotted data. I essentially want to bin the values (average) for when the timelines exist within said bin (bin boundaries could be where a new/old task starts/ends). Likeso.
I'm looking for a basic example or an answer to whether this is even supported, in Vega-Lite. My current working example would yield no benefit to this discussion.
I see that you found a Vega solution, but I think in Vega-Lite what you were looking for was something like the following. You put the start field in "x" and the end field in x2, add bin and type to x and all should work.
"encoding": {
"x": {
"field": "start_time",
"bin": { "binned": true },
"type": "temporal",
"title": "Time"
},
"x2": {
"field": "end_time"
}
}
I lost my old account, but I was the person who posted this. Here is my solution to my question. The value I am aggregating here is the sum of times the timelines for each datapoint is contained within each bin.
First you want to use a join aggregate to get the max and min times your data extend to. You could also hardcode this.
{
type: joinaggregate
fields: [
startTime
endTime
]
ops: [
min
max
]
as: [
min
max
]
}
You want to find a step for your bins, you can hard code this later or use a formula and write this into a new field.
You want to create two new fields in your data that is a sequence between the max and min, and the other the same sequence offset by your step.
{
type: formula
expr: sequence(datum.min, datum.max, datum.step)
as: startBin
}
{
type: formula
expr: sequence(datum.min + datum.step, datum.max + datum.step, datum.step)
as: endBin
}
The new fields will be arrays. So if we go ahead and use a flatten transform we will get a row for each data value in each bin.
{
type: flatten
fields: [
startBin
endBin
]
}
You then want to calculate the total time your data spans across each specific bin. In order to do this you will need to round up the start time to the bin start and round down the end time to the bin end. Then taking the difference between the start and end times.
{
type: formula
expr: if(datum.startTime<datum.startBin, datum.startBin, if(datum.startTime>datum.endBin, datum.endBin, datum.startTime))
as: startBinTime
}
{
type: formula
expr: if(datum.endTime<datum.startBin, datum.startBin, if(datum.endTime>datum.endBin, datum.endBin, datum.endTime))
as: endBinTime
}
{
type: formula
expr: datum.endBinTime - datum.startBinTime
as: timeInBin
}
Finally, you just need to aggregate the data by the bins and sum up these times. Then your data is ready to be plotted.
{
type: aggregate
groupby: [
startBin
endBin
]
fields: [
timeInBin
]
ops: [
sum
]
as: [
timeInBin
]
}
Although this solution is long, it is relatively easily to implement in the transform section of your data. From my experience this runs fast and just displays how versatile Vega can be. Freedom to visualisations!
I need help to generate a visualization. A term in one of my document indicies, 'temperature', is not in the drop down box of fields to visualize in kibana. What must I change so that 'temperature' shows up as a field in the drop down?
Situation:
ES 5.1
Dynamic Templates
The field is present a portion of documents
The index mapping interprets the field as a 'long'
In Discover, Kibana can filter the documents shows a table of "temperature" and "timestamp." I seek help to visualize the data shown in that table.
A filtered search for the term in the console yields a search result with documents.
GET /_search
{
"size" : 10,
"_source": ["temperature", "timestamp" ],
"query" : {
"term" : { "name" : "HomeThermostat" }
}
}
If you wish to visualize let's say a date histogram, where the X-Axis is the timestamp and Y-Axis is a numeric field (in your case: temperature) , then you would have to choose the following settings from the drop down:
For X-Axis
Aggregation = Date Histogram
field = timestamp
Interval = Choose your desired interval
For Y-Axis
Aggregation = Median (median of a single value is the value itself)
field = temperature
If your drop down does not show the field temperature then a possible reason is that temperature is not being recognized as a numeric value.
Go to Management -> Index Patterns -> your index and check whether the field temperature is saved as a number or not.
So I have an index of cities, looks something like this:
{
"location": "41.388587, 2.175888",
"name": "BARCELONA",
"radius": 20
}
We have a few dozen of these. I need to be able to query this index with a single lat/lng combination and see if it falls inside one of our "cities".
The location property is the centre of the city, and the radius is the radius of the city in km (assuming all the cities are circles). We can also assume no cities overlap.
How can I return whether or not a lat/lng combination falls within a city?
For example, given the point 40.419691, -3.701254, how can I determine if this falls within BARCELONA?
you can do it easily, in either Lucene, Solr or ES.
In Solr for example:
declare a type of SpatialRecursivePrefixTreeFieldType. This allows you to index different shapes, not just points
by using lat/long and the radius, you create an specific circle for each city, and you index that shape, in a field called 'shape' for example:
{
"location": "41.388587, 2.175888",
"name": "BARCELONA",
"shape": "CIRCLE (2.175888 41.388587, 20)"
}
then you just query for any doc that intersects with your point (untested):
fq=shape:"Intersects(40.419691 -3.701254)"
Be sure to check the docs and javadocs for the specific version of Lucene/Solr/ES you are using, as APIs have been changing in this space
Kibana 4 is very cool , however creating a line chart for the data I have below seems very confusing . Here's the problem . So I have to watch the cpu usage over time on a specified date (or say a date range ) for the below kind of data
{
"_index": "demo-index-01-10-2016",
"_type": "mapping1",
"_id": "AVOJL8SAfhtnGcHBklKt",
"_score": 1,
"_source": {
"my_custom_id ": 165,
"MEM": 89.12,
"TIME": "2016-01-10T15:22:35",
"CPU": 68.99
}
Find bigger sample here .
On the x axis however we can select date histogram but the problem with Y Axis is the aggregate function . It has count which shows the item count for the above kind of json fields . Selecting sum in Y axis (as per this answer suggests do not works for me) and then Selecting 'CPU' gives the sum of the cpu field which is ofcourse not desired . I want to plot for individual cpu field against individual timestamp which is the most basic graph I expect . How can I get that ?
I hope I've understood your question - that you want to plot CPU usage against time, showing each sample. With a Date Histogram, Kibana looks for an aggregation for each time bucket, whereas you want to plot samples.
One way would be to use a date histogram, but ensure that the "interval" is less than your sample period. Then use the "Average", "Min" or "Max" aggregation on the Y axis. Note that the sample timestamp will be adjusted to the histogram.
Generally, I'd suggest using a date histogram with the "Average" or "Max" aggregation on the Y, and not worry too much about plotting individual samples, as you're looking for a trend. (Using "Max" is good for spotting outliers) Setting the interval to "Auto" will pick a decent level of granularity, and then you can zoom in to get more detail if you need it.
Also, ensure your mapping is correct, CPU usage should be a float or a double I believe.
Hope this helps
I have several hundred thousand documents in an elasticsearch index with associated latitudes and longitudes (stored as geo_point types). I would like to be able to create a map visualization that looks something like this: http://leaflet.github.io/Leaflet.markercluster/example/marker-clustering-realworld.388.html
So, I think what I want is to run a query with a bounding box (i.e., the map boundaries that the user is looking at) and return a summary of the clusters within this bounding box. Is there a good way to accomplish this in elasticsearch? A new indexing strategy perhaps? Something like geohashes could work, but it would cluster things into a rectangular grid, rather than the arbitrary polygons based on point density as seen in the example above.
#kumetix - Good question. I'm responding to your comment here because the text was too long to put in another comment. The geohash_precision setting will dictate the maximum precision at which a geohash aggregation will be able to return. For example, if geohash_precision is set to 8, we can run a geohash aggregation on that field with at most precision 8. This would, according to the reference, return results grouped in geohash boxes of roughly 38.2m x 19m. A precision of 7 or 8 would probably be accurate enough for showing a web-based heatmap like the one I mentioned in the above example.
As far as how geohash_precision affects the cluster internals, I'm guessing the setting stores a geohash string of length <= geohash_precision inside the geo_point. Let's say we have a point at the Statue of Liberty: 40.6892,-74.0444. The geohash12 for this is: dr5r7p4xb2ts. Setting geohash_precision in the geo_point to 8 would internally store the strings:
d
dr
dr5
dr5r
dr5r7
dr5r7p
dr5r7p4
dr5r7p4x
and a geohash_precision of 12 would additionally internally store the strings:
dr5r7p4xb
dr5r7p4xb2
dr5r7p4xb2t
dr5r7p4xb2ts
resulting in a little more storage overhead for each geo_point. Setting the geohash_precision to a distance value (1km, 1m, etc) probably just stores it at the closest geohash string length precision value.
Note: How to calculate geohashes using python
$ pip install python-geohash
>>> import geohash
>>> geohash.encode(40.6892,-74.0444)
'dr5r7p4xb2ts'
In Elasticsearch 1.0, you can use the new Geohash Grid aggregation.
Something like geohashes could work, but it would cluster things into a rectangular grid, rather than the arbitrary polygons based on point density as seen in the example above.
This is true, but the geohash grid aggregation handles sparse data well, so all you need is enough points on your grid and you can achieve something pretty similar to the example in that map.
Try this:
https://github.com/triforkams/geohash-facet
We have been using it to do server-side clustering and it's pretty good.
Example query:
GET /things/thing/_search
{
"size": 0,
"query": {
"filtered": {
"filter": {
"geo_bounding_box": {
"Location"
: {
"top_left": {
"lat": 45.274886437048941,
"lon": -34.453125
},
"bottom_right": {
"lat": -35.317366329237856,
"lon": 1.845703125
}
}
}
}
}
},
"facets": {
"places": {
"geohash": {
"field": "Location",
"factor": 0.85
}
}
}
}