Kibana 4 is very cool , however creating a line chart for the data I have below seems very confusing . Here's the problem . So I have to watch the cpu usage over time on a specified date (or say a date range ) for the below kind of data
{
"_index": "demo-index-01-10-2016",
"_type": "mapping1",
"_id": "AVOJL8SAfhtnGcHBklKt",
"_score": 1,
"_source": {
"my_custom_id ": 165,
"MEM": 89.12,
"TIME": "2016-01-10T15:22:35",
"CPU": 68.99
}
Find bigger sample here .
On the x axis however we can select date histogram but the problem with Y Axis is the aggregate function . It has count which shows the item count for the above kind of json fields . Selecting sum in Y axis (as per this answer suggests do not works for me) and then Selecting 'CPU' gives the sum of the cpu field which is ofcourse not desired . I want to plot for individual cpu field against individual timestamp which is the most basic graph I expect . How can I get that ?
I hope I've understood your question - that you want to plot CPU usage against time, showing each sample. With a Date Histogram, Kibana looks for an aggregation for each time bucket, whereas you want to plot samples.
One way would be to use a date histogram, but ensure that the "interval" is less than your sample period. Then use the "Average", "Min" or "Max" aggregation on the Y axis. Note that the sample timestamp will be adjusted to the histogram.
Generally, I'd suggest using a date histogram with the "Average" or "Max" aggregation on the Y, and not worry too much about plotting individual samples, as you're looking for a trend. (Using "Max" is good for spotting outliers) Setting the interval to "Auto" will pick a decent level of granularity, and then you can zoom in to get more detail if you need it.
Also, ensure your mapping is correct, CPU usage should be a float or a double I believe.
Hope this helps
Related
I have elastic search document that looks like this:
...
{
title : "post 1",
total_likes : 100,
total_comments : 129,
updated_at : "2020-10-19"
},
...
And i use a query that boost the likes and comments with respect to the post creation date
so it look like this:
total_likes^6,
total_comments^4,
updated_at
now the issue with this approach, that if some post had a huge number of likes it will stuck on top of the results forever no matter when it is created.
How i can minimize the boost as the time pass, for example a very fresh post will have the full boost factor (6,4) however, a post that has been created 1 year ago will have the factors (2,1) ?
So I think what you are look for is the function score in coordination with the decay factor [doc]
Or if your logic is more complex, you could write it in painless in the function field value factor [doc]
New to Kibana visualizations here...
I'm planning on publishing a JSON (once a day) that has populations of list of cities. Following is a sample JSON:
{
"timestamp":"2019-10-10",
"population_stats":[
{
"city":"New York",
"population":8398748
},
{
"city":"Los Angeles",
"population":3976322
}
]
}
I'd like to setup cities in the X axis and population count in Y axis.
I can setup my X axis property (with Field aggregations) however I just can't get the populations to reflect in the Y axis.
Using "count" in the Y axis always gives me 1 -- I guess this is because there's only one document for the given date range.
Is there a proper way to get the correct population count to display on the Y axis?
Finally managed to figure this out!
Folks are correct about Kibana not being able detecting inner fields, so you basically have to create a JSON for each city (going by my example in the question above). And then from visualizations, you need to select "sum" or "average" aggregation-type. That's all!
I have an application which writes Time Series data to Elasticsearch. The (simplified) data looks like the following:
{
"timestamp": 1425369600000,
"shares": 12271
},
{
"timestamp": 1425370200000,
"shares": 12575
},
{
"timestamp": 1425370800000,
"shares": 12725
},
...
I now would like to use an aggregation to calculate the change rate of the shares field by time "buckets", for example like
The change rate of the share values within the last 10 minute "bucket" could be IMHO calculated as
# of shares t1
--------------
# of shares t0
I tried the Date Histogram aggregation, but I guess that's not what I need to calculate the change rates, because this would only give me the doc_count, and it's not clear to me how I could calculate the change rate from these:
{
"aggs" : {
"shares_over_time" : {
"date_histogram" : {
"field" : "timestamp",
"interval" : "10m"
}
}
}
}
Is there a way to achieve my goal with aggregations within Elasticsearch? I search the docs, but didn't find a matching method.
Thanks a lot for any help!
I think it is hard to achieve with out-of-the-box aggregate functions. However, you can take a look at percentile_ranks_aggregation and add your own modifications to the script to to create point in time rates.
Also, sorry for the off-top, but I wonder: is the elastic search the best fit for that kind of stuff? As I understand, at any given point in time you need only the previous sample data to calculate the correct rate for the current sample. This sounds to me like a better fit for some sliding window algorithm real time implementation (even on some relational DB like Postgres), where you keep a fixed number of time buckets and counters you are interested in inside the bucket. Once the new sample 'arrives', you update (slide) the window and calculate the updated rate for the most recent time bucket.
I use logstash to store log files containing the speed of vehicles over time.
In Kibana 3, how can I generate a panel which displays a value over time, i.e. the x axis displays the time and the y axis the related value, e.g. vehicle speed.
Most panels I found count the occurrence of events in a given time span and display it on the y axis. My goal however is to directly print a value from the json log entry (wheelSpeed_m_s), which looks as follows:
{
"_index": "logstash-2013.05.07",
"_type": "vehicle_odometry",
"_id": "Q3b58Pi7RUKuPon0s_ihlA",
"_score": null,
"_source": {
"message": " ",
"wheelSpeed_m_s": 0.91,
"#timestamp": "2013-05-07T17:50:04.099+02:00",
"angularVelocity_rad_s": 0,
"type": "vehicle_odometry",
"#version": "1",
"ts_ms": 1367934604099
},
}
Any help is highly appreciated.
In the histogram panel, click the "Configure" (gear) icon, then select the "Panel" tab.
On that tab, you can select the "Chart value". This defaults to count, but can be any of the basic math set functions (mean, max, min, total). Select the function, and you'll be asked to enter the field to which the function should be applied:
OP: please don't accept this answer (rutter deserves the points for getting you straight). I leave the info here to complete the question so it's not marked as 'unanswered'.
I have several hundred thousand documents in an elasticsearch index with associated latitudes and longitudes (stored as geo_point types). I would like to be able to create a map visualization that looks something like this: http://leaflet.github.io/Leaflet.markercluster/example/marker-clustering-realworld.388.html
So, I think what I want is to run a query with a bounding box (i.e., the map boundaries that the user is looking at) and return a summary of the clusters within this bounding box. Is there a good way to accomplish this in elasticsearch? A new indexing strategy perhaps? Something like geohashes could work, but it would cluster things into a rectangular grid, rather than the arbitrary polygons based on point density as seen in the example above.
#kumetix - Good question. I'm responding to your comment here because the text was too long to put in another comment. The geohash_precision setting will dictate the maximum precision at which a geohash aggregation will be able to return. For example, if geohash_precision is set to 8, we can run a geohash aggregation on that field with at most precision 8. This would, according to the reference, return results grouped in geohash boxes of roughly 38.2m x 19m. A precision of 7 or 8 would probably be accurate enough for showing a web-based heatmap like the one I mentioned in the above example.
As far as how geohash_precision affects the cluster internals, I'm guessing the setting stores a geohash string of length <= geohash_precision inside the geo_point. Let's say we have a point at the Statue of Liberty: 40.6892,-74.0444. The geohash12 for this is: dr5r7p4xb2ts. Setting geohash_precision in the geo_point to 8 would internally store the strings:
d
dr
dr5
dr5r
dr5r7
dr5r7p
dr5r7p4
dr5r7p4x
and a geohash_precision of 12 would additionally internally store the strings:
dr5r7p4xb
dr5r7p4xb2
dr5r7p4xb2t
dr5r7p4xb2ts
resulting in a little more storage overhead for each geo_point. Setting the geohash_precision to a distance value (1km, 1m, etc) probably just stores it at the closest geohash string length precision value.
Note: How to calculate geohashes using python
$ pip install python-geohash
>>> import geohash
>>> geohash.encode(40.6892,-74.0444)
'dr5r7p4xb2ts'
In Elasticsearch 1.0, you can use the new Geohash Grid aggregation.
Something like geohashes could work, but it would cluster things into a rectangular grid, rather than the arbitrary polygons based on point density as seen in the example above.
This is true, but the geohash grid aggregation handles sparse data well, so all you need is enough points on your grid and you can achieve something pretty similar to the example in that map.
Try this:
https://github.com/triforkams/geohash-facet
We have been using it to do server-side clustering and it's pretty good.
Example query:
GET /things/thing/_search
{
"size": 0,
"query": {
"filtered": {
"filter": {
"geo_bounding_box": {
"Location"
: {
"top_left": {
"lat": 45.274886437048941,
"lon": -34.453125
},
"bottom_right": {
"lat": -35.317366329237856,
"lon": 1.845703125
}
}
}
}
}
},
"facets": {
"places": {
"geohash": {
"field": "Location",
"factor": 0.85
}
}
}
}