I was just wondering if anybody knew of a way to be able to show a graph of the difference of metrics like system.network.in.bytes -
If you look at this graph you can just see that the value continuously gets bigger (at around the same speed) - but I just want to graph the difference between each value not the total.
Example
Anyone have any ideas?
Try a timeseries visualization or timelion.
Assuming your field name is 'bytesIn' (for simplicity) and taking 1 minute intervals (as IMO 30s isn't possible in timelion), your timelion expression should look something like:
.es(*,metric='avg:bytesIn').subtract(.es(*,metric='avg:bytesIn',offset='-1m'))
Explanation
.es(*,metric='avg:bytesIn') gives average of bytesIn over a time interval (here I'm assuming 1m)
Adding offset='-1m', offsets the series retrieval by -1m as if they are happening now
.subtract just subtracts value of one series from another
Related
Small question regarding Spring Boot, some of the useful default metrics, and how to properly use them in Grafana please.
Currently with a Spring Boot 2.5.1+ (question applicable to 2.x.x.) with Actuator + Micrometer + Prometheus dependencies, there are lots of very handy default metrics that come out of the box.
I am seeing many many of them with pattern _max _count _sum.
Example, just to take a few:
spring_data_repository_invocations_seconds_max
spring_data_repository_invocations_seconds_count
spring_data_repository_invocations_seconds_sum
reactor_netty_http_client_data_received_bytes_max
reactor_netty_http_client_data_received_bytes_count
reactor_netty_http_client_data_received_bytes_sum
http_server_requests_seconds_max
http_server_requests_seconds_count
http_server_requests_seconds_sum
Unfortunately, I am not sure what to do with them, how to correctly use them, and feel like my ignorance makes me miss on some great application insights.
Searching on the web, I am seeing some using like this, to compute what seems to be an average with Grafana:
irate(http_server_requests_seconds::sum{exception="None", uri!~".*actuator.*"}[5m]) / irate(http_server_requests_seconds::count{exception="None", uri!~".*actuator.*"}[5m])
But Not sure if it is the correct way to use those.
May I ask what sort of queries are possible, usually used when dealing with metrics of type _max _count _sum please?
Thank you
UPD 2022/11: Recently I've had a chance to work with these metrics myself and I made a dashboard with everything I say in this answer and more. It's available on Github or Grafana.com. I hope this will be a good example of how you can use these metrics.
Original answer:
count and sum are generally used to calculate an average. count accumulates the number of times sum was increased, while sum holds the total value of something. Let's take http_server_requests_seconds for example:
http_server_requests_seconds_sum 10
http_server_requests_seconds_count 5
With the example above one can say that there were 5 HTTP requests and their combined duration was 10 seconds. If you divide sum by count you'll get the average request duration of 2 seconds.
Having these you can create at least two useful panels: average request duration (=average latency) and request rate.
Request rate
Using rate() or irate() function you can get how many there were requests per second:
rate(http_server_requests_seconds_count[5m])
rate() works in the following way:
Prometheus takes samples from the given interval ([5m] in this example) and calculates difference between current timepoint (not necessarily now) and [5m] ago.
The obtained value is then divided by the amount of seconds in the interval.
Short interval will make the graph look like a saw (every fluctuation will be noticeable); long interval will make the line more smooth and slow in displaying changes.
Average Request Duration
You can proceed with
http_server_requests_seconds_sum / http_server_requests_seconds_count
but it is highly likely that you will only see a straight line on the graph. This is because values of those metrics grow too big with time and a really drastic change must occur for this query to show any difference. Because of this nature, it will be better to calculate average on interval samples of the data. Using increase() function you can get an approximate value of how the metric changed during the interval. Thus:
increase(http_server_requests_seconds_sum[5m]) / increase(http_server_requests_seconds_count[5m])
The value is approximate because under the hood increase() is rate() multiplied by [inverval]. The error is insignificant for fast-moving counters (such as the request rate), just be ready that there can be an increase of 2.5 requests.
Aggregation and filtering
If you already ran one of the queries above, you have noticed that there is not one line, but many. This is due to labels; each unique set of labels that the metric has is considered a separate time series. This can be fixed by using an aggregation function (like sum()). For example, you can aggregate request rate by instance:
sum by(instance) (rate(http_server_requests_seconds_count[5m]))
This will show you a line for each unique instance label. Now if you want to see only some and not all instances, you can do that with a filter. For example, to calculate a value just for nodeA instance:
sum by(instance) (rate(http_server_requests_seconds_count{instance="nodeA"}[5m]))
Read more about selectors here. With labels you can create any number of useful panels. Perhaps you'd like to calculate the percentage of exceptions, or their rate of occurrence, or perhaps a request rate by status code, you name it.
Note on max
From what I found on the web, max shows the maximum recorded value during some interval set in settings (default is 2 minutes if to trust the source). This is somewhat uncommon metric and whether it is useful is up to you. Since it is a Gauge (unlike sum and count it can go both up and down) you don't need extra functions (such as rate()) to see dynamics. Thus
http_server_requests_seconds_max
... will show you the maximum request duration. You can augment this with aggregation functions (avg(), sum(), etc) and label filters to make it more useful.
In a Graphana dashboard with several datapoints, how can I get the difference between the last value and the previouse one for the same metric?
Perhaps the tricky part is that the tiem between 2 datapoins for the same metric is not know.
so the desired result is the <metric>.$current_value - <metric>.$previouse_value for each point in the metricstring.
Edit:
The metrics are stored in graphite/Carbon DB.
thanks
You need to use the derivative function
This is the opposite of the integral function. This is useful for taking a running total metric and calculating the delta between subsequent data points.
This function does not normalize for periods of time, as a true derivative would. Instead see the perSecond() function to calculate a rate of change over time.
Together with the keepLastValue
Takes one metric or a wildcard seriesList, and optionally a limit to the number of ‘None’ values to skip over.
Continues the line with the last received value when gaps (‘None’ values) appear in your data, rather than breaking your line.
Like this
derivative(keepLastValue(your_mteric))
A good example can be found here http://www.perehospital.cat/blog/graphite-getting-derivative-to-work-with-empty-data-points
I’ve got a statistical/mathematical problem I’m stumped on and I was really hoping to get some help. I’m working on a research where I need to compare a weekly graph with its own history to see when in the past it was almost the same. Think of this as “finding the closest match”. The information is displayed as a line graph, but it’s readily available as raw data:
Date...................Result
08/10/18......52.5
08/07/18......60.2
08/06/18......58.5
08/05/18......55.4
08/04/18......55.2
and so on...
What I really want is the output to be a form of correlation between the current data points with the other set of 5 concurrent data points in history. So, something like:
Date range.....................Correlation
07/10/18-07/15/18....0.98
We’ll be getting a code written in Python for the software to do this automatically (so that as new data is added, it automatically runs and finds the closest set of numbers to match the current one).
Here’s where the difficulty sets in: Since numbers are on a general upward trend over time, we don’t want it to compare the absolute value (since the numbers might never really match). One suggestion has been to compare the delta (rate of change as a percentage over the previous day), or using a log scale.
I’m wondering: how do I go about this? What kind of calculation I can use to get the desired results? I’ve looked at the different kind of correlation equations, but they don’t account for the “shape” of the data, and they generally just average it out. The shape of the line chart is the important thing.
Thanks very much in advance!
I would simply divide the data of each week by their average (i.e., normalize them to an average of 1), then sum the squares of the differences of each day of each pair of weeks. This sum is what you want to minimize.
If you don't care about how much a graph oscillates relative to its mean, you can normalize also the variance. For each week, calculate mean and variance, then subtract the mean and divide by the root of the variance. Each week will have mean 0 and variance 1. Then minimize the sum of squares of differences like before.
If the normalization of data is all you can change in your workflow, just leave out the sum of squares of differences minimization part.
I'm facing a following problem. In Kibana 4 I've created a line chart based on my input from elasticeasrch but I can only display average, min, max instead of an actual value of the field per time, e.g. sent bytes.
Most answears to that question on stackoverflow are about Kibana 3 (How to create value over time chart with Kibana 3?) and seem to include a Histogram on a X axis, yet I can't seem to find one which will enable me to apply them to Kibana 4. I was unable to find the histogram panel and once I click on the discover tab there is the constant Searching loading.
If I have the following fields in my _source:
{"timestamp":"2015-06-02T10:16:44.0855","time":587,"threadName":"Thread Group 1-957","byte":1372,"status":"false","latence":306,"registerCall":"404"}
and I would like to have the number of bytes on the Y-axis and on the X-axis my timestamp.
Any help in the right direction will be appreciated :)
To create a value over time line chart in Kibana, follow these steps:
Go to visualize tab and select line chart
In the X-axis, select X-axis, Aggregation as Date Histogram and then select your timestamp field as the date field.
Next for the Y-Axis, select Sum as the aggregation and then bytes as the field.
For the X axis, what Alcanzar said is good, but as you notice, the Y axis is problematic.
Sum (suggested by "Limit") works, but since it's aggregated, it shows the total used in each aggregated bucket, but that may be meaningless depending on what you are trying to show. Your question isn't clear on what you want, so I'm just guessing here. One hour of requests, each of which ran for one minute and sent 1 megabyte is indeed 60 megabytes-minutes, if you are trying to show total capacity used over than hour (maybe you are paying a bill based on usage per time). On the other hand, if you are trying to show peak usage in each time, it would be wrong.
You said you already looked and Max and Min and they don't meet your needs. I don't suppose Standard Deviation would be any better?
I have the same concern. The best I've been able to do so far is
display Min and Max simultaneously in the Y axis. When they diverge, I know I'm zoomed out too far, so I zoom in until they align.
This is how I know I'm seeing individual events.
In any case, I share your frustration. I too would like to be able to show time series as easily as I can in, say, Excel.
I have a very large input file with the following format:
ID \t time \t duration \t Description \t status
The status column is limited to contain either lower case a,s,i or upper case A,S,I or a mixed of the two (sample element in status col: a,si, I, asi, ASI, aSI, Asi...)
The ultimate purpose is to cluster events that start and end at a "close enough' times in order to recognize that those events contribute to a bigger event. Close enough here could be determined by a theta, let's say for now is 1 hour (or it could be 2 hours, or more, etc.). If two events that have start times within 1 hour and end times within 1 hour, we cluster them together and say that they are part of a big event.
One other thing is that I only care about events that has all lower case letters in the status
So below is my initial input processing:
I filter the input file so that it only contains rows that have all lower case letters
This task is already done in Python using MapReduce and Hadoop
Then I take the output file and split it into "boxes" where each box represents a time range (ex: 11am-12pm - box1, 12pm-1pm - box2, 1pm-2pm - box 3, etc...)
Then use MapReduce again to sort each box based on start time (ascending order)
The final output is an ordered list of start time
Now I want to develop an algorithm to group "similar events" together in the output above. Similar events are determined by start and end time.
my current algorithm is:
pick the first item in the list
find any event in the list that has start time is within 1 hour with first event start time and duration is +/- 1 hour with first item duration (duration determines the end time).
Then cluster them together (basically I want to cluster events that happen at the same time frame)
If no other event found, move to the next available event (the one that has not been clustered).
Keep doing this until no more events to be clustered.
I'm not sure if my algorithm will work or it's efficient. I'm trying to do an algorithm that is better than O (n^2), so hierarchical clustering won't work. K-means might not work also since I don't know ahead of time how many clusters I will need.
There could be some other clustering algorithms that might fit better in my case. I think I might need some equations in my algorithm to calculate the distance in my cluster in order to determine similarity. I'm pretty new to this field, so any help to direct me to the right path is greatly appreciated.
Thanks a lot in advance.
You could try DBSCAN density-based clustering algorithm which is O(n log n) (garanteed ONLY in case of using indexing data structure like kd-tree, ball-tree etc, otherwise it's O(n^2)). Events that are part of a bigger event should be in a areas of high density. It seem to be a great fit to your problem.
You might need to implement your own distance function in order to perform neighborhood query (to find nearest events).
Also, it would be better to represent event time in POSIX-time format.
Here is an example.
Depending on the environment you use, DBSCAN implementation is in Java (ELKI), Python (scikit-learn), R (fpc).