Calculate Percentage From Duration Data in Graphite/Grafana - time

I have state change duration data between my object state in milliseconds.I am sending this data to graphite. I want to create a single stat panel which show me the percentage of the duration less than 20 seconds. How can I create it? Any idea or any similar scenario example will be useful.
myProjectName.FromStateToState.duration 10000ms
myProjectName.FromStateToState.duration 15000ms
myProjectName.FromStateToState.duration 21000ms
myProjectName.FromStateToState.duration 25000ms
myProjectName.FromStateToState.duration 30000ms
Assume for above scenario I expect my percentage should be %40. Because I have 5 duration data and 2 of them is less than 20 seconds. I am using Graphite as data source and Grafana as visualizing.
Temporary Solution
Because I couldn't get enough attention and any answer, I will add my temprorary solution to here. If I learn exact solution in the future I will post as an answer too.
Basically I created two counter like counterSuccess and counterFail. If state change duration is less than 20 seconds increase counterSuccess otherwise increase counterFail. Then get percentage of the success rate via following basic formula counterSuccess/(counterSuccess + counterFail).
Graphite commands at Grafana Panel:
A : sumSeries(myProjectName.FromStateToState.counterSuccess.count)
B : sumSeries(myProjectName.FromStateToState.counterFail.count)
C : sumSeries(#A, #B)
D : divideSeries(#A,#C)
I defined a single stat at grafana to show it as single percentage;

Related

Using grafana counter to visualize weather data

I'm trying to visualize my weather data using grafana. I've already made the prometheus part and now I face an issue that hunts me for quite a while.
I created an counter that adds temperature indoor every five minutes.
var tempIn = prometheus.NewCounter(prometheus.CounterOpts{
Name: "tempin",
Help: "Temperature indoor",
})
for {
tempIn.Add(station.Body.Devices[0].DashboardData.Temperature)
time.Sleep(time.Second*300)
}
How can I now visualize this data that it shows current temperature and stores it for unlimited time so I can look at it even 1 year later like an normal graph?
tempin{instance="localhost:9999"} will only display added up temperature so its useless for me. I need the current temperature not the added up one. I also tried rate(tempin{instance="localhost:9999"}[5m])
How to solve this issue?
Although a counter is not the best solution for this use case, you can use the operator increase.
Increase(tempin{instance="localhost:9999"}[5m])
This will tell you how much the counter increased in the last five minutes

How to see what value is being calculated pine Editor

I have the following script running with the intention of closing a trade after it has been open for a period of 4 days since the trade was taken.
TimeDiff = time - time[1]
MinutesPerBar = TimeDiff / 60000
//calcuates how long one bar is in minutes
BarsSinceSwingLongCondition = barssince(SwingLongCondition)
// Calculates how many bars have passed since open of trade
CurrentSwingTradeDuration = BarsSinceSwingLongCondition * MinutesPerBar
//calculates the duration that the trade has been opened for (minutes*number of bars)
MaximumSwingTradeDuration = 4*1440
// Sets maximum trade duration. Set at 4 Days in minutes
SwingLongCloseLogic3 = CurrentSwingTradeDuration > MaximumSwingTradeDuration
// Closes trade when trade duration exceeds maximum duration set (4days)
The close logic however isn't executing when I run the strategy as i have trades open for longer than the maximum duration.
Is there any way to see what value each element of the formula is calculating so that I can see where the error is (i suspect it could be the time element). Or can anyone see where I am going wrong in the code?
The fastest way to achieve that is using the plotchar function, which would show the values in the data-window on mouse-over on each bar. The user manual contains several other techniques available for debugging.

Firing Alerts for an activity which is supposed to happen during a particular time interval(using Prometheus Metrics and AlertManager)

I am fairly new to Prometheus alertmanager and had a doubt regarding firing alerts only during a particular period
I have a microservice which receives a file and does some processing on it, which is only invoked when it gets a message through a Kafka queue. The aforementioned is supposed to come every day between 5 am and 6 am(UTC time). The microservice has a metric which is incremented by 1 every time it receives a file. I want to raise an alert if it does not receive a file in the interval. I have created a query like this :
expr : sum(increase(metric_name[1m]) and on() hour(vector(time()))==5) < 1
for: 1h
My questions:-
1) Is it correct or is there a better way to do it
2) In case of no update, will it return 0 or "datapoints not found"
3) Is increase the correct function as it tends to give results in decimals due to extrapolation, but I understand if increase is 0, it will show 0
I can't really play around with scrape_intervals, which is set at 30s.
I have not run this expression but I expect it will cause an alert to fire at 06:00 only and then go off at 06:01. It is the only time the expression would hold true for one hour.
Answering your questions
It is correct if what you want is a single fire of alert (sending a mail by example) but then no longer firing. Even with that, the schedule is a bit tight and may get hurt by alertmanager delay causing the alert to be lost.
In case of no increase, you will get the expression will evaluate to 0. It will be empty when there is an update
Increase is the right function. It even takes into account reset of the counter.
Answering if there is a better way to do it.
Regarding your expression, you can have the same result, without for clause, with:
expr: increase(metric_name[1h])==0 and on() hour()==6 and on() minute()<1
It reads a : starting at 6am and for 1 minutes, if there was no increase of metric over the lasthour.
Alerting longer
If you want the alert to last longer (say for the day and you silence it when it is solved), you can use sub-queries;
expr: increase((metric and on() hour()==5)[18h:])==0 and on() hour()>5
It reads as : starting at 6am (hour()>5), compute the increase over 5-6am for the next 18 hours. If you like having a pending, you can drop the trailing on() hour()>5 and use a for: 1h clause.
If you want to alert until a file is submitted and thus detect a resolution, simply transform the expression to evaluate the increase until now:
expr: increase((metric and on() hour()>5)[18h:])==0 and on() hour()>5

Dropwizard metrics - How to reset counters after reporting interval

I am using codahale metrics (now dropwizard metrics) to monitor a few 'events' happening in my system. I am using the counters metrics to keep track of number of time the 'event' happened.
I checked the values printed by the reporter for my counter metrics and it seems like the value keeps on increasing (and never goes down). This seems logical as I am always using metrics.inc() function whenever my 'event' occurs.
What I really want is to get count of my 'event' happening between two reporting times, for this I need to reset my counter every time I report my metrics, but I couldn't find any option in counter metrics to do that. Is there a way or general practice followed by codahale users to produce such metrics?
Current Behavior (reporting time 10 sec):
00:00:00 0
00:00:10 2 // event happened twice
00:00:20 2 // event did not occur
00:00:30 5 // event occured three times`
Expected metrics:
00:00:00 0
00:00:10 2
00:00:20 0
00:00:30 3
To sum up or calculate count(total) per arbitrary interval:
hitcount(perSecond(your.count), '1day')
Afaik it does all the black magic inside. Including but not limited to summarize(scaleToSeconds(nonNegativeDerivative(your.count),1), '1day')
and also there should be scaling according to carbon's retention periods (one or many) that fall into chosen aggregation interval.
I believe that counter is not correct metrics for your case. Consider using meter that will provide you rate per time interval:
while(...) {
int stuffProcesssed = doStuff();
meter.mark(stuffProcesssed);
}

Can Cube (js metrics framework) return more than 1000 events?

The Cube software (https://github.com/square/cube) allows you to retrieve events.
I want to retrieve a lot of events. But it appears that I am capped at 1000. There are well over 9000 in mongodb in the collection and time range I am querying
Example http GET queries I issue:
# 1000 results
http://1.2.3.4:1081/1.0/event?expression=my_event_type
# 1000 results
http://1.2.3.4:1081/1.0/event?expression=my_event_type&start=2012-02-02&stop=2013-07-03
# 7 results
http://1.2.3.4:1081/1.0/event?expression=my_event_type&limit=7
# 1000 results
http://1.2.3.4:1081/1.0/event?expression=my_event_type&limit=9999
It appears that the limit is pinned:
https://github.com/square/cube/blob/28dad4af27a6680deb46077b16952590f2c21cad/lib/cube/event.js
Line 166
based on the 'batchSize=1000'
Is it possible that you can 'page' through the data in some way? Or is this just a hard limit?
Looks like there is a hard cap on results in three places that need to be updated for large domains:
event.js - line 166
metric.js - line 11
metric.js - line 12
In addition, I was unable to find any query-string apis for the parameters. Ideally, we can leave the cap at 1000 (to avoid server bloat for people not tuning their queries correctly) and allow the consumer to define override behavior.

Resources