I didn't find a 'moving average' feature and I'm wondering if there's a workaround.
I'm using influxdb as the backend.
Grafana supports adding a movingAverage(). I also had a hard time finding it in the docs, but you can (somewhat hilariously) see its usage on the feature intro page:
As is normal, click on the graph title, edit, add the metric movingAverage() as per described in the graphite documentation:
movingAverage(seriesList, windowSize)
Graphs the moving average of a metric (or metrics) over a fixed number of past points, or a time interval.
Takes one metric or a wildcard seriesList followed by a number N of datapoints or a quoted string with a length of time like ‘1hour’ or ‘5min’ (See from / until in the render_api_ for examples of time formats). Graphs the average of the preceding datapoints for each point on the graph. All previous datapoints are set to None at the beginning of the graph.
Example:
&target=movingAverage(Server.instance01.threads.busy,10)
&target=movingAverage(Server.instance*.threads.idle,'5min')
Grafana does no calculations itself, it just queries a backend and draws nice charts. So aggregating abilities depends solely on your backend. While Graphite supports windowing functions such as moving average, InfluxDB currently doesn't support it.
There are quite a lot requests for moving average in influxdb on the web. You can leave your "+1" and track progress in this ticket https://github.com/influxdb/influxdb/issues/77
Possible (yet not so easy) workaround is to create a custom script (cron, daemon, whatever) that will pre-calcuate MA and save it in a separate influxdb series.
I found myself here trying to do a moving average in Grafana with a PostgreSQL database, so I'll just add a way to do with a SQL query:
SELECT
date as time,
AVG(daily_average_column)
OVER(ORDER BY date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW)
AS value,
'5 Day Moving Average' as metric
FROM daily_average_table
ORDER BY time ASC;
This uses a "window" function to average of the last 4 rows (plus the current row).
I'm sure there are ways to do this with MySQL as well.
Method and capability for this is dependent on your datasource.
You specified InfluxDB, so your query will need to wrap an 'Aggregation function' [ such as mean($field) ] within the moving_average($aggregation_function, $num_of_points) 'Transformation Function'.
In the 'Metrics' tab, you will find both the 'Transformation' functions in the 'select' portion of the menu.
Craft your query with the 'Aggregation function' (mean, min, max, etc.) first -- this way you can make sure the data looks as you expect it.
After this, just click the '+' button next to the 'Aggregation function', and under the menu 'Transformations', select 'moving_average'.
The number in brackets will be the number of points you want the average taken over.
Screenshot:
try avg_over_time(mymetric[5m])
InfluxDB 2 allows you to calculate the moving average in the query, e.g.:
from(bucket: "iot")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "PoolWeather")
|> filter(fn: (r) => r["_field"] == "batteryvoltage")
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
|> movingAverage(n: 10)
|> yield(name: "average")
Another option is to report the data as "timing" metrics and not counts.
This is easy to do especially with Statsd in your stack.
Plotting timing data (coming from statsd) as average of the reported data points is already built in.
Related
I'm working on Elasticsearch where I have to apply the idea of bucketing based on lastActiveDate of a user.
current Process is below
(Search String) -> [ElasticSearch(match query + range query)] => (Response( user with score)) => Then apply Pagination
What I need to implement is
(Search String) -> [ElasticSearch(match query + range query)] => (Response( Keep the relevancy score as the last) + [lastActiveDate should be in range of 1-3 months from current date] +
This is new Addition Required
(Append this respose as well)Response( Keep the relevancy score as is it) + [lastActiveDate should be in range of 4-6 months from current date]) => Then apply Pagination
Now, this can be solved if I tell the API consumer to call my API multiple time with appropriate date range(Solving the issue at Client(consumer) level). But as I am working with micro-services this logic should be implemented at my end so that other teams don't have to change the code.
My issue is with pagination and data duplication if I implement anything.
Solutions that I thought might work:
Apply multi search query on same index with, but in response dont get response object I get multiple response object with their own pagination objects. (Not sure how do I manipulate this for my implementation as pagination is not global)
Make the REST API statefull, send the pagination object(somesortofhash) from which I can tell do I have to query for 1-3 months or 4-6 months. I can do this otherwise whats the point of REST API.
The Known problems.
I have two section (1-3, 4-6), during pagination if the elements are less than the asked size in pagination. They won't call my api again as it means the data is not there anymore. To continue the response I have to call my database again for 4-6 months and full fill my request size.
If I go with above approach, I'll have a data duplication issue, as in the next call(4-6) I'll send the same data again, for mitigting this I can send an object in the response and tell others to send this when you call the API again. This will again create the issue because I'm dependent on the previous call.
It seems I'm out of options here, can anyone help how to do this ?
I'm trying to visualize my weather data using grafana. I've already made the prometheus part and now I face an issue that hunts me for quite a while.
I created an counter that adds temperature indoor every five minutes.
var tempIn = prometheus.NewCounter(prometheus.CounterOpts{
Name: "tempin",
Help: "Temperature indoor",
})
for {
tempIn.Add(station.Body.Devices[0].DashboardData.Temperature)
time.Sleep(time.Second*300)
}
How can I now visualize this data that it shows current temperature and stores it for unlimited time so I can look at it even 1 year later like an normal graph?
tempin{instance="localhost:9999"} will only display added up temperature so its useless for me. I need the current temperature not the added up one. I also tried rate(tempin{instance="localhost:9999"}[5m])
How to solve this issue?
Although a counter is not the best solution for this use case, you can use the operator increase.
Increase(tempin{instance="localhost:9999"}[5m])
This will tell you how much the counter increased in the last five minutes
I am coding an app that allows users to vote (market_trend_up += 1 for example), the app then fetches the accumulated data (trend_up_votes = 632; trend_down_votes = 236), analyzes and displays the resulting trend (if up_votes > down_votes { trend = up }).
What would you advise me to do to refresh the trends regularly? I thought about reinitializing the votes every 6h for instance but then the first voter will decide of the trend by himself.
Would letting the votes accumulate always provide the current trend? Thank you!
I have an x-axis that displays the days that my data occurs on. The data is dynamic and sometimes I have data for only 1 day, 2 days, n days, etc.
Here is my code for displaying the days on the x-axis:
chart.x = d3.time.scale()
.range([0, chart.w]);
chart.xAxis = d3.svg.axis()
.scale(chart.x)
.orient("bottom")
.ticks(d3.time.day) // --- TODO : this is not showing the current day, for some reason...
.tickFormat(d3.time.format("%b %-d %p"));
If my data is spread on 2 days (ex: Tuesday, Wednesday), this will only display a tick for the second day (Wednesday), ie. when the day "changes" from one to another.
I want to also display a tick for the first day (Tuesday).
Even if there is only data on 1 day, I still want to display a tick for it.
Thanks you guys,
To extend the domain so that the scale starts and ends at a tick mark you use the .nice() method, as #meetamit suggested -- but "nicing" only works if you call that method after you set the domain, so that's why you might not have noticed any change. The API doesn't really make that clear, although since the method alters the domain I suppose it makes sense that changing the domain later would over-ride the effect of a previous nice() call.
Also, be sure to use the time-scale version of the method: .nice(d3.time.day) to get a domain rounded off to the nearest day as opposed to just the nearest hour.
Here's a fiddle:
http://fiddle.jshell.net/4rGQq/
The key code is simply:
xScale.domain(d3.extent(d))
//d3.extent() returns max and min of array, which become the basic domain
.nice(d3.time.day);
//nice() extends the domain to nearest start/end of a day
Compare what happens if you comment out the .nice() call after setting the domain, even with the other .nice() call during initialization of the scale. Also compare what happens if you don't specify the day-interval as a parameter to the nice method.
Can you show how chart.x is set up? Hard to tell without seeing it, but you may be able to fix it by calling chart.x.nice() (see documentation).
Otherwise, seems like you'll need to manually check the extents of its domain, and adjust them in the case of single day.
Clarification
Your code shows how you call range() but not how you call domain(), which is the important one.
It seems to me to me that if do
var domain = chart.x.domain()
console.log domain[0] == domain[1]
you'll see true getting logged whenever the data is for only one day. If so, it means you're dealing with a single point in time rather than a time range. In that case, you'll need to adjust the domain to be a longer range.
Really hard to know without even seeing an image of what you're working on.
.ticks() should be used to set the number of ticks you'd like to have on your axis, not the kind of data that should be in them. So try to set it like .ticks(3) and it should set a couple of ticks.
From the wiki:
.ticks([count])
Returns approximately count representative values from the scale's input domain. If count is not specified, it defaults to 10. The returned tick values are uniformly spaced, have human-readable values (such as multiples of powers of 10), and are guaranteed to be within the extent of the input domain. Ticks are often used to display reference lines, or tick marks, in conjunction with the visualized data. The specified count is only a hint; the scale may return more or fewer values depending on the input domain.
I have a python (3.2) request that goes to MongoDB and the request itself is running fast enough. When I then perform an if statement check to see if any records were found it takes 50 times as long:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
58 27623 6475988 234.4 1.7 itemInDB = db.mainData.find({"x":item[x]}).limit(1)
59
60 #existing item in db
61 27623 293419802 10622.3 77.6 if itemInDB.count():
What on earth is the cause for that if statement taking so long?! I presume there must be a better way to check if a record was found but google has come up empty.
Thanks for the help.
Perhaps a Better Way
If you're only interested in returning one value, you might want to use find_one instead of find. It will stop looking for values after one has been found, as opposed to find, which has to run through the collection:
itemInDB = db.mainData.find_one({"x":item[x]})
if itemInDB:
print("Item found")
else:
print("Item not found")
For Your Example
According to the PyMongo docs, when querying the count of a cursor, you can pass in a parameter (True or False) to take into account any skip or limit calls previously made to the cursor. The default for that parameter is False (namely, not taking those calls into account). That may be affecting the performance of your count query.
Gauging Query Performance
If you want to see how your query will be carried out by mongo, you can call explain on your cursor:
db.coll.find({"x":4}).explain()
The explain function is also implemented in PyMongo.
Turns out it was due to the find() function and not the if statement. I created an index on "x" (as I should have anyway). Changed the find to find_one and removed the .count() from the if statement. Overall 75% faster.