How to calculate the standard deviation based on 5-minute OHLC bars? - std

The following is the part records of 5-minute OHLC bars. How can I obtain the Std of 5 previous records ( open, high, low, close) at each timestamp?

You can use higher-order function moving.
Please refer to https://www.dolphindb.com/help/Functionalprogramming/TemplateFunctions/moving.html for more information.
select *,moving(std, (fixedLengthArrayVector(close,high,low,open)),5) as std from t

Related

Algorithm to find areas of support in a candlestick chart

I am in the process of designing an algorithm that will calculate regions in a candlestick chart where strong areas of support exist. An "area of support" in this case is defined as an area in the chart where the price of a stock rises by a large amount in a short period of time. (Please see the diagram below, the blue dots represent these strong areas of support)
The data I am working with is a list of over 6000 TOHLC (timestamp, open price, high price, low price, close price) values. For example, the first entry in this list of data is:
[1555286400, 83.7, 84.63, 83.7, 84.27]
The way I have structured the algorithm to work is as follows:
1.) The list of 6000+ TOHLC values are split into sub-lists of 30 TOHLC values (30 is a number that I arbitrarily chose). The lowest low price (LLP) is then obtained from each of these sub-lists. The purpose behind using this method is to find areas in the chart where prices dip.
2.) The next step is to determine how high the price rose from each of these lows. For this, I take the next 30 candlestick values from the low and determine what the highest high price (HHP) is. Then, if HHP / LLP >= 1.03, the low price is accepted, otherwise it is discarded. Again, 1.03 is a value that I arbitrarily chose, by analysing the stock chart manually and determining how much the price rose on average from these lows.
The blue dots in the chart above represent the accepted areas of support by the algorithm. It appears to be working well, in terms of that I am trying to achieve.
So the question I have is: does anyone have any improvements they can suggest for this algorithm, or point out any faults in it?
Thanks!
I may have understood wrong, however, from your explanation it seems like you are doing your calculation in separate 30-ish sub lists and then combining them together.
So, what if the LLP is the 30th element of sublist N and HHP is 1st element of sublist N+1 ? If you have taken that into account, then it's fine.
If you haven't taken that into account, I would suggest doing a moving-window type of approach in reading those data. So, you would start from 0th element of 6000+ TOHLC and start with a window size of 30 and slide it 1 by 1. This way, you won't miss any values.
Some of the selected blue dots have higher dip than others. Why is that? I would separate them into another classifier. If you will store them into an object, store the dip rate as well.
Floating point numbers are not suggested in finance. If possible, I'd use a different approach and perhaps classifier, solely using integers. It may not bother you or your project as of now, but surely, it will begin to create false results when the numbers add up in the future.

How to create timeline chart with average using Kibana?

I am ingesting data to elasticsearch using flume, I want to create a time-series graph in kibana to show the events collected over time. BUT I also want to to the average per that time unit so the user knows if the current flow is around the average or not.
To create a timeline I am using line graph with #timestamp as X-axis and count as Y-axis.
The question is how to create the average line? and how to make this average dynamic e.g. as we zoom in average changes from average per day to average per hour.
While creating a visualization you can choose the type of y-axis metric. The default is "count". You can click on the icon to choose other type of metrics you want. It will have various options like average, sum, percentile etc.
As for the time range of average calculation, the the x-axis metrics, under buckets when you choose date histogram the default interval is auto.This means that the time range of average will chage automatically depending on overall time range selected.
You can change it to a fixed interval like per second, minute, hourly daily etc.
It's a bit odd, you would expect count to appear alongside field as something you can average. In reality you have to do it another way:
For the Y axis, instead of selecting count, select "Average Bucket"
then set up your bucket aggregation that you would like e.g. Date Histogram with second interval.
Below this you have another box for metric, e.g. the thing you're averaging, set this to count

How to create value over time line chart in Kibana 4?

I'm facing a following problem. In Kibana 4 I've created a line chart based on my input from elasticeasrch but I can only display average, min, max instead of an actual value of the field per time, e.g. sent bytes.
Most answears to that question on stackoverflow are about Kibana 3 (How to create value over time chart with Kibana 3?) and seem to include a Histogram on a X axis, yet I can't seem to find one which will enable me to apply them to Kibana 4. I was unable to find the histogram panel and once I click on the discover tab there is the constant Searching loading.
If I have the following fields in my _source:
{"timestamp":"2015-06-02T10:16:44.0855","time":587,"threadName":"Thread Group 1-957","byte":1372,"status":"false","latence":306,"registerCall":"404"}
and I would like to have the number of bytes on the Y-axis and on the X-axis my timestamp.
Any help in the right direction will be appreciated :)
To create a value over time line chart in Kibana, follow these steps:
Go to visualize tab and select line chart
In the X-axis, select X-axis, Aggregation as Date Histogram and then select your timestamp field as the date field.
Next for the Y-Axis, select Sum as the aggregation and then bytes as the field.
For the X axis, what Alcanzar said is good, but as you notice, the Y axis is problematic.
Sum (suggested by "Limit") works, but since it's aggregated, it shows the total used in each aggregated bucket, but that may be meaningless depending on what you are trying to show. Your question isn't clear on what you want, so I'm just guessing here. One hour of requests, each of which ran for one minute and sent 1 megabyte is indeed 60 megabytes-minutes, if you are trying to show total capacity used over than hour (maybe you are paying a bill based on usage per time). On the other hand, if you are trying to show peak usage in each time, it would be wrong.
You said you already looked and Max and Min and they don't meet your needs. I don't suppose Standard Deviation would be any better?
I have the same concern. The best I've been able to do so far is
display Min and Max simultaneously in the Y axis. When they diverge, I know I'm zoomed out too far, so I zoom in until they align.
This is how I know I'm seeing individual events.
In any case, I share your frustration. I too would like to be able to show time series as easily as I can in, say, Excel.

Smooth average of sales data

How can I calculate the average of a set of data while smoothing over any points that are outside the "norm". It's been a while since I had to do any real math, but I'm sure I learned this somewhere...
Lets say I have 12 days of sales data on one item: 2,2,2,50,10,15,9,6,2,0,2,1
I would like to calculate the average sales per day without allowing the 4th day (50) to screw up the average too much. Log, Percentile, something like that I think...
It sounds to me that you're looking for a moving average.
You can also filter by thresholding at some multiple of the standard deviation. This would filter out results that were much farther than expected from the mean (average).
Standard deviation is simply sqrt(sum(your_values - average_value) / number_of_values).
edit: You can also look at weighting the value by it's deviation from the mean. So values that are very large can be weighted as 1 / exp(deviation) and therefore contribute much less the farther from the mean they are.
You'll want to use something like IQR (interquartile range). Basically you break the data into quartiles and then calculate the median from the first and third quartiles. Then you can get your central tendency of the data.

How does RetailMeNot calculate its success rate trends?

I am developing a rails application where I need a "success rate" system similar to RetailMeNot. I noticed that they use jQuery Sparkline library (http://omnipotent.net/jquery.sparkline/) to generate a success rate trend for each coupon.
For example, in their source code:
<em>84%</em> Success<br/><span class="trend">14,18,18,22,19,16,15,28,21,17</span>
<em>20%</em> Success<br/><span class="trend">-1,1,-1,-1,-2,-2,1,-1,1,-1</span>
Can someone explain to me the best way to develop a similar trending system for success rate?
A trend is just a number calculated at regular intervals. In this case it looks like the site is just binning the data they get from the "Did this coupon work for you?" question, and then plotting those values in the chart. In other words, they take the number of (successes - failures) in some time interval (e.g. 12 hours) and plot that number for each interval.
As time passes, they probably rebin to keep the number of bars on the x axis acceptable. For example, if they only want to show 8 bars on the plot, then after 4 hours they'll have to widen the bins.

Resources