How can one analyze the greatest percentage gain (burst) of numbers in sequence in an array? - ruby

There are algorithms for detecting the maximum subarray within an array (both contiguous and non-continguous). Most of them are based around having both negative and positive numbers, though. How is it done with positive numbers only?
I have an array of values of a stock over a consequtive range of time (let's say, the array contains values for all consecutive months).
[15.42, 16.42, 17.36, 16.22, 14.72, 13.95, 14.73, 13.76, 12.88, 13.51, 12.67, 11.11, 10.04, 10.38, 10.14, 7.72, 7.46, 9.41, 11.39, 9.7, 12.67, 18.42, 18.44, 18.03, 17.48, 19.6, 19.57, 18.48, 17.36, 18.03, 18.1, 19.07, 21.02, 20.77, 19.92, 18.71, 20.29, 22.36, 22.38, 22.39, 22.94, 23.5, 21.66, 22.06, 21.07, 19.86, 19.49, 18.79, 18.16, 17.24, 17.74, 18.41, 17.56, 17.24, 16.04, 16.05, 15.4, 15.77, 15.68, 16.29, 15.23, 14.51, 14.05, 13.28, 13.49, 13.12, 14.33, 13.67, 13.13, 12.45, 12.48, 11.58, 11.52, 11.2, 10.46, 12.24, 11.62, 11.43, 10.96, 10.63, 10.19, 10.03, 9.7, 9.64, 9.16, 8.96, 8.49, 8.16, 8.0, 7.86, 8.08, 8.02, 7.67, 8.07, 8.37, 8.35, 8.82, 8.58, 8.47, 8.42, 7.92, 7.77, 7.79, 7.6, 7.18, 7.44, 7.74, 7.47, 7.63, 7.21, 7.06, 6.9, 6.84, 6.96, 6.93, 6.49, 6.38, 6.69, 6.49, 6.76]
I need an algorithm to determine for each element the single time period where it had the biggest percentage gain. This could be a time period of 1 month, some span of several months, or the entire array (e.g., 120 months), depending on the stock. I then want to output the burst, in terms of percentage gain, as well as the return (change in price over the original price; so the peak price vs the starting price in the period).
I've combined the max subarray type algorithms, but realized that this problem is a bit different; the array has no negative numbers, so those algorithms just report the entire array as the period and the sum of all elements as the gain.
The algorithms I mentioned are located here and here, with the latter being based on the Master Theorem. Hope this helps.
I'm coding in Ruby but pseudocode would be welcome, too.

I think you went the wrong way ...
I'm not familiar with ruby but let us build the algorithm in pseudocode using your own words :
I've got an array that contains the values of a stock over a range of
time (let's say, for this example, each element is the value of the
stock in a month; the array contains values for all consecutive
months).
We'll name this array StockValues, its length is given by length(StockValues), assume it is 1 based (first item is retrieved with StockValues[1])
I need an algorithm to analyze the array, and determine for each
element the single time period where it had the biggest percentage
gain in price.
You want to know for a given index i at which index j with j>i we have a maximum gain in percent i.e. when gain=100*StockValues[j]/StockValues[i]-100 is maximum.
I then want to output the burst, in terms of percentage gain, as well
as the return(change in price over the original price; so the peak price
vs the starting price in the period).
You want to retrieve the two values burst=gain=100*StockValues[j]/StockValues[i]-100 and return=StockValues[j]-StickValues[i]
The first step will be to loop thru the array and for each element do a second loop to find when the gain is maximum, when we find a maximum we save the values you want in another array named Result (let us assume this array is initialized with invalid values, like burst=-1 which means no gain over any period can be found)
for i=1 to length(StockValues)-1 do
max_gain=0
for j=i+1 to length(StockValues) do
gain=100*StockValues[j]/StockValues[i]-100
if gain>max_gain then
gain=max_gain
Result[i].burst=gain
Result[i].return=StockValues[j]-StockValues[i]
Result[i].start=i
Result[i].end=j
Result[i].period_length=j-i+1
Result[i].start_price=StockValues[i]
Result[i].end_price=StockValues[j]
end if
end for
end for
Note that this algorithm gives the smallest period, if you replace gain>max_gain with gain>=max_gain you'll get the longest period in the case there are more than one period with the same gain value. Only positive or null gains are listed, if there is no gain at all, Result will contain the invalid value. Only period>1 are listed, if period of 1 are accepted then the worst gain possible would be 0%, and you would have to modify the loops i goes to length(StockValues) and j starts at i

This doesn't really sound like several days of work :p unless I'm missing something.
# returns array of percentage gain per period
def percentage_gain(array)
initial = array[0]
after = 0
percentage_gain = []
1.upto(array.size-1).each do |i|
after = array[i]
percentage_gain << (after - initial)/initial*100
initial = after
end
percentage_gain
end
# returns array of amount gain $ per period
def amount_gain(array)
initial = array[0]
after = 0
amount_gain = []
1.upto(array.size-1).each do |i|
after = array[i]
percentage_gain << (after - initial)
initial = after
end
amount_gain
end
# returns the maximum amount gain found in the array
def max_amount_gain(array)
amount_gain(array).max
end
# returns the maximum percentage gain found in the array
def max_percentage_gain(array)
percentage_gain(array).max
end
# returns the maximum potential gain you could've made by shortselling constantly.
# i am basically adding up the amount gained when you would've hit profit.
# on days the stock loses value, i don't add them.
def max_potential_amount_gain(array)
initial = array[0]
after = 0
max_potential_gain = 0
1.upto(array.size-1).each do |i|
after = array[i]
if after - initial > 0
max_potential_gain += after - initial
end
initial = after
end
amount_gain
end
array = [15.42, 16.42, 17.36, 16.22, 14.72, 13.95, 14.73, 13.76, 12.88, 13.51, 12.67, 11.11, 10.04, 10.38, 10.14, 7.72, 7.46, 9.41, 11.39, 9.7, 12.67, 18.42, 18.44, 18.03, 17.48, 19.6, 19.57, 18.48, 17.36, 18.03, 18.1, 19.07, 21.02, 20.77, 19.92, 18.71, 20.29, 22.36, 22.38, 22.39, 22.94, 23.5, 21.66, 22.06, 21.07, 19.86, 19.49, 18.79, 18.16, 17.24, 17.74, 18.41, 17.56, 17.24, 16.04, 16.05, 15.4, 15.77, 15.68, 16.29, 15.23, 14.51, 14.05, 13.28, 13.49, 13.12, 14.33, 13.67, 13.13, 12.45, 12.48, 11.58, 11.52, 11.2, 10.46, 12.24, 11.62, 11.43, 10.96, 10.63, 10.19, 10.03, 9.7, 9.64, 9.16, 8.96, 8.49, 8.16, 8.0, 7.86, 8.08, 8.02, 7.67, 8.07, 8.37, 8.35, 8.82, 8.58, 8.47, 8.42, 7.92, 7.77, 7.79, 7.6, 7.18, 7.44, 7.74, 7.47, 7.63, 7.21, 7.06, 6.9, 6.84, 6.96, 6.93, 6.49, 6.38, 6.69, 6.49, 6.76]

Related

Generate increasing random number rails

I've a random number generator code:
5.times.map { [*0..9].sample }.join.to_i
It gives me random numbers like 63832, 42337, 34998. As you can see that they are completely random, but how to make than I would get only in an increasing way? Not 63832, 42337, 34998, but 34998, 42337, 63832 (this is just an example, Ideally I would get smth like 00[number] => 0025, where 25 is a random number which was generated.
Hope my explanation is understandable :)
If you have the current / last random number, you can generate a larger one by simply adding a random number to it, e.g:
def generate(base = 0)
base + rand(1_000..10_000)
end
number = generate #=> 9635
number = generate(number) #=> 17761
number = generate(number) #=> 22082
number = generate(number) #=> 31061
Each number is 1,000 to 10,000 larger than its predecessor.
An alternative approach, if you want to generate all random numbers within a known range:
[*1..10000].sample(5).sort
# => [602, 5608, 7912, 8384, 8714]
However, this only works if you want to fetch all random numbers upfront, rather than continuously being able to generate new ones which are larger.
It's also not a good approach if your upper limit is very big - e.g. this will freeze your system and need to be cancelled:
[*1..10000000000].sample(5).sort
...But in that case, since the numbers are so huge, you can surely get away with the tiny risk of having a collision:
5.times.map{ rand(1..10000000000) }.sort
# => [460188573, 555213355, 3576967759, 3994239233, 9570165205]

Poor h2o GBM Classification Performance in a balanced binomial response

In a fairly balanced binomial classification response problem, I am observing unusual level of error in h2o.gbm classification for determining class 0, on train set itself. It is from a competition which is over, so interest is only towards understanding what is going wrong.
Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
0 1 Error Rate
0 147857 234035 0.612830 =234035/381892
1 44782 271661 0.141517 =44782/316443
Totals 192639 505696 0.399260 =278817/698335
Any expert suggestions to treat the data and reduce the error is welcome.
Following approaches are tried and error is not found decreasing.
Approach 1: Selecting top 5 important variables via h2o.varimp(gbm)
Approach 2: Converting the negative normalized variable as zero and possitive as 1.
#Data Definition
# Variable Definition
#Independent Variables
# ID Unique ID for each observation
# Timestamp Unique value representing one day
# Stock_ID Unique ID representing one stock
# Volume Normalized values of volume traded of given stock ID on that timestamp
# Three_Day_Moving_Average Normalized values of three days moving average of Closing price for given stock ID (Including Current day)
# Five_Day_Moving_Average Normalized values of five days moving average of Closing price for given stock ID (Including Current day)
# Ten_Day_Moving_Average Normalized values of ten days moving average of Closing price for given stock ID (Including Current day)
# Twenty_Day_Moving_Average Normalized values of twenty days moving average of Closing price for given stock ID (Including Current day)
# True_Range Normalized values of true range for given stock ID
# Average_True_Range Normalized values of average true range for given stock ID
# Positive_Directional_Movement Normalized values of positive directional movement for given stock ID
# Negative_Directional_Movement Normalized values of negative directional movement for given stock ID
#Dependent Response Variable
# Outcome Binary outcome variable representing whether price for one particular stock at the tomorrow’s market close is higher(1) or lower(0) compared to the price at today’s market close
temp <- tempfile()
download.file('https://github.com/meethariprasad/trikaal/raw/master/Competetions/AnalyticsVidhya/Stock_Closure/test_6lvBXoI.zip',temp)
test <- read.csv(unz(temp, "test.csv"))
unlink(temp)
temp <- tempfile()
download.file('https://github.com/meethariprasad/trikaal/raw/master/Competetions/AnalyticsVidhya/Stock_Closure/train_xup5Mf8.zip',temp)
#Please wait for 60 Mb file to load.
train <- read.csv(unz(temp, "train.csv"))
unlink(temp)
summary(train)
#We don't want the ID
train<-train[,2:ncol(train)]
# Preserving Test ID if needed
ID<-test$ID
#Remove ID from test
test<-test[,2:ncol(test)]
#Create Empty Response SalePrice
test$Outcome<-NA
#Original
combi.imp<-rbind(train,test)
rm(train,test)
summary(combi.imp)
#Creating Factor Variable
combi.imp$Outcome<-as.factor(combi.imp$Outcome)
combi.imp$Stock_ID<-as.factor(combi.imp$Stock_ID)
combi.imp$timestamp<-as.factor(combi.imp$timestamp)
summary(combi.imp)
#Brute Force NA treatment by taking only complete cases without NA.
train.complete<-combi.imp[1:702739,]
train.complete<-train.complete[complete.cases(train.complete),]
test.complete<-combi.imp[702740:804685,]
library(h2o)
y<-c("Outcome")
features=names(train.complete)[!names(train.complete) %in% c("Outcome")]
h2o.shutdown(prompt=F)
#Adjust memory size based on your system.
h2o.init(nthreads = -1,max_mem_size = "5g")
train.hex<-as.h2o(train.complete)
test.hex<-as.h2o(test.complete[,features])
#Models
gbmF_model_1 = h2o.gbm( x=features,
y = y,
training_frame =train.hex,
seed=1234
)
h2o.performance(gbmF_model_1)
You've only trained a single GBM with the default parameters, so it doesn't look like you've put enough effort into tuning your model. I'd recommend a random grid search on GBM using the h2o.grid() function. Here is an H2O R code example you can follow.

How to populate an array with incrementally increasing values Ruby

I'm attempting to solve http://projecteuler.net/problem=1.
I want to create a method which takes in an integer and then creates an array of all the integers preceding it and the integer itself as values within the array.
Below is what I have so far. Code doesn't work.
def make_array(num)
numbers = Array.new num
count = 1
numbers.each do |number|
numbers << number = count
count = count + 1
end
return numbers
end
make_array(10)
(1..num).to_a is all you need to do in Ruby.
1..num will create a Range object with start at 1 and end at whatever value num is. Range objects have to_a method to blow them up into real Arrays by enumerating each element within the range.
For most purposes, you won't actually need the Array - Range will work fine. That includes iteration (which is what I assume you want, given the problem you're working on).
That said, knowing how to create such an Array "by hand" is valuable learning experience, so you might want to keep working on it a bit. Hint: you want to start with an empty array ([]) instead with Array.new num, then iterate something num.times, and add numbers into the Array. If you already start with an Array of size num, and then push num elements into it, you'll end up with twice num elements. If, as is your case, you're adding elements while you're iterating the array, the loop never exits, because for each element you process, you add another one. It's like chasing a metal ball with the repulsing side of a magnet.
To answer the Euler Question:
(1 ... 1000).to_a.select{|x| x%3==0 || x%5==0}.reduce(:+) # => 233168
Sometimes a one-liner is more readable than more detailed code i think.
Assuming you are learning Ruby by examples on ProjectEuler, i'll explain what the line does:
(1 ... 1000).to_a
will create an array with the numbers one to 999. Euler-Question wants numbers below 1000. Using three dots in a Range will create it without the boundary-value itself.
.select{|x| x%3==0 || x%5==0}
chooses only elements which are divideable by 3 or 5, and therefore multiples of 3 or 5. The other values are discarded. The result of this operation is a new Array with only multiples of 3 or 5.
.reduce(:+)
Finally this operation will sum up all the numbers in the array (or reduce it to) a single number: The sum you need for the solution.
What i want to illustrate: many methods you would write by hand everyday are already integrated in ruby, since it is a language from programmers for programmers. be pragmatic ;)

using probability for rounding decimals

What might be a simple Ruby way to round numbers using probability, i.e., based on how close the value is to one boundary or the other (floor or ceiling)?
For example, given a current price value of 28.33, I need to add 0.014.
Equivalent to starting with 28.34 and needing to add 0.004, but the final value must be rounded to two decimal places(which can be provided as parameter, or fixed for now).
The final value should therefore be:
28.34 with 60% chance, since it is that much closer, OR
28.35 with 40% random chance
The reason it occured to me this could serve best is that the application is stateless and independent across runs, but still needs to approximate the net effect of accumulating the less significant digits normally rounded into oblivion (eg. micropenny values that do have an impact over time). For example, reducing a stop-loss by some variable increment every day (subtraction like -0.014 above instead).
It would be useful to extend this method to the Float class directly.
How about:
rand(lower..upper) < current ? lower.round(2) : upper.round(2)
EDIT:
The above will only work if you use Ruby 1.9.3 (due to earlier versions not supporting rand in a range).
Else
random_number = rand * (upper-lower) + lower
random_number < current ? lower.round(2) : upper.round(2)
Wound up using this method:
class Float
def roundProb(delta, prec=2)
ivalue=self
chance = rand # range 0..1, nominally averaged at 0.5
# puts lower=((ivalue + delta)*10**prec -0.5).round/10.0**prec # aka floor
# puts upper=((ivalue + delta)*10**prec +0.5).round/10.0**prec # ceiling
ovalue=((ivalue + delta)*10**prec +chance-0.5).round/10.0**prec # proportional probability
return ovalue
rescue
puts $#, $!
end
end
28.33.roundProb(0.0533)
=> 28.39
Maybe not the most elegant approach but seems to work for the general case of any precision, default 2. Even works on Ruby 1.8.7 I'm stuck with in one case, which lacks a precision parameter to round().

Fastest way to get maximum value from an exclusive Range in ruby

Ok, so say you have a really big Range in ruby. I want to find a way to get the max value in the Range.
The Range is exclusive (defined with three dots) meaning that it does not include the end object in it's results. It could be made up of Integer, String, Time, or really any object that responds to #<=> and #succ. (which are the only requirements for the start/end object in Range)
Here's an example of an exclusive range:
past = Time.local(2010, 1, 1, 0, 0, 0)
now = Time.now
range = past...now
range.include?(now) # => false
Now I know I could just do something like this to get the max value:
range.max # => returns 1 second before "now" using Enumerable#max
But this will take a non-trivial amount of time to execute. I also know that I could subtract 1 second from whatever the end object is is. However, the object may be something other than Time, and it may not even support #-. I would prefer to find an efficient general solution, but I am willing to combine special case code with a fallback to a general solution (more on that later).
As mentioned above using Range#last won't work either, because it's an exclusive range and does not include the last value in it's results.
The fastest approach I could think of was this:
max = nil
range.each { |value| max = value }
# max now contains nil if the range is empty, or the max value
This is similar to what Enumerable#max does (which Range inherits), except that it exploits the fact that each value is going to be greater than the previous, so we can skip using #<=> to compare the each value with the previous (the way Range#max does) saving a tiny bit of time.
The other approach I was thinking about was to have special case code for common ruby types like Integer, String, Time, Date, DateTime, and then use the above code as a fallback. It'd be a bit ugly, but probably much more efficient when those object types are encountered because I could use subtraction from Range#last to get the max value without any iterating.
Can anyone think of a more efficient/faster approach than this?
The simplest solution that I can think of, which will work for inclusive as well as exclusive ranges:
range.max
Some other possible solutions:
range.entries.last
range.entries[-1]
These solutions are all O(n), and will be very slow for large ranges. The problem in principle is that range values in Ruby are enumerated using the succ method iteratively on all values, starting at the beginning. The elements do not have to implement a method to return the previous value (i.e. pred).
The fastest method would be to find the predecessor of the last item (an O(1) solution):
range.exclude_end? ? range.last.pred : range.last
This works only for ranges that have elements which implement pred. Later versions of Ruby implement pred for integers. You have to add the method yourself if it does not exist (essentially equivalent to special case code you suggested, but slightly simpler to implement).
Some quick benchmarking shows that this last method is the fastest by many orders of magnitude for large ranges (in this case range = 1...1000000), because it is O(1):
user system total real
r.entries.last 11.760000 0.880000 12.640000 ( 12.963178)
r.entries[-1] 11.650000 0.800000 12.450000 ( 12.627440)
last = nil; r.each { |v| last = v } 20.750000 0.020000 20.770000 ( 20.910416)
r.max 17.590000 0.010000 17.600000 ( 17.633006)
r.exclude_end? ? r.last.pred : r.last 0.000000 0.000000 0.000000 ( 0.000062)
Benchmark code is here.
In the comments it is suggested to use range.last - (range.exclude_end? ? 1 : 0). It does work for dates without additional methods, but will never work for non-numeric ranges. String#- does not exist and makes no sense with integer arguments. String#pred, however, can be implented.
I'm not sure about the speed (and initial tests don't seem incredibly fast), but the following might do what you need:
past = Time.local(2010, 1, 1, 0, 0, 0)
now = Time.now
range = past...now
range.to_a[-1]
Very basic testing (counting in my head) showed that it took about 4 seconds while the method you provided took about 5-6. Hope this helps.
Edit 1: Removed second solution as it was totally wrong.
I can't think there's any way to achieve this that doesn't involve enumerating the range, at least unless as already mentioned, you have other information about how the range will be constructed and therefore can infer the desired value without enumeration. Of all the suggestions, I'd go with #max, since it seems to be most expressive.
require 'benchmark'
N = 20
Benchmark.bm(30) do |r|
past, now = Time.local(2010, 2, 1, 0, 0, 0), Time.now
#range = past...now
r.report("range.max") do
N.times { last_in_range = #range.max }
end
r.report("explicit enumeration") do
N.times { #range.each { |value| last_in_range = value } }
end
r.report("range.entries.last") do
N.times { last_in_range = #range.entries.last }
end
r.report("range.to_a[-1]") do
N.times { last_in_range = #range.to_a[-1] }
end
end
user system total real
range.max 49.406000 1.515000 50.921000 ( 50.985000)
explicit enumeration 52.250000 1.719000 53.969000 ( 54.156000)
range.entries.last 53.422000 4.844000 58.266000 ( 58.390000)
range.to_a[-1] 49.187000 5.234000 54.421000 ( 54.500000)
I notice that the 3rd and 4th option have significantly increased system time. I expect that's related to the explicit creation of an array, which seems like a good reason to avoid them, even if they're not obviously more expensive in elapsed time.

Resources