I am trying to understand the epsilon - greedy method in DQN. I am learning from the code available in https://github.com/karpathy/convnetjs/blob/master/build/deepqlearn.js
Following is the update rule for epsilon which changes with age as below:
$this.epsilon = Math.min(1.0, Math.max(this.epsilon_min, 1.0-(this.age - this.learning_steps_burnin)/(this.learning_steps_total - this.learning_steps_burnin)));
Does this mean the epsilon value starts with min (chosen by user) and then increase with age reaching upto burnin steps and eventually becoming to 1? Or Does the epsilon start around 1 and then decays to epsilon_min ?
Either way, then the learning almost stops after this process. So, do we need to choose the learning_steps_burnin and learning_steps_total carefully enough? Any thoughts on what value needs to be chosen?
Since epsilon denotes the amount of randomness in your policy (action is greedy with probability 1-epsilon and random with probability epsilon), you want to start with a fairly randomized policy and later slowly move towards a deterministic policy. Therefore, you usually start with a large epsilon (like 0.9, or 1.0 in your code) and decay it to a small value (like 0.1). Most common and simple approaches are linear decay and exponential decay. Usually, you have an idea of how many learning steps you will perform (what in your code is called learning_steps_total) and tune the decay factor (your learning_steps_burnin) such that in this interval epsilon goes from 0.9 to 0.1.
Your code is an example of linear decay.
An example of exponential decay is
epsilon = 0.9
decay = 0.9999
min_epsilon = 0.1
for i from 1 to n
epsilon = max(min_epsilon, epsilon*decay)
Personally I recommend an epsilon decay such that after about 50/75% of the training you reach the minimum value of espilon (advice from 0.05 to 0.0025) from which then you have only the improvement of the policy itself.
I created a specific script to set the various parameters and it returns after what the decay stop is reached (at the indicated value)
import matplotlib.pyplot as plt
import numpy as np
eps_start = 1.0
eps_min = 0.05
eps_decay = 0.9994
epochs = 10000
pct = 0
df = np.zeros(epochs)
for i in range(epochs):
if i == 0:
df[i] = eps_start
else:
df[i] = df[i-1] * eps_decay
if df[i] <= eps_min:
print(i)
stop = i
break
print("With this parameter you will stop epsilon decay after {}% of training".format(stop/epochs*100))
plt.plot(df)
plt.show()
I have a question related to Fast Fourier transform. I want to calculate the phase and make FFT to draw power spectral density. However when I calculate the frequency f, there are some errors. This is my program code:
n = 1:32768;
T = 0.2*10^-9; % Sampling period
Fs = 1/T; % Sampling frequency
Fn = Fs/2; % Nyquist frequency
omega = 2*pi*200*10^6; % Carrier frequency
L = 32768; % % Length of signal
t = (0:L-1)*T; % Time vector
x_signal(n) = cos(omega*T*n + 0.1*randn(size(n))); % Additive phase noise (random)
y_signal(n) = sin(omega*T*n + 0.1*randn(size(n))); % Additive phase noise (random)
theta(n) = atan(y_signal(n)/x_signal(n));
f = (theta(n)-theta(n-1))/(2*pi)
Y = fft(f,t);
PSD = Y.*conj(Y); % Power Spectral Density
%Fv = linspace(0, 1, fix(L/2)+1)*Fn; % Frequency Vector
As posted, you would get the error
error: subscript indices must be either positive integers less than 2^31 or logicals
which refers to the operation theta(n-1) when n=1 which results in an index of 0 (which is out of bounds since Matlab uses 1-based indexing). To avoid that could use a subset of indices in n:
f = (theta(n(2:end))-theta(n(1:end-1)))/(2*pi);
That said, if you are doing this to try to obtain an instantaneous measure of the frequency, then you will have a few more issues to deal with. The most trivial one is that you should also divide by T. Not as obvious is the fact that as given, theta is a scalar due to the use of the / operator (see Matlab's mrdivide) rather than the ./ operator which performs element-wise division. So a better expression would be:
theta(n) = atan(y_signal(n)./x_signal(n));
Now, the next problem you might notice is that you are actually losing some phase information since the result of atan is [-pi/2,pi/2] instead of the full [-pi,pi] range. To avoid this you should instead be using atan2:
theta(n) = atan2(y_signal(n), x_signal(n));
Even with this, you are likely to notice that the estimated frequency regularly has spikes whenever the phase jumps between near -pi and near pi. This can be avoided by computing the phase difference modulo 2*pi:
f = mod(theta(n(2:end))-theta(n(1:end-1)),2*pi)/(2*pi*T);
A final thing to note: when calling the fft, you should not be passing in a time variable (the input is implicitly assumed to be sampled at regular time intervals). You may however specify the desired length of the FFT. So, you would thus compute Y as follow:
Y = fft(f, L);
And you could then plot the resulting PSD using:
Fv = linspace(0, 1, fix(L/2)+1)*Fn; % Frequency Vector
plot(Fv, abs(PSD(1:L/2+1)));
I am using Newton Raphson +successive Substitute algorithm to perform flash calculation(chemical process simulation).
The algorithm can converge well when the input is in low precision like 0.1, but when the number precision is increased to 0.11111 or 0.99999. The algorithm will not converge.
When I am using the quasi newton method with BFGS update, the same problem occurs again. How can we decrease the sensitivity of the code to the numerical precision?
Here is a simple example using matlab to solve the Rachford-Rice equation. When the comp_overall=[0.9,1-0.9], it converges well. However, when the number precision increase to like[0.99999,1-0.99999]. It will not converge.
K=[0.053154011443159 34.234731216532658],
comp_overall= [0.99999 1- 0.99999], phi=0.5; %initial values
epsilon = 1.0;
iter1 = 1;
while (epsilon >=1.e-05)
rc=0.0;
drc=0.0;
for i=1:2
% Rachford-Rice Equation
rc = comp_overall(i)*(K(i)-1.0)/(1.0+phi*(K(i)-1.0))+rc;
% Derivative
drc = comp_overall(i)*(K(i)-1.0)^2/(1.0+phiK(i)-1.0))^2+drc;
end
% Newton-Raphson
phi1 = phi +0.01 (rc / drc);
epsilon = abs( (phi1-phi)/phi );
% Convergence
phi = phi1;
iter1=iter1+1;
end
The Newton–Raphson method relies on the function being differentiable between any two consecutive approximations. Depending on the choice of the initial value, this may not be the case for z₁ = 0.99999. Let's look at the graph of the Rachford-Rice function:
The root of this function is φ₀ ≈ –0.0300781429 and the nearest point of discontinuity is –1/(K₂-1) ≈ –0.0300890052. They are close enough for the Newton–Raphson method to overshoot, to jump over that discontinuity.
For example:
φ₁ = –0.025
f(φ₁) ≈ -0.9229770571
f'(φ₁) ≈ 1.2416569960
φ₂ = φ₁ + 0.01 * f(φ₁) / f'(φ₁) ≈ -0.0324334302
φ₂ lies to the left of the discontinuity, so the following steps will be away from, not towards the root.
φ₃ = -0.0358986759 < φ₂
What can be done about it:
When the algorithm fails to converge, repeat it with smaller steps. For example, start with the coefficient 0.01 (as it is now) and decrease it 10 times after every failure.
Detect overshoots. On each iteration check if there is a discontinuity point (–1/(Kᵢ-1)) between the current approximation and the previous one. When it happens, discard the current approximation, decrease the coefficient and continue.
Limit the scope of the search. Are solutions outside of [0, 1] physically meaningful? If not, you can stop once the approximated value falls out of that range.
Use different method. The function is monotonic on any interval between two consecutive discontinuity points, so you can perform binary search on each such interval. It will be both faster and more robust than the Newton–Raphson method.
I want to implement an iterative algorithm, which calculates weighted average. The specific weight law does not matter, but it should be close to 1 for the newest values and close to 0 to the oldest.
The algorithm should be iterative. i.e. it should not remember all previous values. It should know only one newest value and any aggregative information about past, like previous values of the average, sums, counts etc.
Is it possible?
For example, the following algorithm can be:
void iterate(double value) {
sum *= 0.99;
sum += value;
count++;
avg = sum / count;
}
It will give exponential decreasing weight, which may be not good. Is it possible to have step decreasing weight or something?
EDIT 1
The the requirements for weighing law is follows:
1) The weight decreases into past
2) I has some mean or characteristic duration so that values older this duration matters much lesser than newer ones
3) I should be able to set this duration
EDIT 2
I need the following. Suppose v_i are values, where v_1 is the first. Also suppose w_i are weights. But w_0 is THE LAST.
So, after first value came I have first average
a_1 = v_1 * w_0
After the second value v_2 came, I should have average
a_2 = v_1 * w_1 + v_2 * w_0
With next value I should have
a_3 = v_1 * w_2 + v_2 * w_1 + v_3 * w_0
Note, that weight profile is moving with me, while I am moving along value sequence.
I.e. each value does not have it's own weight all the time. My goal is to have this weight lower while going to past.
First a bit of background. If we were keeping a normal average, it would go like this:
average(a) = 11
average(a,b) = (average(a)+b)/2
average(a,b,c) = (average(a,b)*2 + c)/3
average(a,b,c,d) = (average(a,b,c)*3 + d)/4
As you can see here, this is an "online" algorithm and we only need to keep track of pieces of data: 1) the total numbers in the average, and 2) the average itself. Then we can undivide the average by the total, add in the new number, and divide it by the new total.
Weighted averages are a bit different. It depends on what kind of weighted average. For example if you defined:
weightedAverage(a,wa, b,wb, c,wc, ..., z,wz) = a*wa + b*wb + c*wc + ... + w*wz
or
weightedAverage(elements, weights) = elements·weights
...then you don't need to do anything besides add the new element*weight! If however you defined the weighted average akin to an expected-value from probability:
weightedAverage(elements,weights) = elements·weights / sum(weights)
...then you'd need to keep track of the total weights. Instead of undividing by the total number of elements, you undivide by the total weight, add in the new element*weight, then divide by the new total weight.
Alternatively you don't need to undivide, as demonstrated below: you can merely keep track of the temporary dot product and weight total in a closure or an object, and divide it as you yield (this can help a lot with avoiding numerical inaccuracy from compounded rounding errors).
In python this would be:
def makeAverager():
dotProduct = 0
totalWeight = 0
def averager(newValue, weight):
nonlocal dotProduct,totalWeight
dotProduct += newValue*weight
totalWeight += weight
return dotProduct/totalWeight
return averager
Demo:
>>> averager = makeAverager()
>>> [averager(value,w) for value,w in [(100,0.2), (50,0.5), (100,0.1)]]
[100.0, 64.28571428571429, 68.75]
>>> averager(10,1.1)
34.73684210526316
>>> averager(10,1.1)
25.666666666666668
>>> averager(30,2.0)
27.4
> But my task is to have average recalculated each time new value arrives having old values reweighted. –OP
Your task is almost always impossible, even with exceptionally simple weighting schemes.
You are asking to, with O(1) memory, yield averages with a changing weighting scheme. For example, {values·weights1, (values+[newValue2])·weights2, (values+[newValue2,newValue3])·weights3, ...} as new values are being passed in, for some nearly arbitrarily changing weights sequence. This is impossible due to injectivity. Once you merge the numbers in together, you lose a massive amount of information. For example, even if you had the weight vector, you could not recover the original value vector, or vice versa. There are only two cases I can think of where you could get away with this:
Constant weights such as [2,2,2,...2]: this is equivalent to an on-line averaging algorithm, which you don't want because the old values are not being "reweighted".
The relative weights of previous answers do not change. For example you could do weights of [8,4,2,1], and add in a new element with arbitrary weight like ...+[1], but you must increase all the previous by the same multiplicative factor, like [16,8,4,2]+[1]. Thus at each step, you are adding a new arbitrary weight, and a new arbitrary rescaling of the past, so you have 2 degrees of freedom (only 1 if you need to keep your dot-product normalized). The weight-vectors you'd get would look like:
[w0]
[w0*(s1), w1]
[w0*(s1*s2), w1*(s2), w2]
[w0*(s1*s2*s3), w1*(s2*s3), w2*(s3), w3]
...
Thus any weighting scheme you can make look like that will work (unless you need to keep the thing normalized by the sum of weights, in which case you must then divide the new average by the new sum, which you can calculate by keeping only O(1) memory). Merely multiply the previous average by the new s (which will implicitly distribute over the dot-product into the weights), and tack on the new +w*newValue.
I think you are looking for something like this:
void iterate(double value) {
count++;
weight = max(0, 1 - (count / 1000));
avg = ( avg * total_weight * (count - 1) + weight * value) / (total_weight * (count - 1) + weight)
total_weight += weight;
}
Here I'm assuming you want the weights to sum to 1. As long as you can generate a relative weight without it changing in the future, you can end up with a solution which mimics this behavior.
That is, suppose you defined your weights as a sequence {s_0, s_1, s_2, ..., s_n, ...} and defined the input as sequence {i_0, i_1, i_2, ..., i_n}.
Consider the form: sum(s_0*i_0 + s_1*i_1 + s_2*i_2 + ... + s_n*i_n) / sum(s_0 + s_1 + s_2 + ... + s_n). Note that it is trivially possible to compute this incrementally with a couple of aggregation counters:
int counter = 0;
double numerator = 0;
double denominator = 0;
void addValue(double val)
{
double weight = calculateWeightFromCounter(counter);
numerator += weight * val;
denominator += weight;
}
double getAverage()
{
if (denominator == 0.0) return 0.0;
return numerator / denominator;
}
Of course, calculateWeightFromCounter() in this case shouldn't generate weights that sum to one -- the trick here is that we average by dividing by the sum of the weights so that in the end, the weights virtually seem to sum to one.
The real trick is how you do calculateWeightFromCounter(). You could simply return the counter itself, for example, however note that the last weighted number would not be near the sum of the counters necessarily, so you may not end up with the exact properties you want. (It's hard to say since, as mentioned, you've left a fairly open problem.)
This is too long to post in a comment, but it may be useful to know.
Suppose you have:
w_0*v_n + ... w_n*v_0 (we'll call this w[0..n]*v[n..0] for short)
Then the next step is:
w_0*v_n1 + ... w_n1*v_0 (and this is w[0..n1]*v[n1..0] for short)
This means we need a way to calculate w[1..n1]*v[n..0] from w[0..n]*v[n..0].
It's certainly possible that v[n..0] is 0, ..., 0, z, 0, ..., 0 where z is at some location x.
If we don't have any 'extra' storage, then f(z*w(x))=z*w(x + 1) where w(x) is the weight for location x.
Rearranging the equation, w(x + 1) = f(z*w(x))/z. Well, w(x + 1) better be constant for a constant x, so f(z*w(x))/z better be constant. Hence, f must let z propagate -- that is, f(z*w(x)) = z*f(w(x)).
But here again we have an issue. Note that if z (which could be any number) can propagate through f, then w(x) certainly can. So f(z*w(x)) = w(x)*f(z). Thus f(w(x)) = w(x)/f(z).
But for a constant x, w(x) is constant, and thus f(w(x)) better be constant, too. w(x) is constant, so f(z) better be constant so that w(x)/f(z) is constant. Thus f(w(x)) = w(x)/c where c is a constant.
So, f(x)=c*x where c is a constant when x is a weight value.
So w(x+1) = c*w(x).
That is, each weight is a multiple of the previous. Thus, the weights take the form w(x)=m*b^x.
Note that this assumes the only information f has is the last aggregated value. Note that at some point you will be reduced to this case unless you're willing to store a non-constant amount of data representing your input. You cannot represent an infinite length vector of real numbers with a real number, but you can approximate them somehow in a constant, finite amount of storage. But this would merely be an approximation.
Although I haven't rigorously proven it, it is my conclusion that what you want is impossible to do with a high degree of precision, but you may be able to use log(n) space (which may as well be O(1) for many practical applications) to generate a quality approximation. You may be able to use even less.
I tried to practically code something (in Java). As has been said, your goal is not achievable. You can only count average from some number of last remembered values. If you don't need to be exact, you can approximate the older values. I tried to do it by remembering last 5 values exactly and older values only SUMmed by 5 values, remembering the last 5 SUMs. Then, the complexity is O(2n) for remembering last n+n*n values. This is a very rough approximation.
You can modify the "lastValues" and "lasAggregatedSums" array sizes as you want. See this ascii-art picture trying to display a graph of last values, showing that the first columns (older data) are remembered as aggregated value (not individually), and only the earliest 5 values are remembered individually.
values:
#####
##### ##### #
##### ##### ##### # #
##### ##### ##### ##### ## ##
##### ##### ##### ##### ##### #####
time: --->
Challenge 1: My example doesn't count weights, but I think it shouldn't be problem for you to add weights for the "lastAggregatedSums" appropriately - the only problem is, that if you want lower weights for older values, it would be harder, because the array is rotating, so it is not straightforward to know which weight for which array member. Maybe you can modify the algorithm to always "shift" values in the array instead of rotating? Then adding weights shouldn't be a problem.
Challenge 2: The arrays are initialized with 0 values, and those values are counting to the average from the beginning, even when we haven't receive enough values. If you are running the algorithm for long time, you probably don't bother that it is learning for some time at the beginning. If you do, you can post a modification ;-)
public class AverageCounter {
private float[] lastValues = new float[5];
private float[] lastAggregatedSums = new float[5];
private int valIdx = 0;
private int aggValIdx = 0;
private float avg;
public void add(float value) {
lastValues[valIdx++] = value;
if(valIdx == lastValues.length) {
// count average of last values and save into the aggregated array.
float sum = 0;
for(float v: lastValues) {sum += v;}
lastAggregatedSums[aggValIdx++] = sum;
if(aggValIdx >= lastAggregatedSums.length) {
// rotate aggregated values index
aggValIdx = 0;
}
valIdx = 0;
}
float sum = 0;
for(float v: lastValues) {sum += v;}
for(float v: lastAggregatedSums) {sum += v;}
avg = sum / (lastValues.length + lastAggregatedSums.length * lastValues.length);
}
public float getAvg() {
return avg;
}
}
you can combine (weighted sum) exponential means with different effective window sizes (N) in order to get the desired weights.
Use more exponential means to define your weight profile more detailed.
(more exponential means also means to store and calculate more values, so here is the trade off)
A memoryless solution is to calculate the new average from a weighted combination of the previous average and the new value:
average = (1 - P) * average + P * value
where P is an empirical constant, 0 <= P <= 1
expanding gives:
average = sum i (weight[i] * value[i])
where value[0] is the newest value, and
weight[i] = P * (1 - P) ^ i
When P is low, historical values are given higher weighting.
The closer P gets to 1, the more quickly it converges to newer values.
When P = 1, it's a regular assignment and ignores previous values.
If you want to maximise the contribution of value[N], maximize
weight[N] = P * (1 - P) ^ N
where 0 <= P <= 1
I discovered weight[N] is maximized when
P = 1 / (N + 1)