Reduced chi-square too low (close to 0) after weighted fit - convolution integral - Python lmfit - curve-fitting

I'm fitting the following data where t: time (s), G: counts per second, f: impulse function (mm/s):
t G f
0 4.58 0
900 11.73 (11/900)
1800 18.23 (8.25/900)
2700 19.33 (3/900)
3600 19.04 (0.5/900)
4500 17.21 0
5400 12.98 0
6300 11.59 0
7200 9.26 0
8100 7.66 0
9000 6.59 0
9900 5.68 0
10800 5.1 0
Using the following convolution integral:
And more specifically:
Where: lambda_1 = 0.000431062 and lambda_2 = 0.000580525.
The code used to perform that fitting is:
#Extract data into numpy arrays
t=df['t'].as_matrix()
g=df['G'].as_matrix()
f=df['f'].as_matrix()
#add parameters
params=Parameters()
params.add('a',value=1)
params.add('b',value=0.7)
params.add('c',value =1)
#define functions
def exp(x,k):
return np.exp(-x*k)
def residuals(params,x,y):
A=params['a'].value
B=params['b'].value
C=params['c'].value
dt=x[2]-x[1]
model = A*(np.convolve(exp(x,lambda_1), f))[:len(x)]*dt+B*np.convolve(exp(x,lambda_2), f)[:len(x)]*dt+C
weights=1/np.sqrt(y)
return (model - y)*weights
#perform fit using leastsq
result = minimize(residuals, params, args=(t,g))
final = g + result.residual
print(report_fit(result))
It works, however I obtain a very low reduced chi-square (around 0) when I multiply the residual to be minimized by the weight (1/np.sqrt (g) (weighted fit). If I do not taken into account the weight (non-weighted fit), I obtain a reduced chi-square of 0.254. I would like to obtain a reduced chi-square around 1.

A reduced chi-square far below 1 would imply that your estimate of the uncertainty in the data is far too large. If I read your example correctly, you are using the square-root of G as the uncertainty in G. Using the square root is a standard approach for estimating uncertainties in values dominated by counting statistics.
But... your G is a floating point number that you describe as counts per second. I might assume counts per second over 900 seconds.
If that is right (and we assume for simplicity no significant uncertainty in that time duration), then the uncertainties should be 30x smaller than you have them. That is, you are using
g_values = [4.58 , 11.73, 18.23]
g_uncertainties = sqrt(g_values) = [2.1401, 3.4249, 4.2697]
but the uncertainties in the counts would be sqrt(g_values*900), and so the uncertainties in counts per second by sqrt(g_values*900)/900 = sqrt(g_values)/30.
More formally, the uncertainties in a value representing "counts per time" would add the uncertainties in counts and the uncertainties in time in quadrature. But again, the uncertainties in your time are probably very small (or, at least your time data implies that it is below 1 second).

Related

How to get the number e (2.718) using a random number sensor?

Is it possible to calculate the number e (2.718) using random numbers?
I'm assuming that when you say "using random numbers" you mean "using some sort of random sampling scheme." If you want the exact answer to an infinite number of decimals, the answer is "no, not unless you have an infinite amount of time." However, we can generate random sequences whose expected value is e, and we can assess the sampling error using basic statistics. By increasing the sample size, we can decrease the sampling error to any precision you want as long as you specify your desired confidence level.
It turns out that if you sum a bunch of random uniform(0,1)'s until the sum exceeds 1, the quantity of uniforms required has an expected value of e. We can turn that into a sampling problem by writing a method/function to return the count, and taking the average of the values obtained by calling that method multiple times.
You didn't specify any particular language, so here it is in Ruby (which is practically like pseudocode):
require 'quickstats' # install from rubygems w/ 'gem install quickstats'
def trial # generate results of one trial
count = 0
sum = 0.0
while sum < 1.0
count += 1
sum += rand # Ruby's rand produces U(0,1) values by default
end
return count # added "return" keyword for non-rubyists' readability
end
stats = QuickStats.new
10_000_000.times { stats.new_obs trial } # more precision? bump up sample size
puts "Average = #{stats.avg}"
half_width = 1.96 * stats.std_err
puts "CI half-width = #{half_width}"
deviation = (stats.avg - Math::E).abs
puts " |E - avg| = #{deviation} (should be ≤ half-width 95% of the time)"
This runs in under 4 seconds on my laptop and produces outputs such as:
Average = 2.7179918000002234
CI half-width = 0.0005421324752620413
|E - avg| = 0.0002900284588216451 (should be ≤ half-width 95% of the time)
Here’s another option. Consider the following probability question: you have a biased coin that comes up heads with probability 1/n. You then flip the coin n times. What is the probability that you never flip heads? Well, that’s the probability that you flip tails n times, which is (1 - 1/n)n, which as n tends towards infinity starts to rapidly approach 1/e. You could therefore estimate e by picking some modest value of n, simulating n tosses of a coin that comes up heads with probability 1/n, and seeing whether you never flip heads. The proportion of trials that don’t yield heads will approach 1/e, and from there you can estimate e.
For example, here's Python code to flip a coin with heads probability 1/n a total of n times (done by sampling a uniformly random number between 0 and 1) and see if all of them are tails:
from random import random
def one_trial(n):
for i in range(n):
if random() < 1 / n:
return False
return True
We can then run a large number of trials and see which fraction of them are all tails. That fraction will be approximately 1/e, so we just take the reciprocal:
def estimate_e(n, num_trials):
successes = 0
for i in range(num_trials):
if one_trial(n):
successes += 1
return num_trials / successes
Doing this with n = 210 and num_trials = 220 gave me the estimate
e ≈ 2.7198016257969466,
which isn't too bad.

Q learning - epsilon greedy update

I am trying to understand the epsilon - greedy method in DQN. I am learning from the code available in https://github.com/karpathy/convnetjs/blob/master/build/deepqlearn.js
Following is the update rule for epsilon which changes with age as below:
$this.epsilon = Math.min(1.0, Math.max(this.epsilon_min, 1.0-(this.age - this.learning_steps_burnin)/(this.learning_steps_total - this.learning_steps_burnin)));
Does this mean the epsilon value starts with min (chosen by user) and then increase with age reaching upto burnin steps and eventually becoming to 1? Or Does the epsilon start around 1 and then decays to epsilon_min ?
Either way, then the learning almost stops after this process. So, do we need to choose the learning_steps_burnin and learning_steps_total carefully enough? Any thoughts on what value needs to be chosen?
Since epsilon denotes the amount of randomness in your policy (action is greedy with probability 1-epsilon and random with probability epsilon), you want to start with a fairly randomized policy and later slowly move towards a deterministic policy. Therefore, you usually start with a large epsilon (like 0.9, or 1.0 in your code) and decay it to a small value (like 0.1). Most common and simple approaches are linear decay and exponential decay. Usually, you have an idea of how many learning steps you will perform (what in your code is called learning_steps_total) and tune the decay factor (your learning_steps_burnin) such that in this interval epsilon goes from 0.9 to 0.1.
Your code is an example of linear decay.
An example of exponential decay is
epsilon = 0.9
decay = 0.9999
min_epsilon = 0.1
for i from 1 to n
epsilon = max(min_epsilon, epsilon*decay)
Personally I recommend an epsilon decay such that after about 50/75% of the training you reach the minimum value of espilon (advice from 0.05 to 0.0025) from which then you have only the improvement of the policy itself.
I created a specific script to set the various parameters and it returns after what the decay stop is reached (at the indicated value)
import matplotlib.pyplot as plt
import numpy as np
eps_start = 1.0
eps_min = 0.05
eps_decay = 0.9994
epochs = 10000
pct = 0
df = np.zeros(epochs)
for i in range(epochs):
if i == 0:
df[i] = eps_start
else:
df[i] = df[i-1] * eps_decay
if df[i] <= eps_min:
print(i)
stop = i
break
print("With this parameter you will stop epsilon decay after {}% of training".format(stop/epochs*100))
plt.plot(df)
plt.show()

how do you compute the constant, c, for the asymptotic runtimes of heapSort?

I am trying to understand how to compute the constant, c, when given the data. Before showing the data, I will inform you that I have already graphed the data with a linear trend on Excel. I am still quite baffled as to what I should use to calculate c.
Key question: How do you find some c that makes O(g(n)) true?
Expecting that you do not need to find T(n). The graphs you create should be sufficient.
Data for HeapSort:
1 0
5 0
10 0
50 0
100 0
500 0
1000 0
5000 0
10,000 0.01
50,000 0.04
100,000 0.1
500,000 0.484
1,000,000 1.346
5,000,000 6.596667
10,000,000 14.854
Generally, this sort of problem is solved by fitting the data to an expected function (such as t = cn + b, or t = cnlogn + b) using a least-squares method. Assuming that the "c" you are requesting is the constant factor in front of the main term of your runtime, you will get c with that method.
The value of c will of course be dependent on the particular code that is running and the particular machine on which it is running.

Analytical way of speeding up exp(A*x) in MATLAB

I need to calculate f(x)=exp(A*x) repeatedly for a tiny, variable column vector x and a huge, constant matrix A (many rows, few columns). In other words, the x are few, but the A*x are many. My problem dimensions are such that A*x takes about as much runtime as the exp() part.
Apart from Taylor expansion and pre-calculating a range of values exp(y) (assuming known the range y of values of A*x), which I haven't managed to speed up considerably (while maintaining accuracy) with respect to what MATLAB is doing on its own, I am thinking about analytically restating the problem in order to be able to precalculate some values.
For example, I find that exp(A*x)_i = exp(\sum_j A_ij x_j) = \prod_j exp(A_ij x_j) = \prod_j exp(A_ij)^x_j
This would allow me to precalculate exp(A) once, but the required exponentiation in the loop is as costly as the original exp() function call, and the multiplications (\prod) have to be carried out in addition.
Is there any other idea that I could follow, or solutions within MATLAB that I may have missed?
Edit: some more details
A is 26873856 by 81 in size (yes, it's that huge), so x is 81 by 1.
nnz(A) / numel(A) is 0.0012, nnz(A*x) / numel(A*x) is 0.0075. I already use a sparse matrix to represent A, however, exp() of a sparse matrix is not sparse any longer. So in fact, I store x non-sparse and I calculate exp(full(A*x)) which turned out to be as fast/slow as full(exp(A*x)) (I think A*x is non-sparse anyway, since x is non-sparse.) exp(full(A*sparse(x))) is a way to have a sparse A*x, but is slower. Even slower variants are exp(A*sparse(x)) (with doubled memory impact for a non-sparse matrix of type sparse) and full(exp(A*sparse(x)) (which again yields a non-sparse result).
sx = sparse(x);
tic, for i = 1 : 10, exp(full(A*x)); end, toc
tic, for i = 1 : 10, full(exp(A*x)); end, toc
tic, for i = 1 : 10, exp(full(A*sx)); end, toc
tic, for i = 1 : 10, exp(A*sx); end, toc
tic, for i = 1 : 10, full(exp(A*sx)); end, toc
Elapsed time is 1.485935 seconds.
Elapsed time is 1.511304 seconds.
Elapsed time is 2.060104 seconds.
Elapsed time is 3.194711 seconds.
Elapsed time is 4.534749 seconds.
Yes, I do calculate element-wise exp, I update the above equation to reflect that.
One more edit: I tried to be smart, with little success:
tic, for i = 1 : 10, B = exp(A*x); end, toc
tic, for i = 1 : 10, C = 1 + full(spfun(#(x) exp(x) - 1, A * sx)); end, toc
tic, for i = 1 : 10, D = 1 + full(spfun(#(x) exp(x) - 1, A * x)); end, toc
tic, for i = 1 : 10, E = 1 + full(spfun(#(x) exp(x) - 1, sparse(A * x))); end, toc
tic, for i = 1 : 10, F = 1 + spfun(#(x) exp(x) - 1, A * sx); end, toc
tic, for i = 1 : 10, G = 1 + spfun(#(x) exp(x) - 1, A * x); end, toc
tic, for i = 1 : 10, H = 1 + spfun(#(x) exp(x) - 1, sparse(A * x)); end, toc
Elapsed time is 1.490776 seconds.
Elapsed time is 2.031305 seconds.
Elapsed time is 2.743365 seconds.
Elapsed time is 2.818630 seconds.
Elapsed time is 2.176082 seconds.
Elapsed time is 2.779800 seconds.
Elapsed time is 2.900107 seconds.
Computers don't really do exponents. You would think they do, but what they do is high-accuracy polynomial approximations.
References:
http://www.math.vanderbilt.edu/~esaff/texts/13.pdf
http://deepblue.lib.umich.edu/bitstream/handle/2027.42/33109/0000495.pdf
http://www.cs.yale.edu/homes/sachdeva/pubs/fast-algos-via-approx-theory.pdf
The last reference looked quite nice. Perhaps it should have been first.
Since you are working on images, you likely have discrete number of intensity levels (255 typically). This can allow reduced sampling, or lookups, depending on the nature of "A". One way to check this is to do something like the following for a sufficiently representative group of values of "x":
y=Ax
cdfplot(y(:))
If you were able to pre-segment your images into "more interesting" and "not as interesting" - like if you were looking at an x-ray being able to trim out all the "outside the human body" locations and clamp them to zero to pre-sparsify your data, that could reduce your number of unique values. You might consider the previous for each unique "mode" inside the data.
My approaches would include:
look at alternate formulations of exp(x) that are lower accuracy but higher speed
consider table lookups if you have few enough levels of "x"
consider a combination of interpolation and table lookups if you have "slightly too many" levels to do a table lookup
consider a single lookup (or alternate formulation) based on segmented mode. If you know it is a bone and are looking for a vein, then maybe it should get less high-cost data processing applied.
Now I have to ask myself why would you be living in so many iterations of exp(A*x)*x and I think you might be switching back and forth between frequency/wavenumber domain and time/space domain. You also might be dealing with probabilities using exp(x) as a basis, and doing some Bayesian fun. I don't know that exp(x) is a good conjugate prior, so I'm going to go with the fourier material.
Other options:
- consider use of fft, fft2, or fftn given your matrices - they are fast and might do part of what you are looking for.
I am sure there is a forier domain variation on the following:
https://mathoverflow.net/questions/34173/fast-matrix-multiplication
http://www-cc.cs.uni-saarland.de/media/oldmaterial/bc.pdf
http://arxiv.org/PS_cache/math/pdf/0511/0511460v1.pdf
You might be able to mix the lookup with a compute using the woodbury matrix. I would have to think about that some to be sure though. (link) At one point I knew that everything that mattered (CFD, FEA, FFT) were all about the matrix inversion, but I have since forgotten the particular details.
Now, if you are living in MatLab then you might consider using "coder" which converts MatLab code to c-code. No matter how much fun an interpreter may be, a good c-compiler can be a lot faster. The mnemonic (hopefully not too ambitious) that I use is shown here: link starting around 13:49. It is really simple, but it shows the difference between a canonical interpreted language (python) and compiled version of the same (cython/c).
I'm sure that if I had some more specifics, and was requested to, then I could engage more aggressively in a more specifically relevant answer.
You might not have a good way to do it on conventional hardware, buy you might consider something like a GPGPU. CUDA and its peers have massively parallel operations that allow substantial speedup for the cost of a few video cards. You can have thousands of "cores" (overglorified pipelines) doing the work of a few ALU's and if the job is properly parallelizable (as this looks like) then it can get done a LOT faster.
EDIT:
I was thinking about Eureqa. One option that I would consider if I had some "big iron" for development but not production would be to use their Eureqa product to come up with a fast enough, accurate enough approximation.
If you performed a 'quick' singular value decomposition of your "A" matrix, you would find that the dominant performance is governed by 81 eigenvectors. I would look at the eigenvalues and see if there were only a few of those 81 eigenvectors providing the majority of the information. If that was the case, then you can clamp the others to zero, and construct a simple transformation.
Now, if it were me, I would want to get "A" out of the exponent. I'm wondering if you can look at the 81x81 eigenvector matrix and "x" and think a little about linear algebra, and what space you are projecting your vectors into. Is there any way that you can make a function that looks like the following:
f(x) = B2 * exp( B1 * x )
such that the
B1 * x
is much smaller rank than your current
Ax.

Algorithm to smooth numbers with variable input time

I have an app that accepts integers at a variable rate every .25 to 2 seconds.
I'd like to output the data in a smoothed format for 3, 5 or 7 seconds depending on user input.
If the data always came in at the same rate, let's say every .25 seconds, then this would be easy. The variable rate is what confuses me.
Data might come in like this:
Time - Data
0.25 - 100
0.50 - 102
1.00 - 110
1.25 - 108
2.25 - 107
2.50 - 102
ect...
I'd like to display a 3 second rolling average every .25 seconds on my display.
The simplest form of doing this is to put each item into an array with a time stamp.
array.push([0.25, 100])
array.push([0.50, 102])
array.push([1.00, 110])
array.push([1.25, 108])
ect...
Then every .25 seconds I would read through the array, back to front, until I got to a time that was less than now() - rollingAverageTime. I would sum that and display it. I would then .Shift() the beginning of the array.
That seems not very efficient though. I was wondering if someone had a better way to do this.
Why don't you save the timestamp of the starting value and then accumulate the values and the number of samples until you get a timestamp that is >= startingTime + rollingAverageTime and then divide the accumulator by the number of samples taken?
EDIT:
If you want to preserve the number of samples, you can do this way:
Take the accumulator, and for each input value sum it and store the value and the timestamp in a shift register; at every cycle, you have to compare the latest sample's timestamp with the oldest timestamp in the shift register plus the smoothing time; if it's equal or more, subtract the oldest saved value from the accumulator, delete that entry from the shift register and output the accumulator, divided by the smoothing time. If you iterate you obtain a rolling average with (i think) the least amount of computation for each cycle:
a sum (to increment the accumulator)
a sum and a subtraction (to compare the timestamp)
a subtraction (from the accumulator)
a division (to calculate the average, done in a smart way can be a shift right)
For a total of about 4 algebric sums and a division (or shift)
EDIT:
For taking into account the time from the last sample as a weighting factor, you can divide the value for the ratio between this time and the averaging time, and you obtain an already weighted average, without having to divide the accumulator.
I added this part because it doesn't add computational load, so you can implement quite easy if you want to.
The answer from clabacchio has the basics right, but perhaps you need a bit more sophisticated answer.
Calculating the average:
0.25 - 100
0.50 - 102
1.00 - 110
In the above subset of the data what is the answer you want? You could use the mean of these numbers or you could do it in a weighted fashion. You could convert the data into:
0.50 - 0.25 = 0.25 ---- (100+102)/2 = 101
1.00 - 0.50 = 0.50 ---- (102+110)/2 = 106
Then you can take the weighted average of these values, weight being the time difference, and value being the average value.
The final answer = (0.25*101 + 0.5*106)/(0.25+0.5) = whatever the value is.
Now coming to "moving" averages:
You can either use previous k values or previous k seconds worth of data. In both cases you can keep two sums: weighted sum and sum of weights.
So... the worst case scenario is 4 readings per second over 7 seconds = 28 values in your array to process. That will be done in nanoseconds anyway, so not worth optimizing IMHO.

Resources