Is this a bug in GSL's minimum finding routine? - ruby

I have run into a very strange problem using GSL recently. I'm trying to find the maximum of an ugly function by asking GSL to find the minimum of the function's negative image. This has been working fine with most of the functions I've been using it on up to now, but with certain functions, I get an error.
Specifically, the ruby GSL gem throws an exception when I feed my function to the minimum-finder, stating that the given endpoints do not enclose a minimum. However, the given endpoints DO enclose a minimum; furthermore, the results seem to depend on the initial estimate given.
When I ask GSL to find the minimum starting with a guess of 0.0, I get the error. When I ask it to find the minimum starting with a guess of 2.0, it finds a minimum. It seems strange that GSL would complain that there isn't a minimum in a given interval just depending on the initial guess.
Below is a script I've written to reproduce this error. I'm using GSL version 1.15 and the latest version of the Ruby/GSL gem wrapper, with Ruby version 1.9.3p392.
Will someone please try this script and see if they can reproduce these results? I would like to report this as a bug to the GSL maintainers, but I'd feel better doing that if I could get the error to occur on somebody else's computer too.
There are three values for the initial minimum guess provided in the script; starting with 0.0 causes GSL to throw the aforementioned error. Starting with 1.0 causes GSL to report a minimum, which happens to only be a local minimum. Starting with 2.0 causes GSL to report another minimum, which seems to be the global minimum I'm looking for.
Here is my test script:
require("gsl")
include GSL::Min
function = Function.alloc { |beta| -(((6160558822864*(Math.exp(4*beta))+523830424923*(Math.exp(3*beta))+1415357447750*(Math.exp(5*beta))+7106224104*(Math.exp(6*beta)))/(385034926429*(Math.exp(4*beta))+58203380547*(Math.exp(3*beta))+56614297910*(Math.exp(5*beta))+197395114*(Math.exp(6*beta))))-((1540139705716*(Math.exp(4*beta))+174610141641*(Math.exp(3*beta))+283071489550*(Math.exp(5*beta))+1184370684*(Math.exp(6*beta)))*(1540139705716*(Math.exp(4*beta))+174610141641*(Math.exp(3*beta))+283071489550*(Math.exp(5*beta))+1184370684*(Math.exp(6*beta)))/(385034926429*(Math.exp(4*beta))+58203380547*(Math.exp(3*beta))+56614297910*(Math.exp(5*beta))+197395114*(Math.exp(6*beta)))**2)) }
def find_maximum(fn1)
iter = 0; max_iter = 500
minimum = 0.0 # reasonable initial guess; causes GSL to crash!
#minimum = 1.0 # another initial guess, gets a local min
#minimum = 2.0 # this guess gets what appears to be the global min
a = -6.0
b = 6.0
#pretty wide interval
gmf = FMinimizer.alloc(FMinimizer::BRENT)
gmf.set(fn1, minimum, a, b)
#THIS line is failing (sometimes), complaining that the interval given doesn't contain a minimum. Which it DOES.
begin
iter += 1
status = gmf.iterate
status = gmf.test_interval(0.001, 0.0)
# puts("Converged:") if status == GSL::SUCCESS
a = gmf.x_lower
b = gmf.x_upper
minimum = gmf.x_minimum
# printf("%5d [%.7f, %.7f] %.7f %.7f\n",
# iter, a, b, minimum, b - a);
end while status == GSL::CONTINUE and iter < max_iter
minimum
end
puts find_maximum(function)
Please let me know what happens when you try this code, commenting out the different initial values for minimum. If you can see a reason why this is actually the intended behavior of GSL, I would appreciate that as well.
Thank you for your help!

I think this Mathematica notebook perfectly summarize my answer
Whenever you need to numerically minimize a particular function, you must understand it qualitatively (make plots) to avoid failure due to numerical errors. This plot shows a very important fact: Your function seems to require numerically difficult cancellations between very large numbers at beta ~ 6. Numerically issues can also arise at beta ~ -6.
Then, when you set the interval to be [-6, 6], your numerical errors can prevent GSL to find the minimum. A simple plot and few evaluations of the potentially dangerous terms can give a better understanding of the appropriate limits you must set in GSL routines.
Conclusion: if the minimum is around zero (always make plots of the functions you want to minimize!), then you must set a small interval around zero to avoid numerical problems when you have functions that depends on precise cancellation between exponentials.
Edit: In a comment you asked me to not consider the factor Abs[beta]. In this case, GSL will find multiple local minimum (=> you must diminish the interval)
. I also can't understand how you get the minimum around 1.93

GSL one-dimensional minimising routines report this error whenever your initial guess has a function value greater or equal than the function value at either extreme of the interval. So, indeed, it depends on the initial guess that you find one or another local minimum, or none at all. The error message is certainly misleading.

Related

Precision in Program analysis

According to David Brumley's Control Flow Integrity & Software Fault Isolation (PPT slide),
in the below statements, x is always 8 due to the path to the x=7 is unrealizable even with the path sensitive analysis.
Why is that?
Is it because the analysis cannot determine the values of n, a, b, and c in advance during the analysis? Or is it because there's no solution that can be calculated by a computer?
if(a^n + b^n = c^n && n>2 && a>0 && b>0 && c>0)
x = 7; /unrealizable path/
else
x = 8;
In general, the task to determine which path in the program is taken, and which — not, is undecidable. It is quite possible that a particular expression, as in your example, can be proved to have a specific value. However, the words "in general" and "undecidable" say that you cannot write an algorithm that would be able to compute the value every time.
At this point the analysis algorithm can be optimistic or pessimistic. The optimistic one could pick 8 and be fine — it considers possible that at run-time x would get this value. It could also pick 7 — "who knows, maybe, x would be 7". But if the analysis is required to be sound, and it cannot determine the value of the condition, it should assume that the first branch could be taken during one execution, and the second branch could be taken during another execution, so x could be either 7 or 8.
In other words, there is a trade-off between soundness and precision. Or, actually, between soundness, precision, and decidability. The latter property tells if the analysis always terminates. Now, you have to pick what is needed:
Decidability — this is a common choice for compilers and code analyzers, because you would like to get an answer about your program in finite time. However, proof assistants could start some processes that could run up to the specified time limit, and if the limit is not set, forever: it's up to the user to stop it and to try something else.
Soundness — this is a common choice for compilers, because you would like to get the answer that matches the language specification. Code analyzers are more flexible. Many of them are unsound, but because of that they can find more potential issues in finite time, leaving the interpretation to the developer. I believe the example you mention talks about sound analysis.
Precision — this is a rare property. Compilers and code analyzer should be pessimistic, because otherwise some incorrect code could sneak in. But this might be parameterizable. E.g., if the compiler/analyzer supports constant propagation and folding, and all of the variables in the example are set to some known constants before the condition, it can figure out the exact value of x after it, and be completely precise.

Minimum cost path / least cost path

I am currently using a library from skimage.graph and the function route_through_array to get the least cost path from one point to another in a cost map. The problem is that i have multiple start points and multiple end points, which leads to thousands of iterations. This i am fixing currently with two for loops. The following code is just an illustration:
img=np.random.rand(400,400)
img=img.astype(dtype=int)
indices=[]
costs=[]
start=[[1,1],[2,2],[3,3],[4,5],[6,17]]
end=[[301,201],[300,300],[305,305],[304,328],[336,317]]
for i in range(len(start)):
for j in range(len(end)):
index, weight = route_through_array(img, start[i],end[j])
indices.append(index)
costs.append(weight)
From what i understand from the documentation the function accepts many many end points but i do not know how to pass them in a function. Any ideas?
This should be possible much more efficiently by directly interacting with the skimage.graph.MCP Cython class. The convenience wrapper route_through_array isn't general enough. Assuming I understand your question correctly, what you are looking for is basically the MCP.find_costs() method.
Your code will then look like (neglecting imports)
img = np.random.rand(400,400)
img = img.astype(dtype=int)
starts = [[1,1], [2,2], [3,3], [4,5], [6,17]]
ends = [[301,201], [300,300], [305,305], [304,328], [336,317]]
# Pass full set of start and end points to `MCP.find_costs`
from skimage.graph import MCP
m = MCP(img)
cost_array, tracebacks_array = m.find_costs(starts, ends)
# Transpose `ends` so can be used to index in NumPy
ends_idx = tuple(np.asarray(ends).T.tolist())
costs = cost_array[ends_idx]
# Compute exact minimum cost path to each endpoint
tracebacks = [m.traceback(end) for end in ends]
Note that the raw output cost_array is actually a fully dense array the same shape as img, which has finite values only where you asked for end points. The only possible issue with this approach is if the minimum path from more than one start point converges to the same end point. You will only get the full traceback for the lower of these convergent paths through the code above.
The traceback step still has a loop. This is likely possible to remove by using the tracebacks_array and interacting with `m.offsets, which would also remove the ambiguity noted above. However, if you only want the minimum cost(s) and best path(s), this loop can be omitted - simply find the minimum cost with argmin, and trace that single endpoint (or a few endpoints, if multiple are tied for lowest) back.

Trouble implementing Perceptron in Scala

I'm taking the CalTech online course Learning From Data, and I'm stumped with creating a Perceptron in Scala. I chose Scala because I'm learning it and wanted to challenge myself. I understand the theory, and I also understand others' solutions in Python and Ruby. But I can't figure out why my own Scala code doesn't work.
For a background in the Perceptron code: Learning_algorithm
I'm running Scala 2.11 on OSX 10.10.
Per the algorithm, I start off with weights (0.0, 0.0, 0.0), where weight[2] is a learned bias component. I've already generated a test set in the space [-1, 1],[-1,1] on the X-Y plane. I do this by a) picking two random points and drawing a line through them, then b) generating some other random points and calculating if they are on one side of the line or the other. As far as I can tell by plotting it in Python, this generates linearly separable data.
My next step is to take my initialized weights and check against every point to find miss-classified points, i.e. points that don't generate the right +1 or -1 result. Here is the code that simply calculates dot-product of the weight and the vector x:
def h(weight:List[Double], p:Point ): Double = if ( (weight(0)*p.x + weight(1)*p.y + weight(2)) > 0) 1 else -1
It's the initial weights, so they are all miss-classified. I then update the weights, like so:
def newH(weight:List[Double], p:Point, y:Double): List[Double] = {
val newWt = scala.collection.mutable.ArrayBuffer[Double](0.0, 0.0, 0.0)
newWt(0) = weight(0) + p.x*y
newWt(1) = weight(1) + p.y*y
newWt(2) = weight(2) + 1*y
return newWt.toList
}
Then I identify miss-classified points again by checking the test set against the value output by h() above, and continue iterating.
This follows the algorithm (or is supposed to, at least) that Prof Yaser shows here: Library
The problem is that the algorithm never converges. My weights -- the third component of which is the bias -- keep getting more negative or more positive. My weight vector after every adjustment resembles this:
Weights: List(16.43341624736786, 11627.122008800507, -34130.0)
Weights: List(15.533397436141968, 11626.464265227318, -34131.0)
Weights: List(14.726969361305237, 11626.837346673012, -34132.0)
Weights: List(14.224745154380798, 11627.646470665932, -34133.0)
Weights: List(14.075232982635498, 11628.026384592056, -34134.0)
I'm a Scala newbie so my code is probably atrocious. But am I missing something in Scala, e.g. reassignment, that could be causing my weight to be messed up? Or have I completely misunderstood how the Perceptron even operates? Is my weight update just wrong?
Thanks for any help you can give me on this!
Thanks Till. I've discovered the two problems with my code and I'll share them, but to address your point: Someone else asked about this on the class's forum and it looks like what the Wiki formula does is simply to change the learning rate. Alpha can be picked randomly, and y-h(weight, p) would give you weights like
-1-1 = 2
In the case that y=-1 and h()=1, or
1-(-1) = 2
In the case that y=1 and h()=-1
My/the class formula takes 1*p.x instead of alpha*2, which seems to be a matter of different learning rates. Hope that makes sense.
My two problems were as follows:
The y value passed into the recalculation formula newH needs to be the target value of y, that is, the "correct y" that was discovered while generating the test points. I was passing in the y that was generated through h(), which is the guessed-at function. This makes sense obviously since we are looking to correc the weight by using the target y, not the incorrect y.
I was doing a comparison of target y and h()=yin Scala, but was comparison an element obtained from a map through .get(). My Scala map looks like Map[Point,Double] where the Double value refers to the y value generated during the test set creation. But doing a .get() gives you Option[Double] and not a Double value at all. This is explained in Scala Map#get and the return of Some() and makes a lot of sense now. I did map.get(<some Point>).get() for now, since I was focusing on debugging and not code perfection, and then I was accurately able to compare two Double values.

metafor() non-negative sampling variance

I am trying to learn meta regression using the metafor() package. In running
one of the mixed regression models, I received an error indicating
"There are outcomes with non-positive sampling variances."
I am at lost as to how to proceed with this error. I understand that certain
model statistics (e.g., I^2 and QE) cannot be computed with due to the
presence of non-positive sampling variances. However, I am not sure whether
these results can be interpreted similarly as we would have otherwise. I
also tried using other estimators and/or the unweighted option; the error
still persists.
Any suggestions would be much appreciated.
First of all, to clarify: You are getting a warning, not an error.
Aside from that, I can't think of many situations where it is reasonable to assume that the sampling variance is really equal to 0 in a particular study. I would first question whether this really makes sense. This is why the rma() function is generating this warning message -- to make the user aware of this situation and question whether this really is intended/reasonable.
But suppose that we really want to go through with this, then you have to use an estimator for tau^2 that can handle this (e.g., method="REML" -- which is actually the default). If the estimate of tau^2 ends up equal to 0 as well, then the model cannot be fitted at all (due to division by zero -- and then you get an error). If you do end up with a positive estimate of tau^2, then the results should be okay (but things like the Q-test, I^2, or H^2 cannot be computed then).

How to find the maximum value of a function in Sympy?

These days I am trying to redo shock spectrum of single degree of freedom system using Sympy. The problem can reduce to find maximum value of a function. Following are two cases I cannot figure out how to do.
The first one is
tau,t,t_r,omega,p0=symbols('tau,t,t_r,omega,p0',positive=True)
h=expand(sin(omega*(t-tau)))
f=simplify(integrate(p0*tau/t_r*h,(tau,0,t_r))+integrate(p0*h,(tau,t_r,t)))
The final goal is to obtain maximum absolute value of f (The variable is t). The direct way is
df=diff(f,t)
sln=solve(simplify(df),t)
simplify(f.subs(t,sln[1]))
Here is the result, I tried many ways, but I can not simplify any further.
Therefore, I tried another way. Because I need the maximum absolute value and the location where abs(f) is maximum happens at the same location of square of f, we can calculate square of f first.
df=expand_trig(diff(expand(f)**2,t))
sln=solve(df,t)
simplify(f.subs(t,sln[2]))
It seems the answer is almost the same, just in another form.
The expected answer is a sinc function plus a constant as following:
Therefore, the question is how to get the final presentation.
The second one may be a little harder. The question can be reduced to find the maximum value of f=sin(pi*t/t_r)-T/2/t_r*sin(2*pi/T*t), in which t_r and T are two parameters. The maximum located at different peak when the ratio of t_r and T changes. And I do not find a way to solve it in Sympy. Any suggestion? The answer can be represented in following figure.
The problem is the log(exp(I*omega*t_r/2)) term. SymPy is not reducing this to I*omega*t_r/2. SymPy doesn't simplify this because in general, log(exp(x)) != x, but rather log(exp(x)) = x + 2*pi*I*n for some integer n. But in this case, if you replace log(exp(I*omega*t_r/2)) with omega*t_r/2 or omega*t_r/2 + 2*pi*I*n, it will be the same, because it will just add a 2*pi*I*n inside the sin.
I couldn't figure out any functions that force this simplification, but the easiest way is to just do a substitution:
In [18]: print(simplify(f.subs(t,sln[1]).subs(log(exp(I*omega*t_r/2)), I*omega*t_r/2)))
p0*(omega*t_r - 2*sin(omega*t_r/2))/(omega**2*t_r)
That looks like the answer you are looking for, except for the absolute value (I'm not sure where they should come from).

Resources