Advice on Optimization routine/Constraints to use

Advice on Optimization routine/Constraints to use - algorithm

I am trying to do some numerics and having a difficult time determining the appropriate way to solve a problem and looking for some feedback.
So far I have done all my work in Mathematica, however, I believe that the time has come where I need more control over my algorithms.
I can't post Images yet so here is a link to them
Where H is the heaviside stepfunciton. C(k) is just the FT of C(r) and m=4. N in my case is 2000 so you can see the omega is mearly the sum of a large number of exponentials. rho is just the densitiy. C(r) as you can see because m=4 has for different a coefficients. IRISM is ultimately a function of those for a coefficients.
I have these three equations working correctly I think within Mathematica however I am trying to minimize IRISM and find the 4 a values. The problem I am having is that, for obvious reasons, there is a discontinuity when the log with in the integral is equal to zero. I cannot seem to find a way to modify the Mathematica Algorithm (they are blackbox is that the right term?) so as to check the trial a values. I was using Nelder-Meade and Differential Evolution and attempting different constraints. However, I seemed to only get either imaginary results, obviously from a negative Log, or if I constrained well enough to avoid obviously only local minimum as my results did not match the "correct" results. I tried a few times with minimization algorithms that used gradients however I did not have much luck.
I think my best way to move forward is to just write a minimization routine from scratch, or modify other code, so as I can check IRISM ahead of integration for discontinuity. I have read up some on penalty function, log-barrier etc. but being somewhat new to programming was hoping someone might be able to let me know what a good approach would be to start off with. I think more than anything there is almost too much information out there on optimization and I am finding it difficult to know where to begin.
Edit: Here is the raw input. If I need to post it in a different way please let me know.
OverHat[c][a1_, a2_, a3_, a4_, k_] := (a1*(4*Pi*(Sin[k] - k*Cos[k])))/k^3 +
(a2*(4*Pi*(k*Sin[k] + 2*Cos[k] - 2)))/k^4 +
(a3*(8*Pi*(2*k - 3*Sin[k] + k*Cos[k])))/k^5 +
(a4*(-(24*Pi*(k^2 + k*Sin[k] + 4*Cos[k] - 4))))/k^6
Subscript[OverHat[\[Omega]], \[Alpha]\[Gamma]][k_, \[Alpha]\[Gamma]_, n_] :=
Exp[(-k^2)*\[Alpha]\[Gamma]*((n - \[Alpha]\[Gamma])/(6*n))]
OverHat[\[Omega]][k_] := Sum[Subscript[OverHat[\[Omega]], \[Alpha]\[Gamma]][k, \[Alpha]\[Gamma], n],
{\[Alpha]\[Gamma], 1, n}] /. n -> 2000
IRISM[a1_, a2_, a3_, a4_, \[Rho]_, kmax_] :=
\[Rho]^2*(1/15)*(20*a1 - 5*a2 + 2*a3 - a4)*Pi -
(1/(8*Pi^3))*NIntegrate[(\[Rho]*OverHat[\[Omega]][k]*OverHat[c][a1, a2, a3, a4, k] +
Log[1 - \[Rho]*OverHat[\[Omega]][k]*OverHat[c][a1, a2, a3, a4, k]])*4*Pi*k^2,
{k, 0, kmax}, WorkingPrecision -> 80]
NMinimize[IRISM[a1, a2, a3, a4, 0.9, 30], {a1, a2, a3, a4},
Method -> "DifferentialEvolution"]

Mathematica's FindMinimum aborts if it sees an imaginary number. This can happen even if your objective is real-valued inside the constraints because for default Barrier method because it poor accuracy control and can occasionally step out of bounds. Simplest way around it is to wrap your objective inside Re. You may get better answers if you post complete code.
Some general advice:
It's easier to try to simplify your objective for Mathematica than re-implement optimization algorithms. The reason is that one algorithm failing often means it's a difficult problem, and other algorithms will fail as well.
I once had a problem where FindMinimum gave warnings and failed to converge to correct minimum, which I could determine analytically, it happened with different methods, and it made sense when I plotted the objective surface, below
(source: yaroslavvb.com)
In this case, you can see the problem is very badly conditioned at the minimum (it's almost a plateau) and minimum is hard to localize.
When you have inequality constraints, default method is Barrier method, which is expensive and offers poor precision control. Very inefficient thing to do is to specify equality constraints as pairs of inequalities, ie instead of a=b, have a>=b and a<=b. This can be 3-10 times slower, and also numerically worse -- a and b might be only approximately equal in the result.
Ideally the goal is to get a problem which is convex, doesn't have any inequality constraints and is well conditioned.

Related

Sympy nsolve vs Mathematica NSolve for multivariate polynomial equations

An interesting feature regardin NSolve[] with Mathematica is that it seems to provide all the solutions it can find (and hopefully it is exhaustive). For instance, as stated in the examples:
NSolve[{x^2 + y^3 == 1, 2 x + 3 y == 4}, {x, y}]
would return an array of 3 solutions.
From what I could try, it seems to scale quite well even for, say, 20 multivariate polynomial equations with 20 variables as it can be seen in this notebook.
Alternatively, I am quite found of using Sympy which also features a kind of nsolve function.
But there is a catch: this function requires a starting point "x0" and it would possibly find only one solution - and still, provided you are lucky enough to have chosen a proper x0.
Some users suggested in the past to use a "multi-start method" where one would choose a grid of potential starting points and run nsolve multiple times. But this doesn't seem to fit with my problem: if the grid is of size d for one variable, then it would scale exponentially as 20^d starting points for my own problems of 20 variables. It doesn't seem to match with Mathematica which seems to run in a blink.
What is mathematica doing to achieve such a fast solving? Is it due to the nature of the equations? (Maybe some Groebner basis computations behind the scene)
Could it be done with Sympy?
Thank you for your help!

Mathematica is able to compute an indefinite integral but not the corresponding definite one

I'm trying to compute the following definite integral in Mathematica:
Integrate[Sqrt[3]/Sqrt[1 + Sqrt[1 + 12*u^2 - 24*\[Mu]]],
{u, -Sqrt[1 + 8*\[Mu]]/2, Sqrt[1 + 8*\[Mu]]/2}]
Only for some specific case of Mu Mathematica seems to be able to compute it. The "funny" things are that:
if I keep the integral indefinite, it returns me a solution (quite ugly, but at least...)
if I give precise values for the boundary (e.g, u=+-1/2), after a long time it just returns the definite integral without any result
if I additionally specify a precise value of Mu (so it knows Mu + boundary of integration), in one lucky case it is able to directly give me the result for the definite integral; this does not match with the value I would obtain by using the fundamental theorem of calculus (i.e. substituting the values of Mu and u in the ugly formula and taking the difference).
Has any of you an idea about what the problem could be? I would like to also point out that the square roots are always well definite for the values I consider.
Thank you.

How to implement a superoptimizer

[Related to https://codegolf.stackexchange.com/questions/12664/implement-superoptimizer-for-addition from Sep 27, 2013]
I am interested in how to write superoptimizers. In particular to find small logical formulae for sums of bits. This was previously set this as a challenge on codegolf but it seems a lot harder than one might imagine.
I would like to write code that finds the smallest possible propositional logical formula to check if the sum of y binary 0/1 variables equals some value x. Let us call the variables x1, x2, x3, x4 etc. In the simplest approach the logical formula should be equivalent to the sum. That is, the logical formula should be true if and only if the sum equals x.
Here is a naive way to do that. Say y=15 and x = 5. Pick all 3003 different ways of choosing 5 variables and for each make a new clause with the AND of those variables AND the AND of the negation of the remaining variables. You end up with 3003 clauses each of length exactly 15 for a total cost of 45054.
However, if you are allowed to introduce new variables into your solution then you can potentially reduce this a lot by eliminating common subformulae. So in this case your logical formula consists of the y binary variables, x and some new variables. The whole formula would be satisfiable if and only if the sum of the y variables equals x. The only allowed operators are and, or and not.
It turns out there is a clever method for solving this problem when x =1, at least in theory . However, I am looking for a computational intensive method to search for small solutions.
How can you make a superoptimizer for this problem?
Examples. Take as an example two variables where we want a logical formula that is True exactly when they sum to 1. One possible answer is:
(((not y0) and (y1)) or ((y0) and (not y1)))
To introduce a new variable into a formula such as z0 to represent y0 and not y1 then we can introduce a new clause (y0 and not y1) or not z0 and replace y0 and not y1 by z0 throughout the rest of the formula . Of course this is pointless in this example as it makes the expression longer.

Write your desired sum in binary. First look at the least important bit, y0 . Clearly,
x1 xor x2 xor ... xor xn = y0 - that's your first formula. The final formula will be a conjunction of formulae for each bit of the desired sum.
Now, do you know how an adder is implemented? http://en.wikipedia.org/wiki/Adder_(electronics) . Take inspiration from it, group your input into pairs/triples of bits, calculate the carry bits, and use them to make formulae for y1...yk . If you need further hints, let me know.

If I understand what you're asking, you'll want to look into the general topics of logic minimization and/or Boolean function simplification. The references are mostly about general methods for eliminating redundancy in Boolean formulas that are disjunctions ("or"s) of terms that are conjunctions ("and"s).
By hand, the standard method is called a Karnaugh map. The equivalent algorithm expressed in a way that's more amenable to computer implementation is Quine-McKlosky (also called the method of prime implicants). The minimization problem is NP-hard, and QM solves it exactly.
Therefore I think QM is what you want for the "super-optimizer" you're trying to build.
But the combination of NP-hard and exact solution means that QM is impractical for large problems, at least non-trivial ones.
The QM Algorithm lays out the conjunctive terms (called minterms in this context) in a table and conducts searches for 1-bit differences between pairs of terms. These terms can be combined and the factor for the differing bit labeled "don't care" in further combinations. This is repeated with 2-bit, 4-bit, etc. subsets of bits. The exponential behavior results because choices are involved for the combinations of larger bit sets: choosing one rules out another. Therefore it is essentially a search problem.
There is an enormous literature on heuristics to trim the search space, yet find "good" solutions that aren't necessarily optimal. A famous one is Espresso. However, since algorithm improvements translate directly to dollars in semiconductor manufacture, it's entirely possible that the best are proprietary and closely held.

Locally weighted logistic regression

I have been trying to implement a locally-weighted logistic regression algorithm in Ruby. As far as I know, no library currently exists for this algorithm, and there is very little information available, so it's been difficult.
My main resource has been the dissertation of Dr. Kan Deng, in which he described the algorithm in what I feel is pretty light detail. My work so far on the library is here.
I've run into trouble when trying to calculate B (beta). From what I understand, B is a (1+d x 1) vector that represents the local weighting for a particular point. After that, pi (the probability of a positive output) for that point is the sigmoid function based on the B for that point. To get B, use the Newton-Raphson algorithm recursively a certain number of times, probably no more than ten.
Equation 4-4 on page 66, the Newton-Raphson algorithm itself, doesn't make sense to me. Based on my understanding of what X and W are, (x.transpose * w * x).inverse * x.transpose * w should be a (1+d x N) matrix, which doesn't match up with B, which is (1+d x 1). The only way that would work, then, is if e were a (N x 1) vector.
At the top of page 67, under the picture, though, Dr. Deng just says that e is a ratio, which doesn't make sense to me. Is e Euler's Constant, and it just so happens that that ratio is always 2.718:1, or is it something else? Either way, the explanation doesn't seem to suggest, to me, that it's a vector, which leaves me confused.
The use of pi' is also confusing to me. Equation 4-5, the derivative of the sigmoid function w.r.t. B, gives a constant multiplied by a vector, or a vector. From my understanding, though, pi' is just supposed to be a number, to be multiplied by w and form the diagonal of the weight algorithm W.
So, my two main questions here are, what is e on page 67 and is that the 1xN matrix I need, and how does pi' in equation 4-5 end up a number?
I realize that this is a difficult question to answer, so if there is a good answer then I will come back in a few days and give it a fifty point bounty. I would send an e-mail to Dr. Deng, but I haven't been able to find out what happened to him after 1997.
If anyone has any experience with this algorithm or knows of any other resources, any help would be much appreciated!

As far as I can see, this is just a version of Logistic regression in which the terms in the log-likelihood function have a multiplicative weight depending on their distance from the point you are trying to classify. I would start by getting familiar with an explanation of logistic regression, such as http://czep.net/stat/mlelr.pdf. The "e" you mention seems to be totally unconnected with Euler's constant - I think he is using e for error.
If you can call Java from Ruby, you may be able to make use of the logistic classifier in Weka described at http://weka.sourceforge.net/doc.stable/weka/classifiers/functions/Logistic.html - this says "Although original Logistic Regression does not deal with instance weights, we modify the algorithm a little bit to handle the instance weights." If nothing else, you could download it and look at its source code. If you do this, note that it is a fairly sophisticated approach - for instance, they check beforehand to see if all the points actually lie pretty much in some subspace of the input space, and project down a few dimensions if they do.

Efficient evaluation of hypergeometric functions

Does anyone have experience with algorithms for evaluating hypergeometric functions? I would be interested in general references, but I'll describe my particular problem in case someone has dealt with it.
My specific problem is evaluating a function of the form 3F2(a, b, 1; c, d; 1) where a, b, c, and d are all positive reals and c+d > a+b+1. There are many special cases that have a closed-form formula, but as far as I know there are no such formulas in general. The power series centered at zero converges at 1, but very slowly; the ratio of consecutive coefficients goes to 1 in the limit. Maybe something like Aitken acceleration would help?

I tested Aitken acceleration and it does not seem to help for this problem (nor does Richardson extrapolation). This probably means Pade approximation doesn't work either. I might have done something wrong though, so by all means try it for yourself.
I can think of two approaches.
One is to evaluate the series at some point such as z = 0.5 where convergence is rapid to get an initial value and then step forward to z = 1 by plugging the hypergeometric differential equation into an ODE solver. I don't know how well this works in practice; it might not, due to z = 1 being a singularity (if I recall correctly).
The second is to use the definition of 3F2 in terms of the Meijer G-function. The contour integral defining the Meijer G-function can be evaluated numerically by applying Gaussian or doubly-exponential quadrature to segments of the contour. This is not terribly efficient, but it should work, and it should scale to relatively high precision.

Is it correct that you want to sum a series where you know the ratio of successive terms and it is a rational function?
I think Gosper's algorithm and the rest of the tools for proving hypergeometric identities (and finding them) do exactly this, right? (See Wilf and Zielberger's A=B book online.)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio