Berndt–Hall–Hall–Hausman algorithm does not work - algorithm

I am studying economics and trying to use BHHH algorithm to maximize the loglikelihood function. However, the algorithm converges immediately since the absolute value of approximate Hessian is too large compared with the magnitude of the derivatives. This might happen when the starting point is far from the optimal. But I have the feeling it will also happen when the magnitude of loglikelihood is large and so is the derivative. Am I right? In this case, which algorithm should be a better option? Thank you.

Related

Big Theta of Factorial Multiplied by a Coefficient

For a function with a run time of (cn)! where c is a coefficient >= 0 and c != n, would the tight bound of the run be Θ(n!) or Θ((cn)!)? Right now, I believe it would be Θ((cn)!) since they would differ by a coefficent >= n since cn != n.
Thanks!
Edit: A more specific example to clarify what I'm asking:
Will (7n)!, (5n/16)! and n! all be Θ(n!)?
You can use Stirling's approximation to get that if c>1 then (cn)! is asymptotically larger than pow(c,n)*n!, which is not O(n!) since the quotient diverges. As a more elementary approach consider this example for c=2: (2n)!=(2n)(2n-1)...(n+1)n!>n!n! and (n!n!)/n!=n! diverges, so (2n)! is NOT O(n!).
Will (7n)!, (5n/16)! and n! all be Θ(n!)?
I think there are two answers to your question.
The shorter one is from the purely theoretical point of view. Of those 3 only the n! lies in the class of Θ(n!). The second lies in the O(n!) (note big-O instead of big-Theta) and (7n)! is slower than Θ(n!), it lies in Θ((7n)!)
There is also a longer but more practical answer. And to get to it we first need to understand what is the big deal with this whole big-O and big-Theta business in the first place?
The thing is that for many practical tasks there are many algorithms and not all of them are equally or even similarly efficient. So the practical question is: can we somehow capture this difference in performance in an easy to understand and compare way? And this is the problem that big-O/big-Theta are trying to solve. The idea behind this method is that if we look at some algorithm with some complicated real formula for the exact time, there is only 1 term that grows faster than all others and thus dominates the time as the problem gets bigger. So let's compress this big formula to that dominant term. Then we can compare those terms and if they are different, we can easily say which is the better algorithm (7*n^2 is clearly better than 2*n^3).
Another idea is that the term "operation" is usually not that well defined at the level people usually think about algorithms. Which "operation" actually maps to a single CPU instruction and which to a few depends on many factors such as particular hardware. Also the instructions themselves can take different time to execute. Moreover sometimes the algorithm's working time is dominated by memory access than CPU instructions and those components are not easily additive. The morale of this story is that if two algorithms are different only in a scalar coefficient, you can't really compare those algorithms just theoretically. You need to compare some implementations in some particular environment. This is why algorithms complexity measure typically boils down to something like O(n^k) where k is a constant.
There is one more consideration: practicality. If the algorithm is some polynomial, there is a huge practical difference between cases a=3 and a=4 in O(n^a). But if it is something like O(2^(n^a)), then there is not much difference what exactly the a as along as a>1. This is because 2^n grows fast enough to make it impractical for almost any realistic n irrespective of a. So in practical terms it is often good enough approximation to put all such algorithms into a single "exponential algorithms" bucket and say they are all impractical even despite the fact there is a huge difference between them. This is where some mathematically unconventional notations like 2^O(n) come from.
From this last practical perspective the difference between Θ(n!) and Θ((7n)!) is also very little: both are totally impractical because both lie beyond even the exponential bucket of 2^O(n) (see Stirling's formula that shows that n! grows a bit faster than (n/e)^n). So it makes sense to put all such algorithms in another bucket of "factorial complexity" and mark them as impractical as well.

Comparison of A* heuristics for solving an N-puzzle

I am trying to solve the N-puzzle using the A* algorithm with 3 different heuristic functions. I want to know how to compare each of the heuristics in terms of time complexity. The heuristics I am using are: manhattan distance , manhattan distance + linear conflict, N-max swap. And specifically for an 8-puzzle and an 15-puzzle.
The N-puzzle is, in general, NP hard to find the shortest solution, so no matter what heuristic you use it's unlikely you'll be able to find any difference in complexity between them, since you won't be prove the tightness of any bound.
If you restrict yourself to the 8-puzzle or 15-puzzle, an A* algorithm with any admissible heuristic will run in O(1) time since there are a finite (albeit large) number of board positions.
As #Harold said in his comment, the approach to compare time complexity of heuristic functions is tipically by experimental tests. In your case, generate a set of n random problems for the 8-puzzle and the 15-puzzle and solve them using the different heuristic functions. Things to be aware of are:
The comparison will always depend on several factors, like hardware expecs, programming language, your skills when implementing the algorithm, ...
Generally speaking, a more informed heuristic will always expand less nodes than a less informed one, and will probably be faster.
And finally, in order to compare the three heuristics for each problem set, I would suggest a graphic with average running times (repeat for example 5 times each problem) where:
The problems are in the x-axis sorted by difficulty.
The running times are in the y-axis for each heuristic function (perhaps in logarithmic scale if the difference between the alternatives cannot be easily seen).
and a similar graphic with the number of explored states.

Where is strassen's matrix multiplication useful?

Strassen's algorithm for matrix multiplication just gives a marginal improvement over the conventional O(N^3) algorithm. It has higher constant factors and is much harder to implement. Given these shortcomings, is strassens algorithm actually useful and is it implemented in any library for matrix multiplication? Moreover, how is matrix multiplication implemented in libraries?
Generally Strassen’s Method is not preferred for practical applications for following reasons.
The constants used in Strassen’s method are high and for a typical application Naive method works better.
For Sparse matrices, there are better methods especially designed
for them.
The submatrices in recursion take extra space.
Because of the limited precision of computer arithmetic on
noninteger values, larger errors accumulate in Strassen’s algorithm
than in Naive Method.
So the idea of strassen's algorithm is that it's faster (asymptotically speaking). This could make a big difference if you are dealing with either huge matrices or else a very large number of matrix multiplications. However, just because it's faster asymptotically doesn't make it the most efficient algorithm practically. There are all sorts of implementation considerations such as caching and architecture specific quirks. Also there is also parallelism to consider.
I think your best bet would be to look at the common libraries and see what they are doing. Have a look at BLAS for example. And I think that Matlab uses MAGMA.
If your contention is that you don't think O(n^2.8) is that much faster than O(n^3) this chart shows you that n doesn't need to be very large before that difference becomes significant.
It's very important to stop at the right moment.
With 1,000 x 1,000 matrices, you can multiply them by doing seven 500 x 500 products plus a few additions. That's probably useful. With 500 x 500, maybe. With 10 x 10 matrices, most likely not. You'd just have to do some experiments first at what point to stop.
But Strassen's algorithm only saves a factor 2 (at best) when the number of rows grows by a factor 32, the number of coefficients grows by 1,024, and the total time grows by a factor 16,807 instead of 32,768. In practice, that's a "constant factor". I'd say you gain more by transposing the second matrix first so you can multiply rows by rows, then look carefully at cache sizes, vectorise as much as possible, and distribute over multiple cores that don't step on each others' feet.
Marginal improvement: True, but growing as the matrix sizes grow.
Higher constant factors: Practical implementations of Strassen's algorithm use conventional n^3 for blocks below a particular size, so this doesn't really matter.
Harder to implement: whatever.
As for what's used in practice: First, you have to understand that multiplying two ginormous dense matrices is unusual. Much more often, one or both of them is sparse, or symmetric, or upper triangular, or some other pattern, which means that there are quite a few specialized tools which are essential to the efficient large matrix multiplication toolbox. With that said, for giant dense matrices, Strassen's is The Solution.

How to translate algorithm complexity to time necessary for computation

If I know complexity of an algorithm, can I predict how long it will compute in real life?
A bit more context:
I have been trying to solve university assignment which has to find the best possible result in a game from given position. I have written an algorithm and it works, however very slow. The complexity is O(n)=5^n . For 24 elements it computes a few minutes. I'm not sure if it's because my implementation is wrong, or if this algorithm is simply very slow. Is there a way for me to approximate how much time any algorithm should take?
Worst case you can base on extrapolation. So having time on N=1,2,3,4 elements (the more the better) and O-notation estimation for algorithm complexity you can estimate time for any finite number. Another question this estimation precision goes lower and lower as N increases.
What you can do with it? Search for error estimation algorithms for such approaches. In practice it usually gives good enough result.
Also please don't forget about model adequateness checks. So having results for N=1..10 and O-notation complexity you should check 'how good' your results correlate with your O-model (if you can select numbers for O-notation formula that meets your results). If you cannot get numbers, you need either more numbers to get wider picture or ... OK, you can have wrong complexity estimation :-).
Useful links:
Brief review on algorithm complexity.
Time complexity catalogue
Really good point to start - look for examples based on code as input.
You cannot predict running time based on time complexity alone. There are many factors involved: hardware speed, programming language, implementation details, etc. The only thing you can predict using the complexity is expected time increase when the size of the input increases.
For example, personally, I've seen O(N^2) algorithms take longer than O(N^3) ones, especially on small values of N, such as it is in your case. And by, the way, 5^24 is a huge number (5.9e16). I wouldn't be surprised if that took a few hours on a supercomputer, let alone on some mid-range personal pc, which most of us are using.

Find intersection(s) of any two functions - solving simultaneous equations

Imagine having any two functions. You need to find intersections of that functions. You definitely don't want to try all x values to check for f(x)==g(x).
Normally in math, you create simultaneous equations derived from f(x)==g(x). But I see no way how to implement equations in any programing language.
So once more, what am I looking for:
Conceptual algorithm to solve equations.
The same for simultaneous and quadratic equations.
I believe there should be some workaround using function derivations, but I've recently learned derivation concept at school and I have no idea how to use it in this case.
That is a much harder problem than you would imagine. A good place to start for learning about these things is the Newton-Raphson method, which gives numerical approximations to equations of the form h(x) = 0. (When you set h(x) = g(x) - f(x), this provides solutions for the problem you are asking about.)
Exact, algebraic solving of equations (as implemented in Mathematica, for example) are even more difficult, you basically have to recreate everything you would do in your head when solving an equation on a piece of paper.
Obviously this problem is not solvable in the general case because you can construct a "function" which is arbitrarily complex. For example, if you have a "function" with 5 trillion terms in it including various transcendental and complex transformations in it, the computer could take years just to compute a single value, much less intersect it with another similar function.
So, first of all you need to define what you mean by a "function". If you mean a polynomial of degree less than 4 then the problem becomes much more straightforward. In such cases you combine the terms of the polynomial and find the roots of the equation, which will be the intersections.
If the polynomial has more than 5 terms (a quintic or greater) then there is no easy symbolic solution. In this case the terms are combined and you find the roots by iterative approximation. See Root Finding Algorithms.
If the function involves transcendentals such sin/cos/log/e^x, etc, you can potentially find the intersection by representing the functions as a series or a continued fraction. You then subtract one series from the other and set the value to zero. The solution of the continuous equation yields an approximation of the root(s).

Resources