Quadratic Approximation for Log-Likelihood Ratio Processes - approximation

I'm trying to understand why the quadratic equation can approximate the log likelihood ratio, and how it is derived enter image description here. Is this approximated using Taylor's series or normal distribution equation or anything else?
This was brought up in book 'Essential Medical statistics' Chapter 28, the main goal was to derive a supported range (similar to the 95% CI) for the likelihood ratio.
It was mentioned in the book that the log of the likelihood ratio (LR) is used instead of the likelihood itself, because the log(LR) can be approximated by a quadratic equation (the one shown above), for easier calculation. It is also said that this equation is chosen so as to meet the curve of and to have the same curvature as the log(LR) at the MLE.

Related

What is the difference between Stochastic Gradient Descent and LightGBM?

Although I have individually researched these concepts, I am confused on whether one or the other can be chosen for a solution, or can both of these be used simultaneously to improve results? Any guidance you can provide will be much appreciated.
My understanding is that the cost function of gradient descent is based on the entire training set whereas stochastic gradient descent approximates the cost of the true gradient using much less than the entire training set.
The question of which to use and when is based on determining whether there is sufficient computing power to calculate the exact cost of the gradient. If there is sufficient computing power and time then calculate it exactly.
If the training set is too large, stochastic gradient descent is worth a try. Use both for testing the quality of the approximation.
In general, I would not use both for the same reason I would never average an exact value and it's approximation. (Ex: 1=1 but 1 is also approximately 0.99 so (1+0.99)/2 = 0.995)

2nd order symplectic exponentially fitted integrator

I have to solve equations of motion of a charged particle under the effect of electromagnetic field. Since I have to deal with speed over precision I could not use adaptive stepsize algorithms (like Runge-Kutta Cash-Karp) because they would take too much time. I was looking for an algorithm which is both symplectic (like Boris integration) and exponentially fitted (in order to solve the equation of motion even if the equation is stiff). I found a method but it is for second order differential equations:
https://www.math.purdue.edu/~xiaj/work/SEFRKN.pdf
Later I found a paper which would describe a fourth order symplectic exponentially-fitted Runge-Kutta:
http://users.ugent.be/~gvdbergh/files/publatex/annals1.pdf
Since I have to deal with speed I was looking for a lower order algorithm. Does a 2nd order symplectic exponentially fitted ODE algorithm exist?

What is the ε (epsilon) parameter in Locality Sensitive Hashing (LSH)?

I've read the original paper about Locality Sensitive Hashing.
The complexity is in function of the parameter ε, but I don't understand what it is.
Can you explain its meaning please?
ε is the approximation parameter.
LSH (as FLANN & kd-GeRaF) is designed for high dimensional data. In that space, k-NN doesn't work well, in fact it is almost as slow as brute force, because of the curse of dimensionality.
For that reason, we focus on solving the aproximate k-NN. Check Definition 1 from our paper, which basically say that it's OK to return an approximate neighbor lying in (1 + ε) further distance than the exact neighbor.
Check the image below:
here you see what does it mean finding the exact/approximate NN. In the traditional problem of NNS (Nearest Neighbor Search), we are asked to find the exact NN. In the modern problem, the approximate NNS, we are asked to find some neighbor inside the (1+ε) radius, thus either the exact or approximate NN would be a valid answer!
So, with a high probability, LSH will return a NN inside that (1+ε) radius. For ε = 0, we actually solve the exact NN problem.

Strassen's Algorithm proof

I have been reading about the Strassen Algorithm for matrix multiplication.
As mentioned in Introduction to Algorithms by Cormen , the algorithm is not intuitive. However I am curious to know if there exists any rigorous mathematical proof of the algorithm and what actually went into the design of the algorithm.
I tried searching on Google and stackoverflow, but all links are only on comparing Strassen's approach to standard matrix multiplication approach or they elaborate on the procedure presented by the algorithm.
You should go to the source material. In this case, the original paper by Strassen:
Strassen, Volker, Gaussian Elimination is not Optimal, Numer. Math. 13, p. 354-356, 1969
http://link.springer.com/article/10.1007%2FBF02165411?LI=true
Even though I haven't read it myself, I would assume that there is a rigorous discussion and proof of the complexity of the algorithm.
It looks like Professor Strassen is still active (http://en.wikipedia.org/wiki/Volker_Strassen) and has a home page (http://www.math.uni-konstanz.de/~strassen/). If, after learning as much as you can about the algorithm, you are still interested in learning more, I don't think a carefully worded email to the professor would be out of the question.
Unfortunately, there does not seem to be a free version of the paper available online despite the fact that the work was completed at a public university (UC Berkeley) using federal funds (NSF grant), but that is a completely separate issue we shouldn't discuss here.
If you are a student, you will likely have access via your school, or at least your school could get you a copy without cost to you. Good luck.
The proof that Strassen's algorithm should exist is a simple dimension count (combined with a proof that the naive dimension count gives the correct answer). Consider the vector
space of all bilinear
map $C^n\times C^n \rightarrow C^n$, this is a vector space of dimension $n^3$ (in the case of matrix multiplication, we have $n=m^2$, e.g. $n=4$ for the $2\times 2$ case). The set of bilinear
maps of rank one, i.e., those computable in an algorithm using just one scalar multiplication, has dimension $3(n-1)+1$ and the set of bilinear maps of rank at
most $r$ has dimension the min of $r[3(n-1)]+r$ and $n^3$ for most values of $n,r$ (and one can check that
this is correct when $r=7,n=4$. Thus any bilinear map $C^4\times C^4\rightarrow C^4$,
with probability one has rank at most $7$, and may always be approximated to arbitrary
precision by a bilinear map of rank at most $7$.

maximum likelihood and support vector complexity

Can anyone give some references showing how to determine the maximum likelihood and support vector machine classifiers' computation complexity?
I have been searching the web but don't seem to find a good docs that details how to find the equations that model the computation complexity of those classifier algorithms.
Thanks
Support vector machines, and a number of maximum likelihood fits are convex minimization problems. Therefore they could in theory be solved in polynomial time using http://en.wikipedia.org/wiki/Ellipsoid_method.
I suspect that you can get much better estimates if you consider methods. http://www.cse.ust.hk/~jamesk/papers/jmlr05.pdf says that standard SVM fitting on m instances costs O(m^3) time and O(m^2) space. http://research.microsoft.com/en-us/um/people/minka/papers/logreg/minka-logreg.pdf gives costs per iteration for logistic regression but does not give a theoretical basis for estimating the number of iterations. In practice I would hope that this goes to quadratic convergence most of the time and is not too bad.

Resources