Discrepancy between diagram and equations of GRU? - gated-recurrent-unit

While I was reading the blog of Colah,
In the diagram we can clearly see that zt is going to
~ht and not rt
But the equations say otherwise. Isn’t this supposed to be zt*ht-1 And not rt*ht-1.
Please correct me if I’m wrong.

I see this is somehow old, however, if you still haven't figured it out and care, or for any other person who would end up here, the answer is that the figure and equations are consistent. Note that, the operator (x) in the diagram (the pink circle with an X in it) is the Hadamard product, which is an element-wise multiplication between two tensors of the same size. In the equations, this operator is illustrated by * (usually it is represented by a circle and a dot at its center). ~h_t is the output of the tanh operator. The tanh operator receives a linear combination of the input at time t, x_t, and the result of the Hadamard product between r_t and h_{t-1}. Note that r_t should have already been updated by passing the linear combination of x_t and h_{t-1} through a sigmoid. I hope the reset is clear.

Related

Obtaining the functional form of a curve

The following is the plot of a curve f(r), where r is the radial coordinate, and plotted for different values of a parameter as shown:
However, I don't know the functional form of the curve and I am interested to find the same. Are there any numerical methods which can be used to find the functional form of f(r) in terms of the radial coordinate and the parameter?
I had found a solution of the problem based on the suggestion by ja72 to use the Eureqa software which churns through the data to create accurate predictive models using evolutionary search algorithm.
In the question, the different curves corresponds to different values of . So, initially I obtained the best fit equation for different values of and found that the following model equation is suitable for my purpose:
Then, I repeated the process for a large number of values of and calculated the values of the four functions for different values of and then individually fitted these four functions. The following are the results that I obtained:
N.B.: Eureqa gave several other better fitting formulas than those mentioned in the answer. But the formulas that I mentioned are sufficiently accurate for my purpose and have minimum complexity.
A blind curve fit without an underlying model is a dangerous thing.
You need to have an understanding of the physical model behind the data to create a successful fit. The reason is that if r is distance and the best fit curve uses r^0.4072 for example, that dimension raised to a decimal power bears no meaning and it hides any underlying assumptions.Like some other dimension l not included in the model, whereas only the dimensionless quantity (r/l) would make sense to raise to the decimal power.
From a function analysis standpoint
These curves are not the result of any standard math function. Well I am not that familiar with bessel functions, gamma functions and legendre polynomials. But none of the standard functions you find in a scientific calculator jumps out here.
If r is assumed to be dimensionless, then you try to match the asymptotic behavior when r -> 0 and when r -> ∞. The would be the baseline curve. To me it does not look hyperbolic, but rather close to 1/LN(1+r).
So change the variables make g=1/LN(1+r) and plot f(r) against g(r) and see what that looks like. Then try another round of curve fitting in the new curves ... and so on.
Nobody can answer this question
Nobody else could effectively answer this question but you, because a) you have the data, and b) you need to make assumptions about what region is important or not, and what is acceptable deviation.

explaining camera matrix from fundamental matrix?

This is a follow-up to another stack overflow question, here:
3D Correspondences from fundamental matrix
Just like in that question, I am trying to get a camera matrix from a fundamental matrix, the ultimate goal being 3d reconstruction from 2d points. The answer given there is good, and correct. I just don't understand it. It says, quote, "If you have access to Hartley and Zisserman's textbook, you can check section 9.5.3 where you will find what you need." He also provides a link to source code.
Now, here's what section 9.5.3 of the book, among other things, says:
Result 9.12. A non-zero matrix F is the fundamental matrix
corresponding to a pair of camera matrices P and P if and only if PTFP
is skew-symmetric.
That, to me, is gibberish. (I looked up skew-symmetric - it means the inverse is its negative. I have no idea how that is relevant to anything.)
Now, here is the source code given (source):
[U,S,V] = svd(F);
e = U(:,3);
P = [-vgg_contreps(e)*F e];
This is also a mystery.
So what I want to know is, how does one explain the other? Getting that code from that statement seems like black magic. How would I, or anyone, figure out that "A non-zero matrix F is the fundamental matrix corresponding to a pair of camera matrices P and P if and only if PTFP is skew-symmetric." means what the code is telling you to do, which is basically
'Take the singular value decomposition. Take the first matrix. Take the third column of that. Perform some weird re-arrangment of its values. That's your answer.' How would I have come up with this code on my own?
Can someone explain to me the section 9.5.3 and this code in plain English?
Aha, that "PTFP" is actually something I have also wondered about and could not find the answer in literature. However, this is what I figured out:
The 4x4 skew-symmetric matrix you are mentioning is not just any matrix. It is actually the dual Plücker Matrix of the baseline (see also https://en.wikipedia.org/wiki/Pl%C3%BCcker_matrix). In other words, it only gives you the line on which the camera centers are located, which is not useful for reconstruction tasks as such.
The condition you mention is identical to the more popularized fact that the fundamental matrix for the view 1 & 0 is the negative transpose of the fundamental matrix for the views 0 & 1 (using MATLAB/Octave syntax here)
Consider first that the fundamental matrix maps a point x0 in one image to line l1 in the other
l1=F*x0
Next, consider that the transpose of the projection matrix back-projects a lines l1 in the image to a plane E in space
E=P1'*l1
(I find this beautifully simple and understated in most geometry / computer vision classes)
Now, I will use a geometric argument: Two lines are corresponding epipolar lines iff they correspond to the same epipolar plane i.e. the back-projection of either line gives the same epipolar plane. Algebraically:
E=P0'*l0
E=P1'*l1
thus (the important equation)
P0'*l0=P1'*l1
Now we are almost there. Let's assume we have a 3D point X and its two projections
x0=P0*X
x1=P1*X
and the epipolar lines
l1=F*x0
l0=-F'*x1
We can just put that into the important equation and we have for all X
P0'*-F'*P1*X=P1'*F*P0*X
and finally
P0'*-F'*P1=P1'*F*P0
As you can see, the left-hand-side is the negative transpose of the right-hand-side. So this matrix is a skew symmetric 4x4 matrix.
I also published these thoughts in Section II B (towards the end of the paragraph) in the following paper. It should also explain why this matrix is a representation of the baseline.
Aichert, André, et al. "Epipolar consistency in transmission imaging."
IEEE transactions on medical imaging 34.11 (2015): 2205-2219.
https://www.ncbi.nlm.nih.gov/pubmed/25915956
Final note to #john ktejik : skew-symmetry means that a matrix is identical to its negative transpose (NOT inverse transpose)

Locally weighted logistic regression

I have been trying to implement a locally-weighted logistic regression algorithm in Ruby. As far as I know, no library currently exists for this algorithm, and there is very little information available, so it's been difficult.
My main resource has been the dissertation of Dr. Kan Deng, in which he described the algorithm in what I feel is pretty light detail. My work so far on the library is here.
I've run into trouble when trying to calculate B (beta). From what I understand, B is a (1+d x 1) vector that represents the local weighting for a particular point. After that, pi (the probability of a positive output) for that point is the sigmoid function based on the B for that point. To get B, use the Newton-Raphson algorithm recursively a certain number of times, probably no more than ten.
Equation 4-4 on page 66, the Newton-Raphson algorithm itself, doesn't make sense to me. Based on my understanding of what X and W are, (x.transpose * w * x).inverse * x.transpose * w should be a (1+d x N) matrix, which doesn't match up with B, which is (1+d x 1). The only way that would work, then, is if e were a (N x 1) vector.
At the top of page 67, under the picture, though, Dr. Deng just says that e is a ratio, which doesn't make sense to me. Is e Euler's Constant, and it just so happens that that ratio is always 2.718:1, or is it something else? Either way, the explanation doesn't seem to suggest, to me, that it's a vector, which leaves me confused.
The use of pi' is also confusing to me. Equation 4-5, the derivative of the sigmoid function w.r.t. B, gives a constant multiplied by a vector, or a vector. From my understanding, though, pi' is just supposed to be a number, to be multiplied by w and form the diagonal of the weight algorithm W.
So, my two main questions here are, what is e on page 67 and is that the 1xN matrix I need, and how does pi' in equation 4-5 end up a number?
I realize that this is a difficult question to answer, so if there is a good answer then I will come back in a few days and give it a fifty point bounty. I would send an e-mail to Dr. Deng, but I haven't been able to find out what happened to him after 1997.
If anyone has any experience with this algorithm or knows of any other resources, any help would be much appreciated!
As far as I can see, this is just a version of Logistic regression in which the terms in the log-likelihood function have a multiplicative weight depending on their distance from the point you are trying to classify. I would start by getting familiar with an explanation of logistic regression, such as http://czep.net/stat/mlelr.pdf. The "e" you mention seems to be totally unconnected with Euler's constant - I think he is using e for error.
If you can call Java from Ruby, you may be able to make use of the logistic classifier in Weka described at http://weka.sourceforge.net/doc.stable/weka/classifiers/functions/Logistic.html - this says "Although original Logistic Regression does not deal with instance weights, we modify the algorithm a little bit to handle the instance weights." If nothing else, you could download it and look at its source code. If you do this, note that it is a fairly sophisticated approach - for instance, they check beforehand to see if all the points actually lie pretty much in some subspace of the input space, and project down a few dimensions if they do.

Prescribing strange boundary conditions

Does anyone know how to prescribe boundary conditions of like u[t,0,y]==u[t,1,1-y] in Mathematica using NDSolve... It always complains that the arguments of the dependent variable should literally match the independent variable.
Thanks in advance.
This symmetry condition can probably be recast in the form Derivative[0,1][u][x,1/2]==0. Of course, more information on the problem would be helpful.
Edit in response to rcollyer:
The algebraic identity f(x)=f(1-x) for all x in (0,1) implies a geometric symmetry: the graph of f will be symmetric about the line x=1/2. Now draw the graph of such a function; if it is differentiable, you will find that f'(1/2)=0.
Now, I don't know for sure that the OP's problem can be recast this way; it rather depends on the specifics of the problem. This situation frequently arises when dealing with PDEs on the disk where the function u is a function of polar coordinates r and theta. If the disk represents a clamped drum, then perhaps you've got u(1,t)=0. But, what of u(0,t)? If the function is symmetric and smooth, then u_x(0,t)=0 is a reasonable condition.

Efficient evaluation of hypergeometric functions

Does anyone have experience with algorithms for evaluating hypergeometric functions? I would be interested in general references, but I'll describe my particular problem in case someone has dealt with it.
My specific problem is evaluating a function of the form 3F2(a, b, 1; c, d; 1) where a, b, c, and d are all positive reals and c+d > a+b+1. There are many special cases that have a closed-form formula, but as far as I know there are no such formulas in general. The power series centered at zero converges at 1, but very slowly; the ratio of consecutive coefficients goes to 1 in the limit. Maybe something like Aitken acceleration would help?
I tested Aitken acceleration and it does not seem to help for this problem (nor does Richardson extrapolation). This probably means Pade approximation doesn't work either. I might have done something wrong though, so by all means try it for yourself.
I can think of two approaches.
One is to evaluate the series at some point such as z = 0.5 where convergence is rapid to get an initial value and then step forward to z = 1 by plugging the hypergeometric differential equation into an ODE solver. I don't know how well this works in practice; it might not, due to z = 1 being a singularity (if I recall correctly).
The second is to use the definition of 3F2 in terms of the Meijer G-function. The contour integral defining the Meijer G-function can be evaluated numerically by applying Gaussian or doubly-exponential quadrature to segments of the contour. This is not terribly efficient, but it should work, and it should scale to relatively high precision.
Is it correct that you want to sum a series where you know the ratio of successive terms and it is a rational function?
I think Gosper's algorithm and the rest of the tools for proving hypergeometric identities (and finding them) do exactly this, right? (See Wilf and Zielberger's A=B book online.)

Resources