I'm following this paper to implement and Attentive Pooling Network to build a Question Answering system. In chapter 2.1, it speaks about the CNN layer:
where q_emb is a question where each token (word) has been embedded using word2vec. q_emb has shape (d, M). d is the dimension of the word embedding and M the length of the question. In a similar way, a_emb is the embedding of the answer with shape (d, L).
My question is: how is the convolution done and how is it possible that W_1 and b_1 are the same for both the operations? In my opinion at least b_1 should have a different dimension in each case (and it should be a matrix, not a vector....).
At the moment I've implemented this operation in PyTorch:
### Input is a tensor of shape (batch_size, 1, M or L, d*k)
conv2 = nn.Conv2d(1, c, (d*k, 1))
I find that the authors of the paper are trusting the readers to assume/figure out a lot of things here. From what I read, here is what I could gather:
W1 should be a 1 X dk matrix because that is the only shape that would make sense in order to get Q as c X M matrix.
Assuming this, b1 need not be an matrix. From the above, you could get a c X 1 X M matrix which could be reshaped to c X M matrix easily and b1 could be a c X 1 vector which could be broadcasted and added to the rest of the matrix.
Since, c, d and k are hyper parameters, you could easily have the same W1 and b1 for both Q and A.
This is what I think so far, I will re read and edit in case anythings amiss.
Related
I would like to solve problems combining boolean and integer logic in linear arithmetic with a SAT/SMT solver. At first glance, Z3 seems promising.
First of all, is it at all possible to solve the following problem? This answer makes it seem like it works.
int x,y,z
boolean a,b,c
( (3x + y - 2z >= 10) OR (A AND (NOT B OR C)) OR ((A == C) AND (x + y >= 5)) )
If so, how does Z3 solve this kind of problem in theory and is there any documentation about it?
I could think of two ways to solve this problem. One would be to convert the Boolean operations into a linear integer expression. Another solution I read about is to use the Nelson-Oppen Combination Method described in [Kro 08].
I found a corresponding documentation in chapter 3.2.2. Solving Arithmetical Fragments, Table 1 a listing of the implemented algorithms for a certain logic.
Yes, SMT solvers are quite good at solving problems of this sort. Your problem can be expressed using z3's Python interface like this:
from z3 import *
x, y, z = Ints('x y z')
A, B, C = Bools('A B C')
solve (Or(3*x + y - 2*z >= 10
, And(A, Or(Not(B), C))
, And(A == C, x + y >= 5)))
This prints:
[A = True, z = 3, y = 0, B = True, C = True, x = 5]
giving you a (not necessarily "the") model that satisfies your constraints.
SMT solvers can deal with integers, machine words (i.e., bit-vectors), reals, along with many other data types, and there are efficient procedures for combinations of linear-integer-arithmetic, booleans, uninterpreted-functions, bit-vectors amongst many others.
See http://smtlib.cs.uiowa.edu for many resources on SMT solving, including references to other work. Any given solver (i.e., z3, yices, cvc etc.) will be a collection of various algorithms, heuristics and tactics. It's hard to compare them directly as each shine in their own way for certain sublogics, but for the base set of linear-integer arithmetic, booleans, and bit-vectors, they should all perform fairly well. Looks like you already found some good references, so you can do further reading as necessary; though for most end users it's neither necessary nor that important to know how an SMT solver internally works.
I'm puzzled by what I think is a mistake in a partial derivative I'm having Mathematica do for me.
Specifically, this is what I have:
Derivative I'd like to take
I'm trying to take the partial derivative of the following w.r.t. the variable θ (apologies for the formatting):
f=(1/4)(-4e((1+θ)/2)ψ+eN((1+θ)/2)ψ+eN((1+θ)/2-θd)ψ)-s
But the solution Mathematica produces seems very different from the one I get when I take the derivative myself. While Mathematica says the partial derivative of f w.r.t. θ is:
(1/4)eψ(N-2)
By hand, I get and am quite confident the correct answer is instead:
(1/4)eψ(N(1-d)-2)
That is, Mathematica is producing something that drops the variable d when it is differentiating. I've explored different functions that take a derivative in Mathematica, and the possibility that maybe some of the variables I'm using (such as d) might be protected or otherwise special, but I can't say that I know why the answer's so off. This is the first time in the notebook that d appears, so it is not set to 0. For context, I'm trying to confirm that the derivative of the function is positive for values of the variables in certain ranges, and we have d>0 and d<(1/2). Doing this all by hand works but I'm trying to confirm with Mathematica as I will be dealing with more complicated functions and need to make sure I'm having Mathematica produce the right derivatives.
Your didn't add spaces in eN and θd, so it thinks they're some other 2-character variables.
Adding spaces between them gives your expected result:
f[θ,e,N,ψ,d,s] = (1/4) (-4 e ((1+θ)/2) ψ + e N ((1+θ)/2) ψ + e N ((1+θ)/2 - θ d) ψ) - s;
D[f[θ, e, N, ψ, d, s], θ] // FullSimplify
(* 1/4 e (-2 + N - d N) ψ *)
I'm doing matrix math in Go using mat64. I have a matrix equation I want to solve, something like: (a * b + c) / (d - e) where a, b, c, d, and e are all matrices with real numbers as elements.
mat64 implements matrix math functions as methods. So, if you wanted to multiply matrix a by b, you'd do something like:
// Multiply a by b:
new := mat64.NewDense(x, y, nil)
new.Mul(a, b)
However, this method becomes unwieldy when you're looking at more complex equations with a whole bunch of steps such as my example above.
So, is there any way to invoke these routines (or methods in Go in general) without using receivers, forcing me to create a boatload of temporary matrices in order to solve a more complex equation, or am I stuck doing this the ugly way?
I have a question regarding an algorithm:
We have a fixed point in 2D space let's call it S(x,y) and the length of two links joining (L1 and L2). These two links are connected at a common joint called E(x,y). And we have another point in the space which is end point of the L2 which we call F(x,y).
So we L1 have two end points S and E where as L2 has E and F.
When we are given a point P(x,y) in space. How can we find the coordinate of F(x,y) which is closest to P? I wanted to find the angle of θ1 and θ2 which takes the links L1 and L2 to that point?
See this link to get the graphical representation of my problem
See this pic http://postimage.org/image/qlekcv1qz/, where you will be able to see the real problem I have right now.
So I have formulated this as optimization problem. Where the Objective function is:
* arg min |P-F|
with constraints θ1 and θ2 where θ1 ∈ [ O , π] and θ2 ∈ [ O , π/2].
So we have,
* xE = xS + L1 * Cosθ1 and yE = yS + L1 * Sinθ1
* xF = xE + L2 * Cos (θ1 + θ2 ) and yF = yE + L2 * sin ( θ1 + θ2)
Here we have length of L1 = 105 and L2 = 113.7 and Point S is the origin i.e xS = O and yS = O.
Can you give a hint how code up my function or any optimization problem which gives me the values of θ1 and θ2, such that the distance between Point F and point P is minimized.
So if I understand correctly, your description is equivalent of having two rigid rods of length L1 and L2, with one end of L1 fixed at S, the other end connected to L2 by a flexible joint (at some undefined point E), and you want to get the other end of L2 (point F) as close to some point P as possible. If this is the case then:
If |L1-L2| < |P-S| < |L1+L2| then F = P
If |L1-L2| > |P-S| then F = S + (P-S)*|L1-L2|/|P-S|
If |P-S| > |L1+L2| then F = S + (P-S)*|L1+L2|/|P-S|
Is that what you want?
See imnage
http://postimage.org/image/l1ktt0qtb/
If point P is closer to point S than the distance |L1-L2| (assuming they are unequal), then point F cannot 'reach' point P, even with the angle at E bent to 180 ndegrees. Then the closest you can get is somewhere on the the circle with radius |L1-L2| and centre S. In this case the best F is given by the vector with direction (P-S), and magnitude |L1-L2|, my case 2 above and Figure A below. Note that if L1=L2 this will never be the case.
If point P is further from point S than the distance |L1+L2|, then point F cannot 'reach' point P, even with the angle at E straightened to 0 degrees. Then the closest you can get is aomewhere on the the circle with radius |L1+L2| and centre S. In this case the best F is given by the vector with direction (P-S), and magnitude |L1+L2|, my case 3 above and Figure B below.
If point P is betwen the two limiting circles, then there will be two solutions (one as shown in Figure 3 below, and the other with L1 and L2 reflected in the mirror line formewd by the vector P-S. In this case the 'best' F is equal to point P.
If you want to know the angles Theta1 and Theta 2 then that is a different question (I see you have added that now).
Use the cosine rule for triangles with no right angle.
The rule is
C = acos[(a^2 + b^2 - c^2)/(2ab)]
where a triangle has sides of length a,b, and c, and C is the angle between sides a and b. You are trying to produce a triangle with sides l1, l2, and d=|S-P|, which will be possible so as long as no two of the lengths are shorter (in sum) than the third one.
By substituting l1, l2, and d for a,b, anc c appropriately you will be able to solve for each of the internal angles, A, B, and C. Then you can use these angles A,B,C plus the angle between the vector P-S and horizontal (call that D perhaps?) to calculate your theta1 and theta2.
I hope this hasn't been asked before, if so I apologize.
EDIT: For clarity, the following notation will be used: boldface uppercase for matrices, boldface lowercase for vectors, and italics for scalars.
Suppose x0 is a vector, A and B are matrix functions, and f is a vector function.
I'm looking for the best way to do the following iteration scheme in Mathematica:
A0 = A(x0), B0=B(x0), f0 = f(x0)
x1 = Inverse(A0)(B0.x0 + f0)
A1 = A(x1), B1=B(x1), f1 = f(x1)
x2 = Inverse(A1)(B1.x1 + f1)
...
I know that a for-loop can do the trick, but I'm not quite familiar with Mathematica, and I'm concerned that this is the most efficient way to do it. This is a justified concern as I would like to define a function u(N):=xNand use it in further calculations.
I guess my questions are:
What's the most efficient way to program the scheme?
Is RecurrenceTable a way to go?
EDIT
It was a bit more complicated than I tought. I'm providing more details in order to obtain a more thorough response.
Before doing the recurrence, I'm having problems understanding how to program the functions A, B and f.
Matrices A and B are functions of the time step dt = 1/T and the space step dx = 1/M, where T and M are the number of points in the {0 < x < 1, 0 < t} region. This is also true for vector the function f.
The dependance of A, B and f on x is rather tricky:
A and B are upper and lower triangular matrices (like a tridiagonal matrix; I suppose we can call them multidiagonal), with defined constant values on their diagonals.
Given a point 0 < xs < 1, I need to determine it's representative xn in the mesh (the closest), and then substitute the nth row of A and B with the function v( x) (transposed, of course), and the nth row of f with the function w( x).
Summarizing, A = A(dt, dx, xs, x). The same is true for B and f.
Then I need do the loop mentioned above, to define u( x) = step[T].
Hope I've explained myself.
I'm not sure if it's the best method, but I'd just use plain old memoization. You can represent an individual step as
xstep[x_] := Inverse[A[x]](B[x].x + f[x])
and then
u[0] = x0
u[n_] := u[n] = xstep[u[n-1]]
If you know how many values you need in advance, and it's advantageous to precompute them all for some reason (e.g. you want to open a file, use its contents to calculate xN, and then free the memory), you could use NestList. Instead of the previous two lines, you'd do
xlist = NestList[xstep, x0, 10];
u[n_] := xlist[[n]]
This will break if n > 10, of course (obviously, change 10 to suit your actual requirements).
Of course, it may be worth looking at your specific functions to see if you can make some algebraic simplifications.
I would probably write a function that accepts A0, B0, x0, and f0, and then returns A1, B1, x1, and f1 - say
step[A0_?MatrixQ, B0_?MatrixQ, x0_?VectorQ, f0_?VectorQ] := Module[...]
I would then Nest that function. It's hard to be more precise without more precise information.
Also, if your procedure is numerical, then you certainly don't want to compute Inverse[A0], as this is not a numerically stable operation. Rather, you should write
A0.x1 == B0.x0+f0
and then use a numerically stable solver to find x1. Of course, Mathematica's LinearSolve provides such an algorithm.