I have one tensor A of dimension [a,b,c,d], and another B of dimension [b,b,d,e], and C, a list of [a] integers from 0 to b. I need to produce the tensor D of dimension [a,b,c,e] given by
D[i,j,k,l] = sum for m=0..d of A[i,C[i],k,m] * B[C[i],j,m,l]
b is small enough (3 or 5, usually?) that I don't mind doing this in b separate operations -- but I can't afford the waste by going to something that takes b^2 memory or time, when this operation clearly should be linear in b. This seems like it will be some combination of pointwise multiplies (with broadcasting?) and tensor contractions (a matrix multiply across the common m dimension), but I can't pin it down.
If someone can really convince me that this isn't possible in O(b) flops with tensorflow's provided operations, then okay, but then I'd want an O(b^2) for sure.
Update: It's looking like the appropriately modified A tensors can be built individually using tf.gather_nd; if this can then be paired up with B somehow, maybe? Unfortunately my experiments in this so far led to finding a bug in tf.gather_nd itself which has slowed things down.
I figured out how to accomplish this, reasonably efficiently. First build a modified version of B with tf.gather, with the appropriate parts in the first index:
B2 = tf.gather(B, C)
Then pull out just the relevant parts of the A tensor using tf.gather_nd. We're going to pull out a bunch of pairs of indices of the form [0,C[0]], [1,C[1]], [2,C[2]]... and so on, so first we need to build the index tensor.
a = tf.shape(A)[0]
A2_indices = tf.stack([tf.range(a), C], axis=0)
A2 = tf.gather_nd(A, A2_indices)
producing A2 with shape [a,c,d]. Now we need to multiply A2 and B2 appropriately. It's tensor contraction in the m indices (2 and 3, respectively) but pointwise multiplication in the i index (0 in both). This means that, sadly, the resulting item isn't tensor contraction or pointwise multiplication! One option would be computing the tensor product and contracting only over m, and then taking tf.diag over the two i indices -- but this would waste a lot of computation building the rest of a matrix that we don't need. Instead, we can think of this as a batched matrix multiplication: this used to be called tf.batched_matmul but now it's just matmul. This has the caveat, though, that besides the 2 matrix dimensions in each input tensor, the rest all have to be pointwise multiplies. B and B2 fail this criterion, because they have the additional j index. But, we could "wrap that in" with the l output dimension, and then remove it later. This means first calling tf.transpose to put j and l next to each other, then tf.reshape to turn into one j*l output dimension, then doing tf.matmul, then another tf.reshape and tf.transpose to return to the original form. So
a, b, d, e = B2.get_shape().as_list()
B2_trans = tf.transpose(B2, perm=[0,2,1,3])
B2_jl = tf.reshape(B2, [a,d,b*e])
product_jl = tf.matmul(A2, B2_jl)
product_trans = tf.reshape(product_jl, [a,d,b,e])
result = tf.transpose(product_trans, perm=[0,2,1,3])
Which finishes it up! Of course in practice it may well be that B is only needed in this one instance, in which case it may be that B can start out already in the "compressed" state, saving a transpose (and a cheap reshape); or if A2 is going to be flattened or transposed anyway then it could also save a transpose. But overall everything is pretty minimal complexity. :)
Related
Given that -
A = fn (B, C, D)
Where fn could be any function which may contain simple and complex calculations.
My need is to calculate the possible values of A, B, C, D at run time based on their current value (if available)
Lets take an example to understand it better. Suppose -
A = B + C * D
Now, if B=2, C=3 and D=5 then A = 17
if B=1 to 2, C=1 to 5 and D=5, then A = 6 to 27
if A=10 to 20, B=100, D=1 to 10, then A = 110 to 1020
Similarly based on possible values of B, C and D we can calculate possible values of A.
Now my need to do the same for B, C and D also i.e. if I know values of A, C and D then I should be able to tell the possible value of B - (by keeping in mind that there is not way to directly know what is B = fn2 (A, C, D) and also fn may not be just mathematical calculations.
One way I know is to pre calculate the data in database for all possible values and then filter it out based on the available values (assuming storage is not a problem).
What are other possible ways to achieve this with minimum response time?
Basically, what you want to do is to find maximum and minimum of fn, that is, to solve constraint optimization problem: first you look for a minimum of fn given constraints (ranges for B, C and D), then you minimize -fn at the same domain.
Luckily, you have only 3 variables, so this should not be a problem. But the speed of an algorithm depends on how much information of a function you have. Ideally, you should be able to calculate the Hessian, though knowing just gradient would suffice. Finally, if you don't know the gradient, you still can approximate it using finite differences.
If you don't know optimized function in advance, but know it's symbolic representation (formula) in terms of basic operations (like +, -, etc and elementary functions like exp, log, etc), you can do symbolic differentiation to obtain a formula for the gradient (and the hessian).
I'm not a specialist when it comes to optimization, but I think projected methods (like projected gradient descent, projected Newton method) will work. Also, interior point method may be useful, but I'm not familiar with it.
Assumptions have been made:
Your function is continuous.
Moreover, your function is "sane". There are instances of functions with weird geometry that are quite hard to optimize.
Your function is of real arguments. If that's not the case, most likely, for "sane" function, optimum will be somewhere in 4 int-valued neighbors of that point. Though, that's not guaranteed.
Assume that we are working with a language which stores arrays in column-major order. Assume also that we have a function which uses 2-D array as an argument, and returns it.
I'm wondering can you claim that it is (or isn't) in general beneficial to transpose this array when calling the function in order to work with column-wise operations instead of row-wise operations, or does the transposing negate the the benefits of column-wise operations?
As an example, in R I have a object of class ts named y which has dimension n x p, i.e I have p times series of length n.
I need to make some computations with y in Fortran, where I have two loops with following kind of structure:
do i = 1, n
do j= 1, p
!just an example, some row-wise operations on `y`
x(i,j) = a*y(i,j)
D = ddot(m,y(i,1:p),1,b,1)
! ...
end do
end do
As Fortran (as does R) uses column-wise storage, it would be better to make the computations with p x n array instead. So instead of
out<-.Fortran("something",y=array(y,dim(y)),x=array(0,dim(y)))
ynew<-out$out$y
x<-out$out$x
I could use
out<-.Fortran("something2",y=t(array(y,dim(y))),x=array(0,dim(y)[2:1]))
ynew<-t(out$out$y)
x<-t(out$out$x)
where Fortran subroutine something2 would be something like
do i = 1, n
do j= 1, p
!just an example, some column-wise operations on `y`
x(j,i) = a*y(j,i)
D = ddot(m,y(1:p,i),1,b,1)
! ...
end do
end do
Does the choice of approach always depend on the dimensions n and p or is it possible to say one approach is better in terms of computation speed and/or memory requirements? In my application n is usually much larger than p, which is 1 to 10 in most cases.
more of a comment, buy i wanted to put a bit of code: under old school f77 you would essentially be forced to use the second approach as
y(1:p,i)
is simply a pointer to y(1,i), with the following p values contiguous in memory.
the first construct
y(i,1:p)
is a list of values interspaced in memory, so it seems to require making a copy of the data to pass to the subroutine. I say it seems because i haven't the foggiest idea how a modern optimizing compiler deals with these things. I tend to think at best its a wash at worst this could really hurt. Imagine an array so large you need to page swap to access the whole vector.
In the end the only way to answer this is to test it yourself
----------edit
did a little testng and confirmed my hunch: passing rows y(i,1:p) does cost you vs passing columns y(1:p,i). I used a subroutine that does practically nothing to see the difference. My guess with any real subroutine the hit is negligable.
Btw (and maybe this helps understand what goes on) passing every other value in a column
y(1:p:2,i) takes longer (orders of magnitude) than passing the whole column, while passing every other value in a row cuts the time in half vs. passing a whole row.
(using gfortran 12..)
I need to represent the graph like this:
Graph = graph([Object1,Object2,Object3,Object4],
[arc(Object1,Object2,connected),
arc(Object2,Object4,connected),
arc(Object3,Object4,connected),
arc(Object1,Object3,connected),
arc(Object2,Object3,parallel),
arc(Object1,Object4,parallel),
arc(Object2,Object3,similar_size),
arc(Object1,Object4,similar_size)])
I have no restriction for code, however I'd stick to this representation as it fits all the other structures I've already coded.
What I mean is the undirected graph in which vertices are some objects and edges representing undirected relations between them. To give you more background in this particular example I'm trying to represent a rectangle, so objects are its four edges(segments). Those segments are represented in the same way with use of vertices and so on. The point is to build the hierarchy of graphs which would represent constraints between objects on the same level.
The problem lays in the representation of edges. The most obvious way to represent an arc (a,b) would be to put both (a,b) and (b,a) in the program. This however floods my program with redundant data exponentialy. For example if I have vertices a,b,c,d. I can build segments (a,b),(a,c),(a,d),(b,c),(b,d),(c,d). But I get also (b,a),(c,a), and so on. At this points its not a problem. But later I build a rectangle. It can be build of segments (a,b),(b,c),(c,d),(a,d). And I'd like to get the answer - there's one rectangle. You can calculate however how many combination of this one rectangle I get. It also take too much time to calculate and obviously I don't want to finish at the rectangle level.
I thought about sorting the elements. I can sort vertices in a segment. But if I want to sort segments in a rectangle the constraints are no longer valid. The graph becomes directed. For example taking into consideration the first two relations let's say we have arcs (a,b) and (a,c). If arcs are not sorted the program answers as I want it to: arc(b,a,connected),arc(a,c,connected) with match: Object1=b,Object2=a,Object4=c. If I sort elements it's no longer valid as I cannot have arc(b,a,connected) and arc(a,b,connected) tried out. Only the second one. I'd stick with the sorting but I have no idea how to solve this last issue.
Hopefully I stated all of this quite clearly. I'd prefer to stay as close to the representation and ideas I already have. But completely new ones are also very welcome. I don't expect any exact answer, rather poitning me in the right direction or suggesting something specific to read as I'm quite new to Prolog and maybe this problem is not as uncommon as I think.
I'm trying to solve this since yesterday and couldn't come up with any easy answer. I looked at some discrete math and common undirected graphs representation like adjacency list. Let me know if anything is unclear - I'll try to provide more details.
Interesting question although a bit broad since it is not stated what you actually want to do with the arcs, rectangles etc; a representation may be efficient (time/space/elegance) only with certain uses. In any case, here are some ideas:
Sorting
the obvious issue is the one you mentioned; you can solve it by introducing a clause that succeeds if the sorted pair exists:
arc(X,Y):-
arc_data(X,Y)
; arc_data(Y,X).
note that you should not do something like:
arc(a,b).
arc(b,c).
arc(X,Y):-
arc(Y,X)
since this will result in a infinite loop if the arc does not exist.
you could however only check if the first arg is larger than the second:
arc(a,b).
arc(b,c).
arc(X,Y):-
compare(>,X,Y),
arc(Y,X)
This approach will not resolve the multiple solutions that may arise due to having an arc represented in two ways.
The easy fix would be to only check for one solution where only one solution is expected using once/1:
3 ?- arc(X,Y).
X = a,
Y = b ;
X = b,
Y = a.
4 ?- once(arc(X,Y)).
X = a,
Y = b.
Of course you cannot do this when there could be multiple solutions.
Another approach would be to enforce further abstraction: at the moment, when you have two points (a, b) you can create the arc (arc(a,b) or arc(b,a)) after checking if those points are connected. Instead of that, you should create the arc through a predicate (that could also check if the points are connected). The benefit is that you no longer get involved in the representation of the arc directly and can thus enforce sorting (yes, it's basically object orientation):
cv_arc(X,Y,Arc):-
( arc(X,Y),
Arc = arc(X,Y))
; ( arc(Y,X),
Arc = arc(Y,X)).
(assuming as a database arc(a,b)):
6 ?- cv_arc(a,b,A).
A = arc(a, b).
7 ?- cv_arc(b,a,A).
A = arc(a, b).
8 ?- cv_arc(b,c,A).
false.
Of course you would need to follow a similar principle for the rest of the objects; I assume that you are doing something like this to find a rectangle:
rectangle(A,B,C,D):-
arc(A,B),
arc(B,C),
arc(C,D),
arc(D,A).
besides the duplicates due to the arc (which are resolved) this would recognise ABCD, DABC etc as different rectangles:
28 ?- rectangle(A,B,C,D).
A = a,
B = b,
C = c,
D = d ;
A = b,
B = c,
C = d,
D = a ;
A = c,
B = d,
C = a,
D = b ;
A = d,
B = a,
C = b,
D = c.
We will do the same again:
rectangle(rectangle(A,B,C,D)):-
cv_arc(A,B,AB),
cv_arc(B,C,BC),
compare(<,AB,BC),
cv_arc(C,D,CD),
compare(<,BC,CD),
cv_arc(D,A,DA),
compare(<,CD,DA).
and running with arc(a,b). arc(b,c). arc(c,d). arc(a,d).:
27 ?- rectangle(R).
R = rectangle(a, b, c, d) ;
false.
Note that we did not re-order the rectangle if the arcs were in the wrong order; we simply failed it. This way we avoided duplicate solutions (if we ordered them and accepted it as a valid rectangle we would have the same rectangle four times) but the time spent to find the rectangle increases. We reduced the overhead by stopping the search at the first arc that is out of order instead of creating the whole rectangle. Also, the overhead would also be reduced if the arcs are ordered (since the first match would be ordered). On the other hand, if we consider the complexity of searching for all rectangles this way, the overhead is not that significant. Also, it only applies if we want just the first rectangle; should we want to get more solutions or ensure that there are no other solutions, prolog will search the whole tree, whether it reports the solutions or not.
Given an integer range R = [a, b] (where a >=0 and b <= 100), a bias integer n in R, and some deviation b, what formula can I use to skew a random number generator towards n?
So for example if I had the numbers 1 through 10 inclusively and I don't specify a bias number, then I should in theory have equal chances of randomly drawing one of them.
But if I do give a specific bias number (say, 3), then the number generator should be drawing 3 a more frequently than the other numbers.
And if I specify a deviation of say 2 in addition to the bias number, then the number generator should be drawing from 1 through 5 a more frequently than 6 through 10.
What algorithm can I use to achieve this?
I'm using Ruby if it makes it any easier/harder.
i think the simplest route is to sample from a normal (aka gaussian) distribution with the properties you want, and then transform the result:
generate a normal value with given mean and sd
round to nearest integer
if outside given range (normal can generate values over the entire range from -infinity to -infinity), discard and repeat
if you need to generate a normal from a uniform the simplest transform is "box-muller".
there are some details you may need to worry about. in particular, box muller is limited in range (it doesn't generate extremely unlikely values, ever). so if you give a very narrow range then you will never get the full range of values. other transforms are not as limited - i'd suggest using whatever ruby provides (look for "normal" or "gaussian").
also, be careful to round the value. 2.6 to 3.4 should all become 3, for example. if you simply discard the decimal (so 3.0 to 3.999 become 3) you will be biased.
if you're really concerned with efficiency, and don't want to discard values, you can simply invent something. one way to cheat is to mix a uniform variate with the bias value (so 9/10 times generate the uniform, 1/10 times return 3, say). in some cases, where you only care about average of the sample, that can be sufficient.
For the first part "But if I do give a specific bias number (say, 3), then the number generator should be drawing 3 a more frequently than the other numbers.", a very easy solution:
def randBias(a,b,biasedNum=None, bias=0):
x = random.randint(a, b+bias)
if x<= b:
return x
else:
return biasedNum
For the second part, I would say it depends on the task. In a case where you need to generate a billion random numbers from the same distribution, I would calculate the probability of the numbers explicitly and use weighted random number generator (see Random weighted choice )
If you want an unimodal distribution (where the bias is just concentrated in one particular value of your range of number, for example, as you state 3), then the answer provided by andrew cooke is good---mostly because it allows you to fine tune the deviation very accurately.
If however you wish to make several biases---for instance you want a trimodal distribution, with the numbers a, (a+b)/2 and b more frequently than others, than you would do well to implement weighted random selection.
A simple algorithm for this was given in a recent question on StackOverflow; it's complexity is linear. Using such an algorithm, you would simply maintain a list, initial containing {a, a+1, a+2,..., b-1, b} (so of size b-a+1), and when you want to add a bias towards X, you would several copies of X to the list---depending on how much you want to bias. Then you pick a random item from the list.
If you want something more efficient, the most efficient method is called the "Alias method" which was implemented very clearly in Python by Denis Bzowy; once your array has been preprocessed, it runs in constant time (but that means that you can't update the biases anymore once you've done the preprocessing---or you would to reprocess the table).
The downside with both techniques is that unlike with the Gaussian distribution, biasing towards X, will not bias also somewhat towards X-1 and X+1. To simulate this effect you would have to do something such as
def addBias(x, L):
L = concatList(L, [x, x, x, x, x])
L = concatList(L, [x+2])
L = concatList(L, [x+1, x+1])
L = concatList(L, [x-1,x-1,x-1])
L = concatList(L, [x-2])
Given an n-by-m matrix A, with it being guaranteed that n>m=rank(A), and given a n-by-1 column v, what is the fastest way to check if [A v] has rank strictly bigger than A?
For my application, A is sparse, n is about 2^12, and m is anywhere in 1:n-1.
Comparing rank(full([A v])) takes about a second on my machine, and I need to do it tens of thousands of times, so I would be very happy to discover a quicker way.
There is no need to do repeated solves IF you can afford to do ONE computation of the null space. Just one call to null will suffice. Given a new vector V, if the dot product with V and the nullspace basis is non-zero, then V will increase the rank of the matrix. For example, suppose we have the matrix M, which of course has a rank of 2.
M = [1 1;2 2;3 1;4 2];
nullM = null(M')';
Will a new column vector [1;1;1;1] increase the rank if we appended it to M?
nullM*[1;1;1;1]
ans =
-0.0321573705742971
-0.602164651199413
Yes, since it has a non-zero projection on at least one of the basis vectors in nullM.
How about this vector:
nullM*[0;0;1;1]
ans =
1.11022302462516e-16
2.22044604925031e-16
In this case, both numbers are essentially zero, so the vector in question would not have increased the rank of M.
The point is, only a simple matrix-vector multiplication is necessary once the null space basis has been generated. If your matrix is too large (and the matrix nearly of full rank) that a call to null will fail here, then you will need to do more work. However, n = 4096 is not excessively large as long as the matrix does not have too many columns.
One alternative if null is too much is a call to svds, to find those singular vectors that are essentially zero. These will form the nullspace basis that we need.
I would use sprank for sparse matrixes. Check it out, it might be faster than any other method.
Edit : As pointed out correctly by #IanHincks, it is not the rank. I am leaving the answer here, just in case someone else will need it in the future.
Maybe you can try to solve the system A*x=v, if it has a solution that means that the rank does not increase.
x=(B\A)';
norm(A*x-B) %% if this is small then the rank does not increase