How to vectorize these two nested loops? - for-loop

Vectorize the loop without use of any for loop to get v:
v = zeros(10, 1)
for i = 1:10
for j = 1:10
v(i) = v(i) + A(i, j) * x(j)
end
end
A is a 10x10 matrix and x a 10x1 vector
I have been trying but was not able to figure out the right answer:
v = A * x;
v = Ax;
v = x' * A;
v = sum (A * x);

Proceed step by step, starting with the inner loop. The inner loop computes a dot product between A(i,:) and x(:). In Octave notation it can be expressed by a simple multiplication: v(i) = A(i,:)*x(:). So we are left with only one loop:
v = zeros(10, 1)
for i = 1:10
v(i) = A(i,:)*x(:)
end
Each iteration computes the ith element of v as the dot product between the ith row of A with x: we recognize here the classical matrix vector multiplication
v(:) = A(:,:)*x(:)
And since there are no more explicit indeces, all the : can be omitted (but as mentioned by #ChrisLuengo in the comments, it can be wise to keep it for x, as x(:) is always a column vector, even if x has been defined as a "row vector" (i.e. as a 1x10 matrix))
v = A*x

Related

Triple loop into tensor matlab

I am aware of similar questions, but couldn't find a useful one for my problem
Is there a way to speed up my code by transforming this triple loop into matrix/tensor operations?
% preallocate
X_tensor = zeros(N, N, N);
% loop
for i = 1: N
for k = 1: N
for j = 1:N
X_tensor(i,k,j) = X(i,j) * B(i,k) * Btilde(k,j) / B(i,j);
end
end
end
EDIT
I am sorry, I forgot one important information:
X, B and Btilde are all NxN matrices
You can vectorize the operation by permuting the matrices and using element-wise multiplication/division.
i, k, and j are dimensions [1,2,3]. Looking at the first term in the equation, that means we want to take the [1,2,3] order of X (where the third dimension is a singleton) and rearrange it to [1,3,2]. This corresponds to an index of [i,j,k]. B in the second term is already in the required [1,2,3] order.
So the loops can be replaced by:
X_tensor2 = permute(X,[1,3,2]) .* B .* permute(Btilde,[3,1,2]) ./ permute(B,[1,3,2])
Here's a test program to verify correctness:
N = 5;
P = primes(400);
X = reshape(P(1:25),N,N);
B = reshape(P(26:50),N,N);
Btilde = reshape(P(51:75),N,N);
% preallocate
X_tensor = zeros(N, N, N);
X_tensor2 = zeros(N, N, N);
% loop
for i = 1: N
for k = 1: N
for j = 1:N
X_tensor(i,k,j) = X(i,j) * B(i,k) * Btilde(k,j) / B(i,j);
end
end
end
X_tensor2 = permute(X,[1,3,2]) .* B .* permute(Btilde,[3,1,2]) ./ permute(B,[1,3,2]);
isequal(X_tensor, X_tensor2)

Find the value of f(T) for big value T

I am trying to solve a problem which is described below,
Given value of f(0) and k , which are integers.
I need to find value of f( T ). where T<=1010
Recursive function is,
f(n) = 2*f(n-1) , if 4*f(n-1) <=k
k - ( 2*f(n-1) ) , if 4*f(n-1) > k
My efforts,
#include<iostream>
using namespace std;
int main(){
long k,f0,i;
cin>>k>>f0;
long operation ;
cin>>operation;
long answer=f0;
for(i=1;i<=operation;i++){
answer=(4*answer <= k )?(2*answer):(k-(2*answer));
}
cout<<answer;
return 0;
}
My code gives me right answer. But, The code will run 1010 time in worst case that gives me Time Limit Exceed. I need more efficient solution for this problem. Please help me. I don't know the correct algorithm.
If 2f(0) < k then you can compute this function in O(log n) time (using exponentiation by squaring modulo k).
r = f(0) * 2^n mod k
return 2 * r >= k ? k - r : r
You can prove this by induction. The induction hypothesis is that 0 <= f(n) < k/2, and that the above code fragment computes f(n).
Here's a Python program which checks random test cases, comparing a naive implementation (f) with an optimized one (g).
def f(n, k, z):
r = z
for _ in xrange(n):
if 4*r <= k:
r = 2 * r
else:
r = k - 2 * r
return r
def g(n, k, z):
r = (z * pow(2, n, k)) % k
if 2 * r >= k:
r = k - r
return r
import random
errs = 0
while errs < 20:
k = random.randrange(100, 10000000)
n = random.randrange(100000)
z = random.randrange(k//2)
a1 = f(n, k, z)
a2 = g(n, k, z)
if a1 != a2:
print n, k, z, a1, a2
errs += 1
print '.',
Can you use methmetical solution before progamming and compulating?
Actually,
f(n) = f0*2^(n-1) , if f(n-1)*4 <= k
k - f0*2^(n-1) , if f(n-1)*4 > k
thus, your code will write like this:
condition = f0*pow(2, operation-2)
answer = condition*4 =< k? condition*2: k - condition*2
For a simple loop, your answer looks pretty tight; one could optimise a little bit using answer<<2 instead of 4*answer, and answer<<1 for 2*answer, but quite possibly your compiler is already doing that. If you're blowing the time with this, it might be necessary to reduce the loop itself somehow.
I can't figure out a mathematical pattern that #Shannon was going for, but I'm thinking we could exploit the fact that this function will sooner or later cycle. If the cycle is short enough, then we could short the loop by just getting the answer at the same point in the cycle.
So let's get some cycle detection equipment in the form of Brent's algorithm, and see if we can cut the loop to reasonable levels.
def brent(f, x0):
# main phase: search successive powers of two
power = lam = 1
tortoise = x0
hare = f(x0) # f(x0) is the element/node next to x0.
while tortoise != hare:
if power == lam: # time to start a new power of two?
tortoise = hare
power *= 2
lam = 0
hare = f(hare)
lam += 1
# Find the position of the first repetition of length λ
mu = 0
tortoise = hare = x0
for i in range(lam):
# range(lam) produces a list with the values 0, 1, ... , lam-1
hare = f(hare)
# The distance between the hare and tortoise is now λ.
# Next, the hare and tortoise move at same speed until they agree
while tortoise != hare:
tortoise = f(tortoise)
hare = f(hare)
mu += 1
return lam, mu
f0 = 2
k = 198779
t = 10000000000
def f(x):
if 4 * x <= k:
return 2 * x
else:
return k - 2 * x
lam, mu = brent(f, f0)
t2 = t
if t >= mu + lam: # if T is past the cycle's first loop,
t2 = (t - mu) % lam + mu # find the equivalent place in the first loop
x = f0
for i in range(t2):
x = f(x)
print("Cycle start: %d; length: %d" % (mu, lam))
print("Equivalent result at index: %d" % t2)
print("Loop iterations skipped: %d" % (t - t2))
print("Result: %d" % x)
As opposed to the other proposed answers, this approach actually could use a memo array to speed up the process, since the start of the function is actually calculated multiple times (in particular, inside brent), or it may be irrelevant, depending on how big the cycle happens to be.
The algorithm you proposed already has O(n).
To come up with more efficient algorithms, there is not that much direction we can go about. Some typical options we have
1.Decease the coefficients of the linear term( but I doubt it would make a difference in this case
2.Change to O(Logn)(typically use some sort of divide and conquer technique)
3.Change to O(1)
In this case, we can do the last one.
The recursion function is a piece-wise function
f(n) = 2*f(n-1) , if 4*f(n-1) <=k
k - ( 2*f(n-1) ) , if 4*f(n-1) > k
Let's tackle it by case:
case 1: if 4*f(n-1) <= k (1)(assuming the starting index is zero)
this is a obvious a geometry series
a_n = 2*a_n-1
Therefore, have the formula
Sn = 2^(n-1)f(0) ----()
Case 2: if 4*f(n-1) > k (2), we have
a_n = -2a_n-1 + k
Assuming, a_j is the element in the sequence which just satisfy condition (2)
Nestedly sub in an_1 to the formula, you will obtain the equation
an = k -2k +4k -8k... +(-2)^(n-j)* a_j
k -2k 4k -8... is another gemo series
Sn = k*(1-2^(n-j))/(1-2) ---gemo series sum formula with starting value k and ratio = -2
Therefore, we have a formula for an in the case 2
an = k * (1-2^(n-j))/(1-2) + (-2)^(n-j) * a_j ----(**)
All we left to do it to find aj which just dissatisfy condition (1) and satisfy (2)
This can be obtained in constant time again using the formula we have for case 1:
find n such that, 4*an = 4*Sn = 4*2^(n-1)*f(0)
solve for n: 4*2^(n-1)*f(0) = k, if n is not integer, take ceiling of n
In my first attempt to solve this question, I had wrong assumption that the value of the sequence is monotonically increasing but in fact the sequence might jump between case 1 and case 2. Therefore, there might not be constant algorithm to solve the problem.
However, we can use utilize the result above to skip iterative update complexity.
The overall algorithm will look something like:
start with T, K, and f(0)
compute n that make the condition switch using either (*) or (**)
update f(0) with f(n), update T - n
repeat
terminate when T-n = 0(the last iteration might over compute causing T-n<0, therefore, you need to go back a little bit if that happen)
Create a map that can store your results. Before finding f(n) check in that map, if solution is already existed or not.
If exists, use that solution.
Otherwise find it, store it for future use.
For C++:
Definition:
map<long,long>result;
Insertion:
result[key]=value
Accessing:
value=result[key];
Checking:
map<long,long>::iterator it=result.find(key);
if(it==result.end())
{
//key was not found, find the solution and insert into result
}
else
{
return result[key];
}
Use above technique for better solution.

How to find a unique set of closest pairs of points?

A and B are sets of m and n points respectively, where m<=n. I want to find a set of m unique points from B, named C, where the sum of distances between all [A(i), C(i)] pairs is the minimal.
To solve this without uniqueness constraint I can just find closest points from B to each point in A:
m = 5; n = 8; dim = 2;
A = rand(m, dim);
B = rand(n, dim);
D = pdist2(A, B);
[~, I] = min(D, [], 2);
C2 = B(I, :);
Where there may be repeated elements of B present in C. Now the first solution is brute-force search:
minSumD = inf;
allCombs = nchoosek(1:n, m);
for i = 1:size(allCombs, 1)
allPerms = perms(allCombs(i, :));
for j = 1:size(allPerms, 1)
ind = sub2ind([m n], 1:m, allPerms(j, :));
sumD = sum(D(ind));
if sumD<minSumD
minSumD = sumD;
I = allPerms(j, :);
end
end
end
C = B(I, :);
I think C2 (set of closest points to each A(i)) is pretty much alike C except for its repeated points. So how can I decrease the computation time?
Use a variant of the Hungarian algorithm, which computes a minimum/maximum weight perfect matching. Create n-m dummy points for the unused B points to match with (or, if you're willing to put in more effort, adapt the Hungarian algorithm machinery to non square matrices).

Recursive division algorithm for two n bit numbers

In the below division algorithm, I am not able to understand why multiplying q and r by two works and also why r is incremented if x is odd.
Please give a theoretical justification of this recursive division algorithm.
Thanks in advance.
function divide(x, y)
if x = 0:
return (q, r) = (0, 0)
(q, r) = divide(floor(x/2), y)
q = 2q, r = 2r
if x is odd:
r = r + 1
if r ≥ y:
r = r − y, q = q + 1
return (q, r)
Let's assume you want to divide x by y, i.e. represent x = Q * y + R
Let's assume that x is even. You recursively divide x / 2 by y and get your desired representation for a smaller case: x / 2 = q * y + r.
By multiplying it by two, you would get: x = 2q * y + 2r. Looking at the representation you wanted to get for x in the first place, you see that you have found it! Let Q = 2q and R = 2r and you found the desired Q and R.
If x is odd, you again first get the desired representation for a smaller case: (x - 1) / 2 = q * y + r, multiply it by two: x - 1 = 2q * y + 2r, and send 1 to the right: x = 2q * y + 2r + 1. Again, you have found Q and R you wanted: Q = 2q, R = 2r + 1.
The final part of the algorithm is just normalization so that r < y. r can become bigger than y when you perform multiplication by two.
Algorithm PuzzleSolve(k,S,U) :
Input: An integer k, sequence S, and set U
Output: An enumeration of all k-length extensions to S using elements in U without repetitions
for each e in U do
Add e to the end of S
Remove e from U /e is now being used/
if k == 1 then
Test whether S is a configuration that solves the puzzle
if S solves the puzzle then
return "Solution found: " S
else
PuzzleSolve(k-1,S,U) /a recursive call/
Remove e from the end of S
Add e back to U e is now considered as unused
This algorithm enumerates every possible size-k ordered subset of U, and tests each subset for being
a possible solution to our puzzle. For summation puzzles, U = 0,1,2,3,4,5,6,7,8,9 and each position
in the sequence corresponds to a given letter. For example, the first position could stand for b, the
second for o, the third for y, and so on.

Number of steps necessary in order to complete an algorithm

So guys, I've already asked a question about how to develop an algorithm here.
The reviewed code looks like this: (note that I've put the elements in the vector L all equal in order to maximize the iterations of the program)
L = [2 2 2 2 2 2 2 2 2];
N = 3;
sumToN = [0 0];
Ret = [0 0];
k = 0;
for i=1:numel(L)-1;
for j=i+1:numel(L);
if L(i)+L(j) == N
sumToN = [L(i) L(j)];
display(sumToN);
return
end
k=k+1
end
end
display(sumToN);
The k variable is used to keep count of the iterations. The function that counts the number of steps of the algorithm is (1/2)(x-1)x, with x being equal to the number of elements in the vector L. The problem is that the exercise asks me to ensure that the algorithm completes in at most c*numel(L) for some positive constant c that does not depend on L. Moreover, I need to explain why this implementation completes in at most c*length steps.
How can I do it?
There is a contradiction in your statements: You say that your algorithm complete in x * (x - 1) / 2 (x = numel(L)), and you want to prove that your algorithm completes in c * x (where c is a constant). This is not possible!
Let's assume there is c1 such as x * (x - 1) / 2 <= c1 * x, it means that x must be less than 2 * c1 + 1, so if I take x = 3 * c1, the inequation is not true anymore, so there is no c such as x * (x - 1) / 2 <= c * x for all x.
Here is an algorithm that works in O(x) with a sorted array (from your previous question):
i = 1
j = length (L)
while i < j
if L(i) + L(j) == N
sumToN = [L(i) L(j)];
break
elseif L(i) + L(j) < N
i = i + 1;
elseif L(i) + L(j) > N
j = j - 1;
end
end
Basically, you start with the first (smallest) value and the last (larger) value, and you move towards the middle of the array L as long as your two indexes do not cross.
Another way i think you could only have a single for, to get that condition you're talking about, would be to process a bit like this :
for each value of L, check if there is the value (L-N) in your list L (use a command to find a value in your list, that would return the position in the list)
If the value exist, put that pair of position in your new table.
You should be able to get the same result with a single for.

Resources