Since I couldn't find anything about this subject, I decided to ask this question here. I am completely beginner and this question might be ridiculous.
Suppose we have A(NxN) matrix and a column vector (B(Nx1)). We have also a function f(i,j) that returns the element of the matrix A at row i and column j.
If we want to do some matrix operation, say, matrix product of A and B we can use: the following (below, C is the result of the matrix product) :
using the function f(i,j):
N = 100000
def f(i, j):
return i + j
for i in range(N):
for j in range(N):
s = 0
for k in range(N):
s += f(i, k) * B[k]
C[i] = s
Using Matrix A (NxN) (suppose that A is already defined and contains the same elements returned by the function f)
N = 100000
for i in range(N):
for j in range(N):
s = 0
for k in range(N):
s += A[i,k]*B[k]
C[i] = s
In my opinion, the advantage of the function is that it avoids storing all the values of the matrix thus saving the memory.
My questrions are:
In this case which is the most efficient way to do matrix multiplication (using the function or the matrix itself)?
Is there any performance difference between the two approaches?
EDIT: My question is not specific to Python or any other particular language.
This honestly has no right answer since it depends on what you're willing to sacrifice and also on the language being used.
Regardless, the main difference would be the function method would take more time than the matrix method, and the matrix method would take more space(obviously?).
Using time to save memory is generally not a good idea as we have an abundance of memory and a lot less of time.
I ran these in python with
N=10 and got Function 0.015623331069946289, Matrix 0.0
N=100 and got Function 1.0839078426361084, Matrix 0.8769278526306152
~Currently running N=1000~
Anything larger and I'll have to switch to Numpy.
Here's the code I used to time it if anyone wants to try it out.
import time
n = 1000
def f(i, j):
return i+j
A = [[i+j for j in range(n)] for i in range(n)]
B = [i for i in range(n)]
C = [0 for _ in range(n)]
start1 = time.time()
for i in range(n):
for j in range(n):
s = 0
for k in range(n):
s += f(i, k) * B[k]
C[i] = s
end1 = time.time()
start2 = time.time()
for i in range(n):
for j in range(n):
s = 0
for k in range(n):
s += A[i][k]*B[k]
C[i] = s
end2 = time.time()
print("Function-", end1-start1, ", Matrix-", end2-start2)
Of course this approach assumes, as stated in your question, that the matrix is already set up, since that takes a significant amount of time to do too.
EDIT: Ran for N=1000, got Function 620.2477366924286, Matrix 478.4342918395996
As you can see, The larger it is the better time you'd receive with the matrix method
You should not care about performance questions until you get performance issues. In 99.99% of cases - any approach will work for you. Code should be readable at first, and then performant About premature optimization
In your concrete sample - code with function should be slower (just because of additional function call) or may have equal performance (if compiler will inline it). BTW - see #1 - you should not care and write readable code at first
If you really need performant code - there are number of libraries for that (e.g. NumPy). Libraries will usually work faster. Some approaches may even delegate calculation to the GPU
Also see matrix multiplication performance
Related
Ok another ridiculous daily code problem with an absurd answer
Q:
Given a stream of elements too large to store in memory, pick a random element from the stream with uniform probability
A:
For the base case where i = 0, let’s say the random element is the first one. Then we know it works because
For i = 0, we would’ve picked uniformly from [0, 0].
For i > 0, before the loop began, any element K in [0, i - 1] had 1 / i chance
of being chosen as the random element. We want K to have 1 / (i + 1) chance
of being chosen after the iteration. This is the case since the chance of having
being chosen already but not getting swapped with the ith element is
1 / i * (1 - (1 / (i + 1))) which is 1 / i * i / (i + 1) or 1 / (i + 1)
The code:
import random
def pick(big_stream):
random_element = None
for i, e in enumerate(big_stream):
if i == 0:
random_element = e
elif random.randint(1, i + 1) == 1:
random_element = e
return random_element
So the 1/(k+1) if you're still in a loop it's just 1/k this elif 1/(k+1) with the I+1 seems artificial and wouldn't affect the O(n) time and instead make it O(n+1) which is the same as O(n).
What really is meant by this question? This algorithm seems really superficial, is there any suggestions that really can beat O(n)? This programming language looks perlesque but it's not what language would be close to it? Is there a more optimal (more specific) language for this?
The programming language (once the indentation is fixed) is most definitely Python.
What really is meant by this question? This algorithm seems really superficial, is there any suggestions that really can beat O(n)?
This question is mostly about space, not runtime. The algorithm they provide runs in constant space, which was the point. The naive answer would use an enormous amount of space.
There is a simpler (not necessarily faster) algorithm however, which can also be used to sample k elements from n elements in O(n log k) time and O(k) space. It works like this:
Assign an uniformly random real value in [0, 1] to each element from the stream as you receive it. Using a min-heap, keep track of the k smallest random values and their associated elements. Once the stream is fully processed return the elements that remain in the heap.
Which for k = 1 simply becomes:
Assign an uniformly random real value in [0, 1] to each element and return the element with the smallest random value.
This is an implementation of the algorithm described by #orlp for k=1:
import random
def pick(big_stream):
choice = None
minp = 1.0
for e in big_stream:
p = random.random()
if p >= minp:
continue
minp = p
choice = e
return choice
I am beginning to study Computational Logic, and as an exercise, I want to prove the correctness of merge sort algorithm.
Currently, I’m having difficulties to prove that the output of this algorithm will always correspond to a permutation of a given input.
I’d be very glad if someone can assist me with this.
Thank you very much 😄
The core of this proof will need to show that the "merge" procedure inserts each element once and only once into the result. Since the merge procedure works using a loop, you need to use a loop invariant to show this.
Loop invariants can usually be discovered by asking, "what do I know halfway through the loop?"
to merge arrays A and B:
let n = length of A, m = length of B
let R = new array of length (n + m)
let i = 0, j = 0
while i < n or j < m:
if i < n and (j == m or A[i] <= B[j]):
R[i+j] = A[i]
i = i + 1
else:
R[i+j] = B[j]
j = j + 1
return R
In this loop, we always know that the first i+j elements of R are some permutation of the first i elements of A and the first j elements of B. That's the loop invariant, so you need to show that:
This is true before the loop starts (when i = j = 0).
If this is true before an iteration of the loop, then it remains true after that iteration, i.e. the invariant is preserved.
If this is true when the loop terminates (when i = m, j = n), then the array R has the required property.
In general, the hard parts of a proof like this are discovering the loop invariant, and showing that the invariant is preserved by each iteration of the loop.
What are the preconditions of a merge sort? What are the post conditions? Do you have any loop invariants?
These are the three questions you have to ask yourself before you can start writing your proof.
Then: what are your base cases? Presumably you know how merge sort works if you are working on a proof, so what happens when you have an array of length 1 passed to the mergesort function? What is the postcondition there?
Here's a decent primer from Berkeley on how to prove the correctness of a function. It might take some discrete math (induction) to write the proof.
How can a sparse matrix - matrix product be calculated? I know the 'classic' / mathematical way of doing it, but it seems pretty inefficient. Can it be improved?
I thought about storing the first matrix in CSR form and the second one in CSC form, so since the row and column vectors are sorted I won't have to search for a specific row / column I need, but I guess that doesn't help much.
With the disclaimers that (i) you really don't want to implement your own sparse matrix package and (ii) if you need to anyway, you should read Tim Davis's book on sparse linear algebra, here's how to do a sparse matrix multiply.
The usual naive dense multiply looks like this.
C = 0
for i {
for j {
for k {
C(i, j) = C(i, j) + (A(i, k) * B(k, j))
}
}
}
Since addition commutes, we can permute the loop indices any way we like. Let's put j outermost and i innermost.
C = 0
for j {
for k {
for i {
C(i, j) = C(i, j) + (A(i, k) * B(k, j))
}
}
}
Store all matrices in CSC form. Since j is outermost, we're working column-at-a-time on B and C (but not A). The middle loop is over k, which is rows of B, and, conveniently enough, we don't need to visit the entries of B that are zero. That makes the outer two loops go over the nonzero entries of B in the natural order. The inner loop increments the jth column of C by the kth column of A times B(k, j). To make this easy, we store the current column of C densely, together with the set of indexes where this column is nonzero, as a list/dense Boolean array. We avoid writing all of C or the Boolean array via the usual implicit initialization tricks.
Does the following algorithm to find all possible ways of making changes for a particular sum really use memoization?
func count( n, m )
for i from 0 to n
for j from 0 to m
if i equals 0
table[i,j] = 1
else if j equals 0
table [i,j] = 0
else if S_j greater than i
table[ i, j ] = table[ i, j - 1 ]
else
table[ i, j ] = table[ i - S_j, j ] + table[ i, j - 1 ]
return table[ n, m ]
Each time the function count is called, it starts filling the table from scratch. Even if the table's already been initialized for certain values, the next time count is called, it won't use these values, but will start again from i = 0 and j = 0.
This is not Memoization. This is an example for Dynamic Programming code.
In order to analyze your code, first we need to distinguish between Memoization and Dynamic Programming.
Memoization is a Top Down approach, where as Dynamic Programming is a Bottom Up approach.
Consider the problem of finding the factorial of a number n.
If you are finding n! by using the following facts,
n! = n * (n-1)! and 0!=1
this is an example for top down approach.
The value of n is kept in memory until the values of 0! to (n-1)! are returned. The disadvantage is that you waste a lot of stack memory. The advantage is that you don't have to recalculate sub problems if they are already solved. The solutions to sub problems are stored in memory.
But in your problem you don't have a top down approach, hence no memoization.
Every entry in the table is directly obtained from previously calculated sub problem solutions. There for it uses a bottom up approach. Hence you have a piece of code which uses dynamic programming.
Given n integers, arranged in a circle, show an efficient algorithm that can find one peak. A peak is a number that is not less than the two numbers next to it.
One way is to go through all the integers and check each one to see whether it is a peak. That yields O(n) time. It seems like there should be some way to divide and conquer to be more efficient though.
EDIT
Well, Keith Randall proved me wrong. :)
Here's Keith's solution implemented in Python:
def findPeak(aBase):
N = len(aBase)
def a(i): return aBase[i % N]
i = 0
j = N / 3
k = (2 * N) / 3
if a(j) >= a(i) and a(j) >= a(k)
lo, candidate, hi = i, j, k
elif a(k) >= a(j) and a(k) >= a(i):
lo, candidate, hi = j, k, i + N
else:
lo, candidate, hi = k, i + N, j + N
# Loop invariants:
# a(lo) <= a(candidate)
# a(hi) <= a(candidate)
while lo < candidate - 1 or candidate < hi - 1:
checkRight = True
if lo < candidate - 1:
mid = (lo + candidate) / 2
if a(mid) >= a(candidate):
hi = candidate
candidate = mid
checkRight = False
else:
lo = mid
if checkRight and candidate < hi - 1:
mid = (candidate + hi) / 2
if a(mid) >= a(candidate):
lo = candidate
candidate = mid
else:
hi = mid
return candidate % N
Here's a recursive O(log n) algorithm.
Suppose we have an array of numbers, and we know that the middle number of that segment is no smaller than the endpoints:
A[i] <= A[m] >= A[j]
for i,j indexes into an array, and m=(i+j)/2. Examine the elements midway between the endpoints and the midpoint, i.e. those at indexes x=(3*i+j)/4 and y=(i+3*j)/4. If A[x]>=A[m], then recurse on the interval [i,m]. If A[y]>=A[m], then recurse on the interval [m,j]. Otherwise, recurse on the interval [x,y].
In every case, we maintain the invariant on the interval above. Eventually we get to an interval of size 2 which means we've found a peak (which will be A[m]).
To convert the circle to an array, take 3 equidistant samples and orient yourself so that the largest (or one tied for the largest) is in the middle of the interval and the other two points are the endpoints. The running time is O(log n) because each interval is half the size of the previous one.
I've glossed over the problem of how to round when computing the indexes, but I think you could work that out successfully.
When you say "arranged in a circle", you mean like in a circular linked list or something? From the way you describe the data set, it sounds like these integers are completely unordered, and there's no way to look at N integers and come to any kind of conclusion about any of the others. If that's the case, then the brute-force solution is the only possible one.
Edit:
Well, if you're not concerned with worst-case time, there are slightly more efficient ways to do it. The naive approach would be to look at Ni, Ni-1, and Ni+1 to see if Ni is a peak, then repeat, but you can do a little better.
While not done
If N[i] < N[i+1]
i++
Else
If N[i]>N[i-1]
Done
Else
i+=2
(Well, not quite that, because you have to deal with the case where N[i]=N[i+1]. But something very similar.)
That will at least keep you from comparing Ni to Ni+1, adding 1 to i, and then redundantly comparing Ni to Ni-1. It's a distinctly marginal gain, though. You're still marching through the numbers, but there's no way around that; jumping blindly is unhelpful, and there's no way to look ahead without taking just as long as doing the actual work would be.