Regarding time complexity, big O notation - big-o

suppose n1,n2 > k.
Does
O(k(n1+n2-k)) = O(k(max(n1,n2)) ?
Also, does
O(n1+n2) = O(max(n1,n2)) ?
Thanks

Is the claim O(k(n1+n2-k)) = O(k(max(n1,n2)) true?
We know that k < min{n1,n2} - thus:
k(n1+n2-k) = k(max{n1,n2} + min{n1,n2} -k) > k(max{n1,n2})
So, it is pretty trivially to show that O(k(max(n1,n2)) is a subset of O(k(n1+n2-k))
We also need to show the other way around, which is also pretty easy because 2k*max{n1,n2} is in O(k(max(n1,n2)), and
k(n1+n2-k) < k(max{n1,n2} + max{n1,n2}) -k) <
< k(max{n1,n2} + max{n1,n2}))
= 2 k*max{n1,n2}
So, the claim is correct.
Does O(n1+n2) = O(max(n1,n2)) ?
This is correct. Since max{n1,n2} <= n1+n2 <= 2*max{n1,n2}, and we don't care about constants when analyzing big O notation.

Related

Is this a correct recurrence relationship I've found for the coin change challenge?

I'm trying to solve the "coin change problem" and I think I've come up with a recursive solution but I want to verify.
As a a example, let's suppose we have pennies, nickles and dimes and are trying to make change for 22 cents.
C = { 1 = penny, nickle = 5, dime = 10 }
K = 22
Then the number of ways to make change is
f(C,N) = f({1,5,10},22)
=
(# of ways to make change with 0 dimes)
+ (# of ways to make change with 1 dimes)
+ (# of ways to make change with 2 dimes)
= f(C\{dime},22-0*10) + f(C\{dime},22-1*10) + f(C\{dime},22-2*10)
= f({1,5},22) + f({1,5},12) + f({1,5},2)
and
f({1,5},22)
= f({1,5}\{nickle},22-0*5) + f({1,5}\{nickle},22-1*5) + f({1,5}\{nickle},22-2*5) + f({1,5}\{nickle},22-3*5) + f({1,5}\{nickle},22-4*5)
= f({1},22) + f({1},17) + f({1},12) + f({1},7) + f({1},2)
= 5
and so forth.
In other words, my algorithm is like
let f(C,K) be the number of ways to make change for K cents with coins C
and have the following implementation
if(C is empty or K=0)
return 0
sum = 0
m = C.PopLargest()
A = {0, 1, ..., K / m}
for(i in A)
sum += f(C,K-i*m)
return sum
If there any flaw in that?
Would be linear time, I think.
Rethink about your base cases:
1. What if K < 0 ? Then no solution exists. i.e. No of ways = 0.
2. When K = 0, so there is 1 way to make changes and which is to consider zero elements from array of coin-types.
3. When coin array is empty then No of ways = 0.
Rest of the logic is correct. But your perception that the algorithm is Linear is absolutely wrong.
Lets compute the complexity:
Popping largest element is O(C.length). However this step can be
optimised if you consider sorting the whole array in the beginning.
Your for Loop works O(K/C.max) times in every call and in every iteration it is calling the function recursively.
So if you write the recurrence for it. then it should be something like:
T(N) = O(N) + K*T(N-1)
And this is going to be exponential in terms of N (Size of array).
In case you are looking for improvement, i would suggest you to use Dynamic Programming.

Homework: Implementing Karp-Rabin; For the hash values modulo q, explain why it is a bad idea to use q as a power of 2?

I have a two-fold homework problem, Implement Karp-Rabin and run it on a test file and the second part:
For the hash values modulo q, explain why it is a bad idea to use q as a power of 2. Can you construct a terrible example e.g. for q=64
and n=15?
This is my implementation of the algorithm:
def karp_rabin(text, pattern):
# setup
alphabet = 'ACGT'
d = len(alphabet)
n = len(pattern)
d_n = d**n
q = 2**32-1
m = {char:i for i,char in enumerate(alphabet)}
positions = []
def kr_hash(s):
return sum(d**(n-i-1) * m[s[i]] for i in range(n))
def update_hash():
return d*text_hash + m[text[i+n-1]] - d_n * m[text[i-1]]
pattern_hash = kr_hash(pattern)
for i in range(0, len(text) - n + 1):
text_hash = update_hash() if i else kr_hash(text[i:n])
if pattern_hash % q == text_hash % q and pattern == text[i:i+n]:
positions.append(i)
return ' '.join(map(str, positions))
...The second part of the question is referring to this part of the code/algo:
pattern_hash = kr_hash(pattern)
for i in range(0, len(text) - n + 1):
text_hash = update_hash() if i else kr_hash(text[i:n])
# the modulo q used to check if the hashes are congruent
if pattern_hash % q == text_hash % q and pattern == text[i:i+n]:
positions.append(i)
I don't understand why it would be a bad idea to use q as a power of 2. I've tried running the algorithm on the test file provided(which is the genome of ecoli) and there's no discernible difference.
I tried looking at the formula for how the hash is derived (I'm not good at math) trying to find some common factors that would be really bad for powers of two but found nothing. I feel like if q is a power of 2 it should cause a lot of clashes for the hashes so you'd need to compare strings a lot more but I didn't find anything along those lines either.
I'd really appreciate help on this since I'm stumped. If someone wants to point out what I can do better in the first part (code efficiency, readability, correctness etc.) I'd also be thrilled to hear your input on that.
There is a problem if q divides some power of d, because then only a few characters contribute to the hash. For example in your code d=4, if you take q=64 only the last three characters determine the hash (d**3 = 64).
I don't really see a problem if q is a power of 2 but gcd(d,q) = 1.
Your implementation looks a bit strange because instead of
if pattern_hash % q == text_hash % q and pattern == text[i:i+n]:
you could also use
if pattern_hash == text_hash and pattern == text[i:i+n]:
which would be better because you get fewer collisions.
The Thue–Morse sequence has among its properties that its polynomial hash quickly becomes zero when a power of 2 is the hash module, for whatever polynomial base (d). So if you will try to search a short Thue-Morse sequence in a longer one, you will have a great lot of hash collisions.
For example, your code, slightly adapted:
def karp_rabin(text, pattern):
# setup
alphabet = '01'
d = 15
n = len(pattern)
d_n = d**n
q = 32
m = {char:i for i,char in enumerate(alphabet)}
positions = []
def kr_hash(s):
return sum(d**(n-i-1) * m[s[i]] for i in range(n))
def update_hash():
return d*text_hash + m[text[i+n-1]] - d_n * m[text[i-1]]
pattern_hash = kr_hash(pattern)
for i in range(0, len(text) - n + 1):
text_hash = update_hash() if i else kr_hash(text[i:n])
if pattern_hash % q == text_hash % q : #and pattern == text[i:i+n]:
positions.append(i)
return ' '.join(map(str, positions))
print(karp_rabin('0110100110010110100101100110100110010110011010010110100110010110', '0110100110010110'))
outputs a lot of positions, although only three of then are proper matches.
Note that I have dropped the and pattern == text[i:i+n] check. Obviously if you restore it, the result will be correct, but also it is obvious that the algorithm will do much more work checking this additional condition than for other q. In fact, because there are so many collisions, the whole idea of algorithm becomes not working: you could almost as effectively wrote a simple algorithm that checks every position for a match.
Also note that your implementation is quite strange. The whole idea of polynomial hashing is to take the modulo operation each time you compute the hash. Otherwise your pattern_hash and text_hash are very big numbers. In other languages this might mean arithmetic overflow, but in Python this will invoke big integer arithmetic, which is slow and once again loses the whole idea of the algorithm.

If a>=b then O(a+b)=O(a)?

I'm trying to understand better the idea of O(n), so I'm wonder about this:
If we know that a>=b so O(a+b)=O(a)?
I know that O(a)+O(a)=O(2a)=O(a), but I'm wondering if it's true for something that it's smaller then a, I mean - if O(a+b)=O(a).
I think that it's true because a+b=O(2a), but I'd like to know if I'm wrong...
(P.S. it will be true if a and b are constants?)
Thank you!
You're totally correct in simplifying O(a+b) = O(a) as per this case.
It's so because
a>=b (given)
O(a+b) <= O(a+a) = O(2a) = O(a) // as clearly mentioned by you.
Example :-
Let's assume
a = n; b = log(n).
Then,you can see
O(a+b) = O(n+log(n)) = O(n) = O(a).

portfolio optimization sorting efficient solutions C#

I have the following problem to solve:
Let H be a set of portfolios. For each portfolio i in H let (ri,vi) be the (return,risk) values for this solution.
For each i in H if there exists j in H (j is different from i) such that rj>=ri and vj<=vi then delete i from H. because i is dominated by j (it has better return for less risk).
At the end H will be the set of undominated efficient solutions.
I tried to solve the above problem using linq:
H.RemoveAll(x => H.Any(y => x.CalculateReturn() <= y.CalculateReturn() && x.CalculateRisk() >= y.CalculateRisk() && x != y));
But I wonder if there exist a more efficient way, because if H.Count() is of the order of ten thousands, then it takes a lot of time to remove the dominated portfolios.
Thanks in advance for any help !
Christos
First off, you should be caching the Risk/Reward. I can't tell if you are by your code sample, but if you aren't you need to transform the list first.
Once you've done that, it makes sense to order the list according to the risk. As you increase risk, then, all you have to check is that your reward is strictly greater than the best reward you've seen so far. If it's not, you can remove it. That should dramatically improve performance.
Unfortunately, I'm not feeling clever enough to think of a way to do this with pure LINQ at the moment, but this code segment should work:
(Disclaimer: I haven't compiled/tested)
var orderedH = (
from h in H
let reward = h.CalculatedReward()
let risk = h.CalculatedRisk()
orderby risk ascending
select new {
Original = h,
Risk = risk,
Reward = reward
}).ToList();
var maxReward = Double.NegativeInfinity;
for (int i = 0; i < orderedH.Count; i++)
{
if (orderedH[i].Reward <= maxReward) {
orderedH.RemoveAt(i--);
}
else {
maxReward = orderedH[i].Reward;
}
}
var filteredPortfolio = orderedH.Select(h => h.Original);

Speeding up a nested for loop

I've been working on speeding up the following function, but with no results:
function beta = beta_c(k,c,gamma)
beta = zeros(size(k));
E = #(x) (1.453*x.^4)./((1 + x.^2).^(17/6));
for ii = 1:size(k,1)
for jj = 1:size(k,2)
E_int = integral(E,k(ii,jj),10000);
beta(ii,jj) = c*gamma/(k(ii,jj)*sqrt(E_int));
end
end
end
Up to now, I solved it this way:
function beta = beta_calc(k,c,gamma)
k_1d = reshape(k,[1,numel(k)]);
E_1d =#(k) 1.453.*k.^4./((1 + k.^2).^(17/6));
E_int = zeros(1,numel(k_1d));
parfor ii = 1:numel(k_1d)
E_int(ii) = quad(E_1d,k_1d(ii),10000);
end
beta_1d = c*gamma./(k_1d.*sqrt(E_int));
beta = reshape(beta_1d,[size(k,1),size(k,2)]);
end
Seems to me, it didn't really enhance performances. What do you think about this?
Would you mind to shed a light?
I thank you in advance.
EDIT
I am gonna introduce some theoretical background involving my question.
Generally, beta is to be calculated as follows
Therefore, in the reduced case of unidimensional k array, E_int may be calculated as
E = 1.453.*k.^4./((1 + k.^2).^(17/6));
E_int = 1.5 - cumtrapz(k,E);
or, alternatively as
E_int(1) = 1.5;
for jj = 2:numel(k)
E =#(k) 1.453.*k.^4./((1 + k.^2).^(17/6));
E_int(jj) = E_int(jj - 1) - integral(E,k(jj-1),k(jj));
end
Nonetheless, k is currently a matrix k(size1,size2).
Here's another approach, parallelize, because it's easy using spmd or parfor. Instead of integral consider quad, see this link for examples...
I like this question.
The problem: the function integral takes as integration limits only scalars. Hence, it is difficult to vectorize the computation of of E_int.
A clue: there seems to be lot of redundancy in integrating the same function over and over from k(ii,jj) to infinity...
Proposed solution: How about sorting the values of k from smallest to largest and integrating E_sort_int(si) = integral( E, sortedK(si), sortedK(si+1) ); with sortedK( numel(k) + 1 ) = 10000;. Then the full value of E_int = cumsum( E_sort_int ); (you only need to "undo" the sorting and reshape it back to the size of k).

Resources