I have tried to sort this out from smallest to largest in terms of efficiency but got it wrong.
Any help please?
xe, ex, (x+sinx)x20, xe, ln2x, xlnx, x + sinx, 13 + 1/x, lgx, 1/x
Could you define efficiency? Do you want to know how to sort these regarding their asymptotic complexity? link
when x grow very big:
1/x is near 0
13 + 1/x is near 13
lgx grow very slow, lg(10000000000) = just a bit more than 33)
ln²x is slow also, ln²(10000000000) = 530
x + sinx is approximatively x when x is very big
xlnx
x^e is x^2.71828
(x+sinx)x^20 = x^21 + x^20*sinx, si its almost equal x^21
e^x is a very fast growing function, e^20 is 4.85*10^8
If so, 1/x < 13 + 1/x < lgx < ln²x < x + sinx < xlnx < x^e < (x+sinx)x^20 < e^x
Related
We are covering the class P in my class and this one part is tripping me up regarding if the primality problem is in P
Our program:
"{prime(x): i =2; while i < x { if n mod i == 0 { return 0 } i++ } return 1 }"
Complexity for the program:
If x is n digits long, then x is in the rough vicinity of 10^n
. (Assuming no leading 0s, 10^n−1 ≤ x < 10^n.) The division algorithm that you learned
in elementary school divides an m-digit number by an n-digit number in
time O(mn). Puting that all together, we find that our algorithm for testing
whether an integer is prime takes time O(n^2 10^n).
My questions:
Where in the world does the professor get that x is 10^n, for example if x is 17 how does that turn into x being 10^2 = 100 operations long.
Furthermore where is the n^2 coming from in the final big O notation.
This trial division algorithm has to try x−2 divisors (i.e.,
Θ(10n) of them) when x is prime. The vast majority of these
divisors have n or n−1 digits, so each division takes Θ(n2)
time on average since m = Θ(n).
"{prime(x): i =2; while i < x { if n mod i == 0 { return 0 } i++ } return 1 }"
I mean with n you mean x, so your code should look like
prime(x){
if(x == 2) return 0;
for(int i = 2; i < x; i++)
if(x mod i == 0) return 0
return 1
}
This algorithm is the naive solution, which costs O(x).
It tests all natural numbers from 2 up to x. Supposing modulo is a constant time operation, you will do x - 1 operations (in worst case), hence O(x - 1) = O(x).
Obviously there are better solutions, like evaluating sieves to precompute primes so that you don't need to divide x for every other natural number.
Where in the world does the professor get that x is 10^n
They don't.
What they actually said is:
If x is n digits long, then x is in the rough vicinity of 10^n
In your case it's off by a factor of 100/17≈5.9. But that's a constant factor (and not a big one). In the worst case, it's off by factor 10. And in complexity classes we ignore such constant factors, so it doesn't matter for their analysis.
Well, primality problem is in P, see AKS Test for details
However, naive algorithm (that is in the question) is not in P. For given x we have
Time complexity t = O(x):
{prime(x):
i = 2;
while i < x { # we loop x - 2 times, t = O(x)
if n mod i == 0 {
return 0
}
i++
}
return 1
}
Size of the problem s = O(log(x)) - we have to provide all digits to write x:
x = 123456789 # size is not 123456789, but 27 bits (or 9 digits)
So time complexity t from size of the problem s is O(2^s) if size s is in bits or O(10^s) if size s is in decimal digits. That is definitely not polynomial.
Let's have a look at your example: if x is n digits long (log10(x)), t will be ~ 10^n:
x : size (digits) : time
----------------------------------------------------
17 : 1.23 : 15 # your example
~ 17_000_000_000 : 10.23 : ~ 17_000_000_000 # just 10 times longer
Can you see exponent time ~ f(size) now?
I have read several answers here on SO and on the web about choosing a good hash table length and that it should be a prime to reduce collisions and to uniformly distribute the keys across the hash table.
Though there are a lot of answers, I couldn't find a satisfying proof, and I didn't understand the explanations I found.
So if we have a key k and a hash table of length n and we do k % n = i to find the index i of a bucket in a hash table, we say that n should be a prime in order to minimize the number of collisions and to better distribute the keys across the hash table.
But why? Here is my attempt to prove it. It is going to be quite long and a bit pedantic, but please bear with me and try to read till the end.
I will start by making the following assumptions:
For every key k of the set of keys K, we can have a k which is either even or odd. A key is an integer, either even (k = 2x) or odd (k = 2x + 1).
For every n we may choose, n also can either be even (n = 2y) or odd (n = 2y + 1).
If we add an even number to another even number, we get an even number (2x + 2y = 2(x + y)). Likewise, if we add an odd number to another odd number, we still get an even number
((2x + 1) + (2y + 1) = 2x + 1 + 2y + 1 = 2x + 2y + 2 = 2(x + y + 1)).
If we add an odd number to an even number (same as adding an even number to an odd one), we always get an odd number ((2x + 1) + 2y = 2x + 1 + 2y = 2(x + y) + 1).
First of all, let's try to think about using an n which is not a prime, so maybe we will find out that these numbers are not good enough to be used as the length of a hash table (assuming that the keys share some patterns, e.g. like being all even or all odd).
Let's assume that n is even, i.e. n = 2y. In this case we have 2 scenarios: our keys k of K can be even (1.1.) or odd (1.2.).
1.1. n = 2y is even, keys are even k = 2x
For k = 2x and n = 2y, we have: k % n = 2x % 2y = i.
In this case we can say that if both the key k and hash table length n are even,
then i is also going to always be even.
Why? Because if we take the quotient by the integer division k // n = 2x // 2y = q, we get a quotient q such that:
k = 2x = (n * q) + i = (2y * q) + i = 2yq + i
Since 2yq (2y * q) is an even number, in order to satisfy 2x = 2yq + i the remainder i is always going to be even because 2x is even (even + even = even). If i were odd, we would get an odd number (even + odd = odd), but again 2x is even.
This leads to the following issue if we choose n to be even: if all our ks are even, then they will always end up in a bucket at an even index, increasing the number of collisions and clustering because only half n / 2 of the length of our hash table (only the even indices) will be occupied.
Therefore it's not a good idea to use an even number for n if all our ks or the majority of our ks are going to be even.
1.2. n = 2y is even, keys are odd k = 2x + 1
For k = 2x + 1 and n = 2y, we have: k % n = (2x + 1) % 2y = i.
Likewise, in this case if all of our ks (or the majority of them) are going to be odd, we end up in this situation:
k = 2x + 1 = (n * q) + i = (2y * q) + i = 2yq + i
Since 2yq is even, in order to get an odd k = 2x + 1, i is always going to be odd (even + odd = odd).
Again, choosing an even n as the hash table length is a bad idea even if all or the majority of our ks are odd, because we will end up with only the odd indices (buckets) being occupied.
So let's try with an n which is not an even number, i.e. an odd n = 2y + 1.
Let's assume that n is odd, i.e. n = 2y + 1. We still have even (2.1.) and odd (2.2.) keys (k of K).
2.1. n = 2y + 1 is odd, keys are even k = 2x
Here we have:
k = 2x = (n * q) + i = ((2y + 1) * q) + i = (2yq + q) + i = 2yq + q + i
We know that 2yq is even, so in order to get k = 2x which is even as well we need q + i to also be even.
When can q + i be even? Only in these 2 cases:
q -> even, i -> even, even + even = even
q -> odd, i -> odd, odd + odd = even
If either q or i is even while the other one is odd, we will get an odd q + i, and consequently an odd 2yq + (q + i), but we have k = 2x which is even, so either both q and i are even or they're both odd.
In this case we can see that for an odd n = 2y + 1, i can either be even or odd, which is good because it means that now we will use both even and odd bucket indices of our hash table and not only the even or only the odd ones.
By the way, it turns out that all primes p : p > 2 are odd numbers, so at least for now we can say that choosing a prime could be a good idea because a prime greater than 2 is always odd.
2.2. n = 2y + 1 is odd, keys are odd k = 2x + 1
Similarly here:
k = 2x + 1 = (n * q) + i = ((2y + 1) * q) + i = 2yq + q + i = 2yq + (q + i)
In order to get an odd k = 2x + 1 we need (q + i) to be odd (2yq is even), and this happens only in these 2 cases:
q -> even, i -> odd, even + odd = odd
q -> odd, i -> even, odd + even = odd
Again, we prove that an odd number is a better choice for n as this way we have the chance that both even and odd bucket's indices i are going to be occupied.
Now, I got stuck here. Is there a connection between this proof and prime numbers and how can I continue this proof to conclude that a prime number p would be an even better choice than a generic odd number with a similar reasoning?
EDIT:
So I tried to reason about it a bit further. This is what I came up with:
3. Using a generic odd n sharing a common factor f with k
We can say that for any factor f which is shared across k (k = f * x = fx) and n (n = f * y = fy), we end up with an i = k % n also sharing that common factor f. Why?
Again, if we try to compute k:
k = fx = (n * q) + i = (fy * q) + i = fyq + i
Then:
k = fx = fyq + i
Can only be satisfied if and only if i also shares f as one of its factors, e.g. i = f * g = fg:
k = fx = fyq + fg = f(yq + g)
Leading to yq + g = x.
This means that if both k and n share a common factor, then the result of the modulo i will also have that common factor and therefore i will always be a multiple of that common factor, e.g. for k of K = {12, 15, 33, 96, 165, 336} and n = 9 (an odd number, not a prime):
k | k % n
---------------------------
12 | 12 % 9 = 3
15 | 15 % 9 = 6
33 | 33 % 9 = 6
96 | 96 % 9 = 6
165 | 165 % 9 = 3
336 | 336 % 9 = 3
Both k and n always share a common factor (3 in this case).
This leads to i = k % n also being a multiple of 3 and therefore, again in such scenarios the hash table's bucket indices being used will only be those that are multiples of the common factor 3.
So while an odd number for n is definitely better than an even one (as explained at 2.1. and 2.2), we still may have unwanted patterns in numbers when k and n both share a common factor f.
So, if we make n a prime (n = p), we will certainly avoid that n shares that common factor f with k (provided that f != p), because a prime p can only have two factors: 1 and itself. So...
4. Using a prime for n
If n is a prime (n = p), we end up with:
k = fx = (q * p) + i = qp + i
Then:
k = fx = qp + i
Implies that the quotient q resulting from the integer division k // n can either share the common factor f or not, i.e.:
q = fz
Or:
q = z
In the first case (q = fz) we have:
k = fx = (q * p) + i = (fz * p) + i = fzp + i
So i ends up sharing the common factor f as well, e.g. i = fg:
k = fx = (q * p) + i = (fz * p) + i = fzp + i = fzp + fg = f(zp + g)
Such that zp + g = x.
And in the second case (q = z), we have:
k = fx = (q * p) + i = (z * p) + i = zp + i = zp + i
i.e. in this second case, i won't have f as one of its factors, as zp doesn't have f among its factors too.
So when using a prime for n, the benefit is that the result for i = k % n can either share a common factor f with k or not share it at all, e.g. for k of K = {56, 64, 72, 80, 88, 96} and n = p = 17:
k | k % n
---------------------------
56 | 56 % 17 = 5
64 | 64 % 17 = 13
72 | 72 % 17 = 4 ---> Common factor f = 4 of k and i
80 | 80 % 17 = 12 ---> Common factor f = 4 of k and i
88 | 88 % 17 = 3
96 | 96 % 17 = 11
In this case, all ks share a common factor f = 4, but only i = 72 % 17 = 4 and i = 80 % 17 = 12 both have k and i sharing that common factor f:
72 % 17 = 4 -> (18 * 4) % 17 = (4 * 1)
80 % 17 = 12 -> (20 * 4) % 17 = (4 * 3)
Also, if we take the previous example, for k of K = {12, 15, 33, 96, 165, 336} and we use the prime 17 for n instead of 9, we get:
k | k % n
---------------------------
12 | 12 % 17 = 12
15 | 15 % 17 = 15
33 | 33 % 17 = 16
96 | 96 % 17 = 11
165 | 165 % 17 = 12
336 | 336 % 17 = 13
Even here, we see that the common factor f = 3 in this case is shared between both k and n only in these 3 cases:
12 % 17 = 12 -> (4 * 3) % 17 = (4 * 3)
15 % 17 = 15 -> (5 * 3) % 17 = (5 * 3)
165 % 17 = 12 -> (55 * 3) % 17 = (4 * 3)
This way, using a prime, the probability for a collision has decreased, and we can distribute the data across the hash table better.
Now, what happens if even k is a prime, or at least a multiple of a prime? I think that in this case the distribution along the hash table would be even better, because there won't be any common factors between k and n if they are both primes or if k is a multiple of a prime, provided that k is not a multiple of the prime n.
This is my conclusion why a prime is better suited for the length of a hash table.
Would appreciate to receive your feedback and thoughts regarding my way of understanding this topic.
Thank you.
When it comes to chaining hash tables, you pretty much have the answer, although it can be written in fewer words:
Data often has patterns. Memory addresses, for example, often have zero in the lower bits.
Many hash functions, especially the very popular polynomial hash functions, are built using only addition, subtraction, and multiplication. All of these operations have the property that the lowest n bits of the result depend only on the lowest n bits of the operands, so these hash functions have this property too.
If your table size is zero in the lowest n bits, and the data is all the same in the lowest n bits, and your hash function has the property mentioned above... then your hash table will only use one out of every 2n of its slots.
Sometimes we fix this problem by choosing an odd hash table size. Prime sizes are better, because each small factor of the table size causes similar problems with different arithmetic progressions in hash values.
Sometimes, though, we fix this problem by adding an additional hash step to the hash table itself -- an additional hash step that mixes all the bits of the hash together and prevents this kinds of problems. Java HashMap uses tables with size 2N, but does this extra mixing step to help cover all the slots.
That's for chaining hash tables.
For hash tables that use open addressing to resolve collisions, the choice of a prime table size is usually required to ensure that the probing scheme will eventually check all (or at least half of) the slots. This is required to guarantee that the table will work at least until it is (half) full.
I'm not quite sure if it should come here or on mathematics stack exchange, posting here to find more practical cases.
Is there any formula / algorithm that uses second powers of linear series?
Meaning: a(1)^2 + a(2)^2 + a(3)^2 + ... + a(n)^2
Where a(n) is linear series.
Let a_k = a_0 + d°(k-1)
Then: (I use ° for multiplication)
sum(a_k^2) = sum( (a_0 + d°(k-1))^2) =sum( a_0^2 + d°d°(k-1)^2 + 2°d°(k-1)) = n°a_0°a_0 + d°d°sum((k-1)^2) + 2°d°sum(k-1)
(The sum goes from 1 to n)
We know that sum(k) = n°(n+1)/2, and sum(k^2)=n°(n+1)°(2n+1)/6
Therefore the above
sum(a_k^2) = n°a_0°a_0 + d°d°(n-1)°n°(2n-1)/6 + 2°(n-1)°n/2
Which can be simplified a little more, and be calculated in constant time.
I am reading an algorithms textbook and I am stumped by this question:
Suppose we want to compute the value x^y, where x and y are positive
integers with m and n bits, respectively. One way to solve the problem is to perform y - 1 multiplications by x. Can you give a more efficient algorithm that uses only O(n) multiplication steps?
Would this be a divide and conquer algorithm? y-1 multiplications by x would run in theta(n) right? .. I don't know where to start with this question
I understand this better in an iterative way:
You can compute x^z for all powers of two: z = (2^0, 2^1, 2^2, ... ,2^(n-1))
Simply by going from 1 to n and applying x^(2^(i+1)) = x^(2^i) * x^(2^i).
Now you can use these n values to compute x^y:
result = 1
for i=0 to n-1:
if the i'th bit in y is on:
result *= x^(2^i)
return result
All is done in O(n)
Apply a simple recursion for divide and conquer.
Here i am posting a more like a pseudo code.
x^y :=
base case: if y==1 return x;
if y%2==0:
then (x^2)^(y/2;
else
x.(x^2)^((y-1)/2);
The y-1 multiplications solution is based on the identity x^y = x * x^(y-1). By repeated application of the identity, you know that you will decrease y down to 1 in y-1 steps.
A better idea is to decrease y more "energically". Assuming an even y, we have x^y = x^(2*y/2) = (x^2)^(y/2). Assuming an odd y, we have x^y = x^(2*y/2+1) = x * (x^2)^(y/2).
You see that you can halve y, provided you continue the power computation with x^2 instead of x.
Recursively:
Power(x, y)=
1 if y = 0
x if y = 1
Power(x * x, y / 2) if y even
x * Power(x * x, y / 2) if y odd
Another way to view it is to read y as a sum of weighted bits. y = b0 + 2.b1 + 4.b2 + 8.b3...
The properties of exponentiation imply:
x^y = x^b0 . x^(2.b1) . x^(4.b2) . x^(8.b2)...
= x^b0 . (x^2)^b1 . (x^4)^b2 . (x^8)^b3...
You can obtain the desired powers of x by squaring, and the binary decomposition of y tells you which powers to multiply.
Given a positive integer X, how can one partition it into N parts, each between A and B where A <= B are also positive integers? That is, write
X = X_1 + X_2 + ... + X_N
where A <= X_i <= B and the order of the X_is doesn't matter?
If you want to know the number of ways to do this, then you can use generating functions.
Essentially, you are interested in integer partitions. An integer partition of X is a way to write X as a sum of positive integers. Let p(n) be the number of integer partitions of n. For example, if n=5 then p(n)=7 corresponding to the partitions:
5
4,1
3,2
3,1,1
2,2,1
2,1,1,1
1,1,1,1,1
The the generating function for p(n) is
sum_{n >= 0} p(n) z^n = Prod_{i >= 1} ( 1 / (1 - z^i) )
What does this do for you? By expanding the right hand side and taking the coefficient of z^n you can recover p(n). Don't worry that the product is infinite since you'll only ever be taking finitely many terms to compute p(n). In fact, if that's all you want, then just truncate the product and stop at i=n.
Why does this work? Remember that
1 / (1 - z^i) = 1 + z^i + z^{2i} + z^{3i} + ...
So the coefficient of z^n is the number of ways to write
n = 1*a_1 + 2*a_2 + 3*a_3 +...
where now I'm thinking of a_i as the number of times i appears in the partition of n.
How does this generalize? Easily, as it turns out. From the description above, if you only want the parts of the partition to be in a given set A, then instead of taking the product over all i >= 1, take the product over only i in A. Let p_A(n) be the number of integer partitions of n whose parts come from the set A. Then
sum_{n >= 0} p_A(n) z^n = Prod_{i in A} ( 1 / (1 - z^i) )
Again, taking the coefficient of z^n in this expansion solves your problem. But we can go further and track the number of parts of the partition. To do this, add in another place holder q to keep track of how many parts we're using. Let p_A(n,k) be the number of integer partitions of n into k parts where the parts come from the set A. Then
sum_{n >= 0} sum_{k >= 0} p_A(n,k) q^k z^n = Prod_{i in A} ( 1 / (1 - q*z^i) )
so taking the coefficient of q^k z^n gives the number of integer partitions of n into k parts where the parts come from the set A.
How can you code this? The generating function approach actually gives you an algorithm for generating all of the solutions to the problem as well as a way to uniformly sample from the set of solutions. Once n and k are chosen, the product on the right is finite.
Here is a python solution to this problem, This is quite un-optimised but I have tried to keep it as simple as I can to demonstrate an iterative method of solving this problem.
The results of this method will commonly be a list of max values and min values with maybe 1 or 2 values inbetween. Because of this, there is a slight optimisation in there, (using abs) which will prevent the iterator constantly trying to find min values counting down from max and vice versa.
There are recursive ways of doing this that look far more elegant, but this will get the job done and hopefully give you an insite into a better solution.
SCRIPT:
# iterative approach in-case the number of partitians is particularly large
def splitter(value, partitians, min_range, max_range, part_values):
# lower bound used to determine if the solution is within reach
lower_bound = 0
# upper bound used to determine if the solution is within reach
upper_bound = 0
# upper_range used as upper limit for the iterator
upper_range = 0
# lower range used as lower limit for the iterator
lower_range = 0
# interval will be + or -
interval = 0
while value > 0:
partitians -= 1
lower_bound = min_range*(partitians)
upper_bound = max_range*(partitians)
# if the value is more likely at the upper bound start from there
if abs(lower_bound - value) < abs(upper_bound - value):
upper_range = max_range
lower_range = min_range-1
interval = -1
# if the value is more likely at the lower bound start from there
else:
upper_range = min_range
lower_range = max_range+1
interval = 1
for i in range(upper_range, lower_range, interval):
# make sure what we are doing won't break solution
if lower_bound <= value-i and upper_bound >= value-i:
part_values.append(i)
value -= i
break
return part_values
def partitioner(value, partitians, min_range, max_range):
if min_range*partitians <= value and max_range*partitians >= value:
return splitter(value, partitians, min_range, max_range, [])
else:
print ("this is impossible to solve")
def main():
print(partitioner(9800, 1000, 2, 100))
The basic idea behind this script is that the value needs to fall between min*parts and max*parts, for each step of the solution, if we always achieve this goal, we will eventually end up at min < value < max for parts == 1, so if we constantly take away from the value, and keep it within this min < value < max range we will always find the result if it is possable.
For this code's example, it will basically always take away either max or min depending on which bound the value is closer to, untill some non min or max value is left over as remainder.
A simple realization you can make is that the average of the X_i must be between A and B, so we can simply divide X by N and then do some small adjustments to distribute the remainder evenly to get a valid partition.
Here's one way to do it:
X_i = ceil (X / N) if i <= X mod N,
floor (X / N) otherwise.
This gives a valid solution if A <= floor (X / N) and ceil (X / N) <= B. Otherwise, there is no solution. See proofs below.
sum(X_i) == X
Proof:
Use the division algorithm to write X = q*N + r with 0 <= r < N.
If r == 0, then ceil (X / N) == floor (X / N) == q so the algorithm sets all X_i = q. Their sum is q*N == X.
If r > 0, then floor (X / N) == q and ceil (X / N) == q+1. The algorithm sets X_i = q+1 for 1 <= i <= r (i.e. r copies), and X_i = q for the remaining N - r pieces. The sum is therefore (q+1)*r + (N-r)*q == q*r + r + N*q - r*q == q*N + r == X.
If floor (X / N) < A or ceil (X / N) > B, then there is no solution.
Proof:
If floor (X / N) < A, then floor (X / N) * N < A * N, and since floor(X / N) * N <= X, this means that X < A*N, so even using only the smallest pieces possible, the sum would be larger than X.
Similarly, if ceil (X / N) > B, then ceil (X / N) * N > B * N, and since ceil(X / N) * N >= X, this means that X > B*N, so even using only the largest pieces possible, the sum would be smaller than X.