Complexity of prime factor algorithm - algorithm

I just studied how to find the prime factors of a number with this algorithm that basically works this way:
void printPrimeFactors(N) {
while N is even
print 2 as prime factor
N = N/2
// At this point N is odd
for all the ODDS i from 3 to sqrt(N)
while N is divisible by i
print i as prime factor
N = N / i
if N hasn't been divided by anything (i.e. it's still N)
N is prime
}
everything is clear but I'm not sure how to calculate the complexity in big-O of the program above.
Being the division the most expensive operation (I suppose), I'd say that as a worst-case there could be a maximum of log(N) divisions, but I'm not totally sure of this.

You can proceed as like this. First of all, we are only interested in the behavior of the application when N is really big. In that case, we can simplify a lot: if two parts have different asymptotic performance, we need only take the one that performs worst.
The first while can loop at most m times, where m is the smallest integer so that 2m >= N. Therefore it will loop, at worst, log2N times -- this means that it performs as O(log N). Note that the type of log is irrelevant when N is large enough.
The for loop runs O(sqrt N) times. At scale, this is way more massive than log N, so we can drop the log.
Finally, we need to evaluate the while inside the loop. Since this while loop is executed only for divisors, it will have a big O equal to their number. With a little thought we can see that, at worst, the while will loop log3N times because 3 is the smallest possible divisor.
Thus, the while loop gets executed only O(log N) times, but the outer for gets executed O(sqrt N) times (and often the while loop doesn't run because the current number won't divide).
In conclusion, the part that will take the longest is the for loop and this will make the algorithm go as O(sqrtN).

Related

Why is O(n) better than O( nlog(n) )?

I just came around this weird discovery, in normal maths, n*logn would be lesser than n, because log n is usually less than 1.
So why is O(nlog(n)) greater than O(n)? (ie why is nlogn considered to take more time than n)
Does Big-O follow a different system?
It turned out, I misunderstood Logn to be lesser than 1.
As I asked few of my seniors i got to know this today itself, that if the value of n is large, (which it usually is, when we are considering Big O ie worst case), logn can be greater than 1.
So yeah,
O(1) < O(logn) < O(n) < O(nlogn) holds true.
(I thought this to be a dumb question, and was about to delete it as well, but then realised, no question is dumb question and there might be others who get this confusion so I left it here.)
...because log n is always less than 1.
This is a faulty premise. In fact, logb n > 1 for all n > b. For example, log2 32 = 5.
Colloquially, you can think of log n as the number of digits in n. If n is an 8-digit number then log n ≈ 8. Logarithms are usually bigger than 1 for most values of n, because most numbers have multiple digits.
Plot both the graph( on desmos (https://www.desmos.com/calculator) or any other web) and look yourself the result on large values of n ( y=f(n)). I am saying that you should look for large value because for small value of n the program will not have time issue. For convenience I had attached a graph below you can try for other base of log.
The red represent time = n and blue represent time = nlog(n).
Here is a graph of the popular time complexities
n*log(n) is clearly greater than n for n>2 (log base 2)
An easy way to remember might be, taking two examples
Imagine the binary search algorithm with is Log N time complexity: O(log(N))
If, for each step of binary search, you had to iterate the array of N elements
The time complexity of that task would be O(N*log(N))
Which is more work than iterating the array once: O(N)
In computers, it's log base 2 and not base 10. So log(2) is 1 and log(n), where n>2, is a positive number which is greater than 1.
Only in the case of log (1), we have the value less than 1, otherwise, it's greater than 1.
Log(n) can be greater than 1 if n is greater than b. But this doesn't answer your question that why is O(n*logn) is greater than O(n).
Usually the base is less than 4. So for higher values n, n*log(n) becomes greater than n. And that is why O(nlogn) > O(n).
This graph may help. log (n) rises faster than n and is greater than 1 for n greater than logarithm's base. https://stackoverflow.com/a/7830804/11617347
No matter how two functions behave on small value of n, they are compared against each other when n is large enough. Theoretically, there is an N such that for each given n > N, then nlogn >= n. If you choose N=10, nlogn is always greater than n.
The assertion is not always accurate. When n is small, (n^2) requires more time than (log n), but when n is large, (log n) is more effective. The growth rate of (n^2) is less than (n) and (log n) for small values, so we can say that (n^2) is more efficient because it takes less time than (log n), but as n increases, (n^2) increases dramatically, whereas (log n) has a growth rate that is less than (n^2) and (n), so (log n) is more efficient.
For higher values of log n it becomes greater than 1. as we consider all possible values of n we can say that for most of the time log n is greater than 1. Hence we can say O(nlogn) > O(n) (Assuming higher values)
Remember "big O" is not about values, it is about the shape of the function I can have an O(n2) function that runs faster, even for values of n over a million than a O(1) function...

What is the complexity of this while loop?

Let m be the size of Array A and n be the size of Array B. What is the complexity of the following while loop?
while (i<n && j<m){ if (some condition) i++ else j++}
Example for an array: A=[1,2,3,4] B=[1,2,3,4] the while loop executes at most 5+4 times O(m+n).
Example for an array: A=[1,2,3,4,7,8,9,10] B=[1,2,3,4] the while loop executes at most 4 times O(n).
I am not able to figure out how to represent the complexity of the while loop.
One common approach is to describe the worst-case time complexity. In your example, the worst-case time complexity is O(m + n), because no matter what some condition is during a given loop iteration, the total number of loop iterations is at most m + n.
If it's important to emphasize that the time complexity has a lesser upper bound in some cases, then you'll need to figure out what those cases are, and find a way to express them. (For example, if a given algorithm takes an array of size n and has worst-case O(n2) time, it might also be possible to describe it as "O(mn) time, where m is the number of distinct values in the array" — only if that's true, of course — where we've introduced an extra variable m to let us capture the impact on the performance of having more vs. fewer duplicate values.)

Time complexity measured as a function of the number of bits in the input?

I came across this problem that I am not sure how to solve:
Suppose A(.) is a subroutine that takes as input a number in binary, and takes linear time (that is, O(n), where n is the length (in bits) of the number).
Consider the following piece of code, which starts with an n-bit number x.
while x>1:
call A(x)
x=x-1
Assume that the subtraction takes O(n) time on an n-bit number.
(a) How many times does the inner loop iterate (as a function of n)? Leave your answer in big-O form.
(b) What is the overall running time (as a function of n), in big-O form?
My guess is that (a) is O(n^2) and (b) is O(n^3). Is this correct? The way I'm thinking about it is that the loop has to compute two steps each time it cycles through and it will cycle through x time each time subtracting 1 from n bits until x reaches 0. And for part b since A(.) takes time O(n) we multiply that with the time it takes to execute the loop and we then have the over all running time. Is my analysis correct?
Something that might help here is to write x = 2n, since if x has n bits its value is O(2n). Therefore, the loop will run O(2n) times.
Each iteration of the loop does O(n) work, giving an upper bound on the work of O(n · 2n). This bound ends up being tight. Notice that for the first x/2 iterations of the loop, the value of x will still need n bits. Therefore, as a lower bound on the work done, we get x/2 = 2n-1 iterations doing n work each, giving a total of Ω(n · 2n) work. Thus the work done is Θ(n · 2n).
Hope this helps!

Determining time complexity of an algorithm

Below is some pseudocode I wrote that, given an array A and an integer value k, returns true if there are two different integers in A that sum to k, and returns false otherwise. I am trying to determine the time complexity of this algorithm.
I'm guessing that the complexity of this algorithm in the worst case is O(n^2). This is because the first for loop runs n times, and the for loop within this loop also runs n times. The if statement makes one comparison and returns a value if true, which are both constant time operations. The final return statement is also a constant time operation.
Am I correct in my guess? I'm new to algorithms and complexity, so please correct me if I went wrong anywhere!
Algorithm ArraySum(A, n, k)
for (i=0, i<n, i++)
for (j=i+1, j<n, j++)
if (A[i]+A[j]=k)
return true
return false
Azodious's reasoning is incorrect. The inner loop does not simply run n-1 times. Thus, you should not use (outer iterations)*(inner iterations) to compute the complexity.
The important thing to observe is, that the inner loop's runtime changes with each iteration of the outer loop.
It is correct, that the first time the loop runs, it will do n-1 iterations. But after that, the amount of iterations always decreases by one:
n - 1
n - 2
n - 3
…
2
1
We can use Gauss' trick (second formula) to sum this series to get n(n-1)/2 = (n² - n)/2. This is how many times the comparison runs in total in the worst case.
From this, we can see that the bound can not get any tighter than O(n²). As you can see, there is no need for guessing.
Note that you cannot provide a meaningful lower bound, because the algorithm may complete after any step. This implies the algorithm's best case is O(1).
Yes. In the worst case, your algorithm is O(n2).
Your algorithm is O(n2) because every instance of inputs needs time complexity O(n2).
Your algorithm is Ω(1) because there exist one instance of inputs only needs time complexity Ω(1).
Following appears in chapter 3, Growth of Function, of Introduction to Algorithms co-authored by Cormen, Leiserson, Rivest, and Stein.
When we say that the running time (no modifier) of an algorithm is Ω(g(n)), we mean that no mater what particular input of size n is chosen for each value of n, the running time on that input is at least a constant time g(n), for sufficiently large n.
Given an input in which the summation of first two elements is equal to k, this algorithm would take only one addition and one comparison before returning true.
Therefore, this input costs constant time complexity and make the running time of this algorithm Ω(1).
No matter what the input is, this algorithm would take at most n(n-1)/2 additions and n(n-1)/2 comparisons before returning value.
Therefore, the running time of this algorithm is O(n2)
In conclusion, we can say that the running time of this algorithm falls between Ω(1) and O(n2).
We could also say that worst-case running of this algorithm is Θ(n2).
You are right but let me explain a bit:
This is because the first for loop runs n times, and the for loop within this loop also runs n times.
Actually, the second loop will run for (n-i-1) times, but in terms of complexity it'll be taken as n only. (updated based on phant0m's comment)
So, in worst case scenerio, it'll run for n * (n-i-1) * 1 * 1 times. which is O(n^2).
in best case scenerio, it's run for 1 * 1 * 1 * 1 times, which is O(1) i.e. constant.

Why is Sieve of Eratosthenes more efficient than the simple "dumb" algorithm?

If you need to generate primes from 1 to N, the "dumb" way to do it would be to iterate through all the numbers from 2 to N and check if the numbers are divisable by any prime number found so far which is less than the square root of the number in question.
As I see it, sieve of Eratosthenes does the same, except other way round - when it finds a prime N, it marks off all the numbers that are multiples of N.
But whether you mark off X when you find N, or you check if X is divisable by N, the fundamental complexity, the big-O stays the same. You still do one constant-time operation per a number-prime pair. In fact, the dumb algorithm breaks off as soon as it finds a prime, but sieve of Eratosthenes marks each number several times - once for every prime it is divisable by. That's a minimum of twice as many operations for every number except primes.
Am I misunderstanding something here?
In the trial division algorithm, the most work that may be needed to determine whether a number n is prime, is testing divisibility by the primes up to about sqrt(n).
That worst case is met when n is a prime or the product of two primes of nearly the same size (including squares of primes). If n has more than two prime factors, or two prime factors of very different size, at least one of them is much smaller than sqrt(n), so even the accumulated work needed for all these numbers (which form the vast majority of all numbers up to N, for sufficiently large N) is relatively insignificant, I shall ignore that and work with the fiction that composite numbers are determined without doing any work (the products of two approximately equal primes are few in number, so although individually they cost as much as a prime of similar size, altogether that's a negligible amount of work).
So, how much work does the testing of the primes up to N take?
By the prime number theorem, the number of primes <= n is (for n sufficiently large), about n/log n (it's n/log n + lower order terms). Conversely, that means the k-th prime is (for k not too small) about k*log k (+ lower order terms).
Hence, testing the k-th prime requires trial division by pi(sqrt(p_k)), approximately 2*sqrt(k/log k), primes. Summing that for k <= pi(N) ~ N/log N yields roughly 4/3*N^(3/2)/(log N)^2 divisions in total. So by ignoring the composites, we found that finding the primes up to N by trial division (using only primes), is Omega(N^1.5 / (log N)^2). Closer analysis of the composites reveals that it's Theta(N^1.5 / (log N)^2). Using a wheel reduces the constant factors, but doesn't change the complexity.
In the sieve, on the other hand, each composite is crossed off as a multiple of at least one prime. Depending on whether you start crossing off at 2*p or at p*p, a composite is crossed off as many times as it has distinct prime factors or distinct prime factors <= sqrt(n). Since any number has at most one prime factor exceeding sqrt(n), the difference isn't so large, it has no influence on complexity, but there are a lot of numbers with only two prime factors (or three with one larger than sqrt(n)), thus it makes a noticeable difference in running time. Anyhow, a number n > 0 has only few distinct prime factors, a trivial estimate shows that the number of distinct prime factors is bounded by lg n (base-2 logarithm), so an upper bound for the number of crossings-off the sieve does is N*lg N.
By counting not how often each composite gets crossed off, but how many multiples of each prime are crossed off, as IVlad already did, one easily finds that the number of crossings-off is in fact Theta(N*log log N). Again, using a wheel doesn't change the complexity but reduces the constant factors. However, here it has a larger influence than for the trial division, so at least skipping the evens should be done (apart from reducing the work, it also reduces storage size, so improves cache locality).
So, even disregarding that division is more expensive than addition and multiplication, we see that the number of operations the sieve requires is much smaller than the number of operations required by trial division (if the limit is not too small).
Summarising:
Trial division does futile work by dividing primes, the sieve does futile work by repeatedly crossing off composites. There are relatively few primes, but many composites, so one might be tempted to think trial division wastes less work.
But: Composites have only few distinct prime factors, while there are many primes below sqrt(p).
In the naive method, you do O(sqrt(num)) operations for each number num you check for primality. Ths is O(n*sqrt(n)) total.
In the sieve method, for each unmarked number from 1 to n you do n / 2 operations when marking multiples of 2, n / 3 when marking those of 3, n / 5 when marking those of 5 etc. This is n*(1/2 + 1/3 + 1/5 + 1/7 + ...), which is O(n log log n). See here for that result.
So the asymptotic complexity is not the same, like you said. Even a naive sieve will beat the naive prime-generation method pretty fast. Optimized versions of the sieve can get much faster, but the big-oh remains unchanged.
The two are not equivalent like you say. For each number, you will check divisibility by the same primes 2, 3, 5, 7, ... in the naive prime-generation algorithm. As you progress, you check divisibility by the same series of numbers (and you keep checking against more and more as you approach your n). For the sieve, you keep checking less and less as you approach n. First you check in increments of 2, then of 3, then 5 and so on. This will hit n and stop much faster.
Because with the sieve method, you stop marking mutiples of the running primes when the running prime reaches the square root of N.
Say, you want to find all primes less than a million.
First you set an array
for i = 2 to 1000000
primetest[i] = true
Then you iterate
for j=2 to 1000 <--- 1000 is the square root of 10000000
if primetest[j] <--- if j is prime
---mark all multiples of j (except j itself) as "not a prime"
for k = j^2 to 1000000 step j
primetest[k] = false
You don't have to check j after 1000, because j*j will be more than a million.
And you start from j*j (you don't have to mark multiples of j less than j^2 because they are already marked as multiples of previously found, smaller primes)
So, in the end you have done the loop 1000 times and the if part only for those j's that are primes.
Second reason is that with the sieve, you only do multiplication, not division. If you do it cleverly, you only do addition, not even multiplication.
And division has larger complexity than addition. The usual way to do division has O(n^2) complexity, while addition has O(n).
Explained in this paper: http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf
I think it's quite readable even without Haskell knowledge.
the first difference is that division is much more expensive than addition. Even if each number is 'marked' several times, it's trivial when compared with the huge number of divisions needed for the 'dumb' algorithm.
A "naive" Sieve of Eratosthenes will mark non-prime numbers multiple times.
But, if you have your numbers on a linked list and remove numbers taht are multiples (you will still need to walk the remainder of the list), the work left to do after finding a prime is always smaller than it was before finding the prime.
http://en.wikipedia.org/wiki/Prime_number#Number_of_prime_numbers_below_a_given_number
the "dumb" algorithm does i/log(i) ~= N/log(N) work for each prime number
the real algorithm does N/i ~= 1 work for each prime number
Multiply by roughly N/log(N) prime numbers.
It was a time when I was trying to find an efficient way of finding sum of primes less than x:
There I decided to use N by N square table and started checking if numbers with a unit digits in [1,3,7,9]
But Eratosthenes Method of prime made it a little easier: How
Let you want to know if N is prime or Not
You started finding factorization. So you will realize when N is factorized
when you divide N with the highest factor quotient will be less.
So, You take a number: int(sqrt(N)) = K(say) divides N you get somewhat same and close number to K
Now let's say you divide N with u<K, but if "U" is not prime and one of the prime factors of U is V(prime) then will obviously be less than U (V<U) and V will also divide N
then
why not divide and check if N is prime or not by DIVIDING 'N' WITH ONLY PRIMES LESS THAN K=int(sqrt(N))
Number of times for which loop Keeps executing = π(√n)
This is how the brilliant idea of Eratosthenes starts taking pictures and will start giving you intuition behind this all.
Btw using the Sieve of Eratosthenes one can find sum of primes less than a multiple of 10.
because for a given column you just check need to check their unit digits[1,3,7,9] and for how many times a particular unit digit is repeating.
Being new to Stack Overflow Community! Would like to know suggestions on the same if anything is wrong.

Resources