Efficient Sorting? - sorting

I've been looking an answer for this question for sometime now... "What is the most efficient way to sort a million 32-bit integers?"
I feel that Quick sort is the most efficient in sorting.. with an average Runtime of O(n*log n). (with the worst case O(n²))..
But some search results says that Radix sort/Merge sort are efficient for sorting million integers.
Any pointers?

Mergesort is guarenteed to be O(n lg n), but has a higher memory footprint than quick sort.
Quicksort generally runs faster than mergesort, but under ~some~ circumstances it can degrade to quadratic running time.
Radix sort is O(n*r) where r is the length of the numbers.
To determine if radix is better than your chosen lg-n method, do this:
n * r < n * lg (n)
divide by n on both sides
r < lg(n)
We know r is 32 bits
32 < lg(n)
for both sides, take 2^x
2^32 < 2^(lg(n)
2^32 < n
So if n is less than 2^32 (4 billion) then use the lg-n algorithm.
Personally, I'd use quick sort, shuffling it if I have to in order to prevent it from degrading.

if you have enough space maybe you can try bucket sort (http://en.wikipedia.org/wiki/Bucket_sort). It's more efficient but requires additional memory to store datas.

Merge sort is O(n log n) in its worst case, so its going to be better than quick sort most of the time. Radix sorts, iirc, are only really useful when each thing being sorted is the same length. Its timed in O(K * N) which is (item length) * (number of items). I don't think I've ever needed to use a Radix sort.

Radix is better if for large numbers, especially when you know the range of the numbers.
Calculation fix:
Radix is O(kN) where k is the number of digits in the largest number. (Actually it's about d*k*N where d is the digit base, number of buckets that will be used ... Alphabet = 26, decimal = 10 , binary = 2)
Maxint = 4,294,967,296
32 bits: k = 32 / log(d)
Base 10 Radix:
d*k*n = 10*10*n < nlogn .... 100 < logn ... n > 2^100
Base 2 Radix:
d*k*n = 2*32*n < nlogn .... 64 < logn ... n > 2^64
So for 32bit numbers if you have more than 2^64 numbers n*k*N is better than nlogn
But, if you know that the range will be up to 1024 , and not MAXINT for example:
MaxNumber = 1024
Base 10 Radix:
d*k*n = 10*4*n < nlogn .... 40 < logn ... n > 2^40
Base 2 Radix:
d*k*n = 2*10*n < nlogn .... 20 < logn ... n > 2^20
So for numbers up to 1024 if you have more than 2^20 numbers n*k*N is better than nlogn
Because big O notation discards
multiplicative constants on the
running time, and ignores efficiency
for low input sizes, it does not
always reveal the fastest algorithm in
practice or for practically-sized data
sets, but the approach is still very
effective for comparing the
scalability of various algorithms as
input sizes become large.

Related

Assume we have n integers that range from 0 to n^5 -1 in value. What will be the algorithmic complexity of sorting these numbers by using Radix sort?

I don't know if it is:
O(n)
or
O(5log10(n) * (n+10)) = O(nlog10n)
or
O(n+k)
I could be wrong, but I need to be able to compute it to figure it out. I'm real confused here. Please explain the answer and show any calculations too. Thank you.
If the radix base is based on n, such as base = n (5 passes) or ceil(sqrt(n)) (10 passes), time complexity is O(n), since the constants like 5 or 10 are ignored.
If the radix base is independent of n, such as some power of 2, for example 2^8 = 256, then the number of passes = ceil(log256(n^5)), and time complexity is O(n log(n)).
The question doesn't specify what is to be used for the radix base.

Is complexity O(log(n)) equivalent to O(sqrt(n))?

My professor just taught us that any operation that halves the length of the input has an O(log(n)) complexity as a thumb rule. Why is it not O(sqrt(n)), aren't both of them equivalent?
They are not equivalent: sqrt(N) will increase a lot more quickly than log2(N). There is no constant C so that you would have sqrt(N) < C.log(N) for all values of N greater than some minimum value.
An easy way to grasp this, is that log2(N) will be a value close to the number of (binary) digits of N, while sqrt(N) will be a number that has itself half the number of digits that N has. Or, to state that with an equality:
        log2(N) = 2log2(sqrt(N))
So you need to take the logarithm(!) of sqrt(N) to bring it down to the same order of complexity as log2(N).
For example, for a binary number with 11 digits, 0b10000000000 (=210), the square root is 0b100000, but the logarithm is only 10.
Assuming natural logarithms (otherwise just multiply by a constant), we have
lim {n->inf} log n / sqrt(n) = (inf / inf)
= lim {n->inf} 1/n / 1/(2*sqrt(n)) (by L'Hospital)
= lim {n->inf} 2*sqrt(n)/n
= lim {n->inf} 2/sqrt(n)
= 0 < inf
Refer to https://en.wikipedia.org/wiki/Big_O_notation for alternative defination of O(.) and thereby from above we can say log n = O(sqrt(n)),
Also compare the growth of the functions below, log n is always upper bounded by sqrt(n) for all n > 0.
Just compare the two functions:
sqrt(n) ---------- log(n)
n^(1/2) ---------- log(n)
Plug in Log
log( n^(1/2) ) --- log( log(n) )
(1/2) log(n) ----- log( log(n) )
It is clear that: const . log(n) > log(log(n))
No, It's not equivalent.
#trincot gave one excellent explanation with example in his answer. I'm adding one more point. Your professor taught you that
any operation that halves the length of the input has an O(log(n)) complexity
It's also true that,
any operation that reduces the length of the input by 2/3rd, has a O(log3(n)) complexity
any operation that reduces the length of the input by 3/4th, has a O(log4(n)) complexity
any operation that reduces the length of the input by 4/5th, has a O(log5(n)) complexity
So on ...
It's even true for all reduction of lengths of the input by (B-1)/Bth. It then has a complexity of O(logB(n))
N:B: O(logB(n)) means B based logarithm of n
One way to approach the problem can be to compare the rate of growth of O()
and O( )
As n increases we see that (2) is less than (1). When n = 10,000 eq--1 equals 0.005 while eq--2 equals 0.0001
Hence is better as n increases.
No, they are not equivalent; you can even prove that
O(n**k) > O(log(n, base))
for any k > 0 and base > 1 (k = 1/2 in case of sqrt).
When talking on O(f(n)) we want to investigate the behaviour for large n,
limits is good means for that. Suppose that both big O are equivalent:
O(n**k) = O(log(n, base))
which means there's a some finite constant C such that
O(n**k) <= C * O(log(n, base))
starting from some large enough n; put it in other terms (log(n, base) is not 0 starting from large n, both functions are continuously differentiable):
lim(n**k/log(n, base)) = C
n->+inf
To find out the limit's value we can use L'Hospital's Rule, i.e. take derivatives for numerator and denominator and divide them:
lim(n**k/log(n)) =
lim([k*n**(k-1)]/[ln(base)/n]) =
ln(base) * k * lim(n**k) = +infinity
so we can conclude that there's no constant C such that O(n**k) < C*log(n, base) or in other words
O(n**k) > O(log(n, base))
No, it isn't.
When we are dealing with time complexity, we think of input as a very large number. So let's take n = 2^18. Now for sqrt(n) number of operation will be 2^9 and for log(n) it will be equal to 18 (we consider log with base 2 here). Clearly 2^9 much much greater than 18.
So, we can say that O(log n) is smaller than O(sqrt n).
To prove that sqrt(n) grows faster than lgn(base2) you can take the limit of the 2nd over the 1st and proves it approaches 0 as n approaches infinity.
lim(n—>inf) of (lgn/sqrt(n))
Applying L’Hopitals Rule:
= lim(n—>inf) of (2/(sqrt(n)*ln2))
Since sqrt(n) and ln2 will increase infinitely as n increases, and 2 is a constant, this proves
lim(n—>inf) of (2/(sqrt(n)*ln2)) = 0

What can be the time complexity of below algorithm?

What is the time complexity of Binary to Decimal Conversion ?
I think If there are K bits in a binary number, then TC would be O(K), but, as we always tend to go for N bits in a binary number, then TC would be O(N).
How i got this ,as i think like how many digits a decimal number would include to represent N bit binary numbers.
Then, it is 10^k - 1 = 2^N - 1 => K = N Log2 base10 => which gives TC = O(N).
Can someone clarify this ?
Also, Is there is any chance to reduce this Time complexity ?
10^k - 1 = 2^N - 1
Starting point.
K = N Log2 base10
-1 was present both left and right, so we added 1. we have:
10^k = 2^N now.
To get K from 10^k, we put both left and right into log of base 10. Since logarithm is the power you need to put the base to get the parameter, log(10^k) of base 10 will yield K. On the other hand, N Log2 base10 is the value we reach because of the logarithmization.
Since K, the number of bits is the complexity, the complexity is directly related to N Log2 base 10, which is N multiplied by a positive scalar. If N is assumed to be infinite, the scalar will not mean much of a difference analytically, therefore, we can simplify N Log2 base 10 to O(N)
A decimal digit can represent log(10)/log(2) binary digits, approximately 3.22. But you want binary to decimal, so assume you need 4 bits per decimal digit. In any event, your output size is O(N) where N is your input size.
Now, that tells you very little about the time complexity of the unstated algorithm.
For small N, we can just do a table look up for O(1), but what about when N is large, e.g. 10^6?
A naive algorithm will process one bit of input at a time, doing a multiply/add on the output decimal. That's O(N) on the input, with each step costing O(N) because that's the cost to shift/add on the output. So O(N^2) is your time complexity.

Big O, what is the complexity of summing a series of n numbers?

I always thought the complexity of:
1 + 2 + 3 + ... + n is O(n), and summing two n by n matrices would be O(n^2).
But today I read from a textbook, "by the formula for the sum of the first n integers, this is n(n+1)/2" and then thus: (1/2)n^2 + (1/2)n, and thus O(n^2).
What am I missing here?
The big O notation can be used to determine the growth rate of any function.
In this case, it seems the book is not talking about the time complexity of computing the value, but about the value itself. And n(n+1)/2 is O(n^2).
You are confusing complexity of runtime and the size (complexity) of the result.
The running time of summing, one after the other, the first n consecutive numbers is indeed O(n).1
But the complexity of the result, that is the size of “sum from 1 to n” = n(n – 1) / 2 is O(n ^ 2).
1 But for arbitrarily large numbers this is simplistic since adding large numbers takes longer than adding small numbers. For a precise runtime analysis, you indeed have to consider the size of the result. However, this isn’t usually relevant in programming, nor even in purely theoretical computer science. In both domains, summing numbers is usually considered an O(1) operation unless explicitly required otherwise by the domain (i.e. when implementing an operation for a bignum library).
n(n+1)/2 is the quick way to sum a consecutive sequence of N integers (starting from 1). I think you're confusing an algorithm with big-oh notation!
If you thought of it as a function, then the big-oh complexity of this function is O(1):
public int sum_of_first_n_integers(int n) {
return (n * (n+1))/2;
}
The naive implementation would have big-oh complexity of O(n).
public int sum_of_first_n_integers(int n) {
int sum = 0;
for (int i = 1; i <= n; i++) {
sum += n;
}
return sum;
}
Even just looking at each cell of a single n-by-n matrix is O(n^2), since the matrix has n^2 cells.
There really isn't a complexity of a problem, but rather a complexity of an algorithm.
In your case, if you choose to iterate through all the numbers, the the complexity is, indeed, O(n).
But that's not the most efficient algorithm. A more efficient one is to apply the formula - n*(n+1)/2, which is constant, and thus the complexity is O(1).
So my guess is that this is actually a reference to Cracking the Coding Interview, which has this paragraph on a StringBuffer implementation:
On each concatenation, a new copy of the string is created, and the
two strings are copied over, character by character. The first
iteration requires us to copy x characters. The second iteration
requires copying 2x characters. The third iteration requires 3x, and
so on. The total time therefore is O(x + 2x + ... + nx). This reduces
to O(xn²). (Why isn't it O(xnⁿ)? Because 1 + 2 + ... n equals n(n+1)/2
or, O(n²).)
For whatever reason I found this a little confusing on my first read-through, too. The important bit to see is that n is multiplying n, or in other words that n² is happening, and that dominates. This is why ultimately O(xn²) is just O(n²) -- the x is sort of a red herring.
You have a formula that doesn't depend on the number of numbers being added, so it's a constant-time algorithm, or O(1).
If you add each number one at a time, then it's indeed O(n). The formula is a shortcut; it's a different, more efficient algorithm. The shortcut works when the numbers being added are all 1..n. If you have a non-contiguous sequence of numbers, then the shortcut formula doesn't work and you'll have to go back to the one-by-one algorithm.
None of this applies to the matrix of numbers, though. To add two matrices, it's still O(n^2) because you're adding n^2 distinct pairs of numbers to get a matrix of n^2 results.
There's a difference between summing N arbitrary integers and summing N that are all in a row. For 1+2+3+4+...+N, you can take advantage of the fact that they can be divided into pairs with a common sum, e.g. 1+N = 2+(N-1) = 3+(N-2) = ... = N + 1. So that's N+1, N/2 times. (If there's an odd number, one of them will be unpaired, but with a little effort you can see that the same formula holds in that case.)
That is not O(N^2), though. It's just a formula that uses N^2, actually O(1). O(N^2) would mean (roughly) that the number of steps to calculate it grows like N^2, for large N. In this case, the number of steps is the same regardless of N.
Adding the first n numbers:
Consider the algorithm:
Series_Add(n)
return n*(n+1)/2
this algorithm indeed runs in O(|n|^2), where |n| is the length (the bits) of n and not the magnitude, simply because multiplication of 2 numbers, one of k bits and the other of l bits runs in O(k*l) time.
Careful
Considering this algorithm:
Series_Add_pseudo(n):
sum=0
for i= 1 to n:
sum += i
return sum
which is the naive approach, you can assume that this algorithm runs in linear time or generally in polynomial time. This is not the case.
The input representation(length) of n is O(logn) bits (any n-ary coding except unary), and the algorithm (although it is running linearly in the magnitude) it runs exponentially (2^logn) in the length of the input.
This is actually the pseudo-polynomial algorithm case. It appears to be polynomial but it is not.
You could even try it in python (or any programming language), for a medium length number like 200 bits.
Applying the first algorithm the result comes in a split second, and applying the second, you have to wait a century...
1+2+3+...+n is always less than n+n+n...+n n times. you can rewrite this n+n+..+n as n*n.
f(n) = O(g(n)) if there exists a positive integer n0 and a positive
constant c, such that f(n) ≤ c * g(n) ∀ n ≥ n0
since Big-Oh represents the upper bound of the function, where the function f(n) is the sum of natural numbers up to n.
now, talking about time complexity, for small numbers, the addition should be of a constant amount of work. but the size of n could be humongous; you can't deny that probability.
adding integers can take linear amount of time when n is really large.. So you can say that addition is O(n) operation and you're adding n items. so that alone would make it O(n^2). of course, it will not always take n^2 time, but it's the worst-case when n is really large. (upper bound, remember?)
Now, let's say you directly try to achieve it using n(n+1)/2. Just one multiplication and one division, this should be a constant operation, no?
No.
using a natural size metric of number of digits, the time complexity of multiplying two n-digit numbers using long multiplication is Θ(n^2). When implemented in software, long multiplication algorithms must deal with overflow during additions, which can be expensive. Wikipedia
That again leaves us to O(n^2).
It's equivalent to BigO(n^2), because it is equivalent to (n^2 + n) / 2 and in BigO you ignore constants, so even though the squared n is divided by 2, you still have exponential growth at the rate of square.
Think about O(n) and O(n/2) ? We similarly don't distinguish the two, O(n/2) is just O(n) for a smaller n, but the growth rate is still linear.
What that means is that as n increase, if you were to plot the number of operations on a graph, you would see a n^2 curve appear.
You can see that already:
when n = 2 you get 3
when n = 3 you get 6
when n = 4 you get 10
when n = 5 you get 15
when n = 6 you get 21
And if you plot it like I did here:
You see that the curve is similar to that of n^2, you will have a smaller number at each y, but the curve is similar to it. Thus we say that the magnitude is the same, because it will grow in time complexity similarly to n^2 as n grows bigger.
answer of sum of series of n natural can be found using two ways. first way is by adding all the numbers in loop. in this case algorithm is linear and code will be like this
int sum = 0;
for (int i = 1; i <= n; i++) {
sum += n;
}
return sum;
it is analogous to 1+2+3+4+......+n. in this case complexity of algorithm is calculated as number of times addition operation is performed which is O(n).
second way of finding answer of sum of series of n natural number is direst formula n*(n+1)/2. this formula use multiplication instead of repetitive addition. multiplication operation has not linear time complexity. there are various algorithm available for multiplication which has time complexity ranging from O(N^1.45) to O (N^2). therefore in case of multiplication time complexity depends on the processor's architecture. but for the analysis purpose time complexity of multiplication is considered as O(N^2). therefore when one use second way to find the sum then time complexity will be O(N^2).
here multiplication operation is not same as the addition operation. if anybody has knowledge of computer organisation subject then he can easily understand the internal working of multiplication and addition operation. multiplication circuit is more complex than the adder circuit and require much higher time than the adder circuit to compute the result. so time complexity of sum of series can't be constant.

Determining complexity of an integer factorization algorithm

I'm starting to study computational complexity, BigOh notation and the likes, and I was tasked to do an integer factorization algorithm and determine its complexity. I've written the algorithm and it is working, but I'm having trouble calculating the complexity. The pseudo code is as follows:
DEF fact (INT n)
BEGIN
INT i
FOR (i -> 2 TO i <= n / i STEP 1)
DO
WHILE ((n MOD i) = 0)
DO
PRINT("%int X", i)
n -> n / i
DONE
DONE
IF (n > 1)
THEN
PRINT("%int", n)
END
What I attempted to do, I think, is extremely wrong:
f(x) = n-1 + n-1 + 1 + 1 = 2n
so
f(n) = O(n)
Which I think it's wrong because factorization algorithms are supposed to be computationally hard, they can't even be polynomial. So what do you suggest to help me? Maybe I'm just too tired at this time of the night and I'm screwing this all up :(
Thank you in advance.
This phenomenon is called pseudopolynomiality: a complexity that seems to be polynomial, but really isn't. If you ask whether a certain complexity (here, n) is polynomial or not, you must look at how the complexity relates to the size of the input. In most cases, such as sorting (which e.g. merge sort can solve in O(n lg n)), n describes the size of the input (the number of elements). In this case, however, n does not describe the size of the input; it is the input value. What, then, is the size of n? A natural choice would be the number of bits in n, which is approximately lg n. So let w = lg n be the size of n. Now we see that O(n) = O(2^(lg n)) = O(2^w) - in other words, exponential in the input size w.
(Note that O(n) = O(2^(lg n)) = O(2^w) is always true; the question is whether the input size is described by n or by w = lg n. Also, if n describes the number of elements in a list, one should strictly speaking count the bits of every single element in the list in order to get the total input size; however, one usually assumes that in lists, all numbers are bounded in size (to e.g. 32 bits)).
Use the fact that your algorithm is recursive. If f(x) is the number of operations take to factor, if n is the first factor that is found, then f(x)=(n-1)+f(x/n). The worst case for any factoring algorithm is a prime number, for which the complexity of your algorithm is O(n).
Factoring algorithms are 'hard' mainly because they are used on obscenely large numbers.
In big-O notation, n is the size of input, not the input itself (as in your case). The size of the input is lg(n) bits. So basically your algorithm is exponential.

Resources