Is This Big O Evaluation Correct? - algorithm

I'm having trouble understanding something. I'm not even sure if it's correct.
In Cracking the Code Interview, there is a section that asks you to determine the Big O for a number of functions. For the most part, they're predictable.
However, one of them throws me for a loop.
Apparently, this evaluates to O(ab):
void printUnorderedPairs(int[] arrayA, int[] arrayB) {
for (int i = 0; i < arrayA.length; i++){
for (int j = 0; j < arrayB.length; j++){
for (int k = 0; k < 100000; k++){
System.out.println(arrayA[i] + "," + arrayB[j]);
}
}
}
}
With the rational that
"100,000 units of work is still constant, so the runtime is O(ab).
I'm trying to see why this could make sense, but I just can't yet; naturally, I expected O(abc).
Yes, 100,000 is a constant and arrayA and arrayB are arrays, but we're taking the length of the arrays. At the time of running these for loops, won't array[x].length be a constant (assuming the size of the arrays don't change during their execution)?
So, is the book right? If so, I would really appreciate insight and intuition so I don't fall into the same trap in the future.
Thanks!

Time complexity is generally expressed as the number of required elementary operations on an input of size n, where elementary operations are assumed to take a constant amount of time on a given computer and change only by a constant factor when run on a different computer.
O(ab) is the complexity in the above case as arrayA and arrayB are of variable length and are fully dependent on the calling function , and 100000 is constant, which won't change by any external factors.
Complexity is the measure of Unknown

The arrays A and B have an unspecified length, and all you can do is to give an indication of the complexity that is a function of these two lengths. Nothing else is variable in the given code.

What the authors meant by constant is a value that is a fix, regardless of the input size, unlike the length of input arrays that might change. For instance, the printUnorderedPairs might be called with different arrays as parameter, and those arrays might have different sizes among them.

The point of Big-O is to examine how the calculation grows as the inputs grow. It's clear that it would double if A doubled, and likewise if B doubled. So linear in those two.
What might be confusing you is that you could easily replace the 100k with C, yet another linear input, but it happens it doesn't have the 100k as a variable, it's a constant.
A similar thing in Big-O problems is where you step through an array a fixed number of times. That doesn't change the Big-O. For example if you step through an array to find the max, that's O(n). Stepping through it twice to find the min and the max is... also O(n). And in fact it's the same as stepping through it once to find the min and max in a single sweep.

Related

Finding all prime numbers from 1 to N using GCD (An alternate approach to sieve-of-eratosthenes)

To find all prime numbers from 1 to N.
I know we usually approach this problem using Sieve of Eratosthenes, I had an alternate approach in mind using gcd that I wanted your views on.
My approach->
Keep a maintaining a variable if all prime numbers are processed till any iteration. If gcd of this var, number i ==1. That means the nos. are co-prime so i must be prime.
For ex: gcd(210,11) == 1, so 11 is prime.
{210=2*3*5*7}
Pseudocode:
Init num_list={contains numbers 2 to N} [since 0 and 1 arent prime nos.]
curr_gcd = 2, gcd_val=1
For i=3;i<=N;i++
gcd_val=__gcd(curr_gcd,i)
if gcd_val == 1 //(prime)
curr_gcd = curr_gcd * i
else //(composite so remove from list)
numList.remove(i)
Alternatively, we can also have a list and push the prime numbers into that list.
SC = O(N)
TC = O(N log(N)) [TC to calculate gcd using euclid's method => O(log(max(a,b)))]
Does this seem right or I am calculating the TC incorrectly here. Please post your views on this.
TIA!
Looks like the time complexity of my approach is closer to O(log^2(n)) as pointed out by many in the comments.
Also, the curr_gcd var would become quite large as N is increased and would definitely overflow int and long size limits.
Thanks to everyone who responded!
Maybe your method is theoretically right,but evidently, it's not excellent.
It's efficiency is worse than SoE, the range of data that it needs is too large. So maybe it seems elegant to look but hard to use.
In my views, "To find all prime numbers from 1 to N" is already a well-known problem and that means it's solution is well considered.
At first, maybe we use brute-force to deal with it like this.
int primes[N],cnt;//store all prime numbers
bool st[N];//st[i]:whether i is rejected
void get_primes(int n){
for(int i=2;i<=n;i++){
if(st[i]) continue;
primes[cnt++]=i;
for(int j=i+i;j<=n;j+=i){
st[j]=true;
}
}
}
it's a O(n^2) time algorithm.Too slow to endure.
Go ahead. We have SoE, which use O(nlognlogn) time.
But we have a better algorithm called "liner sieve", which only use O(n) time, just as it's name. I implement it with C language like this.
int primes[N],cnt;
bool st[N];
void get_primes(int n){
for(int i=2;i<=n;i++){
if(!st[i]) primes[cnt++]=i;
for(int j=0;primes[j]*i<=n;j++){
st[primes[j]*i]=true;
if(i%primes[j]==0) break;
}
}
}
this O(n) algorithm is used by me to slove this kind of algorithm problems that appear in major IT companies and many kinds of OJ.

Reverse an array-run time

The following code reverses an array.What is its runtime ?
My heart says it is O(n/2), but my friend says O(n). which is correct? please answer with reason. thank you so much.
void reverse(int[] array) {
for (inti = 0; i < array.length / 2; i++) {
int other = array.length - i - 1;
int temp = array[i];
array[i] = array[other];
array[other] = temp;
}
}
Big-O complexity captures how the run-time scales with n as n gets arbitrarily large. It isn't a direct measure of performance. f(n) = 1000n and f(n) = n/128 + 10^100 are both O(n) because they both scale linearly with n even though the first scales much more quickly than the second, and the second is actually prohibitively slow for all n because of the large constant cost. Nonetheless, they have the same complexity class. For these sorts of reasons, if you want to differentiate actual performance between algorithms or define the performance of any particular algorithm (rather than how performance scales with n) asymptotic complexity is not the best tool. If you want to measure performance, you can count the exact number of operations performed by the algorithm, or better yet, provide a representative set of inputs and just measure the execution time on those inputs.
As for the particular problem, yes, the for loop runs n/2 times, but you also do some constant number of operations, c, in each of those loops (subtractions, array accesses, variable assignments, conditional check on i). Maybe c=10, it's not really important to count precisely to determine the complexity class, just to know that it's constant. The run-time is then f(n)=c*n/2, which is O(n): the fact that you only do n/2 for-loops doesn't change the complexity class.

Space complexity of an algorithm

Example1: Given an input of array A with n elements.
See the algo below:
Algo(A, I, n)
{
int i, j = 100;
for (i = 1 to j)
A[i] = 0;
}
Space complexity = Extra space required by variable i + variable 'j'
In this case my space complexity is: O(1) => constant
Example2: Array of size n given as input
A(A,I,n)
{
int i;
create B[n]; //create a new array B with same number of elements
for(i = 1 to n)
B[i] = A[i]
}
Space complexity in this case: Extra space taken by i + new Array B
=> 1 + n => O(n)
Even if I used 5 variables here space complexity will still be O(n).
If as per computer science my space complexity is always constant for first and O(n) for second even if I was using 10 variables in the above algo, why is it always advised to make programs using less number of variables?
I do understand that in practical scenarios it makes the code more readable and easier to debug etc.
But looking for an answer in terms of space complexity only here.
Big O complexity is not the be-all end-all consideration in analysis of performance. It is all about the constants that you are dropping when you look at asymptotic (big O) complexity. Two algorithms can have the same big-O complexity and yet one can be thousands of times more expensive than the other.
E.g. if one approach to solving some problem always takes 10s flat, and another approach takes 3000s flat, regardless of input size, they both have O(1) time complexity. Of course, that doesn't really mean both are equally good; using the latter approach if there is no real benefit is simply a massive waste of time.
This is not to say performance is the only, or even the primary consideration when someone advises you to be economical with your use of local variables. Other considerations like readability, or avoidance of subtle bugs are also factors.
For this code snippet
Algo(A, I, n)
{
int i, j = 100;
for (i = 1 to j)
A[i] = 0;
}
Space Complexity is: O(1) for the array and constant space for the two variables i and j
It is always advised to use less variables because ,each variable occupies constant space ,if you have 'k' variables.k variables will use k*constant space ,if lets consider each variable is of type int so int occupies 2 bytes so k*2bytes,lets take k as 10 so it 20bytes here
It is as similar as using int A[10] =>20 bytes space complexity
I hope you understand

Order of magnitude using Big-O notation [duplicate]

This question already has answers here:
Big O, how do you calculate/approximate it?
(24 answers)
Closed 7 years ago.
This is likely ground that has been covered but I have yet to find an explanation that I am able to understand. It is likely that I will soon feel embarrassed.
For instance, I am trying to find the order of magnitude using Big-O notation of the following:
count = 0;
for (i = 1; i <= N; i++)
count++;
Where do I begin to find what defines the magnitude? I'm relatively bad at mathematics and, even though I've tried a few resources, have yet to find something that can explain the way a piece of code is translated to an algebraic equation. Frankly, I can't even surmise a guess as to what the Big-O efficiency is regarding this loop.
These notations (big O, big omega, theta) simply say how does the algorithm will be "difficult" (or complex) asymptotically when things will get bigger and bigger.
For big O, having two functions: f(x) and g(x) where f(x) = O(g(x)) then you can say that you are able to find one x from which g(x) will be always bigger than f(x). That is why the definition contains "asymptotically" because these two functions may have any run at the beginning (for example f(x) > g(x) for few first x) but from the single point, g(x) will get always superior (g(x) >= f(x)). So you are interested in behavior in a long run (not for small numbers only). Sometimes big-O notation is named upper bound because it describes the worst possible scenario (it will never be asymptotically more difficult that this function).
That is the "mathematical" part. When it comes to practice you usually ask: How many times the algorithm will have to process something? How many operations will be done?
For your simple loop, it is easy because as your N will grow, the complexity of algorithm will grow linearly (as simple linear function), so the complexity is O(N). For N=10 you will have to do 10 operations, for N=100 => 100 operations, for N=1000 => 1000 operations... So the growth is truly linear.
I'll provide few another examples:
for (int i = 0; i < N; i++) {
if (i == randomNumber()) {
// do something...
}
}
Here it seems that the complexity will be lower because I added the condition to the loop, so we have possible chance the number of "doing something" operations will be lower. But we don't know how many times the condition will pass, it may happen it passes every time, so using big-O (the worst case) we again need to say that the complexity is O(N).
Another example:
for (int i = 0; i < N; i++) {
for (int i = 0; i < N; i++) {
// do something
}
}
Here as N will be bigger and bigger, the # of operations will grow more rapidly. Having N=10 means that you will have to do 10x10 operations, having N=100 => 100x100 operations, having N=1000 => 1000x1000 operations. You can see the growth is no longer linear it is N x N, so we have O(N x N).
For the last example I will use idea of full binary tree. Hope you know what binary tree is. So if you have simple reference to the root and you want to traverse it to the left-most leaf (from top to bottom), how many operations will you have to do if the tree has N nodes? The algorithm would be something similar to:
Node actual = root;
while(actual.left != null) {
actual = actual.left
}
// in actual we have left-most leaf
How many operations (how long loop will execute) will you have to do? Well that depends on the depth of the tree, right? And how is defined depth of full binary tree? It is something like log(N) - with base of logarithm = 2. So here, the complexity will be O(log(N)) - generally we don't care about the base of logarithm, what we care about is the function (linear, quadratic, logaritmic...)
Your example is the order
O(N)
Where N=number of elements, and a comparable computation is performed on each, thus
for (int i=0; i < N; i++) {
// some process performed N times
}
The big-O notation is probably easier than you think; in all daily code you will find examples of O(N) in loops, list iterations, searches, and any other process that does work once per individual of a set. It is the abstraction that is first unfamiliar, O(N) meaning "some unit of work", repeated N times. This "something" can be a an incrementing counter, as in your example, or it can be lengthy and resource intensive computation. Most of the time in algorithm design the 'big-O', or complexity, is more important than the unit of work, this is especially relevant as N becomes large. The description 'limiting' or 'asymptotic' is mathematically significant, it means that an algorithm of lesser complexity will always beat one that is greater no matter how significant the unit of work, given that N is large enough, or "as N grows"
Another example, to understand the general idea
for (int i=0; i < N; i++) {
for (int j=0; j < N; j++) {
// process here NxN times
}
}
Here the complexity is
O(N2)
For example, if N=10, then the second "algorithm" will take 10 times longer than the first, because 10x10 = 100 (= ten times larger). If you consider what will happen when N equals, say a million, or billion, you should be able to work out it will also take this much longer. So if you can find a way to do something in O(N) that a super-computer does in O(N2), you should be able to beat it with your old x386, pocket watch, or other old tool

Optimizing construction of a trie over all substrings

I am solving a trie related problem. There is a set of strings S. I have to create a trie over all substrings for each string in S. I am using the following routine:
String strings[] = { ... }; // array containing all strings
for(int i = 0; i < strings.length; i++) {
String w = strings[i];
for (int j = 0; j < w.length(); j++) {
for (int k = j + 1; k <= w.length(); k++) {
trie.insert(w.substring(j, k));
}
}
}
I am using the trie implementation provided here. However, I am wondering if there are certain optimizations which can be done in order to reduce the complexity of creating trie over all substrings?
Why do I need this? Because I am trying to solve this problem.
If we have N words, each with maximum length L, your algorithm will take O(N*L^3) (supposing that adding to trie is linear with length of adding word). However, the size of the resulting trie (number of nodes) is at most O(N*L^2), so it seems you are wasting time and you could do better.
And indeed you can, but you have to pull a few tricks from you sleeve. Also, you will no longer need the trie.
.substring() in constant time
In Java 7, each String had a backing char[] array as well as starting position and length. This allowed the .substring() method to run in constant time, since String is immutable class. New String object with same backing char[] array was created, only with different start position and length.
You will need to extend this a bit, to support adding at the end of the string, by increasing the length. Always create a new string object, but leave the backing array same.
Recompute hash in constant time after appending single character
Again, let me use Java's hashCode() function for String:
int hash = 0;
for (int i = 0; i < data.length; i++) {
hash = 31 * hash + data[i];
} // data is the backing array
Now, how will the hash change after adding a single character at the end of the word? Easy, just add it's value (ASCII code) multiplied by 31^length. You can keep powers of 31 in some separate table, other primes can be used as well.
Store all substring in single HashMap
With using tricks 1 and 2, you can generate all substrings in time O(N*L^2), which is the total number of substrings. Just always start with string of length one and add one character at a time. Put all your strings into a single HashMap, to reduce duplicities.
(You can skip 2 and 3 and discard duplicities when/after sorting, perhaps it will be even faster.)
Sort your substrings and you are good to go.
Well, when I got to point 4, I realized my plan wouldn't work, because in sorting you need to compare strings, and that can take O(L) time. I came up with several attempts to solve it, among them bucket sorting, but none would be faster than original O(N*L^3)
I will just this answer here in case it inspires someone.
In case you don't know Aho-Corasic algorithm, take look into that, it could have some use for your problem.
What you need may be suffix automaton. It costs only O(n) time and can recognize all substrings.
Suffix array can also solve this problems.
These two algorithms can solve most string problems, and they are really hard to learn. After you learn those you will solve it.
You may consider the following optimization:
Maintain list of processed substrings. While inserting a substring, check if the processed set contains that particular substring and if yes, skip inserting that substring in the trie.
However, the worst case complexity for insertion of all substrings in trie will be of the order of n^2 where n is the size of strings array. From the problem page, this works out to be of the order of 10^8 insertion operations in trie. Therefore, even if each insertion takes 10 operations on an average, you will have 10^9 operations in total which sets you up to exceed the time limit.
The problem page refers to LCP array as a related topic for the problem. You should consider change in approach.
First, notice that it is enough to add only suffixes to the trie, and nodes for every substring will be added along the way.
Second, you have to compress the trie, otherwise it will not fit into memory limit imposed by HackerRank. Also this will make your solution faster.
I just submitted my solution implementing these suggestions, and it was accepted. (the max execution time was 0.08 seconds.)
But you can make your solution even faster by implementing a linear time algorithm to construct the suffix tree. You can read about linear time suffix tree construction algorithms here and here. There is also an explanation of the Ukkonen's algorithm on StackOverflow here.

Resources