Algorithm for better Big O complexity

Could someone please help me with this question?:
how can you limit the input data to achieve a better Big O complexity? Describe an algorithm for handling this limited data to find if there are any duplicates. What is the Big O complexity?
By limit the input data, we mean the array size e.g. n=100 (array contains 100 integers) and also; the array is unsorted by default but could be implemented in the algorithm.
The worst case complexity which i got is O (N^2) = N * ((N + 1)/2) in the case of an unsorted array of size n.
I got that by using nested loops (outer loop used for n-1 iterations- used to iterate on each value in the array- and the inner loop used for comparison to check to see if duplicates exist) and repeated the process until the outer loop terminates.

You have the solution right in front of you. As you state, if the array is unsorted, finding duplicates is O(N^2). But if the array is sorted, you can do it in O(N). So sort the array first, which can be done in O(N.Log(N)).
An algorithm that first sorts and then find the duplicates can thus be done in O(N.Log(N) + N), which is O(N.Log(N)).
As Amir points out: you can use a hash table. Since inserting and searching in a hash table is O(1), you can do this in a single loop, yielding O(N) time complexity.
However, your question is about "limited input data". So if you limit your input data to sorted arrays, you can reduce the complexity to O(N). To be precise, if you know that the array is sorted (this is your limitation), than a single loop that compares every element to its successor(s) will find the duplicates.
If the array is not sorted, you need an outer loop over all elements but the last, and an inner loop over all the remaining elements. If the array is sorted, you don't need the inner loop, just compare to the next element. That is the "reduction" of the algorithm, resulting in a "reduction" of O(N^2) to O(N).

one way is to sort and then remove duplicates, but you need an additional memory
Sort(arr); // O(nlogn)
arr2[0] = arr[0];
foreach(i+1 to arr.len : i++) //O(n)
if(arr[i-1] == arr[i]) continue;
arr1[j++] = arr[i];
You can use, HashTablemethod


Time complexity on multiple variables & functions

I have written an algorithm to read in a text file and extract the contents inside into two array, then sort. The program is working but I am confuse at calculating the time complexity. Just needed someone to clarify on this.
Say I have two functions, a main and a helper.
Helper function
insertion(int array[], int length)
Main function
int main()
while(...) // this while loop read the input text file and push integer into vector
for(...) // this for loop validates array B only
insertion(arrayA, lengthA)
insertion(arrayB, lengthB)
Program read in text file
Push line 1 to array A, push line 2 to array B
'for loop' to validate array B array integers with an outer 'if'
Perform insertion sort on array A and array B
From what I learnt, I have to let number of data be 'n' before calculating the Big-O or number of operations. Now, obviously there are two data points here - one for array A and one for array B.
So, array A = n and array B = m.
However, I am unsure whether the number of data in the helper function should be using 'n' or 'm'. Likewise for the nested while loop, if the number of data should also be using 'n' or 'm'.
I tried my best to explain my difficulty in understanding this time complexity along with a simplified form of my program (the actual program has tons of loops...). Hopefully someone can understand what I mean and provide some clarification or else I will modify further to see if I can make it clearer. Thanks!
Edit: I am required to calculate the number of operations before finding the Big-O for my algorithm.
I understand that after you read the file, will have array A and B.
If m and n is close, then you can say that m = n. Otherwise, you choose the biggest one and say it is n.
Then you read n two time, n + n = 2, but in big O, you can take out the constant, then at this point you have O(n) time.
If validate only pass one time through your array B, then you say 3n of complexity time, but 3 still a constant, then time complexity still O(n).
But, the worse case insertion sort can do is O(n^2). You do it two time n^2 + n^ 2 = 2*n^2, two is a constant, so time of insertion sort peace takes O(n^2).
Finally, you have O(n) + O(n^2). Since it's big notation, the most cost part is the really significant part: O(n^2) is your complexity.
For example, if you use insertion sort n times, then you'd have O(n(n^2)) time, which is O(n^3).
The computer do 10^9 operation per second. So small n doesn't count so much.
If you not sure if n and m is close, let's says that 0 < n < 10^9 and 0 < m < 10^3. You'd say that time complexity of inputs is O(n+m). Then insertion sort O(n^2) + O(m^2). But still here, m << n (m is much less than n), you can equally not consider m (I'm saying m here is almost optional IF YOU'RE not being strict!). IF you need be strict, do not ignore at first this small cases.
If 0 < n < 10^9 and 0 < m < 10^9, then you should't say m = n, or ignore anyone. Because n can be one, and m one million.

Count number of identical pairs

An identical pair in array are 2 indices p,q such that
0<=p<q<N and array[p]=array[q] where N is the length of the array.
Given an unsorted array, find the number identical pairs in the array.
My solution was to sort the array by values,
keeping track of indices.
Then for every index p in sorted array, count all q<N such that and
sortedarray[p].index < sortedarray[q].index and
sortedarray[p] = sortedarray[q]
Is this the correct approach. I think the complexity would be
O(N log N) for sorting based on value +
O(N^2) for counting the newsorted array that satisfies the condition.
This means I am still looking at O(N^2). Is there a better way ?
Another thought that came was for every P binary search the sorted array for all Q that satisfies the condition. Would that not reduce the complexity of the second part to O(Nlog(N))
Here is my code for second part
for(int i=0;i<N;i++){
int j=i+1;
while( j<N && sortedArray[j].index > sortedArray[i].index &&
sortedArray[j].item == sortedArray[i].item){
return inversion;
#Edit: I think, I mistook the complexity of second part to be O(N^2).
As in every iteration in while loop, no rescan of elements from indices 0-i occurs, linear time is required for scanning the sorted array to count the inversions. The total complexity is therefore
O(NlogN) for sorting and O(N) for linear scan count in sorted array.
You are partially correct. Sorting the array via Merge Sort or Heapsort will take O(n lg n). But once the array is sorted, you can make a single pass through to find all identical pairs. This single pass is an O(n) operation. So the total complexity is:
O(n lg n + n) = O(n lg n)
As Tim points out in his response, the complexity of finding the pairs within a sorted array is O(n) and not O(n^2).
To convince yourself of this, think about a typical O(n^2) algorithm: Insertion Sort.
An animated example can be found here.
As you can see in the gif, the reason why this algorithm is quadratic, is because, for each element, it has to check the whole array to ensure where such element will have to go (this includes previous elements in the array!).
On the hand, in you case, you have an ordered array: e.g. [0,1,3,3,6,7,7,9,10,10]
In this situation, you will start scanning (pairwise) from the beginning, and (because of the fact that the array is ordered) you know that once an element is scanned and you pointers proceed, there cannot be any reason to rescan previous elements in the future, because otherwise you would have not proceeded in the first place.
Hence, you scan the whole array only once: O(n)
If you can allocate more memory you can get some gains.
You can reach O(n) by using a hash table which maps any values in the array to a counter indicating how often you already saw this value.
If the number of allowed values is integral and in a limited range you can directly use an array instead of a hash table. The index of value i being i itself. In that case the complexity would be O(n+m) where m is the number of allowed values (because you must first set to 0 all entries in the array and then look through all the array entries to count pairs).
Both methods gives you the number of identical values for each values in your array. Let's call this number nv_i the number of appearance of the value i in the array. Then the number of pairs of value i is: (nv_i)*(nv_i-1)/2.
You can pair:
1st i with nv_i-1 others
2nd i with nv_i-2 others
last i with 0
And (nv_i-1)+(nv_i-2)+...+0 = (nv_i)*(nv_i-1)/2
I've been thinking about this.... I think that if you "embed" the == condition into your sorting algorithm, then, the complexity is still O(n lg n).

Why is counting sort not used for large inputs?

Counting sort is the sorting algorithm with a average time complexity of O(n+K), and the counting sort assumes that each of the input element is an integer in the range of 0 to K.
Why can't we linear-search the maximum value in an unsorted array, equal it to K, and hence apply counting sort on it?
In the case where your inputs are arrays with maximum - minimum = O(n log n) (i.e. the range of values is reasonably restricted), this actually makes sense. If this is not the case, a standard comparison-based sort algorithm or even an integer sorting algorithm like radix sort is asymptotically better.
To give you an example, the following algorithm generates a family of inputs on which counting sort has runtime complexity Θ(n^2):
def generate_input(n):
array = []
for i := 1 to n:
return array
Your heading of the question is Why is counting sort not used for large inputs?
What we do in counting sort? We take another array (suppose b[]) and initialize all element to zero. Then we increment an index if that index is an element of the given array. Then we run a loop from lower limit to upper limit of the given array and check if element of index of my taken array (b[]) is 0 or not. If it is not zero, that means, that index is an element of given array.
Now, If the difference between this two (upper limit & lower limit) is very high(like 10^9 or more), then a single loop is enough to kill our PC. :)
According to Big-O notation definition, if we say f(n) ∈ O(g(n)), it means that there is a value C > 0 and n = N such that f(n) < C*g(n), where C and N are constants. Nothing is said about the value of C nor for which n = N the inequality is true.
In any the algorithm analysis, the cost of each operation of the Turing machine must be considered (compare, move, sum, etc). The value of such costs are the defining factors of how big (or small) the values of C and N must be in order to turn the inequality true or false. Remove these cost is a naive assumption I myself used to do during the algorithm analysis course.
The statement "counting sort is O(n+k)" actually means that the sorting is polynomial and linear for a given C, n > N,n > K, where C, N, and K are constants. Thus other algorithms may have a better performance for smaller inputs, because the inequality is true only if the given conditions are true.

How to sort an array according to another array?

Suppose A={1,2,3,4}, p={36,3,97,19}, sort A using p as sort keys. You can get {2,4,1,3}.
It is an example in the book introducton to algorithms. It says it can be done in nlogn.
Can anyone give me some idea about how it can be done? My thought is you need to keep track of each element in p to find where it ends up, like p[1] ends up at p[3] then A[1] ends up at A[3]. Can anyone use merge sort or other nlogn sorting to get this done?
I'm new to algorithm and find it a little intimidating :( thanks for any help.
Construct an index array:
i = { 0, 1, 2, 3 }
Now, while you are sorting p, make the same changes to the index array i.
When you're done, you'll have:
i = { 1, 3, 0, 2 }
Sorting two arrays takes at most twice as long as sorting one (and actually, if you're only counting comparisons you don't have to do any additional comparisons, just data swaps in two arrays instead of one), so that doesn't change the Big-O complexity of the overall sort because O( 2n log n ) = O(n log n).
Now, you can use those indices to construct the sorted A array in linear time by simply iterating through the sorted index array and looking up the element of A at that index. This takes O( n ) time.
The runtime complexity of your overall algorithm is at worst: O( n + 2n log n ) = O( n log n )
Of course you can also skip index array entirely and simply treat the array A in the same way, sorting it along side p.
I don't see this difficult, since complexity of a sorting algorithm is usually measured on number of comparisons required you just need to update the position of elements in array A according to the elements in B. You won't need to do any comparison in addition to ones already needed to sort B so complexity is the same.
Every time you move an element, just move it in both arrays and you are done.

Very hard sorting algorithm problem - O(n) time - Time complextiy

Since the problem is long i can not describe it at title.
Imagine that we have 2 unsorted integer arrays. Both array lenght is n and they are containing interegers between 0 - n^765 (n power 765 maximum) .
I want to compare both arrays and find out whether they contain any same integer value or not with in O(n) time complexity.
no duplicates are possible in the same array
Any help and idea is appreciated.
What you want is impossible. Each element will be stored in up to log(n^765) bits, which is O(log n). So simply reading the contents of both arrays will take O(n*logn).
If you have a constant upper bound on the value of each element, You can solve this in O(n) average time by storing the elements of one array in a hash table, and then checking if the elements of the other array are contained in it.
The solution you may be looking for is to use radix sort to sort your data, after which you can easily check for duplicate elements. You would look at your numbers in base n, and do 765 passes over your data. Each pass would use a bucket sort or counting sort to sort by a single digit (in base n). This process would take O(n) time in the worst case (assuming a constant upper bound on element size). Note that I doubt anyone would ever choose this over a hash table in practice.
By assuming multiplication and division is O(1):
Think about numbers, you can write them as:
Number(i) = A0 * n^765 + A1 * n^764 + .... + A764 * n + A765.
for coding number to this format, you should just do Number / n^i, Number % n^i, if you precompute, n^1, n^2, n^3, ... it can be done in O(n * 765)=> O(n) for all numbers. precomputation of n^i, can be done in O(i) since i at most is 765 it's O(1) for all items.
Now you can write Numbers(i) as array: Nembers(i) = (A0, A1, ..., A765) and know you can radix sort items :
first compare all A765, then ...., All of Ai's are in the range 0..n so for comparing Ai's you can use Counting sort (Counting sort is O(n)), so your radix sort is O(n * 765) which is O(n).
After radix sort you have two sorted array and you can simply find one similar item in O(n) or use merge algorithm (like merge sort) to find most possible similarity (not just one).
for generalization if the size of input items is O(n^C) it can be sorted in O(n) (C is fix number). but because the overhead of this way of sortings are big, prefer to using quicksort and similar algorithms. Simple sample of this question can be found in Introduction to Algorithm book, which asks if the numbers are in range (0..n^2) how to sort them in O(n).
Edit: for clarifying how you can find similar items in 2-sorted lists:
You have 2 sorted list, for example in merge sort how do you can merge two sorted list to one list? you will move from start of list 1, and list 2, and move your head pointer of list1 while head(list(1)) > head(list(2)), and after that do this for list2 and ..., so if there is a similar item your algorithm will stop (before reach the end of lists), or in the end of two lists your algorithm will stop.
it's as easy as bellow:
public int FindSimilarityInSortedLists(List<int> list1, List<int> list2)
int i = 0;
int j = 0;
while (i < list1.Count && j < list2.Count)
if (list1[i] == list2[j])
return list1[i];
if (list1[i] < list2[j])
return -1; // not found
If memory was unlimited you could simply create a hashtable with the integers as keys and the values the number of times they are found. Then to do your "fast" look up you simple query for an integer, discover if its contained within the hash table, and if found check that the value is 1 or 2. That would take O(n) to load and O(1) to query.
I do not think you can do it O(n).
You should check n values whether they are in the other array. This means you have n comparing operations at least if the other array has just 1 element. But as you have n element it the other array as well, you can do it just O(n*n)
