Finding the minimum unique number in an array - algorithm

The minimum unique number in an array is defined as
min{v|v occurs only once in the array}
For example, the minimum unique number of {1, 4, 1, 2, 3} is 2.
Is there any way better than sorting?

I believe this is an O(N) solution in both time and space:
HashSet seenOnce; // sufficiently large that access is O(1)
HashSet seenMultiple; // sufficiently large that access is O(1)
for each in input // O(N)
if item in seenMultiple
if item in seenOnce
remove item from seenOnce
add to item seenMultiple
add to item seeOnce
smallest = SENTINEL
for each in seenOnce // worst case, O(N)
if item < smallest
smallest = item
If you have a limited range of integral values, you can replace the HashSets with BitArrays indexed by the value.

You don't need to do full sorting. Perform bubble sort inner loop until you get distinct minimum value at one end. In the best case this will have time complexity O(k * n) where k = number of non-distinct minimum values. However worst case complexity is O(n*n). So, this can be efficient when expected value of k << n.
I think this would be the minimum possible time complexity unless you can adapt any O(n * logn) sorting algorithms to the above task.

Python version using dictionary.
Time complexity O(n) and space complexity O(n):
from collections import defaultdict
for _ in range(int(input())):
for i in d:
if d[i]==1 and i<m:
print(m if m!= 9999999 else -1)
Please tell me if there is a better approach.


Count number of identical pairs

An identical pair in array are 2 indices p,q such that
0<=p<q<N and array[p]=array[q] where N is the length of the array.
Given an unsorted array, find the number identical pairs in the array.
My solution was to sort the array by values,
keeping track of indices.
Then for every index p in sorted array, count all q<N such that and
sortedarray[p].index < sortedarray[q].index and
sortedarray[p] = sortedarray[q]
Is this the correct approach. I think the complexity would be
O(N log N) for sorting based on value +
O(N^2) for counting the newsorted array that satisfies the condition.
This means I am still looking at O(N^2). Is there a better way ?
Another thought that came was for every P binary search the sorted array for all Q that satisfies the condition. Would that not reduce the complexity of the second part to O(Nlog(N))
Here is my code for second part
for(int i=0;i<N;i++){
int j=i+1;
while( j<N && sortedArray[j].index > sortedArray[i].index &&
sortedArray[j].item == sortedArray[i].item){
return inversion;
#Edit: I think, I mistook the complexity of second part to be O(N^2).
As in every iteration in while loop, no rescan of elements from indices 0-i occurs, linear time is required for scanning the sorted array to count the inversions. The total complexity is therefore
O(NlogN) for sorting and O(N) for linear scan count in sorted array.
You are partially correct. Sorting the array via Merge Sort or Heapsort will take O(n lg n). But once the array is sorted, you can make a single pass through to find all identical pairs. This single pass is an O(n) operation. So the total complexity is:
O(n lg n + n) = O(n lg n)
As Tim points out in his response, the complexity of finding the pairs within a sorted array is O(n) and not O(n^2).
To convince yourself of this, think about a typical O(n^2) algorithm: Insertion Sort.
An animated example can be found here.
As you can see in the gif, the reason why this algorithm is quadratic, is because, for each element, it has to check the whole array to ensure where such element will have to go (this includes previous elements in the array!).
On the hand, in you case, you have an ordered array: e.g. [0,1,3,3,6,7,7,9,10,10]
In this situation, you will start scanning (pairwise) from the beginning, and (because of the fact that the array is ordered) you know that once an element is scanned and you pointers proceed, there cannot be any reason to rescan previous elements in the future, because otherwise you would have not proceeded in the first place.
Hence, you scan the whole array only once: O(n)
If you can allocate more memory you can get some gains.
You can reach O(n) by using a hash table which maps any values in the array to a counter indicating how often you already saw this value.
If the number of allowed values is integral and in a limited range you can directly use an array instead of a hash table. The index of value i being i itself. In that case the complexity would be O(n+m) where m is the number of allowed values (because you must first set to 0 all entries in the array and then look through all the array entries to count pairs).
Both methods gives you the number of identical values for each values in your array. Let's call this number nv_i the number of appearance of the value i in the array. Then the number of pairs of value i is: (nv_i)*(nv_i-1)/2.
You can pair:
1st i with nv_i-1 others
2nd i with nv_i-2 others
last i with 0
And (nv_i-1)+(nv_i-2)+...+0 = (nv_i)*(nv_i-1)/2
I've been thinking about this.... I think that if you "embed" the == condition into your sorting algorithm, then, the complexity is still O(n lg n).

Why is this the cost?

The algorithm of the Quicksort is:
if p<r then
q<- partition(A,p,r)
According to my notes,the cost of Quicksort(A,1,n) is T(n)=T(q)+T(n-q)+ cost of partition.
Why is the cost like that and not : T(n)=T(q-1)+T(n-q)+cost of partition?
And also why is the cost of the worst case T(n)=T(n-1)+Θ(n) ?
I'm more confident about the answer to your second question.
In the worst case, the pivot can always turn out to be the lowest number (or the highest number) in the array. In that case, the divided arrays shall be of length n-1 and 0 respectively. Hence the recurrence relation shall be:
T(n)= T(n-1)+T(0) + Work done for partition
= T(n-1) + 0 + O(n)
For example in the worst case if the array is originally sorted in ascended order and you decide to choose the 1st element as the pivot always.
Initial Array: {1, 2, 3, 4, 5}
Pivot Element: 1.
Partitioned arrays: {} and {2,3,4,5}
Next pivot element: 2
Partitioned arrays: {} {3,4,5}
Here you can see that at each partition, the size of problem decreases by just 1 and not by a factor of half.
Hence T(n) = T(n-1) + Work done for partitioning( O(n) )
Only the terms with the highest indices are considered when performing time complexity analysis. This is because only the terms with the highest indices remain relevant as the input gets larger. For example: O(0.0001n^3 + 0.002n^2 + 0.1n + 1000000) = O(n^3). It follows that T(q-1) = T(q), since -1 is irrelevant for large values of q.
I am not sure if your note is entirely accurate. user1990169 has kindly answered why the general Quicksort has the worst case time complexity of O(n^2), but it's actually possible to spend O(n) time to determine the median in an unsorted array of n elements, meaning we can always pick the median value (the best value) for the pivot in each iteration. The time complexity of T(n)=T(n-1)+Θ(n) may result from an array where all elements have the same value, in which case, depending on implementation, all elements other than the pivot may get put into the LEFT partition or the RIGHT partition. However, even this can be avoided with some clever implementation. Thus the complexity analysis of T(n)=T(n-1)+Θ(n) may be from a specific implementation of Quicksort, rather than an optimal one.

Construct an array with from an existing array

Given an array of integers A[1...n-1] where N is the length of array A. Construct an array B such that B[i] = min(A[i], A[i+1], ..., A[i+K-1]), where K will be given. Array B will have N-K+1 elements.
We can solve the problem using min-heaps Construct min-heap for k elements - O(k). For every next element delete the first element and insert the new element and heapify.
Hence Worst Case Time - O( (n-k+1)*k ) + O(k) Space - O(k)
Can we do it better?
We can do better if in the algorithm from OP we change expensive "heapify" procedure to much cheaper "upheap" or "downheap". This gives O(n * log(k)) time complexity.
Or, if we iterate through input array and put each element to the min-queue of size 'k', we can do it in O(n) time. Min-queue is a queue that can perform find-min in O(1) time. It may be implemented as a pair of min-stacks. See this answer for details.

How to sort an array according to another array?

Suppose A={1,2,3,4}, p={36,3,97,19}, sort A using p as sort keys. You can get {2,4,1,3}.
It is an example in the book introducton to algorithms. It says it can be done in nlogn.
Can anyone give me some idea about how it can be done? My thought is you need to keep track of each element in p to find where it ends up, like p[1] ends up at p[3] then A[1] ends up at A[3]. Can anyone use merge sort or other nlogn sorting to get this done?
I'm new to algorithm and find it a little intimidating :( thanks for any help.
Construct an index array:
i = { 0, 1, 2, 3 }
Now, while you are sorting p, make the same changes to the index array i.
When you're done, you'll have:
i = { 1, 3, 0, 2 }
Sorting two arrays takes at most twice as long as sorting one (and actually, if you're only counting comparisons you don't have to do any additional comparisons, just data swaps in two arrays instead of one), so that doesn't change the Big-O complexity of the overall sort because O( 2n log n ) = O(n log n).
Now, you can use those indices to construct the sorted A array in linear time by simply iterating through the sorted index array and looking up the element of A at that index. This takes O( n ) time.
The runtime complexity of your overall algorithm is at worst: O( n + 2n log n ) = O( n log n )
Of course you can also skip index array entirely and simply treat the array A in the same way, sorting it along side p.
I don't see this difficult, since complexity of a sorting algorithm is usually measured on number of comparisons required you just need to update the position of elements in array A according to the elements in B. You won't need to do any comparison in addition to ones already needed to sort B so complexity is the same.
Every time you move an element, just move it in both arrays and you are done.

Very hard sorting algorithm problem - O(n) time - Time complextiy

Since the problem is long i can not describe it at title.
Imagine that we have 2 unsorted integer arrays. Both array lenght is n and they are containing interegers between 0 - n^765 (n power 765 maximum) .
I want to compare both arrays and find out whether they contain any same integer value or not with in O(n) time complexity.
no duplicates are possible in the same array
Any help and idea is appreciated.
What you want is impossible. Each element will be stored in up to log(n^765) bits, which is O(log n). So simply reading the contents of both arrays will take O(n*logn).
If you have a constant upper bound on the value of each element, You can solve this in O(n) average time by storing the elements of one array in a hash table, and then checking if the elements of the other array are contained in it.
The solution you may be looking for is to use radix sort to sort your data, after which you can easily check for duplicate elements. You would look at your numbers in base n, and do 765 passes over your data. Each pass would use a bucket sort or counting sort to sort by a single digit (in base n). This process would take O(n) time in the worst case (assuming a constant upper bound on element size). Note that I doubt anyone would ever choose this over a hash table in practice.
By assuming multiplication and division is O(1):
Think about numbers, you can write them as:
Number(i) = A0 * n^765 + A1 * n^764 + .... + A764 * n + A765.
for coding number to this format, you should just do Number / n^i, Number % n^i, if you precompute, n^1, n^2, n^3, ... it can be done in O(n * 765)=> O(n) for all numbers. precomputation of n^i, can be done in O(i) since i at most is 765 it's O(1) for all items.
Now you can write Numbers(i) as array: Nembers(i) = (A0, A1, ..., A765) and know you can radix sort items :
first compare all A765, then ...., All of Ai's are in the range 0..n so for comparing Ai's you can use Counting sort (Counting sort is O(n)), so your radix sort is O(n * 765) which is O(n).
After radix sort you have two sorted array and you can simply find one similar item in O(n) or use merge algorithm (like merge sort) to find most possible similarity (not just one).
for generalization if the size of input items is O(n^C) it can be sorted in O(n) (C is fix number). but because the overhead of this way of sortings are big, prefer to using quicksort and similar algorithms. Simple sample of this question can be found in Introduction to Algorithm book, which asks if the numbers are in range (0..n^2) how to sort them in O(n).
Edit: for clarifying how you can find similar items in 2-sorted lists:
You have 2 sorted list, for example in merge sort how do you can merge two sorted list to one list? you will move from start of list 1, and list 2, and move your head pointer of list1 while head(list(1)) > head(list(2)), and after that do this for list2 and ..., so if there is a similar item your algorithm will stop (before reach the end of lists), or in the end of two lists your algorithm will stop.
it's as easy as bellow:
public int FindSimilarityInSortedLists(List<int> list1, List<int> list2)
int i = 0;
int j = 0;
while (i < list1.Count && j < list2.Count)
if (list1[i] == list2[j])
return list1[i];
if (list1[i] < list2[j])
return -1; // not found
If memory was unlimited you could simply create a hashtable with the integers as keys and the values the number of times they are found. Then to do your "fast" look up you simple query for an integer, discover if its contained within the hash table, and if found check that the value is 1 or 2. That would take O(n) to load and O(1) to query.
I do not think you can do it O(n).
You should check n values whether they are in the other array. This means you have n comparing operations at least if the other array has just 1 element. But as you have n element it the other array as well, you can do it just O(n*n)
