Find specific numbers of frequency in an unsorted array - pseudocode

I have a question:
I have an unsorted array with n numbers,
I need to find the numbers that appear more than 10% in the array.
Can u please write me a pseudo code with the Time complexity
An example:
array A = {12,11,1,3,1,4,4,7,8,9,10}
The answer is 1,3.

You can use a Hash Table (Hash Map) to solve this.
Iterate you array,
if the hash table does not contain your element (number)
then add it with a counter set to 1
else increment the counter by 1.
Iterate your hash table and keep every entry where the counter is more than 10% of the size of the array.
Time complexity is the cost of iterating the array (n) plus iterating the hash table (worst case : n) : O(n) = 2n
An other solution would be to sort the array then to iterate it counting each element and keeping the element if the count is more than 10%
Time complexity is the cost of a sort (nlog(n)) plus the cost of iterating the array (n) : O(n) = n + nlog(n)

Related

Linear sorting using additional data structure (O(1) to find median of set, O(1) for adding element)

Suppose we have arbitrary elements (we can compare them in O(1)) in array and magic DS in which we can add element in O(1) and find the median of elements in DS in O(1). We can't remove elements from DS and there are no equal elements in array. Also, we can create as many such DS as we need.
The question: is there a way to sort the array in O(n) using this DS?
Yes, if this data structure exists then it can be used to sort in O(n) time.
Scan the array to find the minimum and maximum elements. Call this min and max.
Insert all of the array elements into the data structure, in any order.
Insert n - 1 copies of min - 1. The median is now the smallest element from the original array.
Repeat n - 1 times:
Insert two copies of max + 1.
Read off the median, which will now be the next element from the original array in ascending order.
This procedure takes O(n) time, because
Finding the min and max is O(n),
Inserting n elements is n * O(1) = O(n),
Inserting n-1 elments is (n - 1) * O(1) = O(n),
Inserting two elements and reading the median is O(1), so doing this n - 1 times is O(n).

Lower bound of merging k sorted arrays of size n

As the title suggests, I am wondering what the proof for the lower bound of merging k sorted arrays of size n is? I know that the bound is O(kn*log[k]), but how was this achieved? I tried comparing to sorting an array of p elements using a decision tree but I don't see how to implement this proof.
This is pretty much easy to prove, try to think about it in a merge-sort way. To merge-sort an array of size K*N it takes O(KN*log(K*N)).
But we don't have to reach leafs of size 1, as we know when the array size is N it is sorted. For simplicity we will assume K is a power of 2.
How many times do we have to divide by 2 to reach leafs of size N ?
K times!
Visualization
So you have log(k) steps, then having to merge each step costs KN, and there are log(k) steps. Hence, the time complexity is O(NK(log(K))
Proof: Lets assume it is not a lower bound and we could achieve better. Then for any unknown array of size N*K we could split it in 2 until we reach sub-arrays of size N, merge-sort each of the arrays of size N in Nlog(N) time and in total for all the arrays K*N*log(N) time.
After having the K arrays of size N sorted, we have to merge them into a bigger array of size N*K, pay less than O(NK*(log(K)) as we assumed it is not the lower bound.
At the end you sorted an unknown array of size N*K in a complexity lesser than N*K*log(N*K) which is not possible in the comparison model.
Hence, you can't achieve better than O(NK*(log(K)) while merging the K sorted arrays of size N.
Possible implementation.
Let's create a heap data structure that store pairs (element, arrayIndex) ordered by element. Then
Add the first element of each array with the corresponding array index to this heap.
On each step, remove the top (lowest) pair p from the heap, add p.element to the result, and insert to the heap the pair (next, p.arrayIndex) with the next element from the array with p.arrayIndex index (if it is not empty).
For tracking 'next' element you need an array with k indices/pointers/iterators that are pointing to the next element of the corresponding array.
There will be at most k elements in the heap at any time, thus the insert/remove operations of the heap will have O(log(k)) complexity. Every element will be inserted and removed once from the heap. The number of elements is n*k. Overall complexity is O(n*k*log(k)).
Create a min heap of size k which stores the next item from each of the k arrays. Each node also stores which array it came from. Create your sorted array by adding the min from the heap to final_sorted_array, then adding the next element from the array that value came from to the heap.
Removing the min elt of the heap is O(log k). You have a total of NK elements so you do this NK times. Final result: O(NK log k).

duplicate identifying algorithm

Rather than having two for loops : first from (var = i) 1 to length-1 of the input array and the second from i to length (using var k) then compare each array[i] == array[k], return true if an identical pair is found false if not found.
Only an array is used as the single parameter for this algo.
Can this still be optimized?
There are a few ways to tackle this depending on the memory restrictions.
With constant memory, you can sort the 2 arrays O(nlogn), and step the arrays together, advancing the pointers if the number is smaller. This will be O(n). This sums up for a total of O(nlogn).
If you can use O(n) memory, you can create a Hash Set from the first array (O(n)), and traverse the second and break if the set contains the number from the second array. This will be N*O(1) which is O(n).

Count number of identical pairs

An identical pair in array are 2 indices p,q such that
0<=p<q<N and array[p]=array[q] where N is the length of the array.
Given an unsorted array, find the number identical pairs in the array.
My solution was to sort the array by values,
keeping track of indices.
Then for every index p in sorted array, count all q<N such that and
sortedarray[p].index < sortedarray[q].index and
sortedarray[p] = sortedarray[q]
Is this the correct approach. I think the complexity would be
O(N log N) for sorting based on value +
O(N^2) for counting the newsorted array that satisfies the condition.
This means I am still looking at O(N^2). Is there a better way ?
Another thought that came was for every P binary search the sorted array for all Q that satisfies the condition. Would that not reduce the complexity of the second part to O(Nlog(N))
Here is my code for second part
for(int i=0;i<N;i++){
int j=i+1;
while( j<N && sortedArray[j].index > sortedArray[i].index &&
sortedArray[j].item == sortedArray[i].item){
inversion++;
j++;
}
}
return inversion;
#Edit: I think, I mistook the complexity of second part to be O(N^2).
As in every iteration in while loop, no rescan of elements from indices 0-i occurs, linear time is required for scanning the sorted array to count the inversions. The total complexity is therefore
O(NlogN) for sorting and O(N) for linear scan count in sorted array.
You are partially correct. Sorting the array via Merge Sort or Heapsort will take O(n lg n). But once the array is sorted, you can make a single pass through to find all identical pairs. This single pass is an O(n) operation. So the total complexity is:
O(n lg n + n) = O(n lg n)
As Tim points out in his response, the complexity of finding the pairs within a sorted array is O(n) and not O(n^2).
To convince yourself of this, think about a typical O(n^2) algorithm: Insertion Sort.
An animated example can be found here.
As you can see in the gif, the reason why this algorithm is quadratic, is because, for each element, it has to check the whole array to ensure where such element will have to go (this includes previous elements in the array!).
On the hand, in you case, you have an ordered array: e.g. [0,1,3,3,6,7,7,9,10,10]
In this situation, you will start scanning (pairwise) from the beginning, and (because of the fact that the array is ordered) you know that once an element is scanned and you pointers proceed, there cannot be any reason to rescan previous elements in the future, because otherwise you would have not proceeded in the first place.
Hence, you scan the whole array only once: O(n)
If you can allocate more memory you can get some gains.
You can reach O(n) by using a hash table which maps any values in the array to a counter indicating how often you already saw this value.
If the number of allowed values is integral and in a limited range you can directly use an array instead of a hash table. The index of value i being i itself. In that case the complexity would be O(n+m) where m is the number of allowed values (because you must first set to 0 all entries in the array and then look through all the array entries to count pairs).
Both methods gives you the number of identical values for each values in your array. Let's call this number nv_i the number of appearance of the value i in the array. Then the number of pairs of value i is: (nv_i)*(nv_i-1)/2.
You can pair:
1st i with nv_i-1 others
2nd i with nv_i-2 others
...
last i with 0
And (nv_i-1)+(nv_i-2)+...+0 = (nv_i)*(nv_i-1)/2
I've been thinking about this.... I think that if you "embed" the == condition into your sorting algorithm, then, the complexity is still O(n lg n).

Very hard sorting algorithm problem - O(n) time - Time complextiy

Since the problem is long i can not describe it at title.
Imagine that we have 2 unsorted integer arrays. Both array lenght is n and they are containing interegers between 0 - n^765 (n power 765 maximum) .
I want to compare both arrays and find out whether they contain any same integer value or not with in O(n) time complexity.
no duplicates are possible in the same array
Any help and idea is appreciated.
What you want is impossible. Each element will be stored in up to log(n^765) bits, which is O(log n). So simply reading the contents of both arrays will take O(n*logn).
If you have a constant upper bound on the value of each element, You can solve this in O(n) average time by storing the elements of one array in a hash table, and then checking if the elements of the other array are contained in it.
Edit:
The solution you may be looking for is to use radix sort to sort your data, after which you can easily check for duplicate elements. You would look at your numbers in base n, and do 765 passes over your data. Each pass would use a bucket sort or counting sort to sort by a single digit (in base n). This process would take O(n) time in the worst case (assuming a constant upper bound on element size). Note that I doubt anyone would ever choose this over a hash table in practice.
By assuming multiplication and division is O(1):
Think about numbers, you can write them as:
Number(i) = A0 * n^765 + A1 * n^764 + .... + A764 * n + A765.
for coding number to this format, you should just do Number / n^i, Number % n^i, if you precompute, n^1, n^2, n^3, ... it can be done in O(n * 765)=> O(n) for all numbers. precomputation of n^i, can be done in O(i) since i at most is 765 it's O(1) for all items.
Now you can write Numbers(i) as array: Nembers(i) = (A0, A1, ..., A765) and know you can radix sort items :
first compare all A765, then ...., All of Ai's are in the range 0..n so for comparing Ai's you can use Counting sort (Counting sort is O(n)), so your radix sort is O(n * 765) which is O(n).
After radix sort you have two sorted array and you can simply find one similar item in O(n) or use merge algorithm (like merge sort) to find most possible similarity (not just one).
for generalization if the size of input items is O(n^C) it can be sorted in O(n) (C is fix number). but because the overhead of this way of sortings are big, prefer to using quicksort and similar algorithms. Simple sample of this question can be found in Introduction to Algorithm book, which asks if the numbers are in range (0..n^2) how to sort them in O(n).
Edit: for clarifying how you can find similar items in 2-sorted lists:
You have 2 sorted list, for example in merge sort how do you can merge two sorted list to one list? you will move from start of list 1, and list 2, and move your head pointer of list1 while head(list(1)) > head(list(2)), and after that do this for list2 and ..., so if there is a similar item your algorithm will stop (before reach the end of lists), or in the end of two lists your algorithm will stop.
it's as easy as bellow:
public int FindSimilarityInSortedLists(List<int> list1, List<int> list2)
{
int i = 0;
int j = 0;
while (i < list1.Count && j < list2.Count)
{
if (list1[i] == list2[j])
return list1[i];
if (list1[i] < list2[j])
i++;
else
j++;
}
return -1; // not found
}
If memory was unlimited you could simply create a hashtable with the integers as keys and the values the number of times they are found. Then to do your "fast" look up you simple query for an integer, discover if its contained within the hash table, and if found check that the value is 1 or 2. That would take O(n) to load and O(1) to query.
I do not think you can do it O(n).
You should check n values whether they are in the other array. This means you have n comparing operations at least if the other array has just 1 element. But as you have n element it the other array as well, you can do it just O(n*n)

Resources