Construct an array with from an existing array - algorithm

Given an array of integers A[1...n-1] where N is the length of array A. Construct an array B such that B[i] = min(A[i], A[i+1], ..., A[i+K-1]), where K will be given. Array B will have N-K+1 elements.
We can solve the problem using min-heaps Construct min-heap for k elements - O(k). For every next element delete the first element and insert the new element and heapify.
Hence Worst Case Time - O( (n-k+1)*k ) + O(k) Space - O(k)
Can we do it better?

We can do better if in the algorithm from OP we change expensive "heapify" procedure to much cheaper "upheap" or "downheap". This gives O(n * log(k)) time complexity.
Or, if we iterate through input array and put each element to the min-queue of size 'k', we can do it in O(n) time. Min-queue is a queue that can perform find-min in O(1) time. It may be implemented as a pair of min-stacks. See this answer for details.

Related

Linear sorting using additional data structure (O(1) to find median of set, O(1) for adding element)

Suppose we have arbitrary elements (we can compare them in O(1)) in array and magic DS in which we can add element in O(1) and find the median of elements in DS in O(1). We can't remove elements from DS and there are no equal elements in array. Also, we can create as many such DS as we need.
The question: is there a way to sort the array in O(n) using this DS?
Yes, if this data structure exists then it can be used to sort in O(n) time.
Scan the array to find the minimum and maximum elements. Call this min and max.
Insert all of the array elements into the data structure, in any order.
Insert n - 1 copies of min - 1. The median is now the smallest element from the original array.
Repeat n - 1 times:
Insert two copies of max + 1.
Read off the median, which will now be the next element from the original array in ascending order.
This procedure takes O(n) time, because
Finding the min and max is O(n),
Inserting n elements is n * O(1) = O(n),
Inserting n-1 elments is (n - 1) * O(1) = O(n),
Inserting two elements and reading the median is O(1), so doing this n - 1 times is O(n).

Lower bound of merging k sorted arrays of size n

As the title suggests, I am wondering what the proof for the lower bound of merging k sorted arrays of size n is? I know that the bound is O(kn*log[k]), but how was this achieved? I tried comparing to sorting an array of p elements using a decision tree but I don't see how to implement this proof.
This is pretty much easy to prove, try to think about it in a merge-sort way. To merge-sort an array of size K*N it takes O(KN*log(K*N)).
But we don't have to reach leafs of size 1, as we know when the array size is N it is sorted. For simplicity we will assume K is a power of 2.
How many times do we have to divide by 2 to reach leafs of size N ?
K times!
Visualization
So you have log(k) steps, then having to merge each step costs KN, and there are log(k) steps. Hence, the time complexity is O(NK(log(K))
Proof: Lets assume it is not a lower bound and we could achieve better. Then for any unknown array of size N*K we could split it in 2 until we reach sub-arrays of size N, merge-sort each of the arrays of size N in Nlog(N) time and in total for all the arrays K*N*log(N) time.
After having the K arrays of size N sorted, we have to merge them into a bigger array of size N*K, pay less than O(NK*(log(K)) as we assumed it is not the lower bound.
At the end you sorted an unknown array of size N*K in a complexity lesser than N*K*log(N*K) which is not possible in the comparison model.
Hence, you can't achieve better than O(NK*(log(K)) while merging the K sorted arrays of size N.
Possible implementation.
Let's create a heap data structure that store pairs (element, arrayIndex) ordered by element. Then
Add the first element of each array with the corresponding array index to this heap.
On each step, remove the top (lowest) pair p from the heap, add p.element to the result, and insert to the heap the pair (next, p.arrayIndex) with the next element from the array with p.arrayIndex index (if it is not empty).
For tracking 'next' element you need an array with k indices/pointers/iterators that are pointing to the next element of the corresponding array.
There will be at most k elements in the heap at any time, thus the insert/remove operations of the heap will have O(log(k)) complexity. Every element will be inserted and removed once from the heap. The number of elements is n*k. Overall complexity is O(n*k*log(k)).
Create a min heap of size k which stores the next item from each of the k arrays. Each node also stores which array it came from. Create your sorted array by adding the min from the heap to final_sorted_array, then adding the next element from the array that value came from to the heap.
Removing the min elt of the heap is O(log k). You have a total of NK elements so you do this NK times. Final result: O(NK log k).

Count number of identical pairs

An identical pair in array are 2 indices p,q such that
0<=p<q<N and array[p]=array[q] where N is the length of the array.
Given an unsorted array, find the number identical pairs in the array.
My solution was to sort the array by values,
keeping track of indices.
Then for every index p in sorted array, count all q<N such that and
sortedarray[p].index < sortedarray[q].index and
sortedarray[p] = sortedarray[q]
Is this the correct approach. I think the complexity would be
O(N log N) for sorting based on value +
O(N^2) for counting the newsorted array that satisfies the condition.
This means I am still looking at O(N^2). Is there a better way ?
Another thought that came was for every P binary search the sorted array for all Q that satisfies the condition. Would that not reduce the complexity of the second part to O(Nlog(N))
Here is my code for second part
for(int i=0;i<N;i++){
int j=i+1;
while( j<N && sortedArray[j].index > sortedArray[i].index &&
sortedArray[j].item == sortedArray[i].item){
inversion++;
j++;
}
}
return inversion;
#Edit: I think, I mistook the complexity of second part to be O(N^2).
As in every iteration in while loop, no rescan of elements from indices 0-i occurs, linear time is required for scanning the sorted array to count the inversions. The total complexity is therefore
O(NlogN) for sorting and O(N) for linear scan count in sorted array.
You are partially correct. Sorting the array via Merge Sort or Heapsort will take O(n lg n). But once the array is sorted, you can make a single pass through to find all identical pairs. This single pass is an O(n) operation. So the total complexity is:
O(n lg n + n) = O(n lg n)
As Tim points out in his response, the complexity of finding the pairs within a sorted array is O(n) and not O(n^2).
To convince yourself of this, think about a typical O(n^2) algorithm: Insertion Sort.
An animated example can be found here.
As you can see in the gif, the reason why this algorithm is quadratic, is because, for each element, it has to check the whole array to ensure where such element will have to go (this includes previous elements in the array!).
On the hand, in you case, you have an ordered array: e.g. [0,1,3,3,6,7,7,9,10,10]
In this situation, you will start scanning (pairwise) from the beginning, and (because of the fact that the array is ordered) you know that once an element is scanned and you pointers proceed, there cannot be any reason to rescan previous elements in the future, because otherwise you would have not proceeded in the first place.
Hence, you scan the whole array only once: O(n)
If you can allocate more memory you can get some gains.
You can reach O(n) by using a hash table which maps any values in the array to a counter indicating how often you already saw this value.
If the number of allowed values is integral and in a limited range you can directly use an array instead of a hash table. The index of value i being i itself. In that case the complexity would be O(n+m) where m is the number of allowed values (because you must first set to 0 all entries in the array and then look through all the array entries to count pairs).
Both methods gives you the number of identical values for each values in your array. Let's call this number nv_i the number of appearance of the value i in the array. Then the number of pairs of value i is: (nv_i)*(nv_i-1)/2.
You can pair:
1st i with nv_i-1 others
2nd i with nv_i-2 others
...
last i with 0
And (nv_i-1)+(nv_i-2)+...+0 = (nv_i)*(nv_i-1)/2
I've been thinking about this.... I think that if you "embed" the == condition into your sorting algorithm, then, the complexity is still O(n lg n).

Finding the minimum unique number in an array

The minimum unique number in an array is defined as
min{v|v occurs only once in the array}
For example, the minimum unique number of {1, 4, 1, 2, 3} is 2.
Is there any way better than sorting?
I believe this is an O(N) solution in both time and space:
HashSet seenOnce; // sufficiently large that access is O(1)
HashSet seenMultiple; // sufficiently large that access is O(1)
for each in input // O(N)
if item in seenMultiple
next
if item in seenOnce
remove item from seenOnce
add to item seenMultiple
else
add to item seeOnce
smallest = SENTINEL
for each in seenOnce // worst case, O(N)
if item < smallest
smallest = item
If you have a limited range of integral values, you can replace the HashSets with BitArrays indexed by the value.
You don't need to do full sorting. Perform bubble sort inner loop until you get distinct minimum value at one end. In the best case this will have time complexity O(k * n) where k = number of non-distinct minimum values. However worst case complexity is O(n*n). So, this can be efficient when expected value of k << n.
I think this would be the minimum possible time complexity unless you can adapt any O(n * logn) sorting algorithms to the above task.
Python version using dictionary.
Time complexity O(n) and space complexity O(n):
from collections import defaultdict
d=defaultdict(int)
for _ in range(int(input())):
ele=int(input())
d[ele]+=1
m=9999999
for i in d:
if d[i]==1 and i<m:
m=i
print(m if m!= 9999999 else -1)
Please tell me if there is a better approach.

check if there exists a[i] = 2*a[j] in an unsorted array a?

Given a unsorted sequence of a[1,...,n] of integers, give an O(nlogn) runtime algorithm to check there are two indices i and j such that a[i] =2*a[j]. The algorithm should return i=0 and j=2 on input 4,12,8,10 and false on input 4,3,1,11.
I think we have to sort the array anyways which is O(nlogn). I'm not sure what to do after that.
Note: that can be done on O(n)1 on average, using a hash table.
set <- new hash set
for each x in array:
set.add(2*x)
for each x in array:
if set.contains(x):
return true
return false
Proof:
=>
If there are 2 elements a[i] and a[j] such that a[i] = 2 * a[j], then when iterating first time, we inserted 2*a[j] to the set when we read a[j]. On the second iteration, we find that a[i] == 2* a[j] is in set, and return true.
<=
If the algorithm returned true, then it found a[i] such that a[i] is already in the set in second iteration. So, during first itetation - we inserted a[i]. That only can be done if there is a second element a[j] such that a[i] == 2 * a[j], and we inserted a[i] when reading a[j].
Note:
In order to return the indices of the elemets, one can simply use a hash-map instead of a set, and for each i store 2*a[i] as key and i as value.
Example:
Input = [4,12,8,10]
first insert for each x - 2x to the hash table, and the index. You will get:
hashTable = {(8,0),(24,1),(16,2),(20,3)}
Now, on secod iteration you check for each element if it is in the table:
arr[0]: 4 is not in the table
arr[1]: 12 is not in the table
arr[2]: 8 is in the table - return the current index [2] and the value of 8 in the map, which is 0.
so, final output is 2,0 - as expected.
(1) Complexity notice:
In here, O(n) assumes O(1) hash function. This is not always true. If we do assume O(1) hash function, we can also assume sorting with radix-sort is O(n), and using a post-processing of O(n) [similar to the one suggested by #SteveJessop in his answer], we can also achieve O(n) with sorting-based algorithm.
Sort the array (O(n log n), or O(n) if you're willing to stretch a point about arrays of fixed-size integers)
Initialise two pointers ("fast" and "slow") at the start of the array (O(1))
Repeatedly:
increment "fast" until you find an even value >= twice the value at "slow"
if the value at "fast" is exactly twice the value at "slow", return true
increment "slow" until you find a value >= half the value at fast
if the value at "slow" is exactly half the value at "fast", return true
if one of the attempts to increment goes past the end, return false
Since each of fast and slow can be incremented at most n times total before reaching the end of the array, the "repeatedly" part is O(n).
You're right that the first step is sorting the array.
Once the array is sorted, you can find out whether a given element is inside the array in O(log n) time. So if for every of the n elements, you check for the inclusion of another element in O(log n) time, you end up with a runtime of O(n log n).
Does that help you?
Create an array of pairs A={(a[0], 0), (a[1], 1), ..., (a[n-1], n-1)}
Sort A,
For every (a[i], i) in A, do a binary search to see if there's a (a[i] * 2, j) pair or not. We can do this, because A is sorted.
Step 1 is O(n), and step 2 and 3 are O(n * log n).
Also, you can do step 3 in O(n) (there's no need for binary search). Because if the corresponding element for A[i] is at A[j], then then corresponding element for A[i+1] cannot be in A[0..j-1]. So we can keep two pointers, and find the answer in O(n). But anyway, the whole algorithm will be O(n log n) because we still do sorting.
Sorting the array is a good option - O(nlogn), assuming you don't have some fancy bucket sort option.
Once it's sorted, you need only pass through the array twice - I believe this is O(n)
Create a 'doubles' list which starts empty.
Then, For each element of the array:
check the element against the first element of the 'doubles' list
if it is the same, you win
if the element is higher, ditch the first element of the 'doubles' list and check again
add its double to the end of the 'doubles' list
keep going until you find a double, or get to the end of your first list.
You can also use a balanced tree, but it uses extra space but also does not harm the array.
Starting at i=0, and incrementing i, insert elements, checking if twice or half the current element is already there in the tree.
One advantage is that it will work in O(M log M) time where M = min [max{i,j}]. You could potentially change your sorting based algorithm to try and do O(M log M) but it could get complicated.
Btw, if you are using comparisons only, there is an Omega(n log n) lower bound, by reducing the element distinctness problem to this:
Duplicate the input array. Use the algorithm for this problem twice. So unless you bring hashing type stuff into the picture, you cannot get a better than Theta(n log n) algorithm!

Resources