implementing an algorithm that requires minimal memory - algorithm

I am trying to implement an algorithm to find N-th largest element that requires minimal memory.
Example:
List of integers: 1;9;5;7;2;5. N: 2
after duplicates are removed, the list becomes 1;9;5;7;2.
So, the answer is 7, because 7 is the 2-nd largest element in the modified list.
In the below algorithm i am using bubble sort to sort my list and then removing duplicates without using a temp variable, does that make my program memory efficient ? Any ideas or suggestion
type Integer_array is Array (Natural range <>) of Integer;
procedure FindN-thLargestNumber (A : in out Integer_Array) is
b : Integer;
c:Integer;
begin
//sorting of array
for I in 0 to length(A) loop
for J in 1 to length(A) loop
if A[I] > A[J] then
A[I] = A[I] + A[J];
A[J] = A[I] - A[J];
A[I] = A[I] - A[J];
end if;
end loop;
end loop;
//remove duplicates
for K in 1 to length(A) loop
IF A[b] != A[K] then
b++;
A[b]=A[K];
end loop;
c = ENTER TO FIND N-th Largest number
PRINT A[b-(c-1)] ;
end FindN-th Largest Number

To find the n'th largest element you don't need to sort the main list at all. Note that this algorithm will perform well if N is smaller than M. If N is a large fraction of the list size then then you will be better off just sorting the list.
You just need a sorted list holding your N largest, then pick the smallest from that list (this is all untested so will probably need a couple of tweaks):
int[n] found = new int[n];
for (int i = 0;i<found.length;i++) {
found[i] = Integer.MIN_VALUE;
}
for (int i: list) {
if (i > found[0]) {
int insert = 0;
// Find the point in the array to insert the value
while (insert < found.length && found[insert] < i) {
insert++;
}
// If not at the end we have found a larger value, so move back one before inserting
if (found[insert] >= i) {
insert --;
}
// insert the value and shuffle everything BELOW it down.
for (int j=insert;j<=0;j--) {
int temp = found[j];
found[j]=i;
i=temp;
}
}
}
At the end you have the top N values from your list sorted in order. the first entry in the list is Nth value, the last entry the top value.

If you need the N-th largest element, then you don't need to sort the complete array. You should apply selection sort, but only for the required N steps.

Instead of using bubble sort, use quicksort kind of partial sorting.
Pick a key and using as a pivot move around elements (move all the elements>= pivot to the left of the array)
Count how many unique elements are there that are greater than equal to pivot.
If the number is less than N, then the answer is to the right of the array. Otherwise it is in the left part of the array (left or right as compared to pivot)
Iteratively repeat with smaller array and appropriate N
Complexity is O(n) and you will need constant extra memory.

HeapSort uses constant additional memory, so it has minimal space complexity, albeit it doesn't use a minimal number of variables.
It sorts in O(n log n) time which I think is optimal time complexity for this problem because of the needed to ignore duplicates. I may be wrong.
Of course you don't need to complete the heapsort -- just heapify the array and then pop out the first N non-duplicate largest values.
If you really do want to minimise memory usage to the point that you care about one temporary variable one way or the other, then you probably have to accept terrible performance. This becomes a purely theoretical exercise, though -- it will not make your code more memory efficient in practice, because in non-recursive code there is no practical difference between using, say 64 bytes of stack vs using 128 bytes of stack.

Related

Find majority element when the values are unknown

Suppose I have an array of elements.
I cannot read the values of the elements. I can only compare any two elements from the array to know whether they are the same or not, but even then I don't get to know their actual values.
Suppose this array has a majority of elements of the same value. I need to find and return any of the majority elements. How would I do it?
We have to be be able to do it in a big thet.of n l0g n.
Keep track of two indices, i & j. Initialize i=0, j=1. Repeatedly compare arr[i] to arr[j].
if arr[i] == arr[j], increment j.
if arr[i] != arr[j]
eliminate both from the array
increment i to the next index that hasn't been eliminated.
increment j to the next index >i that hasn't been eliminated.
The elimination operation will eliminate at least one non-majority element each time it eliminates a majority element, so majority is preserved. When you've gone through the array, all elements not eliminated will be in the majority, and you're guaranteed at least one.
This is O(n) time, but also O(n) space to keep track of eliminations.
Given:
an implicit array a of length n, which is known to have a majority element
an oracle function f, such that f(i, j) = a[i] == a[j]
Asked:
Return an index i, such that a[i] is a majority element of a.
Main observation:
If
m is a majority element of a, and
for some even k < n each element of a[0, k) occurs at most k / 2 times
then m is a majority element of a[k, n).
We can use that observation by assuming that the first element is the majority element. We move through the array until we reach a point where that element occurred exactly half the time. Then we discard the prefix and continue again from that point on. This is exactly what the Boyer-Moore algorithm does, as pointed out by Rici in the comments.
In code:
result = 0 // index where the majority element is
count = 0 // the number of times we've seen that element in the current prefix
for i = 0; i < n; i++ {
// we've seen the current majority candidate exactly half of the time:
// discard the current prefix and start over
if (count == 0) {
result = i
}
// keep track of how many times we've seen the current majority candidate in the prefix
if (f(result, i)) {
count++
} else {
count--
}
}
return result
For completeness: this algorithm uses two variables and a single loop, so it runs in O(n) time and O(1) space.
Assuming you can determine if elements are <, >, or == what you can do is go through the list and build a tree. The trees values will be like buckets, the item and count of how many you've seen. When you come by a node where you get == then just increment the count. Then at the end go through the tree and find the one with the highest count.
Assuming you build a balanced tree, this should be O(n log n). Red Black trees might help with making a balanced tree. Else you could build the tree by adding randomly selected elements and this would give you O(n log n) on average.

Comparing between two arrays

How can I compare between two arrays with sorted contents of integer in binary algorithm?
As in every case: it depends.
Assuming that the arrays are ordered or hashed the time complexity is at most O(n+m).
You did not mention any language, so it's pseudo code.
function SortedSequenceOverlap(Enumerator1, Enumerator2)
{ while (Enumerator1 is not at the end and Enumerator2 is not at the end)
{ if (Enumerator1.current > Enumerator2.current)
Enumerator2.fetchNext()
else if (Enumerator2.current > Enumerator1.current)
Enumerator1.fetchNext()
else
return true
}
return false
}
If the sort order is descending you need to use a reverse enumerator for this array.
However, this is not always the fastest way.
If one of the arrays have significantly different size it could be more efficient to use binary search for a few elements of the elements of the shorter array.
This can be even more improved because when you start with the median element of the small array you need not do a full search for any further element. Any element before the median element must be in the range before the location where the median element was not found and any element after the median element must be in the upper range of the large array. This can be applied recursively until all elements have been located. Once you get a hit, you can abort.
The disadvantage of this method is that it takes more time in worst case, i.e. O(n log m), and it requires random access to the arrays which might impact cache efficiency.
On the other side, multiplying with a small number (log m) could be better than adding a large number (m). In contrast to the above algorithm typically only a few elements of the large array have to be accessed.
The break even is approximately when log m is less than m/n, where n is the smaller number.
You think that's it? - no
In case the random access to the larger array causes a higher latency, e.g. because of reduced cache efficiency, it could be even better to do the reverse, i.e. look for the elements of the large array in the small array, starting with the median of the large array.
Why should this be faster? You have to look up much more elements.
Answer:
No, there are no more lookups. As soon as the boundaries where you expect a range of elements of the large array collapses you can stop searching for these elements since you won't find any hits anymore.
In fact the number of comparisons is exactly the same.
The difference is that a single element of the large array is compared against different elements of the small array in the first step. This takes only one slow access for a bunch of comparisons while the other way around you need to access the same element several times with some other elements accesses in between.
So there are less slow accesses at the expense of more fast ones.
(I implemented search as you type this way about 30 years ago where access to the large index required disk I/O.)
If you know that they are sorted, then you can have a pointer to the beginning of each array, and move on both arrays, and moving one of the pointers up (or down) after each comparison. That would make it O(n). Not sure you could bisect anything as you don't know where the common number would be.
Still better than the brute force O(n2).
If you know the second array is sorted you can use binary search to look through the second array for elements from the first array.
This can be done in two ways.
a) Binary Search b) Linear Search
For Binary Search - for each element in array A look for element in B with binary search, then in that case the complexity is O(n log n )
For Linear Search - it is O( m + n ) - where m, n are sizes of the arrays. In your case m = n.
Linear search :
Have two indices i, j that point to the arrays A, B
Compare A[i], B[j]
If A[i] < B[j] then increment i, because any match if exists can be found only in later indices in A.
If A[i] > B[j] then increment j, because any match if exists can be found only in later indices in B.
If A[i] == B[j] you found the answer.
Code:
private int findCommonElement(int[] A, int[] B) {
for ( int i = 0, j = 0; i < A.length && j < B.length; ) {
if ( A[i] < B[j] ) {
i++;
} else if ( A[i] > B[j] ) {
j++;
}
return A[i];
}
return -1; //Assuming all integers are positive.
}
Now if you have both descending, just reverse the comparison signs i.e. if A[i] < B[j] increment j else increment i
If you have one descending (B) one ascending (A) then i for A starts from beginning of array and j for B starts from end of array and move them accordingly as shown below:
for (int i = 0, j = B.length - 1; i < A.length && j >= 0; ) {
if ( A[i] < B[j] ) {
i++;
} else if ( A[i] > B[j] ) {
j--;
}
return A[i];
}

Algorithm: finding closest differences between two elements of array

You have a array, and a target. Find the difference of the two elements of the array. And this difference should be closest to the target.
(i.e., find i, j such that (array[i]- array[j]) should be closest to target)
Attempt:
I use a order_map (C++ hash-table) to store each element of the array. This would be O(n).
Then, I output the ordered element to a new array (which is sorted increasing number).
Next, I use two pointers.
int mini = INT_MAX, a, b;
for (int i=0, j = ordered_array.size() -1 ; i <j;) {
int tmp = ordered_array[i] - ordered_array[j];
if (abs(tmp - target) < mini) {
mini = abs(tmp - target);
a = i;
b = j;
}
if (tmp == target) return {i,j};
else if (tmp > target) j --;
else i ++;
}
return {a,b};
Question:
Is my algorithm still runs at O(n)?
If the array is already sorted, there is an O(n) algorithm: let two pointers swipe up through the array, whenever the difference between the pointed at elements is smaller than target increase the upper one, whenever it is larger than target increase the lower one. While doing so, keep track of the best result found so far.
When you reach the end of the array the overall best result has been found.
It is easy to see that this is really O(n). Both pointers will only move upwards, in each step you move exactly one pointer and the array has n elements. So after at most 2n steps this will halt.
As already mentioned in some comments, if you need to sort the array first, you need the O(n log n) effort for sorting (unless you can use some radix sort or counting sort).
Your searching phase is linear. Two-pointers approach is equivalent to this:
Make a copy of sorted array
Add `target` to every entry (shift values up or down) (left picture)
Invert shifted array indexing (right picture)
Walk through both arrays in parallel, checking for absolute difference
All stages are linear (and inverting is implicit in your code)
P.S. Is C++ hash map ordered? I doubt. Sorted array creation is in general O(nlogn) (except for special methods like counting or radix sort)

Avoiding duplications when comparing array elements

The question is something like: Go through an array and find pairs of elements that add up to a certain sum k.
for (auto i : array) {
for (auto j : array) {
if (i+j==k) {
*Do something
}
}
}
Say we had array = [1,2,5] and k=3; when i=1 and j=2, we would execute the Do something. But when i=2 and j=1, we would execute Do something again, even though we have already found 2 elements and we would be repeating the answer.
Essentially, how can one go through an array and avoid comparing the same 2 elements multiple times?
If order doesn't matter, then decide which half of all pairs you want: the one where i >= j, or the one where i <= j.
for (i=0...)
for (j=i...)
Like rici commented, this is a very slow (O(N^2)) algorithm compared to sorting (a copy of) the vector. I think then you could iterate i forwards from the start, and j backwards from the end, comparing i+j <= k to decide whether to modify i or j next.
If I am not getting your question wrong then this explained procedure could be a way to solve it.
Sort your array in increasing order ( Preferably merge sort because it takes O(n.log(n)) time ).
Now, take two indices let call them low and high, where low is index to the left-most smallest number of the sorted array and high is index to the right-most largest number of the same sorted array.
Now, do like this :
while(low < high) {
if(arr[low] + arr[high] == k) {
print "unique pair found"
high = high - 1;
low = low + 1;
}
else if(arr[low] + arr[high] > k){
high = high - 1;
}
else {
low = low + 1;
}
}
It's time complexity is O(n), it will definitely give what you are looking for ? (Any doubt most welcome).
You could put the results of i + j in a vector, then sort it and then remove duplicates, and finally go over the vector and "do something".

Maximum of all possible subarrays of an array

How do I find/store maximum/minimum of all possible non-empty sub-arrays of an array of length n?
I generated the segment tree of the array and the for each possible sub array if did query into segment tree but that's not efficient. How do I do it in O(n)?
P.S n <= 10 ^7
For eg. arr[]= { 1, 2, 3 }; // the array need not to be sorted
sub-array min max
{1} 1 1
{2} 2 2
{3} 3 3
{1,2} 1 2
{2,3} 2 3
{1,2,3} 1 3
I don't think it is possible to store all those values in O(n). But it is pretty easy to create, in O(n), a structure that makes possible to answer, in O(1) the query "how many subsets are there where A[i] is the maximum element".
Naïve version:
Think about the naïve strategy: to know how many such subsets are there for some A[i], you could employ a simple O(n) algorithm that counts how many elements to the left and to the right of the array that are less than A[i]. Let's say:
A = [... 10 1 1 1 5 1 1 10 ...]
This 5 up has 3 elements to the left and 2 to the right lesser than it. From this we know there are 4*3=12 subarrays for which that very 5 is the maximum. 4*3 because there are 0..3 subarrays to the left and 0..2 to the right.
Optimized version:
This naïve version of the check would take O(n) operations for each element, so O(n^2) after all. Wouldn't it be nice if we could compute all these lengths in O(n) in a single pass?
Luckily there is a simple algorithm for that. Just use a stack. Traverse the array normally (from left to right). Put every element index in the stack. But before putting it, remove all the indexes whose value are lesser than the current value. The remaining index before the current one is the nearest larger element.
To find the same values at the right, just traverse the array backwards.
Here's a sample Python proof-of-concept that shows this algorithm in action. I implemented also the naïve version so we can cross-check the result from the optimized version:
from random import choice
from collections import defaultdict, deque
def make_bounds(A, fallback, arange, op):
stack = deque()
bound = [fallback] * len(A)
for i in arange:
while stack and op(A[stack[-1]], A[i]):
stack.pop()
if stack:
bound[i] = stack[-1]
stack.append(i)
return bound
def optimized_version(A):
T = zip(make_bounds(A, -1, xrange(len(A)), lambda x, y: x<=y),
make_bounds(A, len(A), reversed(xrange(len(A))), lambda x, y: x<y))
answer = defaultdict(lambda: 0)
for i, x in enumerate(A):
left, right = T[i]
answer[x] += (i-left) * (right-i)
return dict(answer)
def naive_version(A):
answer = defaultdict(lambda: 0)
for i, x in enumerate(A):
left = next((j for j in range(i-1, -1, -1) if A[j]>A[i]), -1)
right = next((j for j in range(i+1, len(A)) if A[j]>=A[i]), len(A))
answer[x] += (i-left) * (right-i)
return dict(answer)
A = [choice(xrange(32)) for i in xrange(8)]
MA1 = naive_version(A)
MA2 = optimized_version(A)
print 'Array: ', A
print 'Naive: ', MA1
print 'Optimized:', MA2
print 'OK: ', MA1 == MA2
I don't think it is possible to it directly in O(n) time: you need to iterate over all the elements of the subarrays, and you have n of them. Unless the subarrays are sorted.
You could, on the other hand, when initialising the subarrays, instead of making them normal arrays, you could build heaps, specifically min heaps when you want to find the minimum and max heaps when you want to find the maximum.
Building a heap is a linear time operation, and retrieving the maximum and minimum respectively for a max heap and min heap is a constant time operation, since those elements are found at the first place of the heap.
Heaps can be easily implemented just using a normal array.
Check this article on Wikipedia about binary heaps: https://en.wikipedia.org/wiki/Binary_heap.
I do not understand what exactly you mean by maximum of sub-arrays, so I will assume you are asking for one of the following
The subarray of maximum/minimum length or some other criteria (in which case the problem will reduce to finding max element in a 1 dimensional array)
The maximum elements of all your sub-arrays either in the context of one sub-array or in the context of the entire super-array
Problem 1 can be solved by simply iterating your super-array and storing a reference to the largest element. Or building a heap as nbro had said. Problem 2 also has a similar solution. However a linear scan is through n arrays of length m is not going to be linear. So you will have to keep your class invariants such that the maximum/minimum is known after every operation. Maybe with the help of some data structure like a heap.
Assuming you mean contiguous sub-arrays, create the array of partial sums where Yi = SUM(i=0..i)Xi, so from 1,4,2,3 create 0,1,1+4=5,1+4+2=7,1+4+2+3=10. You can create this from left to right in linear time, and the value of any contiguous subarray is one partial sum subtracted from another, so 4+2+3 = 1+4+2+3 - 1= 9.
Then scan through the partial sums from left to right, keeping track of the smallest value seen so far (including the initial zero). At each point subtract this from the current value and keep track of the highest value produced in this way. This should give you the value of the contiguous sub-array with largest sum, and you can keep index information, too, to find where this sub-array starts and ends.
To find the minimum, either change the above slightly or just reverse the sign of all the numbers and do exactly the same thing again: min(a, b) = -max(-a, -b)
I think the question you are asking is to find the Maximum of a subarry.
bleow is the code that cand do that in O(n) time.
int maxSumSubArr(vector<int> a)
{
int maxsum = *max_element(a.begin(), a.end());
if(maxsum < 0) return maxsum;
int sum = 0;
for(int i = 0; i< a.size; i++)
{
sum += a[i];
if(sum > maxsum)maxsum = sum;
if(sum < 0) sum = 0;
}
return maxsum;
}
Note: This code is not tested please add comments if found some issues.

Resources