Algorithm: finding closest differences between two elements of array - algorithm

You have a array, and a target. Find the difference of the two elements of the array. And this difference should be closest to the target.
(i.e., find i, j such that (array[i]- array[j]) should be closest to target)
Attempt:
I use a order_map (C++ hash-table) to store each element of the array. This would be O(n).
Then, I output the ordered element to a new array (which is sorted increasing number).
Next, I use two pointers.
int mini = INT_MAX, a, b;
for (int i=0, j = ordered_array.size() -1 ; i <j;) {
int tmp = ordered_array[i] - ordered_array[j];
if (abs(tmp - target) < mini) {
mini = abs(tmp - target);
a = i;
b = j;
}
if (tmp == target) return {i,j};
else if (tmp > target) j --;
else i ++;
}
return {a,b};
Question:
Is my algorithm still runs at O(n)?

If the array is already sorted, there is an O(n) algorithm: let two pointers swipe up through the array, whenever the difference between the pointed at elements is smaller than target increase the upper one, whenever it is larger than target increase the lower one. While doing so, keep track of the best result found so far.
When you reach the end of the array the overall best result has been found.
It is easy to see that this is really O(n). Both pointers will only move upwards, in each step you move exactly one pointer and the array has n elements. So after at most 2n steps this will halt.
As already mentioned in some comments, if you need to sort the array first, you need the O(n log n) effort for sorting (unless you can use some radix sort or counting sort).

Your searching phase is linear. Two-pointers approach is equivalent to this:
Make a copy of sorted array
Add `target` to every entry (shift values up or down) (left picture)
Invert shifted array indexing (right picture)
Walk through both arrays in parallel, checking for absolute difference
All stages are linear (and inverting is implicit in your code)
P.S. Is C++ hash map ordered? I doubt. Sorted array creation is in general O(nlogn) (except for special methods like counting or radix sort)

Related

Find majority element when the values are unknown

Suppose I have an array of elements.
I cannot read the values of the elements. I can only compare any two elements from the array to know whether they are the same or not, but even then I don't get to know their actual values.
Suppose this array has a majority of elements of the same value. I need to find and return any of the majority elements. How would I do it?
We have to be be able to do it in a big thet.of n l0g n.
Keep track of two indices, i & j. Initialize i=0, j=1. Repeatedly compare arr[i] to arr[j].
if arr[i] == arr[j], increment j.
if arr[i] != arr[j]
eliminate both from the array
increment i to the next index that hasn't been eliminated.
increment j to the next index >i that hasn't been eliminated.
The elimination operation will eliminate at least one non-majority element each time it eliminates a majority element, so majority is preserved. When you've gone through the array, all elements not eliminated will be in the majority, and you're guaranteed at least one.
This is O(n) time, but also O(n) space to keep track of eliminations.
Given:
an implicit array a of length n, which is known to have a majority element
an oracle function f, such that f(i, j) = a[i] == a[j]
Asked:
Return an index i, such that a[i] is a majority element of a.
Main observation:
If
m is a majority element of a, and
for some even k < n each element of a[0, k) occurs at most k / 2 times
then m is a majority element of a[k, n).
We can use that observation by assuming that the first element is the majority element. We move through the array until we reach a point where that element occurred exactly half the time. Then we discard the prefix and continue again from that point on. This is exactly what the Boyer-Moore algorithm does, as pointed out by Rici in the comments.
In code:
result = 0 // index where the majority element is
count = 0 // the number of times we've seen that element in the current prefix
for i = 0; i < n; i++ {
// we've seen the current majority candidate exactly half of the time:
// discard the current prefix and start over
if (count == 0) {
result = i
}
// keep track of how many times we've seen the current majority candidate in the prefix
if (f(result, i)) {
count++
} else {
count--
}
}
return result
For completeness: this algorithm uses two variables and a single loop, so it runs in O(n) time and O(1) space.
Assuming you can determine if elements are <, >, or == what you can do is go through the list and build a tree. The trees values will be like buckets, the item and count of how many you've seen. When you come by a node where you get == then just increment the count. Then at the end go through the tree and find the one with the highest count.
Assuming you build a balanced tree, this should be O(n log n). Red Black trees might help with making a balanced tree. Else you could build the tree by adding randomly selected elements and this would give you O(n log n) on average.

Comparing between two arrays

How can I compare between two arrays with sorted contents of integer in binary algorithm?
As in every case: it depends.
Assuming that the arrays are ordered or hashed the time complexity is at most O(n+m).
You did not mention any language, so it's pseudo code.
function SortedSequenceOverlap(Enumerator1, Enumerator2)
{ while (Enumerator1 is not at the end and Enumerator2 is not at the end)
{ if (Enumerator1.current > Enumerator2.current)
Enumerator2.fetchNext()
else if (Enumerator2.current > Enumerator1.current)
Enumerator1.fetchNext()
else
return true
}
return false
}
If the sort order is descending you need to use a reverse enumerator for this array.
However, this is not always the fastest way.
If one of the arrays have significantly different size it could be more efficient to use binary search for a few elements of the elements of the shorter array.
This can be even more improved because when you start with the median element of the small array you need not do a full search for any further element. Any element before the median element must be in the range before the location where the median element was not found and any element after the median element must be in the upper range of the large array. This can be applied recursively until all elements have been located. Once you get a hit, you can abort.
The disadvantage of this method is that it takes more time in worst case, i.e. O(n log m), and it requires random access to the arrays which might impact cache efficiency.
On the other side, multiplying with a small number (log m) could be better than adding a large number (m). In contrast to the above algorithm typically only a few elements of the large array have to be accessed.
The break even is approximately when log m is less than m/n, where n is the smaller number.
You think that's it? - no
In case the random access to the larger array causes a higher latency, e.g. because of reduced cache efficiency, it could be even better to do the reverse, i.e. look for the elements of the large array in the small array, starting with the median of the large array.
Why should this be faster? You have to look up much more elements.
Answer:
No, there are no more lookups. As soon as the boundaries where you expect a range of elements of the large array collapses you can stop searching for these elements since you won't find any hits anymore.
In fact the number of comparisons is exactly the same.
The difference is that a single element of the large array is compared against different elements of the small array in the first step. This takes only one slow access for a bunch of comparisons while the other way around you need to access the same element several times with some other elements accesses in between.
So there are less slow accesses at the expense of more fast ones.
(I implemented search as you type this way about 30 years ago where access to the large index required disk I/O.)
If you know that they are sorted, then you can have a pointer to the beginning of each array, and move on both arrays, and moving one of the pointers up (or down) after each comparison. That would make it O(n). Not sure you could bisect anything as you don't know where the common number would be.
Still better than the brute force O(n2).
If you know the second array is sorted you can use binary search to look through the second array for elements from the first array.
This can be done in two ways.
a) Binary Search b) Linear Search
For Binary Search - for each element in array A look for element in B with binary search, then in that case the complexity is O(n log n )
For Linear Search - it is O( m + n ) - where m, n are sizes of the arrays. In your case m = n.
Linear search :
Have two indices i, j that point to the arrays A, B
Compare A[i], B[j]
If A[i] < B[j] then increment i, because any match if exists can be found only in later indices in A.
If A[i] > B[j] then increment j, because any match if exists can be found only in later indices in B.
If A[i] == B[j] you found the answer.
Code:
private int findCommonElement(int[] A, int[] B) {
for ( int i = 0, j = 0; i < A.length && j < B.length; ) {
if ( A[i] < B[j] ) {
i++;
} else if ( A[i] > B[j] ) {
j++;
}
return A[i];
}
return -1; //Assuming all integers are positive.
}
Now if you have both descending, just reverse the comparison signs i.e. if A[i] < B[j] increment j else increment i
If you have one descending (B) one ascending (A) then i for A starts from beginning of array and j for B starts from end of array and move them accordingly as shown below:
for (int i = 0, j = B.length - 1; i < A.length && j >= 0; ) {
if ( A[i] < B[j] ) {
i++;
} else if ( A[i] > B[j] ) {
j--;
}
return A[i];
}

How to get dot product of two sparsevectors in O(m+n) , where m and n are the number of elements in both vectors

I have two sparse vectors X and Y and want to get the dot product in O(m+n) where m and n are the numbers of non-zero elements in X and Y. The only way I can think of is picking each element in vector X and traverse through vector Y to find if there is element with the same index. But that would take O(m * n). I am implementing the vector as a linked list and each node has an element.
You can do it if your vectors are stored as a linked list of tuples whith each tuple containing the index and the value of a non zero element and sorted by the index.
You iterate through both vectors, by selecting the next element from the vector where you are at the lower index. If the indexes are the same you multiply the elements and store the result.
Repeat until one list reaches the end.
Since you have one step per non zero element in each list, the complexity is O(m+n) as required.
Footnote: The datastructure doesn't have to be linked list, but must provide a O(1) way to access the next non 0 element and it's index.
Sorted lists
Given that your nonzero elements are sorted by coordinate index in both vectors, it is achieved by merge algorithm. That is a standard algorithm in computer science, which merges two sorted sequences into one sorted sequence, and it works in O(M + N).
There are two ways to do it. The first one is to check for equal elements inside merge. And it is indeed the best way.
The second way is to merge first, then check for equals (they must be consecutive then):
std::pair<int, double> vecA[n], vecB[m], vecBoth[n+m];
std::merge(vecA, vecA+n, vecB, vecB+m, vecBoth);
double dotP = 0.0;
for (int i = 0; i+1 < n+m; i++)
if (vecBoth[i].first == vecBoth[i+1].first)
dotP += vecBoth[i].second * vecBoth[i+1].second;
Complexity of std::merge is O(M + N).
Example above assumes that the data is stored in arrays (which is the best choice for sparse vectors and matrices). If you want to use linked lists, you can also perform merge in O(M + N) time, see this question.
Unsorted lists
Even if your lists are unsorted, you can still perform dot product in O(M + N) time. The idea is to put all the elements of A into hash table first, then iterate through elements of B and see if there is an elements in hash with same index.
If indices are very large (e.g. more than million), then perhaps you should really use a nontrivial hash function. However, if your indices are rather small, then you can avoid using hash function. Simply use array of size greater than dimension of your vectors. In order to clear this array fast, you can use the trick with "generations".
//global data! must be threadlocal in case of concurrent access
double elemsTable[1<<20];
int whenUsed[1<<20] = {0};
int usedGeneration = 0;
double CalcDotProduct(std::pair<int, double> vecA[n], vecB[m]) {
usedGeneration++; //clear used array in O(1)
for (int i = 0; i < n; i++) {
elemsTable[vecA[i].first] = vecA[i].second;
whenUsed[vecA[i].first] = usedGeneration;
}
double dotP = 0.0;
for (int i = 0; i < m; i++)
if (whenUsed[vecB[i].first] == usedGeneration)
dotP += elemsTable[vecB[i].first] * vecB[i].second;
return dotP;
}
Note that you might need to clear whenUsed once per billion dot products.
Use Map to store each vector.
Each entry of map has index as key and value as the vector value at the particular index. Insert only the non zero values
Iterate on one map and for each entry check whether the particular key is present in the other map.If yes update the product else ignore the current key
Time Complexity : n -> vector size
O(n) - for map construction
O(n) - for iteration
Space Complexity : O(n) - for maps

implementing an algorithm that requires minimal memory

I am trying to implement an algorithm to find N-th largest element that requires minimal memory.
Example:
List of integers: 1;9;5;7;2;5. N: 2
after duplicates are removed, the list becomes 1;9;5;7;2.
So, the answer is 7, because 7 is the 2-nd largest element in the modified list.
In the below algorithm i am using bubble sort to sort my list and then removing duplicates without using a temp variable, does that make my program memory efficient ? Any ideas or suggestion
type Integer_array is Array (Natural range <>) of Integer;
procedure FindN-thLargestNumber (A : in out Integer_Array) is
b : Integer;
c:Integer;
begin
//sorting of array
for I in 0 to length(A) loop
for J in 1 to length(A) loop
if A[I] > A[J] then
A[I] = A[I] + A[J];
A[J] = A[I] - A[J];
A[I] = A[I] - A[J];
end if;
end loop;
end loop;
//remove duplicates
for K in 1 to length(A) loop
IF A[b] != A[K] then
b++;
A[b]=A[K];
end loop;
c = ENTER TO FIND N-th Largest number
PRINT A[b-(c-1)] ;
end FindN-th Largest Number
To find the n'th largest element you don't need to sort the main list at all. Note that this algorithm will perform well if N is smaller than M. If N is a large fraction of the list size then then you will be better off just sorting the list.
You just need a sorted list holding your N largest, then pick the smallest from that list (this is all untested so will probably need a couple of tweaks):
int[n] found = new int[n];
for (int i = 0;i<found.length;i++) {
found[i] = Integer.MIN_VALUE;
}
for (int i: list) {
if (i > found[0]) {
int insert = 0;
// Find the point in the array to insert the value
while (insert < found.length && found[insert] < i) {
insert++;
}
// If not at the end we have found a larger value, so move back one before inserting
if (found[insert] >= i) {
insert --;
}
// insert the value and shuffle everything BELOW it down.
for (int j=insert;j<=0;j--) {
int temp = found[j];
found[j]=i;
i=temp;
}
}
}
At the end you have the top N values from your list sorted in order. the first entry in the list is Nth value, the last entry the top value.
If you need the N-th largest element, then you don't need to sort the complete array. You should apply selection sort, but only for the required N steps.
Instead of using bubble sort, use quicksort kind of partial sorting.
Pick a key and using as a pivot move around elements (move all the elements>= pivot to the left of the array)
Count how many unique elements are there that are greater than equal to pivot.
If the number is less than N, then the answer is to the right of the array. Otherwise it is in the left part of the array (left or right as compared to pivot)
Iteratively repeat with smaller array and appropriate N
Complexity is O(n) and you will need constant extra memory.
HeapSort uses constant additional memory, so it has minimal space complexity, albeit it doesn't use a minimal number of variables.
It sorts in O(n log n) time which I think is optimal time complexity for this problem because of the needed to ignore duplicates. I may be wrong.
Of course you don't need to complete the heapsort -- just heapify the array and then pop out the first N non-duplicate largest values.
If you really do want to minimise memory usage to the point that you care about one temporary variable one way or the other, then you probably have to accept terrible performance. This becomes a purely theoretical exercise, though -- it will not make your code more memory efficient in practice, because in non-recursive code there is no practical difference between using, say 64 bytes of stack vs using 128 bytes of stack.

Efficient list intersection algorithm

Given two lists (not necessarily sorted), what is the most efficient non-recursive algorithm to find the set intersection of those lists?
I don't believe I have access to hashing algorithms.
You could put all elements of the first list into a hash set. Then, iterate the second one and, for each of its elements, check the hash to see if it exists in the first list. If so, output it as an element of the intersection.
You might want to take a look at Bloom filters. They are bit vectors that give a probabilistic answer whether an element is a member of a set. Set intersection can be implemented with a simple bitwise AND operation. If you have a large number of null intersections, the Bloom filter can help you eliminate those quickly. You'll still have to resort to one of the other algorithms mentioned here to compute the actual intersection, however.
http://en.wikipedia.org/wiki/Bloom_filter
without hashing, I suppose you have two options:
The naive way is going to be compare each element to every other element. O(n^2)
Another way would be to sort the lists first, then iterate over them: O(n lg n) * 2 + 2 * O(n)
From the eviews features list it seems that it supports complex merges and joins (if this is 'join' as in DB terminology, it will compute an intersection). Now dig through your documentation :-)
Additionally, eviews has their own user forum - why not ask there_
with set 1 build a binary search tree with O(log n) and iterate set2 and search the BST m X O(log n) so total O(log n) + O(m)+O(log n) ==> O(log n)(m+1)
in C++ the following can be tried using STL map
vector<int> set_intersection(vector<int> s1, vector<int> s2){
vector<int> ret;
map<int, bool> store;
for(int i=0; i < s1.size(); i++){
store[s1[i]] = true;
}
for(int i=0; i < s2.size(); i++){
if(store[s2[i]] == true) ret.push_back(s2[i]);
}
return ret;
}
Here is another possible solution I came up with takes O(nlogn) in time complexity and without any extra storage. You can check it out here https://gist.github.com/4455373
Here is how it works: Assuming that the sets do not contain any repetition, merge all the sets into one and sort it. Then loop through the merged set and on each iteration create a subset between the current index i and i+n where n is the number of sets available in the universe. What we look for as we loop is a repeating sequence of size n equal to the number of sets in the universe.
If that subset at i is equal to that subset at n this means that the element at i is repeated n times which is equal to the total number of sets. And since there are no repetitions in any set that means each of the sets contain that value so we add it to the intersection. Then we shift the index by i + whats remaining between it and n because definitely none of those indexes are going to form a repeating sequence.
First, sort both lists using quicksort : O(n*log(n). Then, compare the lists by browsing the lowest values first, and add the common values. For example, in lua) :
function findIntersection(l1, l2)
i, j = 1,1
intersect = {}
while i < #l1 and j < #l2 do
if l1[i] == l2[i] then
i, j = i + 1, j + 1
table.insert(intersect, l1[i])
else if l1[i] > l2[j] then
l1, l2 = l2, l1
i, j = j, i
else
i = i + 1
end
end
return intersect
end
which is O(max(n, m)) where n and m are the sizes of the lists.
EDIT: quicksort is recursive, as said in the comments, but it looks like there are non-recursive implementations
Using skip pointers and SSE instructions can improve list intersection efficiency.
Why not implement your own simple hash table or hash set? It's worth it to avoid nlogn intersection if your lists are large as you say.
Since you know a bit about your data beforehand, you should be able to choose a good hash function.
I second the "sets" idea. In JavaScript, you could use the first list to populate an object, using the list elements as names. Then you use the list elements from the second list and see if those properties exist.
If there is a support for sets (as you call them in the title) as built-in usually there is a intersection method.
Anyway, as someone said you could do it easily (I will not post code, someone already did so) if you have the lists sorted. If you can't use recursion there is no problem. There are quick sort recursion-less implementations.
In PHP, something like
function intersect($X) { // X is an array of arrays; returns intersection of all the arrays
$counts = Array(); $result = Array();
foreach ($X AS $x) {
foreach ($x AS $y) { $counts[$y]++; }
}
foreach ($counts AS $x => $count) {
if ($count == count($X)) { $result[] = $x; }
}
return $result;
}
From the definition of Big-Oh notation:
T(N) = O(f(N)) if there are positive constants c and n 0 such that
T(N) ≤ cf(N) when N ≥ n 0.
Which in practice means that if the two lists are relatively small in size say something less 100 elements in each two for loops works just fine. Loop the first list and look for similar object in the second.
In my case it works just fine because I won't have more than 10 - 20 max elements in my lists.
However, a good solution is the sort the first O(n log n), sort the second also O(n log n) and merge them, another O(n log n) roughly speeking O(3 n log n), say that the two lists are the same size.
Time: O(n) Space: O(1) Solution for identifying points of intersection.
For example, the two given nodes will detect the point of intersection by swapping pointers every time they reach the end. Video Explanation Here.
public ListNode getIntersectionNode(ListNode headA, ListNode headB) {
ListNode pA = headA;
ListNode pB = headB;
while (pA != pB) {
pA = pA == null ? headB : pA.next;
pB = pB == null ? headA : pB.next;
}
return pA;
}
Thanks.
Edit
My interpretation of intersection is finding the point of intersection.
For example:
For the given lists A and B, A and B will "meet/intersect" at point c1, and the algo above will return c1. As OP stated that OP has NO access to Hashmaps or some sort, I believe OP is saying that the algo should have O(1) space complexity.
I got this idea from Leetcode some time ago, if interested: Intersection of Two Linked Lists.

Resources