Binary search in 2 sorted integer arrays - algorithm

There is a big array which consists of 2 small integer arrays written one at the end of another. Both small arrays are sorted by ascending. We have to find an element in big array as fast, as possible. My idea was to find the end of the left array by binsearch in big array and then implement 2 binsearches on small arrays. The problem is that I don't know how to find that end. If you have an idea, how to find element without finding borders of smaller arrays, you're welcome!
Information about arrays: both small arrays have integer elements, both are sorted by ascending, they both can have length from 0 to any positive integer number, but there can be only one copy of an element.
Here are some examples of big arrays:
1 2 3 4 5 6 7 (all the elements of the second array are bigger, than the maximum of the first array)
100 1 (both arrays have only one element)
1 3 5 2 4 6 or 2 4 6 1 3 5 (most common situations)

This problem is impossible to solve in guaranteed time complexity faster than O(n) and not possible to solve at all for certain arrays. Binary search runs in O(log n) for a sorted array, but the big array is not guaranteed to be sorted and will in the worst-case require one or more comparisions per element, which is O(n). The best guaranteed time complexity is O(n) with the trivial algorithm: compare every item with its neighbour until you find the "turning point" with A[i] > A[i+1]. However, if you use a breadth-first search, you may get lucky and find the "turning point" early.
Proof that the problem is unsolvable for some arrays: let the array M = [A B] be our big array. To find the point where the arrays meet we're looking for an index i where M[i] > M[i+1]. Now let A=[1 2 3] and B=[4 5]. There is no index in the array M for which the condition holds true, thus the problem is unsolvable for some arrays.
Informal proof for the former: let M=[A B] and A=[1..x] and B=[(x+1)..y] be two sorted arrays. Then swap the positions of element x and y in M. We have no way of finding the index of x without (in the worst case) checking every index, thus the problem is O(n).
Binary search relies on being able to eliminate half the solution space with each comparision, but in this case we cannot eliminate anything from the array and so we cannot do better than a linear search.
(From a practical standpoint, you should never do this in a program. The two arrays should be separate. If this isn't possible, append the length of either array to the bigger array.)
Edit: changed my answer after question was updated. It's possible to do it faster than linear time for some arrays, but not all possible arrays. Here's my idea for an algorithm using breadth-first search:
Start with the interval [0..n-1] where n is the length of the big array.
Make a list of intervals and put the starting interval in it.
For each interval in the list:
if the interval is only two elements and the first element is greater than the last
we found the turning point, return it
else if the interval is two elements or less
remove it from the list
else if the first element of the interval is greater than the last
turning point is in this interval
clear the list
split this interval in two equal parts and add them to the list
else
split this interval in two equal parts and replace this interval in the list with the two parts
I think a breadth-first approach will increase the odds of finding an interval where A[first] > A[last] early. Note that this approach will not work if the turning point is between two intervals, but it's something to get you started. I would test this myself, but unfortunately I don't have the time now.

Related

Bubble sort variant - three adjacent number swapping

This problem appeared in code jam 2018 qualification round which has ended.
https://codejam.withgoogle.com/2018/challenges/ (Problem 2)
Problem description:
The basic operation of the standard bubble sort algorithm is to examine a pair of adjacent numbers and reverse that pair if the left number is larger than the right number. But our algorithm examines a group of three adjacent numbers, and if the leftmost number is larger than the rightmost number, it reverses that entire group. Because our algorithm is a "triplet bubble sort", we have named it Trouble Sort for short.
We were looking forward to presenting Trouble Sort at the Special
Interest Group in Sorting conference in Hawaii, but one of our interns
has just pointed out a problem: it is possible that Trouble Sort does
not correctly sort the list! Consider the list 8 9 7, for example.
We need your help with some further research. Given a list of N
integers, determine whether Trouble Sort will successfully sort the
list into non-decreasing order. If it will not, find the index
(counting starting from 0) of the first sorting error after the
algorithm has finished: that is, the first value that is larger than
the value that comes directly after it when the algorithm is done.
So a naive approach will be to apply trouble sort on the given list, apply normal sort on the list, and find the index of the first non-matching element. However, this would time out for very large N.
Here is what I figured:
The algorithm will compare 0th index with 2nd, 2nd with 4th and so on.
Similarly 1st with 3rd, 3rd with 5th and so on.
All the elements at odd index will be sorted with respect to odd index. Same for even indexed element.
So the issue would lie between two consecutive odd/even indexed element.
I can't think of a way to figure it out without doing an O(n^2) approach.
Is my approach any viable, or there is something easier?
Your observation is spot on. The algorithm presented in the problem statement will only compare( and swap ) the consecutive odd and even elements among themselves.
If you take that observation one step further, you can state that Trouble Sort is an algorithm that correctly sorts odd- and even-indexed elements of an array within themselves. (i.e. as if odd-indexed elements and even-indexed elements of an array A are two separate arrays B and C)
In other words, Trouble Sort does sort B and C correctly. The issue here is whether those arrays B and C of odd and even-indexed elements can be merged properly. You should check if sorting odd- and even-indexed elements among themselves is enough to make the entire array sorted.
This step is really similar to the merging step of MergeSort. The only difference is that, due to the indexing being a limiting factor on your operation, you know at all times from which array you will pick the top element. For a 1-indexed array A, during the merging step of B and C, at each step, you should pick the smallest previously unpicked element from B, and then C.
So, basically, if you sort B and C, which takes, O(NlogN) using an algorithm such as mergesort or heapsort, and then merge them in the manner described in the previous paragraph, which takes O(N), you end up with the same version of the array A after it has been processed by the Trouble Sort algorithm.
The difference is the time complexity. While Trouble Sort takes O(N^2) time, the operations described above takes O(NlogN) time. Once you end up with this array, then you can check in O(N) time if, for each consecutive indices i, j, A[i] < A[j] holds. The overall complexity of the algorithm would still be O(NlogN).
Below is a code sample in Python to demonstrate sort of a pseudocode of the algorithm I described above. There are a couple of minor differences in implementation due to Python arrays being 0-indexed. You may observe the execution of this code here.
def does_trouble_sort_work(A):
B, C = A[0::2], A[1::2]
B_sorted = sorted(B)
C_sorted = sorted(C)
j = k = 0
for i in xrange(len(A)):
if i % 2 == 0:
A[i] = B_sorted[j]
j += 1
else:
A[i] = C_sorted[k]
k += 1
trouble_sort_works = True
for i in xrange(1, len(A)):
if A[i-1] > A[i]:
trouble_sort_works = False
break
return trouble_sort_works

Find element of an array that appears only once in O(logn) time

Given an array A with all elements appearing twice except one element which appears only once. How do we find the element which appears only once in O(logn) time? Let's discuss two cases.
Array is always sorted and elements are in sequential order. Let's assume A = [1, 1, 2, 2, 3, 4, 4, 5, 5, 6, 6], we want to find 3 in log n time because it appears only once.
When the array is not sorted and the elements are not in sequential order.
I can only come up with a solution of using the XOR operator on the binary representation of the integers as explained Here, and at the end, the binary string will represent the element which appears only once because duplicates will cancel out. But it takes O(n) time. How can we do better than that?
using Haroon S' comment this is the solution which I think is correct, given the constraints for time.
class Solution:
def singleNonDuplicate(self, nums: List[int]) -> int:
low = 0
high = len(nums)-1
while(low<high):
mid = (low+high)//2
if(mid%2==0):
mid+=1
if(nums[mid]==nums[mid+1]):
# answer in second half
high = mid-1
elif(nums[mid]==nums[mid-1]):
# answer in first half
low = mid+1
return nums[low]
If the elements are sorted (i.e., the first case you mentioned) then I believe a strategy not unlike binary search could work in O(logN) time.
Starting from the left endpoint in a sorted array, until we encounter the unique element, all the index pairs (2i, 2i + 1) we encounter along the way will have the same value. (i.e., due to the array being sorted) However, as we go towards the right endpoint of the array, as soon as we consider an array that includes the unique element, that structure of "same values within (2i, 2i+1) index pairs" will be invalid.
Using that information, a search algorithm similar to binary search can find out in which half of the array the unique element is. Basically, you can deduce that, "in the left half of the array, if the values in the rightmost index pair (2i, 2i+1) are the same, then the unique value is in the right half". (i.e., with the exception of the last index on the left half-array being even; but you can overcome that case with various O(1) time operations)
The overall complexity then becomes O(logN), due to the halving of the array size at each step.
For the demonstration of the index notion I mentioned above, see your own example. In the left of the unique element(i.e. 3) all index pairs (2i, 2i+1) have the same values. And all subarrays starting from index 0 and ending with an index that is to the right of the unique element, all index pairs (2i, 2i+1) have a correspond to cells that contain different values.
Unless the array is sorted, though, since you'd have to investigate each and every element, I believe any algorithm you may come up with would take at least O(n) time. This is what I think will happen in the second case you mention in your question.
In the general case this is impossible, as to make sure an element doesn't repeat you need to check every other element.
From your example, it seems the array might be a sorted sequence of integers with no "gaps" (or some other clearly defined sequence, like all even numbers, etc). In this case it is possible with a modified binary search.
You have the array [1,1,2,2,3,4,4,5,5,6,6].
You check the middle element and the element following it and see 3 and 4. Now you know there are only 5 elements from the set {1, 2, 3}, while there are 6 elements from the set {4, 5, 6}. Which means, the missing elements is in {1, 2, 3}.
Then you recurse on [1,1,2,2,3]. You see 2,2. Now you know there are 2 "1" elements and 1 "3" element, so 3 is the answer.
The reason you check 2 elements in each step is that if you see just "3", you don't know whether you hit the first 3 in "3,3" or the second one. But if you read 2 elements you always find a "boundary" between 2 different elements.
The condition for this to be viable is that, given the value of an element, you need to be able to calculate in O(1) how many different elements come before this element. In your case this is trivial, but it is also possible for any arithmetic series, geometric series (with fixed size numbers)...
This is not a O(log n) solution. I have no idea how to solve it in logarithmic time without the constraints that the array is sorted and we have a known difference between consecutive numbers so we can recognise when we are to the left or right of the singleton. The other solutions already deal with that special case and I couldn’t do better there either.
I have a suggestion that might solve the general case in O(n), rather than O(n log n) when you first sort the array. It’s not as fast as the xor solution, but it will also work for non-integers. The elements must have an order, so it is not completely general, but it will work anywhere you can sort the elements.
The idea is the same as the k’th order element algorithm based on Quicksort. You partition and recurse on one half of the array. The time recurrence is T(n) = T(n/2) + O(n) = O(n).
Given array x and indices i,j, representing sub-array x[i:j], partition with quicksort’s partitioning method. You want a variant that partitions x[i:j] into three segments, x[i:k] x[k:l], x[l:j] where all elements in the first part are smaller than the pivot (whatever it is) all elements in x[k:l] are equal to the pivot, and all elements in the last segment are greater than the pivot.
(you might be able to use a version that only partitions in two, or explicitly count the number of pivots, but with this version is easier to work with here)
Now, if the middle segment has length one, you have your singleton. It is the pivot.
If not, the length of the segment that has the singleton is odd while the other is even. So recurse on the segment with the odd length.
It doesn’t give you worst case linear time, for the same reason that Quicksort isn’t worst case log-linear, but you get an expected linear time algorithm and likely a fast one at that.
Not, of course, as fast as those solutions based on binary search, but here the elements do not need to be sorted and we can handle elements with arbitrary gaps between them. We are also not restricted to data where we can easily manipulate their bit-patterns. So it is more general. If you can compare the elements, this approach will find the singleton in O(n).
This solution will find the element in the array that appeared only once but there should not be more than one element of that type and the array should be sorted. This is Binary Search and will return the element in O(log n) time.
var singleNonDuplicate = function(nums) {
let s=0,e= nums.length-1
while(s < e){
let mid = Math.trunc(s+(e-s)/2)
if((mid%2 == 0&& nums[mid] ==nums[mid+1])||(mid%2==1 && nums[mid] == nums[mid-1]) ){
s= mid+1
}
else{
e = mid
}
}
return nums[s] // can return nums[e] also
};
I don't believe there is a O(log n) solution for that. The reason is that in order to find which element is appearing only once, you at least need to iterate over the elements of that array once.

Algorithms for dividing an array into n parts

In a recent campus Facebook interview i have asked to divide an array into 3 equal parts such that the sum in each array is roughly equal to sum/3.My Approach1. Sort The Array2. Fill the array[k] (k=0) uptil (array[k]<=sum/3)3. After that increment k and repeat the above step for array[k]Is there any better algorithm for this or it is NP Hard Problem
This is a variant of the partition problem (see http://en.wikipedia.org/wiki/Partition_problem for details). In fact a solution to this can solve that one (take an array, pad with 0s, and then solve this problem) so this problem is NP hard.
There is a dynamic programming approach that is pseudo-polynomial. For each i from 0 to the size of the array, you keep track of all possible combinations of current sizes for the sub arrays, and their current sums. As long as there are a limited number possible sums of subsets of the array, this runs acceptably fast.
The solution that I would have suggested is to just go for "good enough" closeness. First let's consider the simpler problem with all values positive. Then sort by value descending. Take that array in threes. Build up the three subsets by always adding the largest of the triple to the one with the smallest sum, the smallest to the one with the largest, and the middle to the middle. You will end up dividing the array evenly, and the difference will be no more than the value of the third smallest element.
For the general case you can divide into positive and negative, use the above approach on each, and then brute force all combinations of a group of positives, a group of negatives, and the few leftover values in the middle that did not divide evenly.
Here are details on a dynamic programming solution if you are interested. The running time and memory usage is O(n*(sum)^2) where n is the size of your array and sum is the sum of absolute values of your array values. For each array index j from 1 to n, store all the possible values you can get for your 3 subset sums when you split the array from index 1 to j into 3 subsets. Also for each possibility, store one possible way to split the array to get the 3 sums. Then to extend this information for 1 to (j+1) given the information from 1 to j, simply take each possible combination of 3 sums for splitting 1 to j and form the 3 combinations of 3 sums you get when you choose to add the (j+1)th array element to any one of the 3 subsets. Finally, when you are done and reach j = n, go through the set of all combinations of 3 subset sums you can get when you split array positions 1 to n into 3 sets, and choose the one whose maximum deviation from sum/3 is minimized. At first this may seem like O(n*(sum)^3) complexity, but for each j and each combination of the first 2 subset sums, the 3rd subset sum is uniquely determined. (because you are not allowed to omit any elements of the array). Thus the complexity really is O(n*(sum)^2).

Find the minimum two non edge, non adjacent entries in an array

I had the following question
Find the smallest two nonadjacent values in an array, such that non of these elements is on the array edge (no A[0] and no A[n-1])
The runtime of the algorithm should be O(n)
I first thought about sorting the array, but sorting would cost O(nlogn)
Ignoring this fact for a second, if we sort the array, we can not just take the first two values, since they might violate the conditions mentioned above? and then what? take the next element and try, if not take the next, I can't see an easy solution there
Another solution is to generate all allowed pairs and find the pair with the minimum sum. But finding all pairs cost O(n^2)
Any ideas?
In linear time, find the ith smallest entry (excluding the first and last) for i from 1 to 4. The best possibility is a pair of these. If 1 and 2 are nonadjacent, then that's the best. Otherwise, if 1 and 3 are nonadjacent, then that's the best. Otherwise, 2 and 3 are bordering 1 (hence not each other), and the possibilities are 1 and 4, or 2 and 3.
You could go with your sort first, then, unless I am missing something, take elements 0 and 2. Those would be the smallest non-adjacent values.
As long as the array is 3 elements or greater you should be assured that the element values in position 0 and 2 are the smallest (and if you need them to be, non-consecutive) as well as non-adjacent in the array.
If your array is sorted, you would only have to keep comparing alternate elements (like indices (0,2), (1,3), (2,5)) and so on and then find the pair with the smallest difference. But without sorting, you are right in saying that the run time complexity would then become O(n^2) as you would have to compare every element with every other element in the array.

Measuring how "out-of-order" an array is

Given an array of values, I want to find the total "score", where the score of each element is the number of elements with a smaller value that occur before it in the array.
e.g.
values: 4 1 3 2 5
scores: 0 0 1 1 4
total score: 6
An O(n^2) algorithm is trivial, but I suspect it may be possible to do it in O(nlgn), by sorting the array. Does anyone have any ideas how to do that, or if it's not possible?
Looks like what you are doing is essentially counting the number of pairs of elements that are in the incorrect relative order (i.e. number of inversions). This can be done in O(n*log(n)) by using the same idea as merge sort. As you merge, you just count the number of elements that are in the left list but should have been on the right list (and vice versa).
If the range of your numbers is small enough, the fastest algorithm I can think of is one that uses Fenwick Trees. Essentially just iterate through the list and query the Fenwick Tree for how many elements are before it, then insert the number into the tree. This will answer your question in O(nlogm), where n is the size of your list and m is your largest integer.
If you don't have a reasonable range on your integers (or you want to conserve space) MAK's solution is pretty damn elegant, so use that :)

Resources