How to apply the Step-Count method to my binary search implementation

How to apply the Step-Count method to my binary search implementation - algorithm

int binarySearch(int arr[], int left, int right, int x)
{
while( left <= right)
{
int mid = (left+right)/2;
if(arr[mid] == x)
{
return mid;
}
else if(arr[mid] > x)
{
right = mid-1;
}
else
{
left = mid+1;
}
}
return -1;
}
when I went through this myself I got 5n+4 = O(n) but somehow it is suppose to be O(logN) which I don't understand why that's the case.
int mean(int a[], size_t n)
{
int sum = 0; // 1 step
for (int i = 0; i < n; i++) // 1 step *(n+1)
sum += a[i]; // 1 step
return sum; // 1 step
}
I understand that the above code reduces to 2N+3 but this is a very basic example and doesn't take much thought to understand. Will someone please walk me through the binary search version as all the sources I have encountered don't make much sense to me.
Here is a link to one of the many other resources that I have used, but the explanation as to how each statement is separated into steps is what I prefer if possible.
how to calculate binary search complexity

In binary search you always reduce problem size by 1/2. Lets take an example: searching element is 19 and array size is 8 elements in a sorted array [1,4,7,8,11,16,19,22] then following will be the sequence of steps that a binary search will perform:
Get the middle element index i.e. divide the problem size by 1/2.
Check if element at index is greater than, less than or equal to your searching element.
a. If equal you are done, return the index
b. If searching element is greater, then keep looking on right half of array
c. If searching element is less, than look on left half of array
You continue step 1 and 2 until you are left with one element or you found the element.
In our example problem will look as follows:
Iteration 1: [1,4,7,8,11,16,19,22]
Iteration 2: [16,19,22]
Iteration 3: [19]
Order of complexity: O(log<sub>2</sub>(n))
i.e.
log<sub>2</sub>8 = 3, which means we required 3 steps to find our desired element. Even if element was not there (i.e. in worst case) time complexity of this algorithms remains log2n.
Its important to note base of log in binary search is 2 as we are reducing problem size by 1/2, if in any other algorithm we are reducing problem size by 1/3 than its log3 but asymptotically we call it as logarithmic algorithm irrespective of its base.

Note: Binary search can only be done on sorted data.
Suppose i have an array of 10 elements. Binary search will split the array into two halfs, in this case 5(call it L because these are left 5 elements) and 5 (call it right because these are right 5 elements).
Suppose the element you are trying to find is greater than middle elements , in this case x > array[5] then you just ignore first 5 elements and go to last five elements.
Now you have an array of five elements(starting from index 5 to 10). Now again you will split the array into two halfs , if x > array[mid] then you ignore left whole array and if it is smaller then you ignore whole right array.
In mathematical notation you get a series like this: {n , n/2,n/(2^2) , n/(2^m)}
Now if you try to solve this: Because the highest term is n/2^m so we have n/2^m = 1 and this has a solution as log(n)

Related

algorithm that finds the element ai in a sequence

A sequence A=[a1, a2,...,an] is a valley sequence, if there's an index i with 1 < i < n such that:
a1 > a2 > .... > ai
and
ai < ai+1 < .... < an.
It is given that a valley sequence must contain at least three elements in it.
What i'm really confused about is, how do we find an algorithm that finds the element ai, as described above, in O(log n) time?
Will it be similar to an O(log n) binary search?
And if we do have a binary search algorithm which find an element of an array in O(log n) time, can we improve the runtime to O(log log n) ?

To have a BIG-O(logn) algorithm, we will have to reduce the problem size by half in constant time.
In this problem specifically, we can select a mid-point, and check if its slope is increasing, decreasing or a bottom.
If the slope is increasing, the part after the mid-point could be ignored
else if the slope is decreasing, the part before the mid-point could be ignored
else the mid-point should be the bottom, hence we find our target.
Java code example :
Input: [99, 97, 89, 1, 2, 4, 6], output: 1
public int findBottomValley(int[] valleySequence) {
int start = 0, end = valleySequence.length - 1, mid;
while (start + 1 < end) {
mid = start + (end - start) / 2;
if (checkSlope(mid, valleySequence) < 0) {
// case decreasing
start = mid;
} else if (checkSlope(mid, valleySequence) > 0) {
// case increasing
end = mid;
} else {
// find our target
return valleySequence[mid];
}
}
// left over with two points
if (valleySequence[start] < valleySequence[end]) {
return valleySequence[start];
}
return valleySequence[end];
}
The helper function checkSlope(index, list) will check the slope at the index of the list, it will check three points including index - 1, index and index + 1. If the slope is decreasing, return negative numbers; if the slope is increasing, return positive numbers; if the numbers at index - 1 and index + 1 are both larger than the number at index, return 0;
Note: the algorithm makes assumptions that:
the list has at least three items
the slope at the adjacent elements cannot be flat, the reason behind this is that if there are adjacent numbers that are equal, then we are unable to decide which side the bottom is. It could appear at the left of such flat slope or on the right, hence we will have to do a linear search.
Since random access of an array is already constant O(1), having an O(logn) access time may not help the algorithm.

There is a solution that works a lot like binary search. Set a = 2 and b = n - 1. At each step, we will only need to consider candidates with index k such that a <= k <= b. Compute m = (a + b) / 2 (integer divide, so round down) and then consider array elements at indices m - 1, m and m + 1. If these elements are decreasing, then set a = m and keep searching. If these elements are increasing, then set b = m and keep searching. If these elements form a valley sequence, then return m as the answer. If b - a < 2, then there is no valley.
Since we halve the search space each time, the complexity is logarithmic. Yes, we access three elements and perform two comparisons at each stage, but calculation will show that just affects constant factors.
Note that this answer depends on these sequences being strictly decreasing and then increasing. If consecutive elements can repeat, the best solution is linear in the worst case.
Just saw the second part. In general, no, a way to find specific elements in logarithmic time - even constant time - is useless in general. The problem is that we really have no useful idea what to look for. If the spacing of all elements' values is greater than their spacing in the array - this isn't hard to arrange - then I can't see how you'd pick something to search for.

Maximum of all possible subarrays of an array

How do I find/store maximum/minimum of all possible non-empty sub-arrays of an array of length n?
I generated the segment tree of the array and the for each possible sub array if did query into segment tree but that's not efficient. How do I do it in O(n)?
P.S n <= 10 ^7
For eg. arr[]= { 1, 2, 3 }; // the array need not to be sorted
sub-array min max
{1} 1 1
{2} 2 2
{3} 3 3
{1,2} 1 2
{2,3} 2 3
{1,2,3} 1 3

I don't think it is possible to store all those values in O(n). But it is pretty easy to create, in O(n), a structure that makes possible to answer, in O(1) the query "how many subsets are there where A[i] is the maximum element".
Naïve version:
Think about the naïve strategy: to know how many such subsets are there for some A[i], you could employ a simple O(n) algorithm that counts how many elements to the left and to the right of the array that are less than A[i]. Let's say:
A = [... 10 1 1 1 5 1 1 10 ...]
This 5 up has 3 elements to the left and 2 to the right lesser than it. From this we know there are 4*3=12 subarrays for which that very 5 is the maximum. 4*3 because there are 0..3 subarrays to the left and 0..2 to the right.
Optimized version:
This naïve version of the check would take O(n) operations for each element, so O(n^2) after all. Wouldn't it be nice if we could compute all these lengths in O(n) in a single pass?
Luckily there is a simple algorithm for that. Just use a stack. Traverse the array normally (from left to right). Put every element index in the stack. But before putting it, remove all the indexes whose value are lesser than the current value. The remaining index before the current one is the nearest larger element.
To find the same values at the right, just traverse the array backwards.
Here's a sample Python proof-of-concept that shows this algorithm in action. I implemented also the naïve version so we can cross-check the result from the optimized version:
from random import choice
from collections import defaultdict, deque
def make_bounds(A, fallback, arange, op):
stack = deque()
bound = [fallback] * len(A)
for i in arange:
while stack and op(A[stack[-1]], A[i]):
stack.pop()
if stack:
bound[i] = stack[-1]
stack.append(i)
return bound
def optimized_version(A):
T = zip(make_bounds(A, -1, xrange(len(A)), lambda x, y: x<=y),
make_bounds(A, len(A), reversed(xrange(len(A))), lambda x, y: x<y))
answer = defaultdict(lambda: 0)
for i, x in enumerate(A):
left, right = T[i]
answer[x] += (i-left) * (right-i)
return dict(answer)
def naive_version(A):
answer = defaultdict(lambda: 0)
for i, x in enumerate(A):
left = next((j for j in range(i-1, -1, -1) if A[j]>A[i]), -1)
right = next((j for j in range(i+1, len(A)) if A[j]>=A[i]), len(A))
answer[x] += (i-left) * (right-i)
return dict(answer)
A = [choice(xrange(32)) for i in xrange(8)]
MA1 = naive_version(A)
MA2 = optimized_version(A)
print 'Array: ', A
print 'Naive: ', MA1
print 'Optimized:', MA2
print 'OK: ', MA1 == MA2

I don't think it is possible to it directly in O(n) time: you need to iterate over all the elements of the subarrays, and you have n of them. Unless the subarrays are sorted.
You could, on the other hand, when initialising the subarrays, instead of making them normal arrays, you could build heaps, specifically min heaps when you want to find the minimum and max heaps when you want to find the maximum.
Building a heap is a linear time operation, and retrieving the maximum and minimum respectively for a max heap and min heap is a constant time operation, since those elements are found at the first place of the heap.
Heaps can be easily implemented just using a normal array.
Check this article on Wikipedia about binary heaps: https://en.wikipedia.org/wiki/Binary_heap.

I do not understand what exactly you mean by maximum of sub-arrays, so I will assume you are asking for one of the following
The subarray of maximum/minimum length or some other criteria (in which case the problem will reduce to finding max element in a 1 dimensional array)
The maximum elements of all your sub-arrays either in the context of one sub-array or in the context of the entire super-array
Problem 1 can be solved by simply iterating your super-array and storing a reference to the largest element. Or building a heap as nbro had said. Problem 2 also has a similar solution. However a linear scan is through n arrays of length m is not going to be linear. So you will have to keep your class invariants such that the maximum/minimum is known after every operation. Maybe with the help of some data structure like a heap.

Assuming you mean contiguous sub-arrays, create the array of partial sums where Yi = SUM(i=0..i)Xi, so from 1,4,2,3 create 0,1,1+4=5,1+4+2=7,1+4+2+3=10. You can create this from left to right in linear time, and the value of any contiguous subarray is one partial sum subtracted from another, so 4+2+3 = 1+4+2+3 - 1= 9.
Then scan through the partial sums from left to right, keeping track of the smallest value seen so far (including the initial zero). At each point subtract this from the current value and keep track of the highest value produced in this way. This should give you the value of the contiguous sub-array with largest sum, and you can keep index information, too, to find where this sub-array starts and ends.
To find the minimum, either change the above slightly or just reverse the sign of all the numbers and do exactly the same thing again: min(a, b) = -max(-a, -b)

I think the question you are asking is to find the Maximum of a subarry.
bleow is the code that cand do that in O(n) time.
int maxSumSubArr(vector<int> a)
{
int maxsum = *max_element(a.begin(), a.end());
if(maxsum < 0) return maxsum;
int sum = 0;
for(int i = 0; i< a.size; i++)
{
sum += a[i];
if(sum > maxsum)maxsum = sum;
if(sum < 0) sum = 0;
}
return maxsum;
}
Note: This code is not tested please add comments if found some issues.

Amortized worst case complexity of binary search

For a binary search of a sorted array of 2^n-1 elements in which the element we are looking for appears, what is the amortized worst-case time complexity?
Found this on my review sheet for my final exam. I can't even figure out why we would want amortized time complexity for binary search because its worst case is O(log n). According to my notes, the amortized cost calculates the upper-bound of an algorithm and then divides it by the number of items, so wouldn't that be as simple as the worst-case time complexity divided by n, meaning O(log n)/2^n-1?
For reference, here is the binary search I've been using:
public static boolean binarySearch(int x, int[] sorted) {
int s = 0; //start
int e = sorted.length-1; //end
while(s <= e) {
int mid = s + (e-s)/2;
if( sorted[mid] == x )
return true;
else if( sorted[mid] < x )
start = mid+1;
else
end = mid-1;
}
return false;
}

I'm honestly not sure what this means - I don't see how amortization interacts with binary search.
Perhaps the question is asking what the average cost of a successful binary search would be. You could imagine binary searching for all n elements of the array and looking at the average cost of such an operation. In that case, there's one element for which the search makes one probe, two for which the search makes two probes, four for which it makes three probes, etc. This averages out to O(log n).
Hope this helps!

iAmortized cost is the total cost over all possible queries divided by the number of possible queries. You will get slightly different results depending on how you count queries that fail to find the item. (Either don't count them at all, or count one for each gap where a missing item could be.)
So for a search of 2^n - 1 items (just as an example to keep the math simple), there is one item you would find on your first probe, 2 items would be found on the second probe, 4 on the third probe, ... 2^(n-1) on the nth probe. There are 2^n "gaps" for missing items (remembering to count both ends as gaps).
With your algorithm, finding an item on probe k costs 2k-1 comparisons. (That's 2 compares for each of the k-1 probes before the kth, plus one where the test for == returns true.) Searching for an item not in the table costs 2n comparisons.
I'll leave it to you to do the math, but I can't leave the topic without expressing how irked I am when I see binary search coded this way. Consider:
public static boolean binarySearch(int x, int[] sorted {
int s = 0; // start
int e = sorted.length; // end
// Loop invariant: if x is at sorted[k] then s <= k < e
int mid = (s + e)/2;
while (mid != s) {
if (sorted[mid] > x) e = mid; else s = mid;
mid = (s + e)/2; }
return (mid < e) && (sorted[mid] == x); // mid == e means the array was empty
}
You don't short-circuit the loop when you hit the item you're looking for, which seems like a defect, but on the other hand you do only one comparison on every item you look at, instead of two comparisons on each item that doesn't match. Since half of all items are found at leaves of the search tree, what seems like a defect turns out to be a major gain. Indeed, the number of elements where short-circuiting the loop is beneficial is only about the square root of the number of elements in the array.
Grind through the arithmetic, computing amortized search cost (counting "cost" as the number of comparisons to sorted[mid], and you'll see that this version is approximately twice as fast. It also has constant cost (within ±1 comparison), depending only on the number of items in the array and not on where or even if the item is found. Not that that's important.

Most efficient way to sort a 2d array with sorted rows into 1d sorted array

Given a 2d array (a matrix) with n rows and k columns, with it's rows sorted, columns unspecified, what would be the most efficient algorithm to sort it?
For exmaple:
Input (n = 3 ; k = 4):
1 5 8 10
3 4 5 6
2 3 3 9
Output:
1 2 3 3 3 4 5 5 6 8 9 10
This is purely algorithmic question, so no specific .sort() methods of some languages help me, as I'm actually intereseted in the runtime complexity.
What I had in mind would be an algorithm as follows:
- Build a Binary Search tree with n nodes. Each node contains:
ID - of the row it refers to;
Value - The number at the "head" of the row.
- Sort the tree where the sorting key are the values.
- Find the lowest valued node.
- Do while the tree has a root:
- Take the lowest value and append it to the new array.
- Update the node's value to the next one in the same row.
- If the updated value is null, delete that node.
- else re-sort the tree.
- If the node moved, the first node it swapped with will be the next lowest
- else, if it didn't move, it's the new lowest.
If I'm not mistaken the runtime complexity is O(n*k * log n), Since I'm sorting the tree n*k times, which takes O(log n) time, and process of finding the next lowest is O(1).
If my complexity calculation is wrong, please let me know.
Is there any way more efficient than this?

You basically have n sorted lists, each of size k. You need a generalization of merge-sort, which is the k-way merge.
The idea is to keep a min-heap, that contains the smallest element from each list.
Now, iteratively pop the min of the heap. Let this number be x, and say it was taken from row i. Now, append x to the result list, and add to the min heap the next element in row i (if such element exists)
Repeat until all elements are exhausted.
Complexity is O(n*k*logn), which is pretty efficient considering you are sorting n*k elements, and need to traverse all of them. The constants for using a binary heap are pretty good.
Note that this is often refered as external sort (or to be exact a close variant to external sort's second part).
This is very similar to the algorithm you are suggesting, but will probably run much faster (with better constants) due to the usage of heap rather than the less efficient tree.
Also note, that if you use a 'regular' binary tree, you get complexity of O(n^2k), since there is no guarantee on the height of the tree. You need a self balancing binary search tree in order to get O(nklogn) run time.

This can be done using Sorted Merge which will take o(rows*cols) time i.e. total number of elements and o(rows) space complexity.
The java code for this problem is as below: (Consider rows = 3 and cols = 4)
for(int i=0;i<3;i++)
{
index[i] =0;
}
int count=0;
int a;
int b;
int c;
while(count<(ROWS*COLS))
{
int smallest;
if(index[0]>=COLS)
a= Integer.MAX_VALUE;
else
a= matrix[0][index[0]];
if(index[1]>=COLS)
b = Integer.MAX_VALUE;
else
b = matrix[1][index[1]];
if(index[2]>=COLS)
c = Integer.MAX_VALUE;
else
c = matrix[2][index[2]];
if(a<=b && a<=c){
// a is smallest
smallest = a;
index[0] = index[0] +1;
}else if(b<=c && b<=a){
//b is smallest
smallest = b;
index[1] = index[1] + 1;
}else{
//c is smallest
smallest = c;
index[2] = index[2] + 1;
}
System.out.print(smallest + ", ");
count++;
}

Elegant way of finding two consecutive values in sorted array that bound a given value?

I have an array of sorted integers, and I'd like to get the two consecutive indices of elements that bound a particular value I pass in. To illustrate, because it's hard to describe in words, let's say I have an array (regular zero-indexed):
1 3 4 5 7 9
I want to get the two indices that bound, say, the value 6. In this case, the array has values 5 and 7 in consecutive positions, which bound the value I'm looking for (5 <= 6 <= 7), and so I'd return the index of 5 and the index of 7 (3 and 4, respectively).
I have this currently implemented in a very brute-force fashion, involving a lot of sorts and searches in the array. In addition, I feel like I'm missing a lot of corner cases (especially with values that are larger/smaller than the largest/smallest value in the array).
Is there an elegant way of doing this? What corner cases should I look out for, and how can I deal with and or check for them? Thanks!

You can solve the problem using binary search tol solve it in O(lg(n)) without considering so many boundary cases. The idea is to use binary search to find the lowest element greater than or equal to the value to bound (let's call it x).
pair<int, int> FindInterval(const vector<int>& v, int x) {
int low = 0, high = (int)v.size();
while (low < high) {
const int mid = (low + high) / 2;
if (v[mid] < x) low = mid + 1;
else high = mid;
}
// This if is used to detect then a bound (a <= x <= x) is impossible but a
// bound (x <= x <= can be found).
if (low == 0 && low < (int)v.size() && v[low] == x) ++low;
return make_pair(low - 1, low);
}
Notice that the answer might be (-1, 0), indicating that there is no lower bound for the interval, it might be (n - 1, n), indicating that there is no upper bound for the interval (where n is the size of v). Also, there can two possible answers if x is in v, and there can be multiple answers if x is multiple times in v, because the boundaries include the extremes.
Finally, you can substitute the binary search with the std::lower_bound function:
pair<int, int> FindInterval(const vector<int>& v, int x) {
// This does the same as the previous hand-coded binary search.
const int low = (int)(lower_bound(v.begin(), v.end(), x) - v.begin());
// The rest of the code is the same...
}

Basically:
Sort the array (once);
Do a bisection search to find the closest element;
Compare that element to input value;
If it's lower you have the lower bound;
If it's higher you have the higher bound;
If it's the same you the bounds are next to the element.
Now if you can have repeat values in the array the last step is a little more complicated. You may need to skip over several values.
Ultimately this is little more than a bisection search on a sorted array, so is O(log n) on a sorted array and O(n log n) on an unsorted array.

Binary search for the value that you want (in this case, 6).
If it's found, grab the previous and next values based in the resulting index.
If not, your final search value will be either less than
or greater than your target value. If it's bigger, your bounding values will be at that index and the one previous. Otherwise, they will be at that index and the next one.

One way to make this faster would be to use binary search. This will reduce your current time complexity of O(n) to O(log n).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to apply the Step-Count method to my binary search implementation - algorithm

Related

algorithm that finds the element ai in a sequence

Maximum of all possible subarrays of an array

Amortized worst case complexity of binary search

Most efficient way to sort a 2d array with sorted rows into 1d sorted array

Elegant way of finding two consecutive values in sorted array that bound a given value?

Categories

Resources