Binary Search Explanation

Binary Search Explanation - algorithm

Link: https://leetcode.com/problems/first-bad-version/discuss/71386/An-clear-way-to-use-binary-search
I am doing a question wherein, given a string like this "FFTTTT", I have to find either the rightmost F or the leftmost T.
The following is the code:
To find the leftmost T
public int firstBadVersionLeft(int n) {
int i = 1;
int j = n;
while (i < j) {
int mid = i + (j - i) / 2;
if (isBadVersion(mid)) {
j = mid;
} else {
i = mid + 1;
}
}
return i;
}
I have the following doubts:
I am unable to understand the intuition behind returning i. I mean, why didn't we return j. I did a trial run of the code in my mind, and it works out, but how do we know we have to return i.
Why didn't we do while (i<=j) and just while(i<j). I mean, how do we determine this?

You stop when i==j as far as I can tell. So you can return either of them since they have the same value.
The difference between the two conditions is i == j . In that case mid == i + 0/2 == i. So isBadVersion can return true or false. If it returns true then you do j= mid, but we already had i == j == mid, so you don't do anything and you have an infinite loop. If isBadVersion returns false then you will make i == j+1 and the loop will end. Since the boundaries are inclusive (they include the i'th and j'th element) this can only happen if there's no 'T' in the string.
So you do (i < j) to avoid that infinite loop case.
P.S. This code will return n if there's not 'T' in the string or the last character is a 'T'. Not sure if that's intended or not.

Related

How can I figure out loop invariant in my binary search implementation?

bool binsearch(int x) {
int i = 0, j = N;
while(i < j) {
int m = (i+j)/2;
if(arr[m] <= x) {
if(arr[m] == x)
return true;
i = m+1;
}
else {
j = m;
}
}
return false;
}
This is my implementation of binary search which returns true if x is in arr[0:N-1] or
returns false if x is not in arr[0:N-1].
And I'm wondering how can I figure out right loop invariant to prove this implementation is correct.
How can I solve this problem?
Thanks a lot :D

Think about the variables holding state within your loop. In your case, they are variables i and j. You start with the fact that all elements < i and less than the value you are searching for (x) and all elements > j and greater than the x. This is the invariant you are trying to maintain.

Sort a given array whose elements range from 1 to n , in which one element is missing and one is repeated

I have to sort this array in O(n) time and O(1) space.
I know how to sort an array in O(n) but that doesn't work with missing and repeated numbers. If I find the repeated and missing numbers first (It can be done in O(n)) and then sort , that seems costly.
static void sort(int[] arr)
{
for(int i=0;i<arr.length;i++)
{
if(i>=arr.length)
break;
if(arr[i]-1 == i)
continue;
else
{
while(arr[i]-1 != i)
{
int temp = arr[arr[i]-1];
arr[arr[i]-1] = arr[i];
arr[i] = temp;
}
}
}
}

First, you need to find missing and repeated numbers. You do this by solving following system of equations:
Left sums are computed simultaneously by making one pass over array. Right sums are even simpler -- you may use formulas for arithmetic progression to avoid looping. So, now you have system of two equations with two unknowns: missing number m and repeated number r. Solve it.
Next, you "sort" array by filling it with numbers 1 to n left to right, omitting m and duplicating r. Thus, overall algorithm requires only two passes over array.

void sort() {
for (int i = 1; i <= N; ++i) {
while (a[i] != a[a[i]]) {
std::swap(a[i], a[a[i]]);
}
}
for (int i = 1; i <= N; ++i) {
if (a[i] == i) continue;
for (int j = a[i] - 1; j >= i; --j) a[j] = j + 1;
for (int j = a[i] + 1; j <= i; ++j) a[j] = j - 1;
break;
}
}
Explanation:
Let's denote m the missing number and d the duplicated number
Please note in the while loop, the break condition is a[i] != a[a[i]] which covers both a[i] == i and a[i] is a duplicate.
After the first for, every non-duplicate number i is encountered 1-2 time and moved into the i-th position of the array at most 1 time.
The first-found number d is moved to d-th position, at most 1 time
The second d is moved around at most N-1 times and ends up in m-th position because every other i-th slot is occupied by number i
The second outer for locate the first i where a[i] != i. The only i satisfies that is i = m
The 2 inner fors handle 2 cases where m < d and m > d respectively
Full implementation at http://ideone.com/VDuLka

After
int temp = arr[arr[i]-1];
add a check for duplicate in the loop:
if((temp-1) == i){ // found duplicate
...
} else {
arr[arr[i]-1] = arr[i];
arr[i] = temp;
}
See if you can figure out the rest of the code.

Binary search bounds

I always have the hardest time with this and I have yet to see a definitive explanation for something that is supposedly so common and highly-used.
We already know the standard binary search. Given starting lower and upper bounds, find the middle point at (lower + higher)/2, and then compare it against your array, and then re-set the bounds accordingly, etc.
However what are the needed differences to adjust the search to find (for a list in ascending order):
Smallest value >= target
Smallest value > target
Largest value <= target
Largest value < target
It seems like each of these cases requires very small tweaks to the algorithm but I can never get them to work right. I try changing inequalities, return conditions, I change how the bounds are updated, but nothing seems consistent.
What are the definitive ways to handle these four cases?

I had exactly the same issue until I figured out loop invariants along with predicates are the best and most consistent way of approaching all binary problems.
Point 1: Think of predicates
In general for all these 4 cases (and also the normal binary search for equality), imagine them as a predicate. So what this means is that some of the values are meeting the predicate and some some failing. So consider for example this array with a target of 5:
[1, 2, 3, 4, 6, 7, 8]. Finding the first number greater than 5 is basically equivalent of finding the first one in this array: [0, 0, 0, 0, 1, 1, 1].
Point 2: Search boundaries inclusive
I like to have both ends always inclusive. But I can see some people like start to be inclusive and end exclusive (on len instead of len -1). I like to have all the elements inside of the array, so when referring to a[mid] I don't think whether that will give me an array out of bound. So my preference: Go inclusive!!!
Point 3: While loop condition <=
So we even want to process the subarray of size 1 in the while loop, and when the while loop finishes there should be no unprocessed element. I really like this logic. It's always solid as a rock. Initially all the elements are not inspected, basically they are unknown. Meaning that everything in the range of [st = 0, to end = len - 1] are not inspected. Then when the while loop finishes, the range of uninspected elements should be array of size 0!
Point 4: Loop invariants
Since we defined start = 0, end = len - 1, invariants will be like this:
Anything left of start is smaller than target.
Anything right of end is greater than or equal to the target.
Point 5: The answer
Once the loop finishes, basically based on the loop invariants anything to the left of start is smaller. So that means that start is the first element greater than or equal to the target.
Equivalently, anything to the right of end is greater than or equal to the target. So that means the answer is also equal to end + 1.
The code:
public int find(int a[], int target){
int start = 0;
int end = a.length - 1;
while (start <= end){
int mid = (start + end) / 2; // or for no overflow start + (end - start) / 2
if (a[mid] < target)
start = mid + 1;
else // a[mid] >= target
end = mid - 1;
}
return start; // or end + 1;
}
variations:
<
It's equivalent of finding the first 0. So basically only return changes.
return end; // or return start - 1;
>
change the if condition to <= and else will be >. No other change.
<=
Same as >, return end; // or return start - 1;
So in general with this model for all the 5 variations (<=, <, >, >=, normal binary search) only the condition in the if changes and the return statement. And figuring those small changes is super easy when you consider the invariants (point 4) and the answer (point 5).
Hope this clarifies for whoever reads this. If anything is unclear of feels like magic please ping me to explain. After understanding this method, everything for binary search should be as clear as day!
Extra point: It would be a good practice to also try including the start but excluding the end. So the array would be initially [0, len). If you can write the invariants, new condition for the while loop, the answer and then a clear code, it means you learnt the concept.

Binary search(at least the way I implement it) relies on a simple property - a predicate holds true for one end of the interval and does not hold true for the other end. I always consider my interval to be closed at one end and opened at the other. So let's take a look at this code snippet:
int beg = 0; // pred(beg) should hold true
int end = n;// length of an array or a value that is guranteed to be out of the interval that we are interested in
while (end - beg > 1) {
int mid = (end + beg) / 2;
if (pred(a[mid])) {
beg = mid;
} else {
end = mid;
}
}
// answer is at a[beg]
This will work for any of the comparisons you define. Simply replace pred with <=target or >=target or <target or >target.
After the cycle exits, a[beg] will be the last element for which the given inequality holds.
So let's assume(like suggested in the comments) that we want to find the largest number for which a[i] <= target. Then if we use predicate a[i] <= target the code will look like:
int beg = 0; // pred(beg) should hold true
int end = n;// length of an array or a value that is guranteed to be out of the interval that we are interested in
while (end - beg > 1) {
int mid = (end + beg) / 2;
if (a[mid] <= target) {
beg = mid;
} else {
end = mid;
}
}
And after the cycle exits, the index that you are searching for will be beg.
Also depending on the comparison you may have to start from the right end of the array. E.g. if you are searching for the largest value >= target, you will do something of the sort of:
beg = -1;
end = n - 1;
while (end - beg > 1) {
int mid = (end + beg) / 2;
if (a[mid] >= target) {
end = mid;
} else {
beg = mid;
}
}
And the value that you are searching for will be with index end. Note that in this case I consider the interval (beg, end] and thus I've slightly modified the starting interval.

The basic binary search is to search the position/value which equals with the target key. While it can be extended to find the minimal position/value which satisfy some condition, or find the maximal position/value which satisfy some condition.
Suppose the array is ascending order, if no satisfied position/value found, return -1.
Code sample:
// find the minimal position which satisfy some condition
private static int getMinPosition(int[] arr, int target) {
int l = 0, r = arr.length - 1;
int ans = -1;
while(l <= r) {
int m = (l + r) >> 1;
// feel free to replace the condition
// here it means find the minimal position that the element not smaller than target
if(arr[m] >= target) {
ans = m;
r = m - 1;
} else {
l = m + 1;
}
}
return ans;
}
// find the maximal position which satisfy some condition
private static int getMaxPosition(int[] arr, int target) {
int l = 0, r = arr.length - 1;
int ans = -1;
while(l <= r) {
int m = (l + r) >> 1;
// feel free to replace the condition
// here it means find the maximal position that the element less than target
if(arr[m] < target) {
ans = m;
l = m + 1;
} else {
r = m - 1;
}
}
return ans;
}
int[] a = {3, 5, 5, 7, 10, 15};
System.out.println(BinarySearchTool.getMinPosition(a, 5));
System.out.println(BinarySearchTool.getMinPosition(a, 6));
System.out.println(BinarySearchTool.getMaxPosition(a, 8));

What you need is a binary search that lets you participate in the process at the last step. The typical binary search would receive (array, element) and produce a value (normally the index or not found). But if you have a modified binary that accept a function to be invoked at the end of the search you can cover all cases.
For example, in Javascript to make it easy to test, the following binary search
function binarySearch(array, el, fn) {
function aux(left, right) {
if (left > right) {
return fn(array, null, left, right);
}
var middle = Math.floor((left + right) / 2);
var value = array[middle];
if (value > el) {
return aux(left, middle - 1);
} if (value < el) {
return aux(middle + 1, right);
} else {
return fn(array, middle, left, right);
}
}
return aux(0, array.length - 1);
}
would allow you to cover each case with a particular return function.
default
function(a, m) { return m; }
Smallest value >= target
function(a, m, l, r) { return m != null ? a[m] : r + 1 >= a.length ? null : a[r + 1]; }
Smallest value > target
function(a, m, l, r) { return (m || r) + 1 >= a.length ? null : a[(m || r) + 1]; }
Largest value <= target
function(a, m, l, r) { return m != null ? a[m] : l - 1 > 0 ? a[l - 1] : null; }
Largest value < target
function(a, m, l, r) { return (m || l) - 1 < 0 ? null : a[(m || l) - 1]; }

Old Top Coder riddle: Making a number by inserting +

I am thinking about this topcoder problem.
Given a string of digits, find the minimum number of additions required for the string to equal some target number. Each addition is the equivalent of inserting a plus sign somewhere into the string of digits. After all plus signs are inserted, evaluate the sum as usual.
For example, consider "303" and a target sum of 6. The best strategy is "3+03".
I would solve it with brute force as follows:
for each i in 0 to 9 // i -- number of plus signs to insert
for each combination c of i from 10
for each pos in c // we can just split the string w/o inserting plus signs
insert plus sign in position pos
evaluate the expression
if the expression value == given sum
return i
Does it make sense? Is it optimal from the performance point of view?
...
Well, now I see that a dynamic programming solution will be more efficient. However it is interesting if the presented solution makes sense anyway.

It's certainly not optimal. If, for example, you are given the string "1234567890" and the target is a three-digit number, you know that you have to split the string into at least four parts, so you need not check 0, 1, or 2 inserts. Also, the target limits the range of admissible insertion positions. Both points have small impact for short strings, but can make a huge difference for longer ones. However, I suspect there's a vastly better method, smells a bit of DP.

I haven't given it much thought yet, but if you scroll down you can see a link to the contest it was from, and from there you can see the solvers' solutions. Here's one in C#.
using System;
using System.Text;
using System.Text.RegularExpressions;
using System.Collections;
public class QuickSums {
public int minSums(string numbers, int sum) {
int[] arr = new int[numbers.Length];
for (int i = 0 ; i < arr.Length; i++)
arr[i] = 0;
int min = 15;
while (arr[arr.Length - 1] != 2)
{
arr[0]++;
for (int i = 0; i < arr.Length - 1; i++)
if (arr[i] == 2)
{
arr[i] = 0;
arr[i + 1]++;
}
String newString = "";
for (int i = 0; i < numbers.Length; i++)
{
newString+=numbers[i];
if (arr[i] == 1)
newString+="+";
}
String[] nums = newString.Split('+');
int sum1 = 0;
for (int i = 0; i < nums.Length; i++)
try
{
sum1 += Int32.Parse(nums[i]);
}
catch
{
}
if (sum == sum1 && nums.Length - 1 < min)
min = nums.Length - 1;
}
if (min == 15)
return -1;
return min;
}
}

Because input length is small (10) all possible ways (which can be found by a simple binary counter of length 10) is small (2^10 = 1024), so your algorithm is fast enough and returns valid result, and IMO there is no need to improve it.
In all until your solution works fine in time and memory and other given constrains, there is no need to do micro optimization. e.g this case as akappa offered can be solved with DP like DP in two-Partition problem, but when your algorithm is fast there is no need to do this and may be adding some big constant or making code unreadable.
I just offer parse digits of string one time (in array of length 10) to prevent from too many string parsing, and just use a*10^k + ... (Also you can calculate 10^k for k=0..9 in startup and save its value).

I think the problem is similar to Matrix Chain Multiplication problem where we have to put braces for least multiplication. Here braces represent '+'. So I think it could be solved by similar dp approach.. Will try to implement it.

dynamic programming :
public class QuickSums {
public static int req(int n, int[] digits, int sum) {
if (n == 0) {
if (sum == 0)
return 0;
else
return -1;
} else if (n == 1) {
if (sum == digits[0]) {
return 0;
} else {
return -1;
}
}
int deg = 1;
int red = 0;
int opt = 100000;
int split = -1;
for (int i=0; i<n;i++) {
red += digits[n-i-1] * deg;
int t = req(n-i-1,digits,sum - red);
if (t != -1 && t <= opt) {
opt = t;
split = i;
}
deg = deg*10;
}
if (opt == 100000)
return -1;
if (split == n-1)
return opt;
else
return opt + 1;
}
public static int solve (String digits,int sum) {
int [] dig = new int[digits.length()];
for (int i=0;i<digits.length();i++) {
dig[i] = digits.charAt(i) - 48;
}
return req(digits.length(), dig, sum);
}
public static void doit() {
String digits = "9230560001";
int sum = 71;
int result = solve(digits, sum);
System.out.println(result);
}

Seems to be too late .. but just read some comments and answers here which say no to dp approach . But it is a very straightforward dp similar to rod-cutting problem:
To get the essence:
int val[N][N];
int dp[N][T];
val[i][j]: numerical value of s[i..j] including both i and j
val[i][j] can be easily computed using dynamic programming approach in O(N^2) time
dp[i][j] : Minimum no of '+' symbols to be inserted in s[0..i] to get the required sum j
dp[i][j] = min( 1+dp[k][j-val[k+1][j]] ) over all k such that 0<=k<=i and val[k][j]>0
In simple terms , to compute dp[i][j] you assume the position k of last '+' symbol and then recur for s[0..k]

Array of size n, with one element n/2 times

Given an array of n integers, where one element appears more than n/2 times. We need to find that element in linear time and constant extra space.
YAAQ: Yet another arrays question.

I have a sneaking suspicion it's something along the lines of (in C#)
// We don't need an array
public int FindMostFrequentElement(IEnumerable<int> sequence)
{
// Initial value is irrelevant if sequence is non-empty,
// but keeps compiler happy.
int best = 0;
int count = 0;
foreach (int element in sequence)
{
if (count == 0)
{
best = element;
count = 1;
}
else
{
// Vote current choice up or down
count += (best == element) ? 1 : -1;
}
}
return best;
}
It sounds unlikely to work, but it does. (Proof as a postscript file, courtesy of Boyer/Moore.)

Find the median, it takes O(n) on an unsorted array. Since more than n/2 elements are equal to the same value, the median is equal to that value as well.

int findLeader(int n, int* x){
int leader = x[0], c = 1, i;
for(i=1; i<n; i++){
if(c == 0){
leader = x[i];
c = 1;
} else {
if(x[i] == leader) c++;
else c--;
}
}
if(c == 0) return NULL;
else {
c = 0;
for(i=0; i<n; i++){
if(x[i] == leader) c++;
}
if(c > n/2) return leader;
else return NULL;
}
}
I'm not the author of this code, but this will work for your problem. The first part looks for a potential leader, the second checks if it appears more than n/2 times in the array.

This is what I thought initially.
I made an attempt to keep the invariant "one element appears more than n/2 times", while reducing the problem set.
Lets start comparing a[i], a[i+1]. If they're equal we compare a[i+i], a[i+2]. If not, we remove both a[i], a[i+1] from the array. We repeat this until i>=(current size)/2. At this point we'll have 'THE' element occupying the first (current size)/2 positions.
This would maintain the invariant.
The only caveat is that we assume that the array is in a linked list [for it to give a O(n) complexity.]
What say folks?
-bhupi

Well you can do an inplace radix sort as described here[pdf] this takes no extra space and linear time. then you can make a single pass counting consecutive elements and terminating at count > n/2.

How about:
randomly select a small subset of K elements and look for duplicates (e.g. first 4, first 8, etc). If K == 4 then the probability of not getting at least 2 of the duplicates is 1/8. if K==8 then it goes to under 1%. If you find no duplicates repeat the process until you do. (assuming that the other elements are more randomly distributed, this would perform very poorly with, say, 49% of the array = "A", 51% of the array ="B").
e.g.:
findDuplicateCandidate:
select a fixed size subset.
return the most common element in that subset
if there is no element with more than 1 occurrence repeat.
if there is more than 1 element with more than 1 occurrence call findDuplicate and choose the element the 2 calls have in common
This is a constant order operation (if the data set isn't bad) so then do a linear scan of the array in order(N) to verify.

My first thought (not sufficient) would be to:
Sort the array in place
Return the middle element
But that would be O(n log n), as would any recursive solution.
If you can destructively modify the array (and various other conditions apply) you could do a pass replacing elements with their counts or something. Do you know anything else about the array, and are you allowed to modify it?
Edit Leaving my answer here for posterity, but I think Skeet's got it.

in php---pls check if it's correct
function arrLeader( $A ){
$len = count($A);
$B = array();
$val=-1;
$counts = array_count_values(array); //return array with elements as keys and occurrences of each element as values
for($i=0;$i<$len;$i++){
$val = $A[$i];
if(in_array($val,$B,true)){//to avoid looping again and again
}else{
if($counts[$val]>$len/2){
return $val;
}
array_push($B, $val);//to avoid looping again and again
}
}
return -1;
}

int n = A.Length;
int[] L = new int[n + 1];
L[0] = -1;
for (int i = 0; i < n; i++)
{
L[i + 1] = A[i];
}
int count = 0;
int pos = (n + 1) / 2;
int candidate = L[pos];
for (int i = 1; i <= n; i++)
{
if (L[i] == candidate && L[pos++] == candidate)
return candidate;
}
if (count > pos)
return candidate;
return (-1);

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Binary Search Explanation - algorithm

Related

How can I figure out loop invariant in my binary search implementation?

Sort a given array whose elements range from 1 to n , in which one element is missing and one is repeated

Binary search bounds

Old Top Coder riddle: Making a number by inserting +

Array of size n, with one element n/2 times

Categories

Resources