Finding bounded nearest neighbour in a 1-dimensional array - algorithm

Let's say we have some array of boolean values:
A = [0 1 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 1 1 0 1 1 1 1 0 0 0 ... 0]
The array is constructed by performing classification on a stream of data. Each element in the array corresponds to the output of a classification algorithm given a small "chunk" of the data. An answer may include restructuring the array to make parsing more efficient.
The array is pseudo random in the sense that groups of 1's and 0's tend to exist in bunches (but not necessarily always).
Given some index, i, what is the most efficient way to find the group of at least n zeros closest to A[i]? For the easy case, take n = 1.
EDIT: Groups should have AT LEAST n zeros. Again, for the easy case, that means at least 1 zero.
EDIT2: This search will be performed o(n) times, where n is size of the array. (Specifically, its n/c, where c is some fixed duration.

In this solution I organize the data so that you can use a binary search O(log n) to find the nearest group of at least a certain size.
I first create groups of zeros from the array, then I put each group of zeros into lists containing all groups of size s or larger , so that when you want to find the nearest group of s s or more then you just run a binary search in the list that has all groups with a size of s or greater.
The downside is in the pre-processing of putting the groups into the lists, with O(n * m) (I think, please check me) time and space efficiency where n is the number of groups of zeros, and m is the max size of the groups, though in reality the efficiency is probably better.
Here is the code:
public static class Group {
final public int x1;
final public int x2;
final public int size;
public Group(int x1, int x2) {
assert x1 <= x2;
this.x1 = x1;
this.x2 = x2;
this.size = x2 - x1 + 1;
}
public static final List<Group> getGroupsOfZeros(byte[] arr) {
List<Group> listOfGroups = new ArrayList<>();
for (int i = 0; i < arr.length; i++) {
if (arr[i] == 0) {
int x1 = i;
for (++i; i < arr.length; i++)
if (arr[i] != 0)
break;
int x2 = i - 1;
listOfGroups.add(new Group(x1, x2));
}
}
return Collections.unmodifiableList(listOfGroups);
}
public static final Group binarySearchNearest(int i, List<Group> list) {
{ // edge cases
Group firstGroup = list.get(0);
if (i <= firstGroup.x2)
return firstGroup;
Group lastGroup = list.get(list.size() - 1);
if (i >= lastGroup.x1)
return lastGroup;
}
int lo = 0;
int hi = list.size() - 1;
while (lo <= hi) {
int mid = (hi + lo) / 2;
Group currGroup = list.get(mid);
if (i < currGroup.x1) {
hi = mid - 1;
} else if (i > currGroup.x2) {
lo = mid + 1;
} else {
// x1 <= i <= x2
return currGroup;
}
}
// intentionally swapped because: lo == hi + 1
Group lowGroup = list.get(hi);
Group highGroup = list.get(lo);
return (i - lowGroup.x2) < (highGroup.x1 - i) ? lowGroup : highGroup;
}
}
NOTE: GroupsBySize can be improved, as described by #maraca to only contain a list of Groups per each distinct group size. I'll update tomorrow.
public static class GroupsBySize {
private List<List<Group>> listOfGroupsBySize = new ArrayList<>();
public GroupsBySize(List<Group> groups) {
for (Group group : groups) {
// ensure internal array can groups up to this size
while (listOfGroupsBySize.size() < group.size) {
listOfGroupsBySize.add(new ArrayList<Group>());
}
// add group to all lists up to its size
for (int i = 0; i < group.size; i++) {
listOfGroupsBySize.get(i).add(group);
}
}
}
public final Group getNearestGroupOfAtLeastSize(int index, int atLeastSize) {
if (atLeastSize < 1)
throw new IllegalArgumentException("group size must be greater than 0");
List<Group> groupsOfAtLeastSize = listOfGroupsBySize.get(atLeastSize - 1);
return Group.binarySearchNearest(index, groupsOfAtLeastSize);
}
}
public static void main(String[] args) {
byte[] byteArray = null;
List<Group> groups = Group.getGroupsOfZeros(byteArray);
GroupsBySize groupsBySize = new GroupsBySize(groups);
int index = 12;
int atLeastSize = 5;
Group g = groupsBySize.getNearestGroupOfAtLeastSize(index, atLeastSize);
System.out.println("nearest group is (" + g.x1 + ":" + g.x2 + ") of size " + g.size);
}

If you have n queries on an array of size n, then the naive approach would take O(n^2) time.
You can optimize this by incorporating the observation that the number of distinct group sizes is in the order of sqrt(n), because the most distinct group sizes we get if we have one group of size 1, one of size 2, one of size 3 and so on, we know that 1 + 2 + 3 + ... + n is n * (n + 1) / 2, so in the order of n^2, but the array has size n, so the number of distinct group sizes is in the order of sqrt(n).
create an integer array of size n to denote which group sizes are present how many times
create a list for the 0-groups, each element should contain the group size and starting index
scan the array, add the 0-groups to the list and update the present group sizes
create an array for the different group sizes, each entry should contain the group size and an array with the start indices of the groups
create an integer array or a map which tells you which group size is at which index by scanning the array of the present group sizes
go through the list of 0-groups and fill the start index arrays created at 4.
We end up with an array which takes O(n) space, takes O(n) time to create and contains all present group sizes in order, additionally each entry has an array with the start indices of the groups of that size.
To answer a query we can do a binary search on the start indices of all groups greater or equal than the given minimum group size. This takes O(log(n)*sqrt(n)) and we do it n times, so over all it would take O(n*log(n)*sqrt(n)) = O(n^1.5*log(n)) which is better than O(n^2).
I think you can get it down to O(n^1.5) by creating a structure which has all distinct group sizes but contains not only the groups of that size, but also the groups that are bigger than that size. This would be the time complexity to create the structure and answering all the n queries would be faster O(n*log(sqrt(n))*log(n)) I think, so it doesn't matter.
example:
[0 1 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0, 1, 0, 0] -- 0 indexed array
hashmap = {1:[0], 2:[15, 18], 7:[5]}
search(i = 7, n = 2) {
binary search in {2:[15, 18], 7:[5]}
return min(15, 5)
}

what is the most efficient way to find the group of at least n zeros closest to A[i]
If we are not limited in preprocessing time and resources, the most efficient way would seem to be O(1) time and O(n * sqrt n) space, storing the answers to all possible queries. (To accomplish that, run the algorithm below with a list of all possible queries, that is each distinct zero-group size in the array paired with each index.)
If we are provided with all the n / c queries at once, we can produce the complete result set in O(n log n) time.
Traverse once from the left and once from the right. For each traversal, start with a balanced binary tree with our queries, sorted by zero-group-size (the n in the query), where each node has a sorted list of the query indexes (all is with this particular n).
At each iteration, when a zero-group is registered, update all queries with n equal and lower than this zero-group size, removing all equal and lower indexes from the node and keeping the records for them (since the index list is sorted, we just remove the head of the list while it's equal or lower than the current index), and storing the current index of the zero-group in the node as well (the "last seen" zero-group-index). If no is are left in the node, remove it.
After the traversal, assign each node's "last seen" zero-group-index to any remaining is in that node. Now we have all the answers for this traversal. (Any queries left in the tree have no answer.) In the opposite traversal, if any query comes up with a better (closer) answer, update it in the final record.

Related

Efficient algorithm to search a element in rectangular Young Tableau [duplicate]

I was recently given this interview question and I'm curious what a good solution to it would be.
Say I'm given a 2d array where all the
numbers in the array are in increasing
order from left to right and top to
bottom.
What is the best way to search and
determine if a target number is in the
array?
Now, my first inclination is to utilize a binary search since my data is sorted. I can determine if a number is in a single row in O(log N) time. However, it is the 2 directions that throw me off.
Another solution I thought may work is to start somewhere in the middle. If the middle value is less than my target, then I can be sure it is in the left square portion of the matrix from the middle. I then move diagonally and check again, reducing the size of the square that the target could potentially be in until I have honed in on the target number.
Does anyone have any good ideas on solving this problem?
Example array:
Sorted left to right, top to bottom.
1 2 4 5 6
2 3 5 7 8
4 6 8 9 10
5 8 9 10 11
Here's a simple approach:
Start at the bottom-left corner.
If the target is less than that value, it must be above us, so move up one.
Otherwise we know that the target can't be in that column, so move right one.
Goto 2.
For an NxM array, this runs in O(N+M). I think it would be difficult to do better. :)
Edit: Lots of good discussion. I was talking about the general case above; clearly, if N or M are small, you could use a binary search approach to do this in something approaching logarithmic time.
Here are some details, for those who are curious:
History
This simple algorithm is called a Saddleback Search. It's been around for a while, and it is optimal when N == M. Some references:
David Gries, The Science of Programming. Springer-Verlag, 1989.
Edsgar Dijkstra, The Saddleback Search. Note EWD-934, 1985.
However, when N < M, intuition suggests that binary search should be able to do better than O(N+M): For example, when N == 1, a pure binary search will run in logarithmic rather than linear time.
Worst-case bound
Richard Bird examined this intuition that binary search could improve the Saddleback algorithm in a 2006 paper:
Richard S. Bird, Improving Saddleback Search: A Lesson in Algorithm Design, in Mathematics of Program Construction, pp. 82--89, volume 4014, 2006.
Using a rather unusual conversational technique, Bird shows us that for N <= M, this problem has a lower bound of Ω(N * log(M/N)). This bound make sense, as it gives us linear performance when N == M and logarithmic performance when N == 1.
Algorithms for rectangular arrays
One approach that uses a row-by-row binary search looks like this:
Start with a rectangular array where N < M. Let's say N is rows and M is columns.
Do a binary search on the middle row for value. If we find it, we're done.
Otherwise we've found an adjacent pair of numbers s and g, where s < value < g.
The rectangle of numbers above and to the left of s is less than value, so we can eliminate it.
The rectangle below and to the right of g is greater than value, so we can eliminate it.
Go to step (2) for each of the two remaining rectangles.
In terms of worst-case complexity, this algorithm does log(M) work to eliminate half the possible solutions, and then recursively calls itself twice on two smaller problems. We do have to repeat a smaller version of that log(M) work for every row, but if the number of rows is small compared to the number of columns, then being able to eliminate all of those columns in logarithmic time starts to become worthwhile.
This gives the algorithm a complexity of T(N,M) = log(M) + 2 * T(M/2, N/2), which Bird shows to be O(N * log(M/N)).
Another approach posted by Craig Gidney describes an algorithm similar the approach above: it examines a row at a time using a step size of M/N. His analysis shows that this results in O(N * log(M/N)) performance as well.
Performance Comparison
Big-O analysis is all well and good, but how well do these approaches work in practice? The chart below examines four algorithms for increasingly "square" arrays:
(The "naive" algorithm simply searches every element of the array. The "recursive" algorithm is described above. The "hybrid" algorithm is an implementation of Gidney's algorithm. For each array size, performance was measured by timing each algorithm over fixed set of 1,000,000 randomly-generated arrays.)
Some notable points:
As expected, the "binary search" algorithms offer the best performance on rectangular arrays and the Saddleback algorithm works the best on square arrays.
The Saddleback algorithm performs worse than the "naive" algorithm for 1-d arrays, presumably because it does multiple comparisons on each item.
The performance hit that the "binary search" algorithms take on square arrays is presumably due to the overhead of running repeated binary searches.
Summary
Clever use of binary search can provide O(N * log(M/N) performance for both rectangular and square arrays. The O(N + M) "saddleback" algorithm is much simpler, but suffers from performance degradation as arrays become increasingly rectangular.
This problem takes Θ(b lg(t)) time, where b = min(w,h) and t=b/max(w,h). I discuss the solution in this blog post.
Lower bound
An adversary can force an algorithm to make Ω(b lg(t)) queries, by restricting itself to the main diagonal:
Legend: white cells are smaller items, gray cells are larger items, yellow cells are smaller-or-equal items and orange cells are larger-or-equal items. The adversary forces the solution to be whichever yellow or orange cell the algorithm queries last.
Notice that there are b independent sorted lists of size t, requiring Ω(b lg(t)) queries to completely eliminate.
Algorithm
(Assume without loss of generality that w >= h)
Compare the target item against the cell t to the left of the top right corner of the valid area
If the cell's item matches, return the current position.
If the cell's item is less than the target item, eliminate the remaining t cells in the row with a binary search. If a matching item is found while doing this, return with its position.
Otherwise the cell's item is more than the target item, eliminating t short columns.
If there's no valid area left, return failure
Goto step 2
Finding an item:
Determining an item doesn't exist:
Legend: white cells are smaller items, gray cells are larger items, and the green cell is an equal item.
Analysis
There are b*t short columns to eliminate. There are b long rows to eliminate. Eliminating a long row costs O(lg(t)) time. Eliminating t short columns costs O(1) time.
In the worst case we'll have to eliminate every column and every row, taking time O(lg(t)*b + b*t*1/t) = O(b lg(t)).
Note that I'm assuming lg clamps to a result above 1 (i.e. lg(x) = log_2(max(2,x))). That's why when w=h, meaning t=1, we get the expected bound of O(b lg(1)) = O(b) = O(w+h).
Code
public static Tuple<int, int> TryFindItemInSortedMatrix<T>(this IReadOnlyList<IReadOnlyList<T>> grid, T item, IComparer<T> comparer = null) {
if (grid == null) throw new ArgumentNullException("grid");
comparer = comparer ?? Comparer<T>.Default;
// check size
var width = grid.Count;
if (width == 0) return null;
var height = grid[0].Count;
if (height < width) {
var result = grid.LazyTranspose().TryFindItemInSortedMatrix(item, comparer);
if (result == null) return null;
return Tuple.Create(result.Item2, result.Item1);
}
// search
var minCol = 0;
var maxRow = height - 1;
var t = height / width;
while (minCol < width && maxRow >= 0) {
// query the item in the minimum column, t above the maximum row
var luckyRow = Math.Max(maxRow - t, 0);
var cmpItemVsLucky = comparer.Compare(item, grid[minCol][luckyRow]);
if (cmpItemVsLucky == 0) return Tuple.Create(minCol, luckyRow);
// did we eliminate t rows from the bottom?
if (cmpItemVsLucky < 0) {
maxRow = luckyRow - 1;
continue;
}
// we eliminated most of the current minimum column
// spend lg(t) time eliminating rest of column
var minRowInCol = luckyRow + 1;
var maxRowInCol = maxRow;
while (minRowInCol <= maxRowInCol) {
var mid = minRowInCol + (maxRowInCol - minRowInCol + 1) / 2;
var cmpItemVsMid = comparer.Compare(item, grid[minCol][mid]);
if (cmpItemVsMid == 0) return Tuple.Create(minCol, mid);
if (cmpItemVsMid > 0) {
minRowInCol = mid + 1;
} else {
maxRowInCol = mid - 1;
maxRow = mid - 1;
}
}
minCol += 1;
}
return null;
}
I would use the divide-and-conquer strategy for this problem, similar to what you suggested, but the details are a bit different.
This will be a recursive search on subranges of the matrix.
At each step, pick an element in the middle of the range. If the value found is what you are seeking, then you're done.
Otherwise, if the value found is less than the value that you are seeking, then you know that it is not in the quadrant above and to the left of your current position. So recursively search the two subranges: everything (exclusively) below the current position, and everything (exclusively) to the right that is at or above the current position.
Otherwise, (the value found is greater than the value that you are seeking) you know that it is not in the quadrant below and to the right of your current position. So recursively search the two subranges: everything (exclusively) to the left of the current position, and everything (exclusively) above the current position that is on the current column or a column to the right.
And ba-da-bing, you found it.
Note that each recursive call only deals with the current subrange only, not (for example) ALL rows above the current position. Just those in the current subrange.
Here's some pseudocode for you:
bool numberSearch(int[][] arr, int value, int minX, int maxX, int minY, int maxY)
if (minX == maxX and minY == maxY and arr[minX,minY] != value)
return false
if (arr[minX,minY] > value) return false; // Early exits if the value can't be in
if (arr[maxX,maxY] < value) return false; // this subrange at all.
int nextX = (minX + maxX) / 2
int nextY = (minY + maxY) / 2
if (arr[nextX,nextY] == value)
{
print nextX,nextY
return true
}
else if (arr[nextX,nextY] < value)
{
if (numberSearch(arr, value, minX, maxX, nextY + 1, maxY))
return true
return numberSearch(arr, value, nextX + 1, maxX, minY, nextY)
}
else
{
if (numberSearch(arr, value, minX, nextX - 1, minY, maxY))
return true
reutrn numberSearch(arr, value, nextX, maxX, minY, nextY)
}
The two main answers give so far seem to be the arguably O(log N) "ZigZag method" and the O(N+M) Binary Search method. I thought I'd do some testing comparing the two methods with some various setups. Here are the details:
The array is N x N square in every test, with N varying from 125 to 8000 (the largest my JVM heap could handle). For each array size, I picked a random place in the array to put a single 2. I then put a 3 everywhere possible (to the right and below of the 2) and then filled the rest of the array with 1. Some of the earlier commenters seemed to think this type of setup would yield worst case run time for both algorithms. For each array size, I picked 100 different random locations for the 2 (search target) and ran the test. I recorded avg run time and worst case run time for each algorithm. Because it was happening too fast to get good ms readings in Java, and because I don't trust Java's nanoTime(), I repeated each test 1000 times just to add a uniform bias factor to all the times. Here are the results:
ZigZag beat binary in every test for both avg and worst case times, however, they are all within an order of magnitude of each other more or less.
Here is the Java code:
public class SearchSortedArray2D {
static boolean findZigZag(int[][] a, int t) {
int i = 0;
int j = a.length - 1;
while (i <= a.length - 1 && j >= 0) {
if (a[i][j] == t) return true;
else if (a[i][j] < t) i++;
else j--;
}
return false;
}
static boolean findBinarySearch(int[][] a, int t) {
return findBinarySearch(a, t, 0, 0, a.length - 1, a.length - 1);
}
static boolean findBinarySearch(int[][] a, int t,
int r1, int c1, int r2, int c2) {
if (r1 > r2 || c1 > c2) return false;
if (r1 == r2 && c1 == c2 && a[r1][c1] != t) return false;
if (a[r1][c1] > t) return false;
if (a[r2][c2] < t) return false;
int rm = (r1 + r2) / 2;
int cm = (c1 + c2) / 2;
if (a[rm][cm] == t) return true;
else if (a[rm][cm] > t) {
boolean b1 = findBinarySearch(a, t, r1, c1, r2, cm - 1);
boolean b2 = findBinarySearch(a, t, r1, cm, rm - 1, c2);
return (b1 || b2);
} else {
boolean b1 = findBinarySearch(a, t, r1, cm + 1, rm, c2);
boolean b2 = findBinarySearch(a, t, rm + 1, c1, r2, c2);
return (b1 || b2);
}
}
static void randomizeArray(int[][] a, int N) {
int ri = (int) (Math.random() * N);
int rj = (int) (Math.random() * N);
a[ri][rj] = 2;
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
if (i == ri && j == rj) continue;
else if (i > ri || j > rj) a[i][j] = 3;
else a[i][j] = 1;
}
}
}
public static void main(String[] args) {
int N = 8000;
int[][] a = new int[N][N];
int randoms = 100;
int repeats = 1000;
long start, end, duration;
long zigMin = Integer.MAX_VALUE, zigMax = Integer.MIN_VALUE;
long binMin = Integer.MAX_VALUE, binMax = Integer.MIN_VALUE;
long zigSum = 0, zigAvg;
long binSum = 0, binAvg;
for (int k = 0; k < randoms; k++) {
randomizeArray(a, N);
start = System.currentTimeMillis();
for (int i = 0; i < repeats; i++) findZigZag(a, 2);
end = System.currentTimeMillis();
duration = end - start;
zigSum += duration;
zigMin = Math.min(zigMin, duration);
zigMax = Math.max(zigMax, duration);
start = System.currentTimeMillis();
for (int i = 0; i < repeats; i++) findBinarySearch(a, 2);
end = System.currentTimeMillis();
duration = end - start;
binSum += duration;
binMin = Math.min(binMin, duration);
binMax = Math.max(binMax, duration);
}
zigAvg = zigSum / randoms;
binAvg = binSum / randoms;
System.out.println(findZigZag(a, 2) ?
"Found via zigzag method. " : "ERROR. ");
//System.out.println("min search time: " + zigMin + "ms");
System.out.println("max search time: " + zigMax + "ms");
System.out.println("avg search time: " + zigAvg + "ms");
System.out.println();
System.out.println(findBinarySearch(a, 2) ?
"Found via binary search method. " : "ERROR. ");
//System.out.println("min search time: " + binMin + "ms");
System.out.println("max search time: " + binMax + "ms");
System.out.println("avg search time: " + binAvg + "ms");
}
}
This is a short proof of the lower bound on the problem.
You cannot do it better than linear time (in terms of array dimensions, not the number of elements). In the array below, each of the elements marked as * can be either 5 or 6 (independently of other ones). So if your target value is 6 (or 5) the algorithm needs to examine all of them.
1 2 3 4 *
2 3 4 * 7
3 4 * 7 8
4 * 7 8 9
* 7 8 9 10
Of course this expands to bigger arrays as well. This means that this answer is optimal.
Update: As pointed out by Jeffrey L Whitledge, it is only optimal as the asymptotic lower bound on running time vs input data size (treated as a single variable). Running time treated as two-variable function on both array dimensions can be improved.
I think Here is the answer and it works for any kind of sorted matrix
bool findNum(int arr[][ARR_MAX],int xmin, int xmax, int ymin,int ymax,int key)
{
if (xmin > xmax || ymin > ymax || xmax < xmin || ymax < ymin) return false;
if ((xmin == xmax) && (ymin == ymax) && (arr[xmin][ymin] != key)) return false;
if (arr[xmin][ymin] > key || arr[xmax][ymax] < key) return false;
if (arr[xmin][ymin] == key || arr[xmax][ymax] == key) return true;
int xnew = (xmin + xmax)/2;
int ynew = (ymin + ymax)/2;
if (arr[xnew][ynew] == key) return true;
if (arr[xnew][ynew] < key)
{
if (findNum(arr,xnew+1,xmax,ymin,ymax,key))
return true;
return (findNum(arr,xmin,xmax,ynew+1,ymax,key));
} else {
if (findNum(arr,xmin,xnew-1,ymin,ymax,key))
return true;
return (findNum(arr,xmin,xmax,ymin,ynew-1,key));
}
}
Interesting question. Consider this idea - create one boundary where all the numbers are greater than your target and another where all the numbers are less than your target. If anything is left in between the two, that's your target.
If I'm looking for 3 in your example, I read across the first row until I hit 4, then look for the smallest adjacent number (including diagonals) greater than 3:
1 2 4 5 6
2 3 5 7 8
4 6 8 9 10
5 8 9 10 11
Now I do the same for those numbers less than 3:
1 2 4 5 6
2 3 5 7 8
4 6 8 9 10
5 8 9 10 11
Now I ask, is anything inside the two boundaries? If yes, it must be 3. If no, then there is no 3. Sort of indirect since I don't actually find the number, I just deduce that it must be there. This has the added bonus of counting ALL the 3's.
I tried this on some examples and it seems to work OK.
Binary search through the diagonal of the array is the best option.
We can find out whether the element is less than or equal to the elements in the diagonal.
I've been asking this question in interviews for the better part of a decade and I think there's only been one person who has been able to come up with an optimal algorithm.
My solution has always been:
Binary search the middle diagonal, which is the diagonal running down and right, containing the item at (rows.count/2, columns.count/2).
If the target number is found, return true.
Otherwise, two numbers (u and v) will have been found such that u is smaller than the target, v is larger than the target, and v is one right and one down from u.
Recursively search the sub-matrix to the right of u and top of v and the one to the bottom of u and left of v.
I believe this is a strict improvement over the algorithm given by Nate here, since searching the diagonal often allows a reduction of over half the search space (if the matrix is close to square), whereas searching a row or column always results in an elimination of exactly half.
Here's the code in (probably not terribly Swifty) Swift:
import Cocoa
class Solution {
func searchMatrix(_ matrix: [[Int]], _ target: Int) -> Bool {
if (matrix.isEmpty || matrix[0].isEmpty) {
return false
}
return _searchMatrix(matrix, 0..<matrix.count, 0..<matrix[0].count, target)
}
func _searchMatrix(_ matrix: [[Int]], _ rows: Range<Int>, _ columns: Range<Int>, _ target: Int) -> Bool {
if (rows.count == 0 || columns.count == 0) {
return false
}
if (rows.count == 1) {
return _binarySearch(matrix, rows.lowerBound, columns, target, true)
}
if (columns.count == 1) {
return _binarySearch(matrix, columns.lowerBound, rows, target, false)
}
var lowerInflection = (-1, -1)
var upperInflection = (Int.max, Int.max)
var currentRows = rows
var currentColumns = columns
while (currentRows.count > 0 && currentColumns.count > 0 && upperInflection.0 > lowerInflection.0+1) {
let rowMidpoint = (currentRows.upperBound + currentRows.lowerBound) / 2
let columnMidpoint = (currentColumns.upperBound + currentColumns.lowerBound) / 2
let value = matrix[rowMidpoint][columnMidpoint]
if (value == target) {
return true
}
if (value > target) {
upperInflection = (rowMidpoint, columnMidpoint)
currentRows = currentRows.lowerBound..<rowMidpoint
currentColumns = currentColumns.lowerBound..<columnMidpoint
} else {
lowerInflection = (rowMidpoint, columnMidpoint)
currentRows = rowMidpoint+1..<currentRows.upperBound
currentColumns = columnMidpoint+1..<currentColumns.upperBound
}
}
if (lowerInflection.0 == -1) {
lowerInflection = (upperInflection.0-1, upperInflection.1-1)
} else if (upperInflection.0 == Int.max) {
upperInflection = (lowerInflection.0+1, lowerInflection.1+1)
}
return _searchMatrix(matrix, rows.lowerBound..<lowerInflection.0+1, upperInflection.1..<columns.upperBound, target) || _searchMatrix(matrix, upperInflection.0..<rows.upperBound, columns.lowerBound..<lowerInflection.1+1, target)
}
func _binarySearch(_ matrix: [[Int]], _ rowOrColumn: Int, _ range: Range<Int>, _ target: Int, _ searchRow : Bool) -> Bool {
if (range.isEmpty) {
return false
}
let midpoint = (range.upperBound + range.lowerBound) / 2
let value = (searchRow ? matrix[rowOrColumn][midpoint] : matrix[midpoint][rowOrColumn])
if (value == target) {
return true
}
if (value > target) {
return _binarySearch(matrix, rowOrColumn, range.lowerBound..<midpoint, target, searchRow)
} else {
return _binarySearch(matrix, rowOrColumn, midpoint+1..<range.upperBound, target, searchRow)
}
}
}
A. Do a binary search on those lines where the target number might be on.
B. Make it a graph : Look for the number by taking always the smallest unvisited neighbour node and backtracking when a too big number is found
Binary search would be the best approach, imo. Starting at 1/2 x, 1/2 y will cut it in half. IE a 5x5 square would be something like x == 2 / y == 3 . I rounded one value down and one value up to better zone in on the direction of the targeted value.
For clarity the next iteration would give you something like x == 1 / y == 2 OR x == 3 / y == 5
Well, to begin with, let us assume we are using a square.
1 2 3
2 3 4
3 4 5
1. Searching a square
I would use a binary search on the diagonal. The goal is the locate the smaller number that is not strictly lower than the target number.
Say I am looking for 4 for example, then I would end up locating 5 at (2,2).
Then, I am assured that if 4 is in the table, it is at a position either (x,2) or (2,x) with x in [0,2]. Well, that's just 2 binary searches.
The complexity is not daunting: O(log(N)) (3 binary searches on ranges of length N)
2. Searching a rectangle, naive approach
Of course, it gets a bit more complicated when N and M differ (with a rectangle), consider this degenerate case:
1 2 3 4 5 6 7 8
2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17
And let's say I am looking for 9... The diagonal approach is still good, but the definition of diagonal changes. Here my diagonal is [1, (5 or 6), 17]. Let's say I picked up [1,5,17], then I know that if 9 is in the table it is either in the subpart:
5 6 7 8
6 7 8 9
10 11 12 13 14 15 16
This gives us 2 rectangles:
5 6 7 8 10 11 12 13 14 15 16
6 7 8 9
So we can recurse! probably beginning by the one with less elements (though in this case it kills us).
I should point that if one of the dimensions is less than 3, we cannot apply the diagonal methods and must use a binary search. Here it would mean:
Apply binary search on 10 11 12 13 14 15 16, not found
Apply binary search on 5 6 7 8, not found
Apply binary search on 6 7 8 9, not found
It's tricky because to get good performance you might want to differentiate between several cases, depending on the general shape....
3. Searching a rectangle, brutal approach
It would be much easier if we dealt with a square... so let's just square things up.
1 2 3 4 5 6 7 8
2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17
17 . . . . . . 17
. .
. .
. .
17 . . . . . . 17
We now have a square.
Of course, we will probably NOT actually create those rows, we could simply emulate them.
def get(x,y):
if x < N and y < M: return table[x][y]
else: return table[N-1][M-1] # the max
so it behaves like a square without occupying more memory (at the cost of speed, probably, depending on cache... oh well :p)
EDIT:
I misunderstood the question. As the comments point out this only works in the more restricted case.
In a language like C that stores data in row-major order, simply treat it as a 1D array of size n * m and use a binary search.
I have a recursive Divide & Conquer Solution.
Basic Idea for one step is: We know that the Left-Upper(LU) is smallest and the right-bottom(RB) is the largest no., so the given No(N) must: N>=LU and N<=RB
IF N==LU and N==RB::::Element Found and Abort returning the position/Index
If N>=LU and N<=RB = FALSE, No is not there and abort.
If N>=LU and N<=RB = TRUE, Divide the 2D array in 4 equal parts of 2D array each in logical manner..
And then apply the same algo step to all four sub-array.
My Algo is Correct I have implemented on my friends PC.
Complexity: each 4 comparisons can b used to deduce the total no of elements to one-fourth at its worst case..
So My complexity comes to be 1 + 4 x lg(n) + 4
But really expected this to be working on O(n)
I think something is wrong somewhere in my calculation of Complexity, please correct if so..
The optimal solution is to start at the top-left corner, that has minimal value. Move diagonally downwards to the right until you hit an element whose value >= value of the given element. If the element's value is equal to that of the given element, return found as true.
Otherwise, from here we can proceed in two ways.
Strategy 1:
Move up in the column and search for the given element until we reach the end. If found, return found as true
Move left in the row and search for the given element until we reach the end. If found, return found as true
return found as false
Strategy 2:
Let i denote the row index and j denote the column index of the diagonal element we have stopped at. (Here, we have i = j, BTW). Let k = 1.
Repeat the below steps until i-k >= 0
Search if a[i-k][j] is equal to the given element. if yes, return found as true.
Search if a[i][j-k] is equal to the given element. if yes, return found as true.
Increment k
1 2 4 5 6
2 3 5 7 8
4 6 8 9 10
5 8 9 10 11
public boolean searchSortedMatrix(int arr[][] , int key , int minX , int maxX , int minY , int maxY){
// base case for recursion
if(minX > maxX || minY > maxY)
return false ;
// early fails
// array not properly intialized
if(arr==null || arr.length==0)
return false ;
// arr[0][0]> key return false
if(arr[minX][minY]>key)
return false ;
// arr[maxX][maxY]<key return false
if(arr[maxX][maxY]<key)
return false ;
//int temp1 = minX ;
//int temp2 = minY ;
int midX = (minX+maxX)/2 ;
//if(temp1==midX){midX+=1 ;}
int midY = (minY+maxY)/2 ;
//if(temp2==midY){midY+=1 ;}
// arr[midX][midY] = key ? then value found
if(arr[midX][midY] == key)
return true ;
// alas ! i have to keep looking
// arr[midX][midY] < key ? search right quad and bottom matrix ;
if(arr[midX][midY] < key){
if( searchSortedMatrix(arr ,key , minX,maxX , midY+1 , maxY))
return true ;
// search bottom half of matrix
if( searchSortedMatrix(arr ,key , midX+1,maxX , minY , maxY))
return true ;
}
// arr[midX][midY] > key ? search left quad matrix ;
else {
return(searchSortedMatrix(arr , key , minX,midX-1,minY,midY-1));
}
return false ;
}
I suggest, store all characters in a 2D list. then find index of required element if it exists in list.
If not present print appropriate message else print row and column as:
row = (index/total_columns) and column = (index%total_columns -1)
This will incur only the binary search time in a list.
Please suggest any corrections. :)
If O(M log(N)) solution is ok for an MxN array -
template <size_t n>
struct MN * get(int a[][n], int k, int M, int N){
struct MN *result = new MN;
result->m = -1;
result->n = -1;
/* Do a binary search on each row since rows (and columns too) are sorted. */
for(int i = 0; i < M; i++){
int lo = 0; int hi = N - 1;
while(lo <= hi){
int mid = lo + (hi-lo)/2;
if(k < a[i][mid]) hi = mid - 1;
else if (k > a[i][mid]) lo = mid + 1;
else{
result->m = i;
result->n = mid;
return result;
}
}
}
return result;
}
Working C++ demo.
Please do let me know if this wouldn't work or if there is a bug it it.
class Solution {
public boolean searchMatrix(int[][] matrix, int target) {
if(matrix == null)
return false;
int i=0;
int j=0;
int m = matrix.length;
int n = matrix[0].length;
boolean found = false;
while(i<m && !found){
while(j<n && !found){
if(matrix[i][j] == target)
found = true;
if(matrix[i][j] < target)
j++;
else
break;
}
i++;
j=0;
}
return found;
}}
129 / 129 test cases passed.
Status: Accepted
Runtime: 39 ms
Memory Usage: 55 MB
Given a square matrix as follows:
[ a b c ]
[ d e f ]
[ i j k ]
We know that a < c, d < f, i < k. What we don't know is whether d < c or d > c, etc. We have guarantees only in 1-dimension.
Looking at the end elements (c,f,k), we can do a sort of filter: is N < c ? search() : next(). Thus, we have n iterations over the rows, with each row taking either O( log( n ) ) for binary search or O( 1 ) if filtered out.
Let me given an EXAMPLE where N = j,
1) Check row 1. j < c? (no, go next)
2) Check row 2. j < f? (yes, bin search gets nothing)
3) Check row 3. j < k? (yes, bin search finds it)
Try again with N = q,
1) Check row 1. q < c? (no, go next)
2) Check row 2. q < f? (no, go next)
3) Check row 3. q < k? (no, go next)
There is probably a better solution out there but this is easy to explain.. :)
As this is an interview question, it would seem to lead towards a discussion of Parallel programming and Map-reduce algorithms.
See http://code.google.com/intl/de/edu/parallel/mapreduce-tutorial.html

Return the number of elements of an array that is the most "expensive"

I recently stumbled upon an interesting problem, an I am wondering if my solution is optimal.
You are given an array of zeros and ones. The goal is to return the
amount zeros and the amount of ones in the most expensive sub-array.
The cost of an array is the amount of 1s divided by amount of 0s. In
case there are no zeros in the sub-array, the cost is zero.
At first I tried brute-forcing, but for an array of 10,000 elements it was far too slow and I ran out of memory.
My second idea was instead of creating those sub-arrays, to remember the start and the end of the sub-array. That way I saved a lot of memory, but the complexity was still O(n2).
My final solution that I came up is I think O(n). It goes like this:
Start at the beginning of the array, for each element, calculate the cost of the sub-arrays starting from 1, ending at the current index. So we would start with a sub-array consisting of the first element, then first and second etc. Since the only thing that we need to calculate the cost, is the amount of 1s and 0s in the sub-array, I could find the optimal end of the sub-array.
The second step was to start from the end of the sub-array from step one, and repeat the same to find the optimal beginning. That way I am sure that there is no better combination in the whole array.
Is this solution correct? If not, is there a counter-example that will show that this solution is incorrect?
Edit
For clarity:
Let's say our input array is 0101.
There are 10 subarrays:
0,1,0,1,01,10,01,010,101 and 0101.
The cost of the most expensive subarray would be 2 since 101 is the most expensive subarray. So the algorithm should return 1,2
Edit 2
There is one more thing that I forgot, if 2 sub-arrays have the same cost, the longer one is "more expensive".
Let me sketch a proof for my assumption:
(a = whole array, *=zero or more, +=one or more, {n}=exactly n)
Cases a=0* and a=1+ : c=0
Cases a=01+ and a=1+0 : conforms to 1*0{1,2}1*, a is optimum
For the normal case, a contains one or more 0s and 1s.
This means there is some optimum sub-array of non-zero cost.
(S) Assume s is an optimum sub-array of a.
It contains one or more zeros. (Otherwise its cost would be zero).
(T) Let t be the longest `1*0{1,2}+1*` sequence within s
(and among the equally long the one with with most 1s).
(Note: There is always one such, e.g. `10` or `01`.)
Let N be the number of 1s in t.
Now, we prove that always t = s.
By showing it is not possible to add adjacent parts of s to t if (S).
(E) Assume t shorter than s.
We cannot add 1s at either side, otherwise not (T).
For each 0 we add from s, we have to add at least N more 1s
later to get at least the same cost as our `1*0+1*`.
This means: We have to add at least one run of N 1s.
If we add some run of N+1, N+2 ... somewhere than not (T).
If we add consecutive zeros, we need to compensate
with longer runs of 1s, thus not (T).
This leaves us with the only option of adding single zeors and runs of N 1s each.
This would give (symmetry) `1{n}*0{1,2}1{m}01{n+m}...`
If m>0 then `1{m}01{n+m}` is longer than `1{n}0{1,2}1{m}`, thus not (T).
If m=0 then we get `1{n}001{n}`, thus not (T).
So assumption (E) must be wrong.
Conclusion: The optimum sub-array must conform to 1*0{1,2}1*.
Here is my O(n) impl in Java according to the assumption in my last comment (1*01* or 1*001*):
public class Q19596345 {
public static void main(String[] args) {
try {
String array = "0101001110111100111111001111110";
System.out.println("array=" + array);
SubArray current = new SubArray();
current.array = array;
SubArray best = (SubArray) current.clone();
for (int i = 0; i < array.length(); i++) {
current.accept(array.charAt(i));
SubArray candidate = (SubArray) current.clone();
candidate.trim();
if (candidate.cost() > best.cost()) {
best = candidate;
System.out.println("better: " + candidate);
}
}
System.out.println("best: " + best);
} catch (Exception ex) { ex.printStackTrace(System.err); }
}
static class SubArray implements Cloneable {
String array;
int start, leftOnes, zeros, rightOnes;
// optimize 1*0*1* by cutting
void trim() {
if (zeros > 1) {
if (leftOnes < rightOnes) {
start += leftOnes + (zeros - 1);
leftOnes = 0;
zeros = 1;
} else if (leftOnes > rightOnes) {
zeros = 1;
rightOnes = 0;
}
}
}
double cost() {
if (zeros == 0) return 0;
else return (leftOnes + rightOnes) / (double) zeros +
(leftOnes + zeros + rightOnes) * 0.00001;
}
void accept(char c) {
if (c == '1') {
if (zeros == 0) leftOnes++;
else rightOnes++;
} else {
if (rightOnes > 0) {
start += leftOnes + zeros;
leftOnes = rightOnes;
zeros = 0;
rightOnes = 0;
}
zeros++;
}
}
public Object clone() throws CloneNotSupportedException { return super.clone(); }
public String toString() { return String.format("%s at %d with cost %.3f with zeros,ones=%d,%d",
array.substring(start, start + leftOnes + zeros + rightOnes), start, cost(), zeros, leftOnes + rightOnes);
}
}
}
If we can show the max array is always 1+0+1+, 1+0, or 01+ (Regular expression notation then we can calculate the number of runs
So for the array (010011), we have (always starting with a run of 1s)
0,1,1,2,2
so the ratios are (0, 1, 0.3, 1.5, 1), which leads to an array of 10011 as the final result, ignoring the one runs
Cost of the left edge is 0
Cost of the right edge is 2
So in this case, the right edge is the correct answer -- 011
I haven't yet been able to come up with a counterexample, but the proof isn't obvious either. Hopefully we can crowd source one :)
The degenerate cases are simpler
All 1's and 0's are obvious, as they all have the same cost.
A string of just 1+,0+ or vice versa is all the 1's and a single 0.
How about this? As a C# programmer, I am thinking we can use something like Dictionary of <int,int,int>.
The first int would be use as key, second as subarray number and the third would be for the elements of sub-array.
For your example
key|Sub-array number|elements
1|1|0
2|2|1
3|3|0
4|4|1
5|5|0
6|5|1
7|6|1
8|6|0
9|7|0
10|7|1
11|8|0
12|8|1
13|8|0
14|9|1
15|9|0
16|9|1
17|10|0
18|10|1
19|10|0
20|10|1
Then you can run through the dictionary and store the highest in a variable.
var maxcost=0
var arrnumber=1;
var zeros=0;
var ones=0;
var cost=0;
for (var i=1;i++;i<=20+1)
{
if ( dictionary.arraynumber[i]!=dictionary.arraynumber[i-1])
{
zeros=0;
ones=0;
cost=0;
if (cost>maxcost)
{
maxcost=cost;
}
}
else
{
if (dictionary.values[i]==0)
{
zeros++;
}
else
{
ones++;
}
cost=ones/zeros;
}
}
This will be log(n^2), i hope and u just need 3n size of memory of the array?
I think we can modify the maximal subarray problem to fit to this question. Here's my attempt at it:
void FindMaxRatio(int[] array, out maxNumOnes, out maxNumZeros)
{
maxNumOnes = 0;
maxNumZeros = 0;
int numOnes = 0;
int numZeros = 0;
double maxSoFar = 0;
double maxEndingHere = 0;
for(int i = 0; i < array.Size; i++){
if(array[i] == 0) numZeros++;
if(array[i] == 1) numOnes++;
if(numZeros == 0) maxEndingHere = 0;
else maxEndingHere = numOnes/(double)numZeros;
if(maxEndingHere < 1 && maxEndingHere > 0) {
numZeros = 0;
numOnes = 0;
}
if(maxSoFar < maxEndingHere){
maxSoFar = maxEndingHere;
maxNumOnes = numOnes;
maxNumZeros = numZeros;
}
}
}
I think the key is if the ratio is less then 1, we can disregard that subsequence because
there will always be a subsequence 01 or 10 whose ratio is 1. This seemed to work for 010011.

Generate a number different from the numbers in the array

There is an array (greater than 1000 elements space) with 1000 large numbers (can be 64 bit numbers as well). The numbers in the array may not be necessarily sorted.
We have to generate a unique number at 1001th position that is different from the previous 1000.
Justify the approach used is the best.
My answer (don't know to what extent this was correct):
Sort the numbers, and start from the 0 position. The number that is at 1000th position + 1 is the required number.
Better suggestions for this?
Create an auxiliary array of 1001 elements. Set all these to 1 (or true or Y or whatever you choose). Run through the main array, if you find a number in the range 1..1000 then 0 out (or falsify some other how) the corresponding element in the auxiliary array. At the end the first element in the auxiliary array which is not 0 (or false) corresponds to a number which is not in the main array.
This is simple, and, I think, O(n) in time complexity, where n is the number of elements in the main array.
unsigned ii,slot;
unsigned array [ NNN ];
/* allocate a histogram */
#define XXX (NNN+1);
unsigned histogram [XXX];
memset(histogram, 0, sizeof histogram);
for (ii=0; ii < NNN; ii++) {
slot = array [ii ] % XXX;
histogram[slot] += 1;
}
for (slot=0; slot < NNN; slot++) {
if ( !histogram[slot]) break;
}
/* Now, slot + k * XXX will be a
** number that does not occur in the original array */
Note: this is similar to High performance Mark, but at least I typed in the code ...
If you sort your array, you have three possibilities for a unique number:
array[999]+1, if array[999] is not equal to INT_MAX
array[0]-1, if array[0] is not equal to INT_MIN
a number between array[i] and array[i+1], if array[i+1]-array[i]>1 (0<=i<=998). Notice that if the two previous tries have failed, then it is guaranteed that there is a number between two elements in your array.
Notice that this solution will also work for the 1002th, 1003th, and so on.
An attempt at a clumsy c# implementation
public class Test
{
public List<int> Sequence { get; set; }
public void GenerateFirstSequence()
{
Sequence = new List<int>();
for (var i = 0; i < 1000; i++)
{
var x = new Random().Next(0, int.MaxValue);
while (Sequence.Contains(x))
{
x = new Random().Next(0, int.MaxValue);
}
Sequence.Add(x);
}
}
public int GetNumberNotInSequence()
{
var number = Sequence.OrderBy(x => x).Max();
var mustRedefine = number == int.MaxValue && Sequence.Contains(number);
if (mustRedefine)
{
while (Sequence.Contains(number))
{
number = number - 1;
if (!Sequence.Contains(number))
return number;
}
}
return number + 1;
}
}
I have some thoughts on this problem:
You could create a hash table H, which contain 1000 elements. Suppose your array named A, and for each element, we have the reminder by 1000: m[i] = A[i] % 1000.
If there is a conflict between A[i] and A[j], that A[i] % 1000 = A[j] % 1000. That is to say, there must exist an index k, that no element's reminder by 1000 equals to k, then k is the number you are going to get.
If there is no conflict at all, just pick H[1] + 1000 as your result.
The complexity of this algorithm is O(l), in which l indicates the original list size, in the example, l = 1000

DP algorithm for bounded Knapsack?

The Wikipedia article about Knapsack problem contains lists three kinds of it:
1-0 (one item of a type)
Bounded (several items of a type)
Unbounded (unlimited number of items of a type)
The article contains DP approaches for 1. and 3. types of problem, but no solution for 2.
How can the dynamic programming algorithm for solving 2. be described?
Use the 0-1 variant, but allow repetition of an item in the solution up to the number of times specified in its bound. You would need to maintain a vector stating how many copies of each item you already included in the partial solution.
The other DP solutions mentioned are all suboptimal as they require you to directly simulate the problem, resulting in a O(number of items * maximum weight * total count of items) runtime complexity.
There are many ways to optimize this, and I'll mention a few of them here:
One solution is to apply a technique similar to Sqrt Decomposition and is described here: https://codeforces.com/blog/entry/59606. This algorithm runs in O(number of items * maximum weight * sqrt(maximum weight)).
However, Dorijan Lendvaj describes a much faster algorithm that runs in O(number of items * maximum weight * log(maximum weight)) here: https://codeforces.com/blog/entry/65202?#comment-492168
Another way to think of the above approach is the following:
For each type of item, let's define the following values:
w, the weight/cost of the current type of item
v, the value of the current type of item
n, the number of copies of the current type of item available to use
Phase 1
First, let us consider 2^k, the largest power of 2 less than or equal to n. We insert the following items (each inserted item is in the format (weight, value)): (w, v), (2 * w, 2 * v), (2^2 * w, 2^2 * v), ..., (2^(k-1) * w, 2^(k-1) * v). Note that the items inserted each represent 2^0, 2^1, ..., 2^(k-1) copies of the current type of item respectively.
Observe that this is the same as inserting 2^k - 1 copies of the current type of item. This is because we can simulate the taking of any number of items (represented as n') by taking the combination of the above items that corresponds to the binary representation of n' (For all whole numbers k', if the bit representing 2^k' is set, take the item that represents 2^k' copies of the current type of item).
Phase 2
Lastly, we just insert the items that correspond to the set bits of n - (2^k - 1). (For all whole numbers k', if the bit representing 2^k' is set, insert (2^k' * w, 2^k' * v)).
Now, we can simulate the taking of up to n items of the current type simply by taking a combination of the above inserted items.
I don't currently have an exact proof of this solution, but after playing around with it for a while it seems correct. If I can figure one out I may update this post later on.
Proof
First, a proposition: All we have to prove is that inserting the above items allows us to simulate the taking of any number of items of the current type up to n.
With that in mind, let's define some variables:
Let n be the number of items of the current type available
Let x be the number of items of the current type we want to take
Let k be the greatest integer such that 2^k <= n
If x < 2^k, we can easily take x items using the method described in phase 1 of the algorithm:
... we can simulate the taking of any number of items (represented as n') by taking the combination of the above items that corresponds to the binary representation of n' (For all whole numbers k', if the bit representing 2^k' is set, take the item that represents 2^k' copies of the current type of item).
Otherwise, we do the following:
Take n - (2^k - 1) items. This is done by taking all the items inserted in phase 2. Now only the items inserted in phase 1 are available for use.
Take x - (n - (2^k - 1)) items. Since this value is always less than 2^k, we can just use the method used for the first case.
Finally, how do we know that x - (n - (2^k - 1)) < 2^k?
If we simplify the left side, we get:
x - (n - (2^k - 1))
x - n + 2^k - 1
x - (n + 1) + 2^k
If the above value was >= 2^k, then x - (n + 1) >= 0 would be true, meaning that x > n. That would be impossible as that's not a valid value of x.
Finally, there is even an approach mentioned here that runs in O(number of items * maximum weight) time.
The algorithm is similar to the brute force method ic3b3rg proposed and just uses simple DP optimizations and sliding window deque to bring down the run time.
My code was tested on this problem (classical bounded knapsack problem): https://dmoj.ca/problem/knapsack
My code: https://pastebin.com/acezMrMY
I posted an article on Code Project which discusses a more efficient solution to the bounded knapsack algorithm.
From the article:
In the dynamic programming solution, each position of the m array is a
sub-problem of capacity j. In the 0/1 algorithm, for each sub-problem
we consider the value of adding one copy of each item to the knapsack.
In the following algorithm, for each sub-problem we consider the value
of adding the lesser of the quantity that will fit, or the quantity
available of each item.
I've also enhanced the code so that we can determine what's in the
optimized knapsack (as opposed to just the optimized value).
ItemCollection[] ic = new ItemCollection[capacity + 1];
for(int i=0;i<=capacity;i++) ic[i] = new ItemCollection();
for(int i=0;i<items.Count;i++)
for(int j=capacity;j>=0;j--)
if(j >= items[i].Weight) {
int quantity = Math.Min(items[i].Quantity, j / items[i].Weight);
for(int k=1;k<=quantity;k++) {
ItemCollection lighterCollection = ic[j - k * items[i].Weight];
int testValue = lighterCollection.TotalValue + k * items[i].Value;
if(testValue > ic[j].TotalValue) (ic[j] = lighterCollection.Copy()).AddItem(items[i],k);
}
}
private class Item {
public string Description;
public int Weight;
public int Value;
public int Quantity;
public Item(string description, int weight, int value, int quantity) {
Description = description;
Weight = weight;
Value = value;
Quantity = quantity;
}
}
private class ItemCollection {
public Dictionary<string,int> Contents = new Dictionary<string,int>();
public int TotalValue;
public int TotalWeight;
public void AddItem(Item item,int quantity) {
if(Contents.ContainsKey(item.Description)) Contents[item.Description] += quantity;
else Contents[item.Description] = quantity;
TotalValue += quantity * item.Value;
TotalWeight += quantity * item.Weight;
}
public ItemCollection Copy() {
var ic = new ItemCollection();
ic.Contents = new Dictionary<string,int>(this.Contents);
ic.TotalValue = this.TotalValue;
ic.TotalWeight = this.TotalWeight;
return ic;
}
}
The download in the Code Project article includes a test case.
First, store all your data in a single array (with repetition).
Then use the 1st method mentioned in the Wikipedia article(1-0).
For example, trying a bounded knapsack with { 2 (2 times), 4(3 times),...} is equivalent to solving a 1-0 knapsack with {2, 2, 4, 4, 4,...}.
I will suggest you to use Knapsack Fraction Greedy Method Algorithm. It's Complexity is O(n log n) and one of the best algorithm.
Below I have mentioned its code in c#..
private static void Knapsack()
{
Console.WriteLine("************Kanpsack***************");
Console.WriteLine("Enter no of items");
int _noOfItems = Convert.ToInt32(Console.ReadLine());
int[] itemArray = new int[_noOfItems];
int[] weightArray = new int[_noOfItems];
int[] priceArray = new int[_noOfItems];
int[] fractionArray=new int[_noOfItems];
for(int i=0;i<_noOfItems;i++)
{
Console.WriteLine("[Item"+" "+(i+1)+"]");
Console.WriteLine("");
Console.WriteLine("Enter the Weight");
weightArray[i] = Convert.ToInt32(Console.ReadLine());
Console.WriteLine("Enter the Price");
priceArray[i] = Convert.ToInt32(Console.ReadLine());
Console.WriteLine("");
itemArray[i] = i+1 ;
}//for loop
int temp;
Console.WriteLine(" ");
Console.WriteLine("ITEM" + " " + "WEIGHT" + " "+"PRICE");
Console.WriteLine(" ");
for(int i=0;i<_noOfItems;i++)
{
Console.WriteLine("Item"+" "+(i+1)+" "+weightArray[i]+" "+priceArray[i]);
Console.WriteLine(" ");
}//For Loop For Printing the value.......
//Caluclating Fraction for the Item............
for(int i=0;i<_noOfItems;i++)
{
fractionArray[i] = (priceArray[i] / weightArray[i]);
}
Console.WriteLine("Testing.............");
//sorting the Item on the basis of fraction value..........
//Bubble Sort To Sort the Process Priority
for (int i = 0; i < _noOfItems; i++)
{
for (int j = i + 1; j < _noOfItems; j++)
{
if (fractionArray[j] > fractionArray[i])
{
//item Array
temp = itemArray[j];
itemArray[j] = itemArray[i];
itemArray[i] = temp;
//Weight Array
temp = weightArray[j];
weightArray[j] = weightArray[i];
weightArray[i] = temp;
//Price Array
temp = priceArray[j];
priceArray[j] = priceArray[i];
priceArray[i] = temp;
//Fraction Array
temp = fractionArray[j];
fractionArray[j] = fractionArray[i];
fractionArray[i] = temp;
}//if
}//Inner for
}//outer For
// Printing its value..............After Sorting..............
Console.WriteLine(" ");
Console.WriteLine("ITEM" + " " + "WEIGHT" + " " + "PRICE" + " "+"Fraction");
Console.WriteLine(" ");
for (int i = 0; i < _noOfItems; i++)
{
Console.WriteLine("Item" + " " + (itemArray[i]) + " " + weightArray[i] + " " + priceArray[i] + " "+fractionArray[i]);
Console.WriteLine(" ");
}//For Loop For Printing the value.......
Console.WriteLine("");
Console.WriteLine("Enter the Capacity of Knapsack");
int _capacityKnapsack = Convert.ToInt32(Console.ReadLine());
// Creating the valuse for Solution
int k=0;
int fractionvalue = 0;
int[] _takingItemArray=new int[100];
int sum = 0,_totalPrice=0;
int l = 0;
int _capacity = _capacityKnapsack;
do
{
if(k>=_noOfItems)
{
k = 0;
}
if (_capacityKnapsack >= weightArray[k])
{
_takingItemArray[l] = weightArray[k];
_capacityKnapsack = _capacityKnapsack - weightArray[k];
_totalPrice += priceArray[k];
k++;
l++;
}
else
{
fractionvalue = fractionArray[k];
_takingItemArray[l] = _capacityKnapsack;
_totalPrice += _capacityKnapsack * fractionArray[k];
k++;
l++;
}
sum += _takingItemArray[l-1];
} while (sum != _capacity);
Console.WriteLine("");
Console.WriteLine("Value in Kg Are............");
Console.WriteLine("");
for (int i = 0; i < _takingItemArray.Length; i++)
{
if(_takingItemArray[i]!=0)
{
Console.WriteLine(_takingItemArray[i]);
Console.WriteLine("");
}
else
{
break;
}
enter code here
}//for loop
Console.WriteLine("Toatl Value is "+_totalPrice);
}//Method
We can use 0/1 knapsack algorithm with tracking # of items left for each item;
We could do the same on unbounded knapsack algorithm to solve bounded knapsack problem also.

How do I search for a number in a 2d array sorted left to right and top to bottom?

I was recently given this interview question and I'm curious what a good solution to it would be.
Say I'm given a 2d array where all the
numbers in the array are in increasing
order from left to right and top to
bottom.
What is the best way to search and
determine if a target number is in the
array?
Now, my first inclination is to utilize a binary search since my data is sorted. I can determine if a number is in a single row in O(log N) time. However, it is the 2 directions that throw me off.
Another solution I thought may work is to start somewhere in the middle. If the middle value is less than my target, then I can be sure it is in the left square portion of the matrix from the middle. I then move diagonally and check again, reducing the size of the square that the target could potentially be in until I have honed in on the target number.
Does anyone have any good ideas on solving this problem?
Example array:
Sorted left to right, top to bottom.
1 2 4 5 6
2 3 5 7 8
4 6 8 9 10
5 8 9 10 11
Here's a simple approach:
Start at the bottom-left corner.
If the target is less than that value, it must be above us, so move up one.
Otherwise we know that the target can't be in that column, so move right one.
Goto 2.
For an NxM array, this runs in O(N+M). I think it would be difficult to do better. :)
Edit: Lots of good discussion. I was talking about the general case above; clearly, if N or M are small, you could use a binary search approach to do this in something approaching logarithmic time.
Here are some details, for those who are curious:
History
This simple algorithm is called a Saddleback Search. It's been around for a while, and it is optimal when N == M. Some references:
David Gries, The Science of Programming. Springer-Verlag, 1989.
Edsgar Dijkstra, The Saddleback Search. Note EWD-934, 1985.
However, when N < M, intuition suggests that binary search should be able to do better than O(N+M): For example, when N == 1, a pure binary search will run in logarithmic rather than linear time.
Worst-case bound
Richard Bird examined this intuition that binary search could improve the Saddleback algorithm in a 2006 paper:
Richard S. Bird, Improving Saddleback Search: A Lesson in Algorithm Design, in Mathematics of Program Construction, pp. 82--89, volume 4014, 2006.
Using a rather unusual conversational technique, Bird shows us that for N <= M, this problem has a lower bound of Ω(N * log(M/N)). This bound make sense, as it gives us linear performance when N == M and logarithmic performance when N == 1.
Algorithms for rectangular arrays
One approach that uses a row-by-row binary search looks like this:
Start with a rectangular array where N < M. Let's say N is rows and M is columns.
Do a binary search on the middle row for value. If we find it, we're done.
Otherwise we've found an adjacent pair of numbers s and g, where s < value < g.
The rectangle of numbers above and to the left of s is less than value, so we can eliminate it.
The rectangle below and to the right of g is greater than value, so we can eliminate it.
Go to step (2) for each of the two remaining rectangles.
In terms of worst-case complexity, this algorithm does log(M) work to eliminate half the possible solutions, and then recursively calls itself twice on two smaller problems. We do have to repeat a smaller version of that log(M) work for every row, but if the number of rows is small compared to the number of columns, then being able to eliminate all of those columns in logarithmic time starts to become worthwhile.
This gives the algorithm a complexity of T(N,M) = log(M) + 2 * T(M/2, N/2), which Bird shows to be O(N * log(M/N)).
Another approach posted by Craig Gidney describes an algorithm similar the approach above: it examines a row at a time using a step size of M/N. His analysis shows that this results in O(N * log(M/N)) performance as well.
Performance Comparison
Big-O analysis is all well and good, but how well do these approaches work in practice? The chart below examines four algorithms for increasingly "square" arrays:
(The "naive" algorithm simply searches every element of the array. The "recursive" algorithm is described above. The "hybrid" algorithm is an implementation of Gidney's algorithm. For each array size, performance was measured by timing each algorithm over fixed set of 1,000,000 randomly-generated arrays.)
Some notable points:
As expected, the "binary search" algorithms offer the best performance on rectangular arrays and the Saddleback algorithm works the best on square arrays.
The Saddleback algorithm performs worse than the "naive" algorithm for 1-d arrays, presumably because it does multiple comparisons on each item.
The performance hit that the "binary search" algorithms take on square arrays is presumably due to the overhead of running repeated binary searches.
Summary
Clever use of binary search can provide O(N * log(M/N) performance for both rectangular and square arrays. The O(N + M) "saddleback" algorithm is much simpler, but suffers from performance degradation as arrays become increasingly rectangular.
This problem takes Θ(b lg(t)) time, where b = min(w,h) and t=b/max(w,h). I discuss the solution in this blog post.
Lower bound
An adversary can force an algorithm to make Ω(b lg(t)) queries, by restricting itself to the main diagonal:
Legend: white cells are smaller items, gray cells are larger items, yellow cells are smaller-or-equal items and orange cells are larger-or-equal items. The adversary forces the solution to be whichever yellow or orange cell the algorithm queries last.
Notice that there are b independent sorted lists of size t, requiring Ω(b lg(t)) queries to completely eliminate.
Algorithm
(Assume without loss of generality that w >= h)
Compare the target item against the cell t to the left of the top right corner of the valid area
If the cell's item matches, return the current position.
If the cell's item is less than the target item, eliminate the remaining t cells in the row with a binary search. If a matching item is found while doing this, return with its position.
Otherwise the cell's item is more than the target item, eliminating t short columns.
If there's no valid area left, return failure
Goto step 2
Finding an item:
Determining an item doesn't exist:
Legend: white cells are smaller items, gray cells are larger items, and the green cell is an equal item.
Analysis
There are b*t short columns to eliminate. There are b long rows to eliminate. Eliminating a long row costs O(lg(t)) time. Eliminating t short columns costs O(1) time.
In the worst case we'll have to eliminate every column and every row, taking time O(lg(t)*b + b*t*1/t) = O(b lg(t)).
Note that I'm assuming lg clamps to a result above 1 (i.e. lg(x) = log_2(max(2,x))). That's why when w=h, meaning t=1, we get the expected bound of O(b lg(1)) = O(b) = O(w+h).
Code
public static Tuple<int, int> TryFindItemInSortedMatrix<T>(this IReadOnlyList<IReadOnlyList<T>> grid, T item, IComparer<T> comparer = null) {
if (grid == null) throw new ArgumentNullException("grid");
comparer = comparer ?? Comparer<T>.Default;
// check size
var width = grid.Count;
if (width == 0) return null;
var height = grid[0].Count;
if (height < width) {
var result = grid.LazyTranspose().TryFindItemInSortedMatrix(item, comparer);
if (result == null) return null;
return Tuple.Create(result.Item2, result.Item1);
}
// search
var minCol = 0;
var maxRow = height - 1;
var t = height / width;
while (minCol < width && maxRow >= 0) {
// query the item in the minimum column, t above the maximum row
var luckyRow = Math.Max(maxRow - t, 0);
var cmpItemVsLucky = comparer.Compare(item, grid[minCol][luckyRow]);
if (cmpItemVsLucky == 0) return Tuple.Create(minCol, luckyRow);
// did we eliminate t rows from the bottom?
if (cmpItemVsLucky < 0) {
maxRow = luckyRow - 1;
continue;
}
// we eliminated most of the current minimum column
// spend lg(t) time eliminating rest of column
var minRowInCol = luckyRow + 1;
var maxRowInCol = maxRow;
while (minRowInCol <= maxRowInCol) {
var mid = minRowInCol + (maxRowInCol - minRowInCol + 1) / 2;
var cmpItemVsMid = comparer.Compare(item, grid[minCol][mid]);
if (cmpItemVsMid == 0) return Tuple.Create(minCol, mid);
if (cmpItemVsMid > 0) {
minRowInCol = mid + 1;
} else {
maxRowInCol = mid - 1;
maxRow = mid - 1;
}
}
minCol += 1;
}
return null;
}
I would use the divide-and-conquer strategy for this problem, similar to what you suggested, but the details are a bit different.
This will be a recursive search on subranges of the matrix.
At each step, pick an element in the middle of the range. If the value found is what you are seeking, then you're done.
Otherwise, if the value found is less than the value that you are seeking, then you know that it is not in the quadrant above and to the left of your current position. So recursively search the two subranges: everything (exclusively) below the current position, and everything (exclusively) to the right that is at or above the current position.
Otherwise, (the value found is greater than the value that you are seeking) you know that it is not in the quadrant below and to the right of your current position. So recursively search the two subranges: everything (exclusively) to the left of the current position, and everything (exclusively) above the current position that is on the current column or a column to the right.
And ba-da-bing, you found it.
Note that each recursive call only deals with the current subrange only, not (for example) ALL rows above the current position. Just those in the current subrange.
Here's some pseudocode for you:
bool numberSearch(int[][] arr, int value, int minX, int maxX, int minY, int maxY)
if (minX == maxX and minY == maxY and arr[minX,minY] != value)
return false
if (arr[minX,minY] > value) return false; // Early exits if the value can't be in
if (arr[maxX,maxY] < value) return false; // this subrange at all.
int nextX = (minX + maxX) / 2
int nextY = (minY + maxY) / 2
if (arr[nextX,nextY] == value)
{
print nextX,nextY
return true
}
else if (arr[nextX,nextY] < value)
{
if (numberSearch(arr, value, minX, maxX, nextY + 1, maxY))
return true
return numberSearch(arr, value, nextX + 1, maxX, minY, nextY)
}
else
{
if (numberSearch(arr, value, minX, nextX - 1, minY, maxY))
return true
reutrn numberSearch(arr, value, nextX, maxX, minY, nextY)
}
The two main answers give so far seem to be the arguably O(log N) "ZigZag method" and the O(N+M) Binary Search method. I thought I'd do some testing comparing the two methods with some various setups. Here are the details:
The array is N x N square in every test, with N varying from 125 to 8000 (the largest my JVM heap could handle). For each array size, I picked a random place in the array to put a single 2. I then put a 3 everywhere possible (to the right and below of the 2) and then filled the rest of the array with 1. Some of the earlier commenters seemed to think this type of setup would yield worst case run time for both algorithms. For each array size, I picked 100 different random locations for the 2 (search target) and ran the test. I recorded avg run time and worst case run time for each algorithm. Because it was happening too fast to get good ms readings in Java, and because I don't trust Java's nanoTime(), I repeated each test 1000 times just to add a uniform bias factor to all the times. Here are the results:
ZigZag beat binary in every test for both avg and worst case times, however, they are all within an order of magnitude of each other more or less.
Here is the Java code:
public class SearchSortedArray2D {
static boolean findZigZag(int[][] a, int t) {
int i = 0;
int j = a.length - 1;
while (i <= a.length - 1 && j >= 0) {
if (a[i][j] == t) return true;
else if (a[i][j] < t) i++;
else j--;
}
return false;
}
static boolean findBinarySearch(int[][] a, int t) {
return findBinarySearch(a, t, 0, 0, a.length - 1, a.length - 1);
}
static boolean findBinarySearch(int[][] a, int t,
int r1, int c1, int r2, int c2) {
if (r1 > r2 || c1 > c2) return false;
if (r1 == r2 && c1 == c2 && a[r1][c1] != t) return false;
if (a[r1][c1] > t) return false;
if (a[r2][c2] < t) return false;
int rm = (r1 + r2) / 2;
int cm = (c1 + c2) / 2;
if (a[rm][cm] == t) return true;
else if (a[rm][cm] > t) {
boolean b1 = findBinarySearch(a, t, r1, c1, r2, cm - 1);
boolean b2 = findBinarySearch(a, t, r1, cm, rm - 1, c2);
return (b1 || b2);
} else {
boolean b1 = findBinarySearch(a, t, r1, cm + 1, rm, c2);
boolean b2 = findBinarySearch(a, t, rm + 1, c1, r2, c2);
return (b1 || b2);
}
}
static void randomizeArray(int[][] a, int N) {
int ri = (int) (Math.random() * N);
int rj = (int) (Math.random() * N);
a[ri][rj] = 2;
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
if (i == ri && j == rj) continue;
else if (i > ri || j > rj) a[i][j] = 3;
else a[i][j] = 1;
}
}
}
public static void main(String[] args) {
int N = 8000;
int[][] a = new int[N][N];
int randoms = 100;
int repeats = 1000;
long start, end, duration;
long zigMin = Integer.MAX_VALUE, zigMax = Integer.MIN_VALUE;
long binMin = Integer.MAX_VALUE, binMax = Integer.MIN_VALUE;
long zigSum = 0, zigAvg;
long binSum = 0, binAvg;
for (int k = 0; k < randoms; k++) {
randomizeArray(a, N);
start = System.currentTimeMillis();
for (int i = 0; i < repeats; i++) findZigZag(a, 2);
end = System.currentTimeMillis();
duration = end - start;
zigSum += duration;
zigMin = Math.min(zigMin, duration);
zigMax = Math.max(zigMax, duration);
start = System.currentTimeMillis();
for (int i = 0; i < repeats; i++) findBinarySearch(a, 2);
end = System.currentTimeMillis();
duration = end - start;
binSum += duration;
binMin = Math.min(binMin, duration);
binMax = Math.max(binMax, duration);
}
zigAvg = zigSum / randoms;
binAvg = binSum / randoms;
System.out.println(findZigZag(a, 2) ?
"Found via zigzag method. " : "ERROR. ");
//System.out.println("min search time: " + zigMin + "ms");
System.out.println("max search time: " + zigMax + "ms");
System.out.println("avg search time: " + zigAvg + "ms");
System.out.println();
System.out.println(findBinarySearch(a, 2) ?
"Found via binary search method. " : "ERROR. ");
//System.out.println("min search time: " + binMin + "ms");
System.out.println("max search time: " + binMax + "ms");
System.out.println("avg search time: " + binAvg + "ms");
}
}
This is a short proof of the lower bound on the problem.
You cannot do it better than linear time (in terms of array dimensions, not the number of elements). In the array below, each of the elements marked as * can be either 5 or 6 (independently of other ones). So if your target value is 6 (or 5) the algorithm needs to examine all of them.
1 2 3 4 *
2 3 4 * 7
3 4 * 7 8
4 * 7 8 9
* 7 8 9 10
Of course this expands to bigger arrays as well. This means that this answer is optimal.
Update: As pointed out by Jeffrey L Whitledge, it is only optimal as the asymptotic lower bound on running time vs input data size (treated as a single variable). Running time treated as two-variable function on both array dimensions can be improved.
I think Here is the answer and it works for any kind of sorted matrix
bool findNum(int arr[][ARR_MAX],int xmin, int xmax, int ymin,int ymax,int key)
{
if (xmin > xmax || ymin > ymax || xmax < xmin || ymax < ymin) return false;
if ((xmin == xmax) && (ymin == ymax) && (arr[xmin][ymin] != key)) return false;
if (arr[xmin][ymin] > key || arr[xmax][ymax] < key) return false;
if (arr[xmin][ymin] == key || arr[xmax][ymax] == key) return true;
int xnew = (xmin + xmax)/2;
int ynew = (ymin + ymax)/2;
if (arr[xnew][ynew] == key) return true;
if (arr[xnew][ynew] < key)
{
if (findNum(arr,xnew+1,xmax,ymin,ymax,key))
return true;
return (findNum(arr,xmin,xmax,ynew+1,ymax,key));
} else {
if (findNum(arr,xmin,xnew-1,ymin,ymax,key))
return true;
return (findNum(arr,xmin,xmax,ymin,ynew-1,key));
}
}
Interesting question. Consider this idea - create one boundary where all the numbers are greater than your target and another where all the numbers are less than your target. If anything is left in between the two, that's your target.
If I'm looking for 3 in your example, I read across the first row until I hit 4, then look for the smallest adjacent number (including diagonals) greater than 3:
1 2 4 5 6
2 3 5 7 8
4 6 8 9 10
5 8 9 10 11
Now I do the same for those numbers less than 3:
1 2 4 5 6
2 3 5 7 8
4 6 8 9 10
5 8 9 10 11
Now I ask, is anything inside the two boundaries? If yes, it must be 3. If no, then there is no 3. Sort of indirect since I don't actually find the number, I just deduce that it must be there. This has the added bonus of counting ALL the 3's.
I tried this on some examples and it seems to work OK.
Binary search through the diagonal of the array is the best option.
We can find out whether the element is less than or equal to the elements in the diagonal.
I've been asking this question in interviews for the better part of a decade and I think there's only been one person who has been able to come up with an optimal algorithm.
My solution has always been:
Binary search the middle diagonal, which is the diagonal running down and right, containing the item at (rows.count/2, columns.count/2).
If the target number is found, return true.
Otherwise, two numbers (u and v) will have been found such that u is smaller than the target, v is larger than the target, and v is one right and one down from u.
Recursively search the sub-matrix to the right of u and top of v and the one to the bottom of u and left of v.
I believe this is a strict improvement over the algorithm given by Nate here, since searching the diagonal often allows a reduction of over half the search space (if the matrix is close to square), whereas searching a row or column always results in an elimination of exactly half.
Here's the code in (probably not terribly Swifty) Swift:
import Cocoa
class Solution {
func searchMatrix(_ matrix: [[Int]], _ target: Int) -> Bool {
if (matrix.isEmpty || matrix[0].isEmpty) {
return false
}
return _searchMatrix(matrix, 0..<matrix.count, 0..<matrix[0].count, target)
}
func _searchMatrix(_ matrix: [[Int]], _ rows: Range<Int>, _ columns: Range<Int>, _ target: Int) -> Bool {
if (rows.count == 0 || columns.count == 0) {
return false
}
if (rows.count == 1) {
return _binarySearch(matrix, rows.lowerBound, columns, target, true)
}
if (columns.count == 1) {
return _binarySearch(matrix, columns.lowerBound, rows, target, false)
}
var lowerInflection = (-1, -1)
var upperInflection = (Int.max, Int.max)
var currentRows = rows
var currentColumns = columns
while (currentRows.count > 0 && currentColumns.count > 0 && upperInflection.0 > lowerInflection.0+1) {
let rowMidpoint = (currentRows.upperBound + currentRows.lowerBound) / 2
let columnMidpoint = (currentColumns.upperBound + currentColumns.lowerBound) / 2
let value = matrix[rowMidpoint][columnMidpoint]
if (value == target) {
return true
}
if (value > target) {
upperInflection = (rowMidpoint, columnMidpoint)
currentRows = currentRows.lowerBound..<rowMidpoint
currentColumns = currentColumns.lowerBound..<columnMidpoint
} else {
lowerInflection = (rowMidpoint, columnMidpoint)
currentRows = rowMidpoint+1..<currentRows.upperBound
currentColumns = columnMidpoint+1..<currentColumns.upperBound
}
}
if (lowerInflection.0 == -1) {
lowerInflection = (upperInflection.0-1, upperInflection.1-1)
} else if (upperInflection.0 == Int.max) {
upperInflection = (lowerInflection.0+1, lowerInflection.1+1)
}
return _searchMatrix(matrix, rows.lowerBound..<lowerInflection.0+1, upperInflection.1..<columns.upperBound, target) || _searchMatrix(matrix, upperInflection.0..<rows.upperBound, columns.lowerBound..<lowerInflection.1+1, target)
}
func _binarySearch(_ matrix: [[Int]], _ rowOrColumn: Int, _ range: Range<Int>, _ target: Int, _ searchRow : Bool) -> Bool {
if (range.isEmpty) {
return false
}
let midpoint = (range.upperBound + range.lowerBound) / 2
let value = (searchRow ? matrix[rowOrColumn][midpoint] : matrix[midpoint][rowOrColumn])
if (value == target) {
return true
}
if (value > target) {
return _binarySearch(matrix, rowOrColumn, range.lowerBound..<midpoint, target, searchRow)
} else {
return _binarySearch(matrix, rowOrColumn, midpoint+1..<range.upperBound, target, searchRow)
}
}
}
A. Do a binary search on those lines where the target number might be on.
B. Make it a graph : Look for the number by taking always the smallest unvisited neighbour node and backtracking when a too big number is found
Binary search would be the best approach, imo. Starting at 1/2 x, 1/2 y will cut it in half. IE a 5x5 square would be something like x == 2 / y == 3 . I rounded one value down and one value up to better zone in on the direction of the targeted value.
For clarity the next iteration would give you something like x == 1 / y == 2 OR x == 3 / y == 5
Well, to begin with, let us assume we are using a square.
1 2 3
2 3 4
3 4 5
1. Searching a square
I would use a binary search on the diagonal. The goal is the locate the smaller number that is not strictly lower than the target number.
Say I am looking for 4 for example, then I would end up locating 5 at (2,2).
Then, I am assured that if 4 is in the table, it is at a position either (x,2) or (2,x) with x in [0,2]. Well, that's just 2 binary searches.
The complexity is not daunting: O(log(N)) (3 binary searches on ranges of length N)
2. Searching a rectangle, naive approach
Of course, it gets a bit more complicated when N and M differ (with a rectangle), consider this degenerate case:
1 2 3 4 5 6 7 8
2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17
And let's say I am looking for 9... The diagonal approach is still good, but the definition of diagonal changes. Here my diagonal is [1, (5 or 6), 17]. Let's say I picked up [1,5,17], then I know that if 9 is in the table it is either in the subpart:
5 6 7 8
6 7 8 9
10 11 12 13 14 15 16
This gives us 2 rectangles:
5 6 7 8 10 11 12 13 14 15 16
6 7 8 9
So we can recurse! probably beginning by the one with less elements (though in this case it kills us).
I should point that if one of the dimensions is less than 3, we cannot apply the diagonal methods and must use a binary search. Here it would mean:
Apply binary search on 10 11 12 13 14 15 16, not found
Apply binary search on 5 6 7 8, not found
Apply binary search on 6 7 8 9, not found
It's tricky because to get good performance you might want to differentiate between several cases, depending on the general shape....
3. Searching a rectangle, brutal approach
It would be much easier if we dealt with a square... so let's just square things up.
1 2 3 4 5 6 7 8
2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17
17 . . . . . . 17
. .
. .
. .
17 . . . . . . 17
We now have a square.
Of course, we will probably NOT actually create those rows, we could simply emulate them.
def get(x,y):
if x < N and y < M: return table[x][y]
else: return table[N-1][M-1] # the max
so it behaves like a square without occupying more memory (at the cost of speed, probably, depending on cache... oh well :p)
EDIT:
I misunderstood the question. As the comments point out this only works in the more restricted case.
In a language like C that stores data in row-major order, simply treat it as a 1D array of size n * m and use a binary search.
I have a recursive Divide & Conquer Solution.
Basic Idea for one step is: We know that the Left-Upper(LU) is smallest and the right-bottom(RB) is the largest no., so the given No(N) must: N>=LU and N<=RB
IF N==LU and N==RB::::Element Found and Abort returning the position/Index
If N>=LU and N<=RB = FALSE, No is not there and abort.
If N>=LU and N<=RB = TRUE, Divide the 2D array in 4 equal parts of 2D array each in logical manner..
And then apply the same algo step to all four sub-array.
My Algo is Correct I have implemented on my friends PC.
Complexity: each 4 comparisons can b used to deduce the total no of elements to one-fourth at its worst case..
So My complexity comes to be 1 + 4 x lg(n) + 4
But really expected this to be working on O(n)
I think something is wrong somewhere in my calculation of Complexity, please correct if so..
The optimal solution is to start at the top-left corner, that has minimal value. Move diagonally downwards to the right until you hit an element whose value >= value of the given element. If the element's value is equal to that of the given element, return found as true.
Otherwise, from here we can proceed in two ways.
Strategy 1:
Move up in the column and search for the given element until we reach the end. If found, return found as true
Move left in the row and search for the given element until we reach the end. If found, return found as true
return found as false
Strategy 2:
Let i denote the row index and j denote the column index of the diagonal element we have stopped at. (Here, we have i = j, BTW). Let k = 1.
Repeat the below steps until i-k >= 0
Search if a[i-k][j] is equal to the given element. if yes, return found as true.
Search if a[i][j-k] is equal to the given element. if yes, return found as true.
Increment k
1 2 4 5 6
2 3 5 7 8
4 6 8 9 10
5 8 9 10 11
public boolean searchSortedMatrix(int arr[][] , int key , int minX , int maxX , int minY , int maxY){
// base case for recursion
if(minX > maxX || minY > maxY)
return false ;
// early fails
// array not properly intialized
if(arr==null || arr.length==0)
return false ;
// arr[0][0]> key return false
if(arr[minX][minY]>key)
return false ;
// arr[maxX][maxY]<key return false
if(arr[maxX][maxY]<key)
return false ;
//int temp1 = minX ;
//int temp2 = minY ;
int midX = (minX+maxX)/2 ;
//if(temp1==midX){midX+=1 ;}
int midY = (minY+maxY)/2 ;
//if(temp2==midY){midY+=1 ;}
// arr[midX][midY] = key ? then value found
if(arr[midX][midY] == key)
return true ;
// alas ! i have to keep looking
// arr[midX][midY] < key ? search right quad and bottom matrix ;
if(arr[midX][midY] < key){
if( searchSortedMatrix(arr ,key , minX,maxX , midY+1 , maxY))
return true ;
// search bottom half of matrix
if( searchSortedMatrix(arr ,key , midX+1,maxX , minY , maxY))
return true ;
}
// arr[midX][midY] > key ? search left quad matrix ;
else {
return(searchSortedMatrix(arr , key , minX,midX-1,minY,midY-1));
}
return false ;
}
I suggest, store all characters in a 2D list. then find index of required element if it exists in list.
If not present print appropriate message else print row and column as:
row = (index/total_columns) and column = (index%total_columns -1)
This will incur only the binary search time in a list.
Please suggest any corrections. :)
If O(M log(N)) solution is ok for an MxN array -
template <size_t n>
struct MN * get(int a[][n], int k, int M, int N){
struct MN *result = new MN;
result->m = -1;
result->n = -1;
/* Do a binary search on each row since rows (and columns too) are sorted. */
for(int i = 0; i < M; i++){
int lo = 0; int hi = N - 1;
while(lo <= hi){
int mid = lo + (hi-lo)/2;
if(k < a[i][mid]) hi = mid - 1;
else if (k > a[i][mid]) lo = mid + 1;
else{
result->m = i;
result->n = mid;
return result;
}
}
}
return result;
}
Working C++ demo.
Please do let me know if this wouldn't work or if there is a bug it it.
class Solution {
public boolean searchMatrix(int[][] matrix, int target) {
if(matrix == null)
return false;
int i=0;
int j=0;
int m = matrix.length;
int n = matrix[0].length;
boolean found = false;
while(i<m && !found){
while(j<n && !found){
if(matrix[i][j] == target)
found = true;
if(matrix[i][j] < target)
j++;
else
break;
}
i++;
j=0;
}
return found;
}}
129 / 129 test cases passed.
Status: Accepted
Runtime: 39 ms
Memory Usage: 55 MB
Given a square matrix as follows:
[ a b c ]
[ d e f ]
[ i j k ]
We know that a < c, d < f, i < k. What we don't know is whether d < c or d > c, etc. We have guarantees only in 1-dimension.
Looking at the end elements (c,f,k), we can do a sort of filter: is N < c ? search() : next(). Thus, we have n iterations over the rows, with each row taking either O( log( n ) ) for binary search or O( 1 ) if filtered out.
Let me given an EXAMPLE where N = j,
1) Check row 1. j < c? (no, go next)
2) Check row 2. j < f? (yes, bin search gets nothing)
3) Check row 3. j < k? (yes, bin search finds it)
Try again with N = q,
1) Check row 1. q < c? (no, go next)
2) Check row 2. q < f? (no, go next)
3) Check row 3. q < k? (no, go next)
There is probably a better solution out there but this is easy to explain.. :)
As this is an interview question, it would seem to lead towards a discussion of Parallel programming and Map-reduce algorithms.
See http://code.google.com/intl/de/edu/parallel/mapreduce-tutorial.html

Resources