An interview question from Google [duplicate] - algorithm

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Given a 2d array sorted in increasing order from left to right and top to bottom, what is the best way to search for a target number?
The following was asked in a Google interview:
You are given a 2D array storing integers, sorted vertically and horizontally.
Write a method that takes as input an integer and outputs a bool saying whether or not the integer is in the array.
What is the best way to do this? And what is its time complexity?

Start at the Bottom-Left corner of the Matrix and follow the rules stated below to traverse the matrix:
The matrix traversal is based on these conditions:
If the input number is greater than current number: Move Right
If the input number is less than current number: Move Up.
If the input number is equal to current number: Return Success
If the input number is not equal to current number and no transition is possible: Return Fail
Time Complexity: (Thanks to Martinho Fernandes)
The time complexity is O(N+M). In the worst case, the element searched for is in the upper-left corner, meaning you'll go up N times, and left M times.
Example
Input matrix:
--------------
| 1 | 4 | 6 |
--------------
| 2 | 5 | 9 |
--------------
| *3* | 8 | 10 |
--------------
Number to search: 4
Step 1:
Start at the cell where you have 3 (Bottom-Left).
3 < 4: Move Right
| 1 | 4 | 6 |
--------------
| 2 | 5 | 9 |
--------------
| 3 | *8* | 10 |
--------------
Step 2:
8 > 4: Move Up
| 1 | 4 | 6 |
--------------
| 2 | *5* | 9 |
--------------
| 3 | 8 | 10 |
--------------
Step 3:
5 > 4: Move Up
| 1 | *4* | 6 |
--------------
| 2 | 5 | 9 |
--------------
| 3 | 8 | 10 |
--------------
Step 4:
4=4: Return the index of the number

I would start by asking details about what it means to be "sorted vertically and horizontally"
If the matrix is sorted in a way that the last element of each row is less than the first element of the next row, you can run a binary search on the first column to find out in what row that number is, and then run another binary search on the row. This algorithm will take O(log C + log R) time, where C and R are, respectively the number of rows and columns. Using a property of the logarithm, one can write that as O(log(C*R)), which is the same as O(log N), if N is the number of elements in the array. This is almost the same as treating the array as 1D and running a binary search on it.
But the matrix could be sorted in a way that the last element of each row is not less than the first element of the next row:
1 2 3 4 5 6 7 8 9
2 3 4 5 6 7 8 9 10
3 4 5 6 7 8 9 10 11
In this case, you could run some sort of horizontal an vertical binary search simultaneously:
Test the middle number of the first column. If it's less than the target, consider the lines above it. If it's greater, consider those below;
Test the middle number of the first considered line. If it's less, consider the columns left of it. If it's greater, consider those to the right;
Lathe, rinse, repeat until you find one, or you're left with no more elements to consider;
This method is also logarithmic on the number of elements.

The first method that comes to mind is a vertical binary search, followed by a horizontal one when you find the row it should be in. Complexity will be O(log NM) where N and M are the dimensions of the array.
Further explanation:
Consider just the first number of every row. When you perform a binary search of these first numbers for the specified number, the result will be either the specified number if you're lucky, otherwise it will be the position before or after where the specified number would go depending on the binary search implementation. Once you find the two of the first numbers that the specified number should go between, you know that the number is in that row, and a second binary search will find the number if it is in the row.

Related

Why does pairing heap need that special two passes when delete_min?

I am reading the Pairing heap.
It is quite simple, the only tricky part is the delete_min operation.
The only non-trivial fundamental operation is the deletion of the
minimum element from the heap. The standard strategy first merges the
subheaps in pairs (this is the step that gave this datastructure its
name) from left to right and then merges the resulting list of heaps
from right to left:
I don't think I need copy/paste the code here, as it is in the wiki link.
My questions are
why they do this two pass merging?
Why they first merge pairs? not directly merge them all?
also why after merging pairs, merge specifically from right to left?
With pairing heap, adding an item to the heap is an O(1) operation because all it does is add the node either as the new root (if it's smaller than the current root), or as the first child of the current root. So if you created a pairing heap and added the numbers 0 through 9 to it, in order, you would end up with:
0
|
-----------------
| | | | | | | | |
9 8 7 6 5 4 3 2 1
If you then do a delete-min, you then have to look at each child to determine the minimum item and build the new heap. If you use the naive left to right combining method, you end up with this tree:
1
|
---------------
| | | | | | | |
9 8 7 6 5 4 3 2
And the next time you do a delete-min you have to look at the 8 remaining children, etc. Using this technique, creating and then removing all items from the heap would be an O(n^2) operation.
The two-pass method of combining in pairs and then combining the pairs results in a much more efficient structure. Consider the first case. After deleting the minimum item, we're left with the nine children. They're combined in pairs from left to right to produce:
8 6 4 2 1
/ / / /
9 7 5 3
Then we combine the the pairs right to left. In steps:
8 6 4 1
/ / / /
9 7 5 2
/
3
8 6 1
/ / / \
9 7 2 4
/ /
3 5
8 1
/ |
9 ---------
6 4 2
/ / /
7 5 3
1
|
----------
8 6 4 2
/ / / /
9 7 5 3
Now, the next time we call delete-min, there are only four nodes to check, and the next time after that there will only be two. Using the two-pass combining method reduces the number of nodes at the child level by at least half. The arrangement I showed is the worst case. If the items were in ascending order, the first delete-min operation would result in a tree with only two child nodes below the root.
This is a particularly good example of the amortized complexity of pairing heap. insert is O(1), but the first delete-min after a bunch of insert operations is O(n), where n is the number of items that were inserted since the last delete-min. The beauty of the two-pass combining rule is that it quickly reorganizes the heap to reduce that O(n) complexity.
With this combining rule, the amortized complexity of delete-min is O(log n). With the strict left-to-right rule, it's O(n).

How to randomize across categories holding the mean equal?

I am looking for some conceptional inputs detached from any specific platform/software for the following problem:
Let R be a Nx2 matrix with the first column denoting the object ID and the second column the category (e.g. from 1 to 10).
ID | Category
1 | 1
2 | 1
3 | 1
4 | 2
5 | 2
6 | 3
7 | 3
8 | 3
9 | 3
. | .
. | .
Further, assume we have a matrix C which assignes for each cateogry a number, e.g.:
Category | Number
1 | 0.5
2 | 0.2
3 | 0.9
. | .
. | .
So for each object in matrix R a number can be mapped according to matrix C (e.g. for ID=1 with category=1, the number according to matrix C is 0.5).
The goal now is to create an algorithm which randomizes the objects across a pre-specified category-range with the overall average of the column number (which is mapped to the corresponding category) being held constant.
E.g. assume that the category-range is defined as 2 meaning that each object from category 1 can either stay in category 1, randomly be shifted to category 2 or even up to category 3. Similarly, an object from category 3 with a selected category range of 1 can either be moved down to category 2, stay at category 3 or move up to category 4). If an object is now shifted to another category, it gets assigned a new number according to matrix C which impacts the overall average across the column numbers.
However, all swaps have to be executed on a purely random basis with the additional constraint that the average across the column number after the randomization is equal to the one from the beginning.
Any input would be greatly appreciated.

Faster computation to get multiples of number at different levels

Here is the scenario:
We have several items that are shipped to many stores. We want to be able to allocate a certain quantity of each item to a store based on need. Each of these stores is also associated to a specific warehouse.
The catch is that at the warehouse level, the total quantity of each item must be a multiple of a number (6 for example).
I have already calculated out the quantity needed by each store at store level, but they do not sum up to a multiple of 6 at the warehouse level.
My solution was this using Excel:
Using a SUMIFS formula to keep track of the sum of each item allocated at the warehouse level. Then another MOD(6) formula that calculates the remaining until a multiple of 6. Then my actually VBA code loops through and subtracts 1 (if MOD <= 3) or adds (if MOD > 3) from the store level units needed until MOD = 0 for all rows.
Now this works for me, but is extremely slow even when I have just ~5000 rows.
I am looking for a faster solution, because everytime I subtract/add to units needed, the SUMIFS and MOD need to be calculated again.
EDIT: (trying to be clearer)
I have a template file that I paste my data into with the following setup:
+------+-------+-----------+----------+--------------+--------+
| Item | Store | Warehouse | StoreQty | WarehouseQty | Mod(6) |
+------+-------+-----------+----------+--------------+--------+
| 1 | 1 | 1 | 2 | 8 | 2 |
| 1 | 2 | 1 | 3 | 8 | 2 |
| 1 | 3 | 1 | 1 | 8 | 2 |
| 1 | 4 | 1 | 2 | 8 | 2 |
| 2 | 1 | 2 | 1 | 4 | 2 |
| 2 | 2 | 2 | 3 | 4 | 2 |
+------+-------+-----------+----------+--------------+--------+
Currently the WarehouseQty column is the SUMIFS formula summing up the StoreQty for each Item-Store combo that is associated to the Warehouse. So I guess the Warehouse/WarehouseQty columns is actually duplicated several times every time an Item-Store combo shows up. The WarehouseQty is the one that needs to be a multiple of 6.
Screen updating can be turned OFF to speed up length computations like this:
Application.ScreenUpdating = FALSE
The opposite assignment turns screen updating back on again.
put the data into an array first, rather than cells, then put the data back after you have manipulated it - this will be much faster.
an example which uses your criteria:
Option Explicit
Sub test()
Dim q() 'this is what will be used for the range
Dim i As Long
q = Range("C2:C41") 'put the data into the array - *ALWAYS* 2 dimensions, even if a single column
For i = LBound(q) To UBound(q) ' use this, in case it's a dynamic array - 1 to 40 would have worked here
Select Case q(i, 1) Mod 6 ' calculate remander
Case 0 To 3
q(i, 1) = q(i, 1) - (q(i, 1) Mod 6) 'make a multiple of 6
Case 4 To 5
q(i, 1) = q(i, 1) - (q(i, 1) Mod 6) + 6 ' and go higher in the later numbers
End Select
Next i
Range("D2:D41") = q ' drop the data back
End Sub
Guessing you may find that stopping the screen refresh may help quite a chunk and therefore not need any more suggestions.
Another option would be to reduce your adjustment to a quantity which is divisible by 6 to a number of if statements, depending on the value of mod(6).
You could also address how you sum up the number of a particular item across all stores, using a pivot table and reading the sum totals from there is a lot quicker than using sumifs in a macro
Based on your modifications to the question:
You're correct that you could have huge amounts of replication doing the calculation row by row, as well as adjusting the quantity by a single unit at a time even though you know exactly how many units you need to add / remove from the mod(6) formula.
Could you not create a new sheet with all your possible combinations of product Id and store. You could then use sumifs() for each of these unique combinations and in a final step round up/down at a warehouse level?

Maximum values in matrix

So here is an interesting problem in C#. I'm looking for a better way of solving it:
Given a matrix M (not necesarily square) of matches, find the best matching elements. Element i matches elem j by value M(i,j). M(i,j) != M(j,i).
Since #rows != #columns, find the best min(#rows,#columns) matching pairs (i,j).
Basically the problem is to pick the maximum from each row/column such that no row/column is picked twice.
Example:
1 2 3
+---------
a | 10 3 1
b | 12 99 2
c | 20 5 3
d | 5 7 4
The maximum value in this matrix is 99 so the best match is (b,2). For the next selection we cannot use anymore row b and column 2. Is like cutting them
1 2 3 or, if you prefer, 1 3
+--------- a smaller matrix: +------
a | 10 || 1 a | 10 1
b | ===++=== c | 20 3
c | 20 || 3 d | 5 4
d | 5 || 4
The max is now 20 and the match is (c, 1). The remaining matrix has only one column.
After another pick we'll get the match (d, 3) with match = 4
In the end "a" has no match.
My current implementation uses 2 array to store the already matched rows/columns and for each match goes through the entire matrix, picking the first maximum that belongs to row/col not match.
PS: in case of value multiple matches having the same value, just pick one of them
PS2: The array is stored as int [,]
How would you approach this problem in a more optimal/beautiful way?
If you are trying to maximise the sum of the cells chosen, such that exactly one cell is picked from each row and from each column, then this is http://en.wikipedia.org/wiki/Assignment_problem. If your matrix is not square, you can make it square by adding rows or columns to them, with values in the new cells which mean that they won't be picked unless there is no other way to fill out the solution.
(If you are not maximising the sum, you need to say what function of the values chosen you are maximising - is (1,3) better than (2,2)?. Otherwise you are into http://en.wikipedia.org/wiki/Multi-objective_optimization, which is possible, but more complicated).
You could first sort all of the entries of the matrix in descending order, and then process the sorted list. Whenever you see an entry that isn't in an already-picked row/col, it means that entry should be picked, so you mark the corresponding row/column and continue further down the list until either all rows or all columns have been picked.

Relative quality of sorted array

I have 2 sorting alghoritms that provides different results (i sort info by relevancy). As result in both ways I get same items in different order. I know, that first alghorytm provides better results than second. I want to get relative value (from 0 to 1) that means "first N values of array2 is 0.73 quality of first N values of array1" (I compare first elements, because user see it without any actions).
First that comes to mind is use sum of differences between position in array1 and array2.
For example:
array1: 1 2 3 4 | 5 6 7 8 9
array2: 8 6 2 3 | 7 4 1 5 9 - positions in array1
array2*: 5 5 2 3 | (greater than 4 replaces with 5 to take relative value in diapasone 0..1)
I want to compare first 4 elements:
S = 1 + 2 + 3 + 4 - sum of etalon, maximum deviation
D = |1 - 5| + |2 - 5| + |3 - 2| + |4 - 3| = 9 - this is absolute deviation
To calculate relative quality I use next formula: (S - D)/S = 0.1.
Is there any standart algorithms? What disadvantages of this algoritm?
What you are looking for is probably DCG [Discounted Cumulative Gain] and nDCG [normalized DCG], which are used to rank relevance.
This assumes one list [let it be list2] is a baseline - the "absolute truth", and list1 should be as closest as possible to it.
The idea is that if the first element if out of order - it is more important if the 10th element is out of order.
The solution is described with more details and an example in my answer in this post [sorry for self-adving myself, it just seems to fit well in here]. and the basic idea is to evaluate:
DCG(list1)/DCG(list2)
Where the relevance of the each element is derived from list2 itself, for example: rel_i = 1/log(1+i)
Notes:
Of course DCG can be calculated only on the relvant n elements
and not on the entire list.
This solution will yield result of 1 if list1 == list2
This solution assumes what matters is only where elements appear, and not the numerical value - of the elements. It completely disregard the numerical value.

Resources