Fast min on span - algorithm

Given a list of arrays and lots of setup time I need to quickly find the smallest value in some sub-span of each array. In concept:
class SpanThing
{
int Data;
SpanThing(int[][] data) /// must be rectangulare
{
Data = data;
//// process, can take a while
}
int[] MinsBruteForce(int from, int to)
{
int[] result = new int[data.length];
foreach(int index, int[] dat; Data)
{
result[i] = int.max;
foreach(int v; dat[from .. to]);
result[i] = min(result[i], v);
}
return result;
}
int[] MinsSmart(int from, int to)
{
// same as MinsBruteForce but faster
}
}
My current thinking on how to do this would be to build a binary tree over the data where each node contains the min in the related span. That way finding the min in the span for one row would consist of finding the min of only the tree nodes that compose it. This set would be the same for each row so could be computed once.
Does any one see any issues with this idea or known of better ways?
To clarify, the tree I'm talking about would be set up sutch that the root node would contain the min value for the entire row and for each node, it's left child would have the min value for the left half of the parent's span and the same for the right.
0 5 6 2 7 9 4 1 7 2 8 4 2
------------------------------------------------
| 5 | 6| | 7 | 9 | | 1 | 7 | 2 | 8 | 4 | 2
0 | 5 | 2 | 7 | 4 | 1 | 2 | 2
0 | 2 | 1 | 2
0 | 1
0
This tree can be mapped to an array and defined in such a way that the segment bounds can be computed resulting in fast look-up.
The case I'm optimizing for is where I have a fixed input set and lots of up front time but then need to do many fast tests on a veriety of spans.

Your proposed solution appears to give the answer using constant space overhead, constant time setup, and logarithmic time for queries. If you are willing to pay quadratic space (i.e., compute all intervals in advance) you can get answers in constant time. Your logarithmic scheme is almost certain to be preferred.
It wouldn't surprise me if it were possible to do better, but I'd be shocked if there were a simple data structure that could do better---and in practice, logarithmic time is almost always plenty fast enough. Go for it.

Your described approach sounds like you're trying to do some sort of memoization or caching, but that's only going to help you if you're checking the same spans or nested spans repeatedly.
The general case for min([0..n]) is going to be O(n), which is what you've got already.
Your code seems to care more about the actual numbers in the array than their order in the array. If you're going to be checking these spans repeatedly, is it possible to just sort the data first, which could be a single O(n log n) operation followed by a bunch of O(1) ops? Most languages have some sort of built in sorting algorithm in their standard libraries.

It's not clear how we can represent the hierarchy of intervals efficiently using the tree approach you've described. There are many ways to divide an interval --- do we have to consider every possibility?
Would a simple approach like this suffice: Suppose data is an N x M array. I would create a M x M x N array where entry (i,j,k) gives the "min" of data(k,i:j). The array entries will be populated on demand:
int[] getMins(int from, int to)
{
assert to >= from;
if (mins[from][to] == null)
{
mins[from][to] = new int[N];
// populate all entries (from,:,:)
for (int k = 0; k < N; k++)
{
int t = array[k][from];
for (int j = from; j < M; j++)
{
if (array[k][j] < t)
t = array[k][j];
mins[from][j][k] = t;
}
}
}
return mins[from][to];
}

Related

How to optimise my solution to HackerRank's Largest Rectangle problem? [duplicate]

I have a histogram with integer heights and constant width 1. I want to maximize the rectangular area under a histogram.
e.g.:
_
| |
| |_
| |
| |_
| |
The answer for this would be 6, 3 * 2, using col1 and col2.
O(n^2) brute force is clear to me, I would like an O(n log n) algorithm. I'm trying to think dynamic programming along the lines of maximum increasing subsequence O(n log n) algo, but am not going forward. Should I use divide and conquer algorithm?
PS: People with enough reputation are requested to remove the divide-and-conquer tag if there is no such solution.
After mho's comments: I mean the area of largest rectangle that fits entirely. (Thanks j_random_hacker for clarifying :) ).
The above answers have given the best O(n) solution in code, however, their explanations are quite tough to comprehend. The O(n) algorithm using a stack seemed magic to me at first, but right now it makes every sense to me. OK, let me explain it.
First observation:
To find the maximal rectangle, if for every bar x, we know the first smaller bar on its each side, let's say l and r, we are certain that height[x] * (r - l - 1) is the best shot we can get by using height of bar x. In the figure below, 1 and 2 are the first smaller of 5.
OK, let's assume we can do this in O(1) time for each bar, then we can solve this problem in O(n)! by scanning each bar.
Then, the question comes: for every bar, can we really find the first smaller bar on its left and on its right in O(1) time? That seems impossible right? ... It is possible, by using a increasing stack.
Why using an increasing stack can keep track of the first smaller on its left and right?
Maybe by telling you that an increasing stack can do the job is not convincing at all, so I will walk you through this.
Firstly, to keep the stack increasing, we need one operation:
while x < stack.top():
stack.pop()
stack.push(x)
Then you can check that in the increasing stack (as depicted below), for stack[x], stack[x-1] is the first smaller on its left, then a new element that can pop stack[x] out is the first smaller on its right.
Still can't believe stack[x-1] is the first smaller on the left on stack[x]?
I will prove it by contradiction.
First of all, stack[x-1] < stack[x] is for sure. But let's assume stack[x-1] is not the first smaller on the left of stack[x].
So where is the first smaller fs?
If fs < stack[x-1]:
stack[x-1] will be popped out by fs,
else fs >= stack[x-1]:
fs shall be pushed into stack,
Either case will result fs lie between stack[x-1] and stack[x], which is contradicting to the fact that there is no item between stack[x-1] and stack[x].
Therefore stack[x-1] must be the first smaller.
Summary:
Increasing stack can keep track of the first smaller on left and right for each element. By using this property, the maximal rectangle in histogram can be solved by using a stack in O(n).
Congratulations! This is really a tough problem, I'm glad my prosaic explanation didn't stop you from finishing. Attached is my proved solution as your reward :)
def largestRectangleArea(A):
ans = 0
A = [-1] + A
A.append(-1)
n = len(A)
stack = [0] # store index
for i in range(n):
while A[i] < A[stack[-1]]:
h = A[stack.pop()]
area = h*(i-stack[-1]-1)
ans = max(ans, area)
stack.append(i)
return ans
There are three ways to solve this problem in addition to the brute force approach. I will write down all of them. The java codes have passed tests in an online judge site called leetcode: http://www.leetcode.com/onlinejudge#question_84. so I am confident codes are correct.
Solution 1: dynamic programming + n*n matrix as cache
time: O(n^2), space: O(n^2)
Basic idea: use the n*n matrix dp[i][j] to cache the minimal height between bar[i] and bar[j]. Start filling the matrix from rectangles of width 1.
public int solution1(int[] height) {
int n = height.length;
if(n == 0) return 0;
int[][] dp = new int[n][n];
int max = Integer.MIN_VALUE;
for(int width = 1; width <= n; width++){
for(int l = 0; l+width-1 < n; l++){
int r = l + width - 1;
if(width == 1){
dp[l][l] = height[l];
max = Math.max(max, dp[l][l]);
} else {
dp[l][r] = Math.min(dp[l][r-1], height[r]);
max = Math.max(max, dp[l][r] * width);
}
}
}
return max;
}
Solution 2: dynamic programming + 2 arrays as cache.
time: O(n^2), space: O(n)
Basic idea: this solution is like solution 1, but saves some space. The idea is that in solution 1 we build the matrix from row 1 to row n. But in each iteration, only the previous row contributes to the building of the current row. So we use two arrays as previous row and current row by turns.
public int Solution2(int[] height) {
int n = height.length;
if(n == 0) return 0;
int max = Integer.MIN_VALUE;
// dp[0] and dp[1] take turns to be the "previous" line.
int[][] dp = new int[2][n];
for(int width = 1; width <= n; width++){
for(int l = 0; l+width-1 < n; l++){
if(width == 1){
dp[width%2][l] = height[l];
} else {
dp[width%2][l] = Math.min(dp[1-width%2][l], height[l+width-1]);
}
max = Math.max(max, dp[width%2][l] * width);
}
}
return max;
}
Solution 3: use stack.
time: O(n), space:O(n)
This solution is tricky and I learnt how to do this from explanation without graphs and explanation with graphs. I suggest you read the two links before reading my explanation below. It's hard to explain without graphs so my explanations might be hard to follow.
Following are my explanations:
For each bar, we must be able to find the biggest rectangle containing this bar. So the biggest one of these n rectangles is what we want.
To get the biggest rectangle for a certain bar (let's say bar[i], the (i+1)th bar), we just need to find out the biggest interval
that contains this bar. What we know is that all the bars in this interval must be at least the same height with bar[i]. So if we figure out how many
consecutive same-height-or-higher bars are there on the immediate left of bar[i], and how many consecutive same-height-or-higher bars are there on the immediate right of the bar[i], we
will know the length of the interval, which is the width of the biggest rectangle for bar[i].
To count the number of consecutive same-height-or-higher bars on the immediate left of bar[i], we only need to find the closest bar on the left that is shorter
than the bar[i], because all the bars between this bar and bar[i] will be consecutive same-height-or-higher bars.
We use a stack to dynamicly keep track of all the left bars that are shorter than a certain bar. In other words, if we iterate from the first bar to bar[i], when we just arrive at the bar[i] and haven't updated the stack,
the stack should store all the bars that are no higher than bar[i-1], including bar[i-1] itself. We compare bar[i]'s height with every bar in the stack until we find one that is shorter than bar[i], which is the cloest shorter bar.
If the bar[i] is higher than all the bars in the stack, it means all bars on the left of bar[i] are higher than bar[i].
We can do the same thing on the right side of the i-th bar. Then we know for bar[i] how many bars are there in the interval.
public int solution3(int[] height) {
int n = height.length;
if(n == 0) return 0;
Stack<Integer> left = new Stack<Integer>();
Stack<Integer> right = new Stack<Integer>();
int[] width = new int[n];// widths of intervals.
Arrays.fill(width, 1);// all intervals should at least be 1 unit wide.
for(int i = 0; i < n; i++){
// count # of consecutive higher bars on the left of the (i+1)th bar
while(!left.isEmpty() && height[i] <= height[left.peek()]){
// while there are bars stored in the stack, we check the bar on the top of the stack.
left.pop();
}
if(left.isEmpty()){
// all elements on the left are larger than height[i].
width[i] += i;
} else {
// bar[left.peek()] is the closest shorter bar.
width[i] += i - left.peek() - 1;
}
left.push(i);
}
for (int i = n-1; i >=0; i--) {
while(!right.isEmpty() && height[i] <= height[right.peek()]){
right.pop();
}
if(right.isEmpty()){
// all elements to the right are larger than height[i]
width[i] += n - 1 - i;
} else {
width[i] += right.peek() - i - 1;
}
right.push(i);
}
int max = Integer.MIN_VALUE;
for(int i = 0; i < n; i++){
// find the maximum value of all rectangle areas.
max = Math.max(max, width[i] * height[i]);
}
return max;
}
Implementation in Python of the #IVlad's answer O(n) solution:
from collections import namedtuple
Info = namedtuple('Info', 'start height')
def max_rectangle_area(histogram):
"""Find the area of the largest rectangle that fits entirely under
the histogram.
"""
stack = []
top = lambda: stack[-1]
max_area = 0
pos = 0 # current position in the histogram
for pos, height in enumerate(histogram):
start = pos # position where rectangle starts
while True:
if not stack or height > top().height:
stack.append(Info(start, height)) # push
elif stack and height < top().height:
max_area = max(max_area, top().height*(pos-top().start))
start, _ = stack.pop()
continue
break # height == top().height goes here
pos += 1
for start, height in stack:
max_area = max(max_area, height*(pos-start))
return max_area
Example:
>>> f = max_rectangle_area
>>> f([5,3,1])
6
>>> f([1,3,5])
6
>>> f([3,1,5])
5
>>> f([4,8,3,2,0])
9
>>> f([4,8,3,1,1,0])
9
Linear search using a stack of incomplete subproblems
Copy-paste algorithm's description (in case the page goes down):
We process the elements in
left-to-right order and maintain a
stack of information about started but
yet unfinished subhistograms. Whenever
a new element arrives it is subjected
to the following rules. If the stack
is empty we open a new subproblem by
pushing the element onto the stack.
Otherwise we compare it to the element
on top of the stack. If the new one is
greater we again push it. If the new
one is equal we skip it. In all these
cases, we continue with the next new
element. If the new one is less, we
finish the topmost subproblem by
updating the maximum area w.r.t. the
element at the top of the stack. Then,
we discard the element at the top, and
repeat the procedure keeping the
current new element. This way, all
subproblems are finished until the
stack becomes empty, or its top
element is less than or equal to the
new element, leading to the actions
described above. If all elements have
been processed, and the stack is not
yet empty, we finish the remaining
subproblems by updating the maximum
area w.r.t. to the elements at the
top.
For the update w.r.t. an element, we
find the largest rectangle that
includes that element. Observe that an
update of the maximum area is carried
out for all elements except for those
skipped. If an element is skipped,
however, it has the same largest
rectangle as the element on top of the
stack at that time that will be
updated later. The height of the
largest rectangle is, of course, the
value of the element. At the time of
the update, we know how far the
largest rectangle extends to the right
of the element, because then, for the
first time, a new element with smaller
height arrived. The information, how
far the largest rectangle extends to
the left of the element, is available
if we store it on the stack, too.
We therefore revise the procedure
described above. If a new element is
pushed immediately, either because the
stack is empty or it is greater than
the top element of the stack, the
largest rectangle containing it
extends to the left no farther than
the current element. If it is pushed
after several elements have been
popped off the stack, because it is
less than these elements, the largest
rectangle containing it extends to the
left as far as that of the most
recently popped element.
Every element is pushed and popped at
most once and in every step of the
procedure at least one element is
pushed or popped. Since the amount of
work for the decisions and the update
is constant, the complexity of the
algorithm is O(n) by amortized
analysis.
The other answers here have done a great job presenting the O(n)-time, O(n)-space solution using two stacks. There's another perspective on this problem that independently provides an O(n)-time, O(n)-space solution to the problem, and might provide a little bit more insight as to why the stack-based solution works.
The key idea is to use a data structure called a Cartesian tree. A Cartesian tree is a binary tree structure (though not a binary search tree) that's built around an input array. Specifically, the root of the Cartesian tree is built above the minimum element of the array, and the left and right subtrees are recursively constructed from the subarrays to the left and right of the minimum value.
For example, here's a sample array and its Cartesian tree:
+----------------------- 23 ------+
| |
+------------- 26 --+ +-- 79
| | |
31 --+ 53 --+ 84
| |
41 --+ 58 -------+
| |
59 +-- 93
|
97
+----+----+----+----+----+----+----+----+----+----+----+
| 31 | 41 | 59 | 26 | 53 | 58 | 97 | 93 | 23 | 84 | 79 |
+----+----+----+----+----+----+----+----+----+----+----+
The reason that Cartesian trees are useful in this problem is that the question at hand has a really nice recursive structure to it. Begin by looking at the lowest rectangle in the histogram. There are three options for where the maximum rectangle could end up being placed:
It could pass right under the minimum value in the histogram. In that case, to make it as large as possible, we'd want to make it as wide as the entire array.
It could be entirely to the left of the minimum value. In that case, we recursively want the answer formed from the subarray purely to the left of the minimum value.
It could be entirely to the right of the minimum value. In that case, we recursively want the answer formed from the subarray purely to the right of the minimum value.
Notice that this recursive structure - find the minimum value, do something with the subarrays to the left and the right of that value - perfectly matches the recursive structure of a Cartesian tree. In fact, if we can create a Cartesian tree for the overall array when we get started, we can then solve this problem by recursively walking the Cartesian tree from the root downward. At each point, we recursively compute the optimal rectangle in the left and right subarrays, along with the rectangle you'd get by fitting right under the minimum value, and then return the best option we find.
In pseudocode, this looks like this:
function largestRectangleUnder(int low, int high, Node root) {
/* Base case: If the range is empty, the biggest rectangle we
* can fit is the empty rectangle.
*/
if (low == high) return 0;
/* Assume the Cartesian tree nodes are annotated with their
* positions in the original array.
*/
return max {
(high - low) * root.value, // Widest rectangle under the minimum
largestRectangleUnder(low, root.index, root.left),
largestRectnagleUnder(root.index + 1, high, root.right)
}
}
Once we have the Cartesian tree, this algorithm takes time O(n), since we visit each node exactly once and do O(1) work per node.
It turns out that there's a simple, linear-time algorithm for building Cartesian trees. The "natural" way you'd probably think to build one would be to scan across the array, find the minimum value, then recursively build a Cartesian tree from the left and right subarrays. The problem is that the process of finding the minimum value is really expensive, and this can take time Θ(n2).
The "fast" way to build a Cartesian tree is by scanning the array from the left to the right, adding in one element at a time. This algorithm is based on the following observations about Cartesian trees:
First, Cartesian trees obey the heap property: every element is less than or equal to its children. The reason for this is that the Cartesian tree root is the smallest value in the overall array, and its children are the smallest elements in their subarrays, etc.
Second, if you do an inorder traversal of a Cartesian tree, you get back the elements of the array in the order in which they appear. To see why this is, notice that if you do an inorder traversal of a Cartesian tree, you first visit everything to the left of the minimum value, then the minimum value, then everything to the right of the minimum value. Those visitations are recursively done the same way, so everything ends up being visited in order.
These two rules give us a lot of information about what happens if we start with a Cartesian tree of the first k elements of the array and want to form a Cartesian tree for the first k+1 elements. That new element will have to end up on the right spine of the Cartesian tree - the part of the tree formed by starting at the root and only taking steps to the right - because otherwise something would come after it in an inorder traversal. And, within that right spine, it has to be placed in a way that makes it bigger than everything above it, since we need to obey the heap property.
The way that you actually add a new node to the Cartesian tree is to start at the rightmost node in the tree and walk upwards until you either hit the root of the tree or find a node that has a smaller value. You then make the new value have as its left child the last node it walked up on top of.
Here's a trace of that algorithm on a small array:
+---+---+---+---+
| 2 | 4 | 3 | 1 |
+---+---+---+---+
2 becomes the root.
2 --+
|
4
4 is bigger than 2, we can't move upwards. Append to right.
+---+---+---+---+
| 2 | 4 | 3 | 1 |
+---+---+---+---+
2 ------+
|
--- 3
|
4
3 is lesser than 4, climb over it. Can't climb further over 2, as it is smaller than 3. Climbed over subtree rooted at 4 goes to the left of new value 3 and 3 becomes rightmost node now.
+---+---+---+---+
| 2 | 4 | 3 | 1 |
+---+---+---+---+
+---------- 1
|
2 ------+
|
--- 3
|
4
1 climbs over the root 2, the entire tree rooted at 2 is moved to left of 1, and 1 is now the new root - and also the rightmost value.
+---+---+---+---+
| 2 | 4 | 3 | 1 |
+---+---+---+---+
Although this might not seem to run in linear time - wouldn't you potentially end up climbing all the way to the root of the tree over and over and over again? - you can show that this runs in linear time using a clever argument. If you climb up over a node in the right spine during an insertion, that node ends up getting moved off the right spine and therefore can't be rescanned in a future insertion. Therefore, every node is only ever scanned over at most once, so the total work done is linear.
And now the kicker - the standard way that you'd actually implement this approach is by maintaining a stack of the values that correspond to the nodes on the right spine. The act of "walking up" and over a node corresponds to popping a node off the stack. Therefore, the code for building a Cartesian tree looks something like this:
Stack s;
for (each array element x) {
pop s until it's empty or s.top > x
push x onto the stack.
do some sort of pointer rewiring based on what you just did.
}
The stack manipulations here might seem really familiar, and that's because these are the exact stack operations that you would do in the answers shown elsewhere here. In fact, you can think of what those approaches are doing as implicitly building the Cartesian tree and running the recursive algorithm shown above in the process of doing so.
The advantage, I think, of knowing about Cartesian trees is that it provides a really nice conceptual framework for seeing why this algorithm works correctly. If you know that what you're doing is running a recursive walk of a Cartesian tree, it's easier to see that you're guaranteed to find the largest rectangle. Plus, knowing that the Cartesian tree exists gives you a useful tool for solving other problems. Cartesian trees show up in the design of fast data structures for the range minimum query problem and are used to convert suffix arrays into suffix trees.
Here's some Java code that implements this idea, courtesy of #Azeem!
import java.util.Stack;
public class CartesianTreeMakerUtil {
private static class Node {
int val;
Node left;
Node right;
}
public static Node cartesianTreeFor(int[] nums) {
Node root = null;
Stack<Node> s = new Stack<>();
for(int curr : nums) {
Node lastJumpedOver = null;
while(!s.empty() && s.peek().val > curr) {
lastJumpedOver = s.pop();
}
Node currNode = this.new Node();
currNode.val = curr;
if(s.isEmpty()) {
root = currNode;
}
else {
s.peek().right = currNode;
}
currNode.left = lastJumpedOver;
s.push(currNode);
}
return root;
}
public static void printInOrder(Node root) {
if(root == null) return;
if(root.left != null ) {
printInOrder(root.left);
}
System.out.println(root.val);
if(root.right != null) {
printInOrder(root.right);
}
}
public static void main(String[] args) {
int[] nums = new int[args.length];
for (int i = 0; i < args.length; i++) {
nums[i] = Integer.parseInt(args[i]);
}
Node root = cartesianTreeFor(nums);
tester.printInOrder(root);
}
}
The easiest solution in O(N)
long long getMaxArea(long long hist[], long long n)
{
stack<long long> s;
long long max_area = 0;
long long tp;
long long area_with_top;
long long i = 0;
while (i < n)
{
if (s.empty() || hist[s.top()] <= hist[i])
s.push(i++);
else
{
tp = s.top(); // store the top index
s.pop(); // pop the top
area_with_top = hist[tp] * (s.empty() ? i : i - s.top() - 1);
if (max_area < area_with_top)
{
max_area = area_with_top;
}
}
}
while (!s.empty())
{
tp = s.top();
s.pop();
area_with_top = hist[tp] * (s.empty() ? i : i - s.top() - 1);
if (max_area < area_with_top)
max_area = area_with_top;
}
return max_area;
}
There is also another solution using Divide and Conquer. The algorithm for it is :
1) Divide the array into 2 parts with the smallest height as the breaking point
2) The maximum area is the maximum of :
a) Smallest height * size of the array
b) Maximum rectangle in left half array
c) Maximum rectangle in right half array
The time complexity comes to O(nlogn)
The stack solution is one of the most clever solutions I've seen till date. And it can be a little hard to understand why that works.
I've taken a jab at explaining the same in some detail here.
Summary points from the post:-
General way our brain thinks is :-
Create every situation and try to find the value of the contraint that is needed to solve the problem.
And we happily convert that to code as :- find the value of contraint(min) for each situation(pair(i,j))
The clever solutions tries to flip the problem.For each constraint/min value of tha area, what is the best possible left and right extremes ?
So if we traverse over each possible min in the array. What are the left and right extremes for each value ?
Little thought says, the first left most value less than the current min and similarly the first rightmost value that is lesser than the current min.
So now we need to see if we can find a clever way to find the first left and right values lesser than the current value.
To think: If we have traversed the array partially say till min_i, how can the solution to min_i+1 be built?
We need the first value less than min_i to its left.
Inverting the statement : we need to ignore all values to the left of min_i that are greater than min_i. We stop when we find the first value smaller than min_i (i) . The troughs in the curve hence become useless once we have crossed it. In histogram , (2 4 3) => if 3 is min_i, 4 being larger is not of interest.
Corrollary: in a range (i,j). j being the min value we are considering.. all values between j and its left value i are useless. Even for further calculations.
Any histogram on the right with a min value larger than j, will be binded at j. The values of interest on the left form a monotonically increasing sequence with j being the largest value. (Values of interest here being possible values that may be of interest for the later array)
Since, we are travelling from left to right, for each min value/ current value - we do not know whether the right side of the array will have an element smaller than it.
So we have to keep it in memory until we get to know this value is useless. (since a smaller value is found)
All this leads to a usage of our very own stack structure.
We keep on stack until we don't know its useless.
We remove from stack once we know the thing is crap.
So for each min value to find its left smaller value, we do the following:-
pop the elements larger to it (useless values)
The first element smaller than the value is the left extreme. The i to our min.
We can do the same thing from the right side of the array and we will get j to our min.
It's quite hard to explain this, but if this is making sense then I'd suggest read the complete article here since it has more insights and details.
I don't understand the other entries, but I think I know how to do it in O(n) as follows.
A) for each index find the largest rectangle inside the histogram ending at that index where the index column touches the top of the rectangle and remember where the rectangle starts. This can be done in O(n) using a stack based algorithm.
B) Similarly for each index find the largest rectangle starting at that index where the index column touches the top of the rectangle and remember where the rectangle ends. Also O(n) using the same method as (A) but scanning the histogram backwards.
C) For each index combine the results of (A) and (B) to determine the largest rectangle where the column at that index touches the top of the rectangle. O(n) like (A).
D) Since the largest rectangle must be touched by some column of the histogram the largest rectangle is the largest rectangle found in step (C).
The hard part is implementing (A) and (B), which I think is what JF Sebastian may have solved rather than the general problem stated.
I coded this one and felt little better in the sense:
import java.util.Stack;
class StackItem{
public int sup;
public int height;
public int sub;
public StackItem(int a, int b, int c){
sup = a;
height = b;
sub =c;
}
public int getArea(){
return (sup - sub)* height;
}
#Override
public String toString(){
return " from:"+sup+
" to:"+sub+
" height:"+height+
" Area ="+getArea();
}
}
public class MaxRectangleInHistogram {
Stack<StackItem> S;
StackItem curr;
StackItem maxRectangle;
public StackItem getMaxRectangleInHistogram(int A[], int n){
int i = 0;
S = new Stack();
S.push(new StackItem(0,0,-1));
maxRectangle = new StackItem(0,0,-1);
while(i<n){
curr = new StackItem(i,A[i],i);
if(curr.height > S.peek().height){
S.push(curr);
}else if(curr.height == S.peek().height){
S.peek().sup = i+1;
}else if(curr.height < S.peek().height){
while((S.size()>1) && (curr.height<=S.peek().height)){
curr.sub = S.peek().sub;
S.peek().sup = i;
decideMaxRectangle(S.peek());
S.pop();
}
S.push(curr);
}
i++;
}
while(S.size()>1){
S.peek().sup = i;
decideMaxRectangle(S.peek());
S.pop();
}
return maxRectangle;
}
private void decideMaxRectangle(StackItem s){
if(s.getArea() > maxRectangle.getArea() )
maxRectangle = s;
}
}
Just Note:
Time Complexity: T(n) < O(2n) ~ O(n)
Space Complexity S(n) < O(n)
I would like to thank #templatetypedef for his/her extremely detailed and intuitive answer. The Java code below is based on his suggestion to use Cartesian Trees and solves the problem in O(N) time and O(N) space. I suggest that you read #templatetypedef's answer above before reading the code below. The code is given in the format of the solution to the problem at leetcode: https://leetcode.com/problems/largest-rectangle-in-histogram/description/ and passes all 96 test cases.
class Solution {
private class Node {
int val;
Node left;
Node right;
int index;
}
public Node getCartesianTreeFromArray(int [] nums) {
Node root = null;
Stack<Node> s = new Stack<>();
for(int i = 0; i < nums.length; i++) {
int curr = nums[i];
Node lastJumpedOver = null;
while(!s.empty() && s.peek().val >= curr) {
lastJumpedOver = s.pop();
}
Node currNode = this.new Node();
currNode.val = curr;
currNode.index = i;
if(s.isEmpty()) {
root = currNode;
}
else {
s.peek().right = currNode;
}
currNode.left = lastJumpedOver;
s.push(currNode);
}
return root;
}
public int largestRectangleUnder(int low, int high, Node root, int [] nums) {
/* Base case: If the range is empty, the biggest rectangle we
* can fit is the empty rectangle.
*/
if(root == null) return 0;
if (low == high) {
if(0 <= low && low <= nums.length - 1) {
return nums[low];
}
return 0;
}
/* Assume the Cartesian tree nodes are annotated with their
* positions in the original array.
*/
int leftArea = -1 , rightArea= -1;
if(root.left != null) {
leftArea = largestRectangleUnder(low, root.index - 1 , root.left, nums);
}
if(root.right != null) {
rightArea = largestRectangleUnder(root.index + 1, high,root.right, nums);
}
return Math.max((high - low + 1) * root.val,
Math.max(leftArea, rightArea));
}
public int largestRectangleArea(int[] heights) {
if(heights == null || heights.length == 0 ) {
return 0;
}
if(heights.length == 1) {
return heights[0];
}
Node root = getCartesianTreeFromArray(heights);
return largestRectangleUnder(0, heights.length - 1, root, heights);
}
}
python-3
a=[3,4,7,4,6]
a.sort()
r=0
for i in range(len(a)):
if a[i]* (n-1) > r:
r = a[i]*(n-i)
print(r)
output:
16
I come across this question in one of interview. Was trying to solve this, resulting in observed following things -
Need to check consecutive left elements greater than current
element
Need to check consecutive right elements greater than
current element
Calculate area (number of left side max elements + number of right side max elements + 1) * current element
Check and replace existing maxArea if calculated area is greater than
maxArea
Following is the JS code implementing above pseudocode
function maxAreaCovered(arr) {
let maxArea = 0;
for (let index = 0; index < arr.length; index++) {
let l = index - 1;
let r = index + 1;
let maxEleCount = 0
while (l > -1) {
if (arr[l] >= arr[index]) {
maxEleCount++;
} else {
break;
}
l--;
}
while (r < arr.length) {
if (arr[r] >= arr[index]) {
maxEleCount++;
} else {
break;
}
r++;
}
let area = (maxEleCount + 1) * arr[index];
maxArea = Math.max(area, maxArea);
}
return maxArea
}
console.log(maxAreaCovered([6, 2, 5, 4, 5, 1, 6]));
You can use O(n) method which uses stack to calculate the maximum area under the histogram.
long long histogramArea(vector<int> &histo){
stack<int> s;
long long maxArea=0;
long long area= 0;
int i =0;
for (i = 0; i < histo.size();) {
if(s.empty() || histo[s.top()] <= histo[i]){
s.push(i++);
}
else{
int top = s.top(); s.pop();
area= histo[top]* (s.empty()?i:i-s.top()-1);
if(area >maxArea)
maxArea= area;
}
}
while(!s.empty()){
int top = s.top();s.pop();
area= histo[top]* (s.empty()?i:i-s.top()-1);
if(area >maxArea)
maxArea= area;
}
return maxArea;
}
For explanation you can read here http://www.geeksforgeeks.org/largest-rectangle-under-histogram/

Find combination that gives maximum profit

Say I have table that looks like this:
ID | Cost | Profit
1 | 0.55 | 1.24
2 | 0.23 | 3.11
3 | 0.19 | 2.21
4 | 0.53 | 1.49
... and so on.
How do I find the maximum profit if my budget is of certain value? ie: Find max profit with cost of 1.12. If from the table above, I should take item from ID 2 + 3 + 4 to maximize my profit.
I'm trying to google something from LP, Operation Research but I'm really Googling blindly, I don't know what are the keywords I should look for. Please help.
What you need is 0/1 Knapsack! I have assumed from the example you gave that you need to take the item as a whole and not a fraction of the item. Otherwise go for default (Fractional) Knapsack.
Get to know about the greedy version of solution & why it might fail, then the Fractional Knapsack Dynamic Programming style of solution for this problem and then 0/1 Knapsack! This is how you should learn to solve this problem.
Let me know if you need anything else on this.
Here is an O(2n + sort) algorithm that should produce good results in most cases
Calculate yield for each item (item.yield = item.profit / item.cost)
Sort items by yield (highest yield first)
Iterate over sorted items whilst remaining_budget > 0
Purchase number_of_items = floor(remaining_budget / item.cost) of item.id, subtract number_of_items * item.cost from remaining_cost
It won't always catch all edge cases if you are operating discretely, e.g. say you have budget = 100, and an item A with cost = 51, yield = 1.5 and another item B with cost = 50, yield = 1.4,
100 - 51 = 49 + 51 * 1.5 = 76.5 = 125,
One A results in 25.5 profit
100 - 2 * 50 = 0 + 2 * 50 * 1.4 = 70 = 140,
Two Bs result in 40 profit,
i.e. buying two individually worse ones is better due to using the entirety of the budget more efficiently
But if you're operating continuously it will begin to pick up performance gains as the budget stops being a restricting factor
Optimisations may include
Calculate yield at sort time, one less iteration over the entire data structure
Removing the sort step and instead "throwing away" worse choices as you iterate through, this can end up taking a lot longer to complete
Ending the iteration when remaining_budget < minumum_unit_cost, e.g. if there are a lot of high/mid-costs but no/very few low-costs, you can stop iterating earlier
Here i have a struct array option code you can modify for your 3d value array. Here I used the struct function to work this program.
struct item
{
double w,p,u;
};
you can use
//Fractional Knapsack
#include <vector>
#include <algorithm>
#include <iostream>
using namespace std;
struct item
{
double w,p,u;
};
bool compare( item a, item b)
{
return a.u>b.u;
}
int main()
{
int n,w;
item arry[100];
cin>>n>>w;
for(int i=0; i<n; i++)
{
cin>>arry[i].w>>arry[i].p;
arry[i].u = arry[i].p/arry[i].w;
}
sort(&arry[0],&arry[n],compare);
int p=0;
for (int i=0; i<n; i++)
{
if(w>arry[i].w)
{
p=p+arry[i].p;
w=w-arry[i].w;
}
else{
p=p+w*arry[i].u;
w=0;
}
}
cout<<"Total Profit: "<<p<<endl;
}

Quick sort - Not a stable sort

Stable sort talks about equal keys NOT getting past each other, after sorting
Consider duplicate key 4 at array index 8 & 9, in the below sequence,
a = [5 20 19 18 17 8 4 5 4 4] where pivot = 0, i = 1, j = 9
Partition logic says,
i pointer moves left to right. Move i as long as a[i] value is ≤ to a[pivot]. swap(a[i], a[j])
j pointer moves right to left. Move j as long as a[j] value is ≥ to a[pivot]. swap(a[i], a[j])
After following this procedure two times,
a = [5 4 19 18 17 8 4 5 4 20] Swap done at i = 1 & j = 9.
a = [5 4 19 18 17 8 4 5 4 20] Stops at i = 2 & j = 8
a = [5 4 4 18 17 8 4 5 19 20] Swap done at i = 2 & j = 8
My understanding is, as duplicate key 4 lost their order after two swaps, Quick sort is not stable sort.
Question:
As per my understanding, Is this the reason for Quick sort not being stable? If yes, Do we have any alternative partition approach to maintain the order of key 4 in the above example?
There's nothing in the definition of Quicksort per se that makes it either stable or unstable. It can be either.
The most common implementation of Quicksort on arrays involves partitioning via swaps between a pair of pointers, one progressing from end to beginning and the other from beginning to end. This does produce an unstable Quicksort.
While this method of partitioning is certainly common, it's not a requirement for the algorithm to be a Quicksort. It's just a method that's simple and common when applying Quicksort to an array.
On the other hand, consider doing a quicksort on a singly linked list. In this case, you typically do the partitioning by creating two separate linked lists, one containing those smaller than the pivot value, the other containing those larger than the pivot value. Since you always traverse the list from beginning to end (not many other reasonable choices with a singly linked list). As long as you add each element to the end of the sub-list, the sub-lists you create contain equal keys in their original order. Thus, the result is a stable sort. On the other hand, if you don't care about stability, you can splice elements to the beginnings of the sub-lists (slightly easy to do with constant complexity). in this case, the sort will (again) be unstable.
The actual mechanics of partitioning a linked list are pretty trivial, as long as you don't get too fancy in choosing the partition.
node *list1 = dummy_node1;
node *add1 = list1;
node *list2 = dummy_node2;
node *add2 = list2;
T pivot = input->data; // easiest pivot value to choose
for (node *current = input; current != nullptr; current = current->next)
if (current->data < pivot) {
add1->next = current;
add1 = add1 -> next;
}
else {
add2->next = current;
add2 = add2->next;
}
add1->next = nullptr;
add2->next = nullptr;

Sieve optimization

A sequence is created from sequence of natural numbers:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
removing every 2nd number in the 2nd step:
1 3 5 7 9 11 13 15 17 19 21 23
removing every 3rd number in the 3rd step (from previous sequence):
1 3 7 9 13 15 19 21
removing every 4th number in the 4th step (from previous sequence):
1 3 7 13 19
and so forth...
Now, we're able to say, that the 4th number of the sequence will be 13.
Definition and the right solution for this is here: http://oeis.org/A000960
My task is to find a 1000th member of the sequence.
I have written an algorithm for this, but I think it's quite slow (when I try it with 10.000th member it takes about 13 seconds). What it does is:
I have number which increases by 2 in every step, since we know
that there ain't no even numbers.
In counters array I store indexes for each step. If the number is
xth in xth step, i have to remove it, e.g. number 5 in 3rd step. And
I initiate a counter for the next step.
ArrayList<Long> list = new ArrayList<Long>(10000);
long[] counters = new long[1002];
long number = -1;
int active_counter = 3;
boolean removed;
counters[active_counter] = 1;
int total_numbers = 1;
while (total_numbers <= 1000) {
number += 2;
removed = false;
for (int i = 3; i <= active_counter; i++) {
if ((counters[i] % i) == 0) {
removed = true;
if (i == active_counter) {
active_counter++;
counters[active_counter] = i;
}
counters[i]++;
break;
}
counters[i]++;
}
if (!removed) {
list.add(number);
total_numbers++;
}
}
Your link to OEIS gives us some methods for fast calculation (FORMULA etc)
Implementation of the second one:
function Flavius(n: Integer): Integer;
var
m, i: Integer;
begin
m := n * n;
for i := n - 1 downto 1 do
m := (m - 1) - (m - 1) mod i;
Result := m;
end;
P.S. Algorithm is linear (O(n)), and result for n=10000 is 78537769
No this problem is not NP hard...
I have the intuition it is O(n^2), and the link proove it:
Let F(n) = number of terms <= n. Andersson, improving results of Brun,
shows that F(n) = 2 sqrt(n/Pi) + O(n^(1/6)). Hence a(n) grows like Pi n^2 / 4.
It think O(n^2) should not be give 15s for n = 10000. Yes there is something not correct :(
Edit :
I measured the number of access to counters (for n = 10000)to get a rough idea of the complexity and I have
F = 1305646150
F/n^2 = 13.05...
Your algorithm is between O(n^2) and O(n^2*(logn)) so you are doing things right.... :)
Wow, that is a really interesting problem.
Thank you so much for that.
I just lost an hour of my life to this. I think the problem will turn out to be NP-hard. And I am at a loss to generate an equation to calculate the ith term in the jth step.
Your "brute force" solution seems fine unless there is some clever math trick to generate the final solution in one step. But I do not think there is.
From a programming standpoint, you could try making your initial array a linked list and just un-linking the terms you want to drop. That would save you some time, since you wouldn't be rebuilding your list every step.
One approach could be to keep an array of the numbers you are using to sieve, rather than the numbers being sieved. Basically, if you are looking for the Nth value in the sequence, you create an array of N counters and then iterate through the natural numbers. For each number, you loop through your counters, incrementing them until one gets to its "maximum" value, at which point you set that counter to zero and stop incrementing the remaining counters. (This represents removing the current number at that counter's step.) If you get through all of the counters without removing the current number, then this is one of the numbers that is left over.
Some sample (Java) code that seems to match the sequence given by OEIS:
public class Test {
public static void main(String[] args) {
int N=10000;
int n=0;
long c=0;
int[] counters = new int[N];
outer: while(n<N) {
c++;
for(int i=0;i<N;i++){
counters[i]++;
if(counters[i]==i+2){
counters[i]=0;
continue outer;
}
}
// c is the n'th leftover
System.out.println(n + " " + c);
n++;
}
}
}
I believe this runs in O(N^3).

Removal of billboards from given ones

I came across this question
ADZEN is a very popular advertising firm in your city. In every road
you can see their advertising billboards. Recently they are facing a
serious challenge , MG Road the most used and beautiful road in your
city has been almost filled by the billboards and this is having a
negative effect on
the natural view.
On people's demand ADZEN has decided to remove some of the billboards
in such a way that there are no more than K billboards standing together
in any part of the road.
You may assume the MG Road to be a straight line with N billboards.Initially there is no gap between any two adjecent
billboards.
ADZEN's primary income comes from these billboards so the billboard removing process has to be done in such a way that the
billboards
remaining at end should give maximum possible profit among all possible final configurations.Total profit of a configuration is the
sum of the profit values of all billboards present in that
configuration.
Given N,K and the profit value of each of the N billboards, output the maximum profit that can be obtained from the remaining
billboards under the conditions given.
Input description
1st line contain two space seperated integers N and K. Then follow N lines describing the profit value of each billboard i.e ith
line contains the profit value of ith billboard.
Sample Input
6 2
1
2
3
1
6
10
Sample Output
21
Explanation
In given input there are 6 billboards and after the process no more than 2 should be together. So remove 1st and 4th
billboards giving a configuration _ 2 3 _ 6 10 having a profit of 21.
No other configuration has a profit more than 21.So the answer is 21.
Constraints
1 <= N <= 1,00,000(10^5)
1 <= K <= N
0 <= profit value of any billboard <= 2,000,000,000(2*10^9)
I think that we have to select minimum cost board in first k+1 boards and then repeat the same untill last,but this was not giving correct answer
for all cases.
i tried upto my knowledge,but unable to find solution.
if any one got idea please kindly share your thougths.
It's a typical DP problem. Lets say that P(n,k) is the maximum profit of having k billboards up to the position n on the road. Then you have following formula:
P(n,k) = max(P(n-1,k), P(n-1,k-1) + C(n))
P(i,0) = 0 for i = 0..n
Where c(n) is the profit from putting the nth billboard on the road. Using that formula to calculate P(n, k) bottom up you'll get the solution in O(nk) time.
I'll leave up to you to figure out why that formula holds.
edit
Dang, I misread the question.
It still is a DP problem, just the formula is different. Let's say that P(v,i) means the maximum profit at point v where last cluster of billboards has size i.
Then P(v,i) can be described using following formulas:
P(v,i) = P(v-1,i-1) + C(v) if i > 0
P(v,0) = max(P(v-1,i) for i = 0..min(k, v))
P(0,0) = 0
You need to find max(P(n,i) for i = 0..k)).
This problem is one of the challenges posted in www.interviewstreet.com ...
I'm happy to say I got this down recently, but not quite satisfied and wanted to see if there's a better method out there.
soulcheck's DP solution above is straightforward, but won't be able to solve this completely due to the fact that K can be as big as N, meaning the DP complexity will be O(NK) for both runtime and space.
Another solution is to do branch-and-bound, keeping track the best sum so far, and prune the recursion if at some level, that is, if currSumSoFar + SUM(a[currIndex..n)) <= bestSumSoFar ... then exit the function immediately, no point of processing further when the upper-bound won't beat best sum so far.
The branch-and-bound above got accepted by the tester for all but 2 test-cases.
Fortunately, I noticed that the 2 test-cases are using small K (in my case, K < 300), so the DP technique of O(NK) suffices.
soulcheck's (second) DP solution is correct in principle. There are two improvements you can make using these observations:
1) It is unnecessary to allocate the entire DP table. You only ever look at two rows at a time.
2) For each row (the v in P(v, i)), you are only interested in the i's which most increase the max value, which is one more than each i that held the max value in the previous row. Also, i = 1, otherwise you never consider blanks.
I coded it in c++ using DP in O(nlogk).
Idea is to maintain a multiset with next k values for a given position. This multiset will typically have k values in mid processing. Each time you move an element and push new one. Art is how to maintain this list to have the profit[i] + answer[i+2]. More details on set:
/*
* Observation 1: ith state depends on next k states i+2....i+2+k
* We maximize across this states added on them "accumulative" sum
*
* Let Say we have list of numbers of state i+1, that is list of {profit + state solution}, How to get states if ith solution
*
* Say we have following data k = 3
*
* Indices: 0 1 2 3 4
* Profits: 1 3 2 4 2
* Solution: ? ? 5 3 1
*
* Answer for [1] = max(3+3, 5+1, 9+0) = 9
*
* Indices: 0 1 2 3 4
* Profits: 1 3 2 4 2
* Solution: ? 9 5 3 1
*
* Let's find answer for [0], using set of [1].
*
* First, last entry should be removed. then we have (3+3, 5+1)
*
* Now we should add 1+5, but entries should be incremented with 1
* (1+5, 4+3, 6+1) -> then find max.
*
* Could we do it in other way but instead of processing list. Yes, we simply add 1 to all elements
*
* answer is same as: 1 + max(1-1+5, 3+3, 5+1)
*
*/
ll dp()
{
multiset<ll, greater<ll> > set;
mem[n-1] = profit[n-1];
ll sumSoFar = 0;
lpd(i, n-2, 0)
{
if(sz(set) == k)
set.erase(set.find(added[i+k]));
if(i+2 < n)
{
added[i] = mem[i+2] - sumSoFar;
set.insert(added[i]);
sumSoFar += profit[i];
}
if(n-i <= k)
mem[i] = profit[i] + mem[i+1];
else
mem[i] = max(mem[i+1], *set.begin()+sumSoFar);
}
return mem[0];
}
This looks like a linear programming problem. This problem would be linear, but for the requirement that no more than K adjacent billboards may remain.
See wikipedia for a general treatment: http://en.wikipedia.org/wiki/Linear_programming
Visit your university library to find a good textbook on the subject.
There are many, many libraries to assist with linear programming, so I suggest you do not attempt to code an algorithm from scratch. Here is a list relevant to Python: http://wiki.python.org/moin/NumericAndScientific/Libraries
Let P[i] (where i=1..n) be the maximum profit for billboards 1..i IF WE REMOVE billboard i. It is trivial to calculate the answer knowing all P[i]. The baseline algorithm for calculating P[i] is as follows:
for i=1,N
{
P[i]=-infinity;
for j = max(1,i-k-1)..i-1
{
P[i] = max( P[i], P[j] + C[j+1]+..+C[i-1] );
}
}
Now the idea that allows us to speed things up. Let's say we have two different valid configurations of billboards 1 through i only, let's call these configurations X1 and X2. If billboard i is removed in configuration X1 and profit(X1) >= profit(X2) then we should always prefer configuration X1 for billboards 1..i (by profit() I meant the profit from billboards 1..i only, regardless of configuration for i+1..n). This is as important as it is obvious.
We introduce a doubly-linked list of tuples {idx,d}: {{idx1,d1}, {idx2,d2}, ..., {idxN,dN}}.
p->idx is index of the last billboard removed. p->idx is increasing as we go through the list: p->idx < p->next->idx
p->d is the sum of elements (C[p->idx]+C[p->idx+1]+..+C[p->next->idx-1]) if p is not the last element in the list. Otherwise it is the sum of elements up to the current position minus one: (C[p->idx]+C[p->idx+1]+..+C[i-1]).
Here is the algorithm:
P[1] = 0;
list.AddToEnd( {idx=0, d=C[0]} );
// sum of elements starting from the index at top of the list
sum = C[0]; // C[list->begin()->idx]+C[list->begin()->idx+1]+...+C[i-1]
for i=2..N
{
if( i - list->begin()->idx > k + 1 ) // the head of the list is "too far"
{
sum = sum - list->begin()->d
list.RemoveNodeFromBeginning()
}
// At this point the list should containt at least the element
// added on the previous iteration. Calculating P[i].
P[i] = P[list.begin()->idx] + sum
// Updating list.end()->d and removing "unnecessary nodes"
// based on the criterion described above
list.end()->d = list.end()->d + C[i]
while(
(list is not empty) AND
(P[i] >= P[list.end()->idx] + list.end()->d - C[list.end()->idx]) )
{
if( list.size() > 1 )
{
list.end()->prev->d += list.end()->d
}
list.RemoveNodeFromEnd();
}
list.AddToEnd( {idx=i, d=C[i]} );
sum = sum + C[i]
}
//shivi..coding is adictive!!
#include<stdio.h>
long long int arr[100001];
long long int sum[100001];
long long int including[100001],excluding[100001];
long long int maxim(long long int a,long long int b)
{if(a>b) return a;return b;}
int main()
{
int N,K;
scanf("%d%d",&N,&K);
for(int i=0;i<N;++i)scanf("%lld",&arr[i]);
sum[0]=arr[0];
including[0]=sum[0];
excluding[0]=sum[0];
for(int i=1;i<K;++i)
{
sum[i]+=sum[i-1]+arr[i];
including[i]=sum[i];
excluding[i]=sum[i];
}
long long int maxi=0,temp=0;
for(int i=K;i<N;++i)
{
sum[i]+=sum[i-1]+arr[i];
for(int j=1;j<=K;++j)
{
temp=sum[i]-sum[i-j];
if(i-j-1>=0)
temp+=including[i-j-1];
if(temp>maxi)maxi=temp;
}
including[i]=maxi;
excluding[i]=including[i-1];
}
printf("%lld",maxim(including[N-1],excluding[N-1]));
}
//here is the code...passing all but 1 test case :) comment improvements...simple DP

Resources