How to represent a Pascal-esque (equilateral) triangle in memory - data-structures

I was working on Project Euler Problem 18 (I did solve the problem; I'm not cheating. "Proof" here) and found myself in need of a way to represent a data structure that looks like a Pascal triangle, but with different values. It looks very similar to a binary tree, but there's a very important distinction: a node's children are not exclusively its children. So the first three rows look like this:
75
/ \
95 64
/ \ / \
17 47 82
Note that 47 has two parents.
It's pretty easy to represent this as a linked structure, or even a two-dimensional array, but I'm hoping that there's a more elegant way. I love binary trees, mainly for how you can allocate a single chunk of memory, treat it as an array, and navigate between children and parent with a couple of arithmetic operations or integer division. Is there a way to do the same for this data structure?
My best solution involved using a two-dimensional array (where it's very easy to find children and parents). I dislike this implementation because (at least the way I did it) I called malloc for every row, even though I knew how big the structure would be ahead of time.
My question is very similar to this one, but I wasn't happy with the accepted answer. A comment alludes to the solution I seek, but no explanation is given.
Edit: To clarify, I'm looking for a way to index into a one-dimensional array in the same way that an binary tree stuffed sequentially into an array (starting at 1) gives the property that the children of a node at index i are at indexes 2 * i and 2 * i + 1. I'm also not very concerned about being able to find parents, so don't worry too much about the weird two parent.

Yes, it is possible to store a triangular data structure in a one-dimensional array (example in Java):
class Triangle<T> {
private T[] triangle;
public Triangle(T[] array, int rows) {
if (array.length != triangleNumber(rows)) {
throw new IllegalArgumentException("Array wrong size");
}
triangle = array;
}
public T get(int row, int col) {
return triangle[index(row, col)];
}
public void set(int row, int col, T val) {
triangle[index(row, col)] = val;
}
private int triangleNumber(int rows) {
return rows * (rows + 1) / 2;
}
private int index(int row, int col) {
if (row < 0 || col < 0 || col > row) {
throw new IndexOutOfBoundsException("Trying to access outside of triangle");
}
return triangleNumber(row) + col;
}
}
The array passed to the constructor is formed by concatenating the rows of the triangle one-by-one into the array: [t(0,0), t(1,0), t(1,1), t(2,0), t(2,1), t(2,2), ... , t(rows-1, rows-1)], where t(R, C) is the triangle cell at triangle row R and triangle column C.
For any cell (row, col):
left child would be at row+1, col
right child would be at row+1, col+1
left parent would be at row-1, col-1
right parent would be at row-1, col
Two parents and two children do not exist for all cells because they would lie outside the triangle. See the exception check in the index method.

Yes there is :
We start with your idea of a two-dimentional array , but with irregular row length.So each element is indexed by a two dimentional index (r,c);
(1,1)
(2,1)(2,2)
(3,1)(3,2)(3,3)
(4,1)(4,2)(4,3)(4,4)
Because the relationship are regular, you can express the positions we have :
for a node (r,c) is childrens are (r+1,min(1,c)),(r+1,max(c+1,r)) and his parent are : (r-1,min(1,c-1)),(r-1,max(c,r))

Related

How to optimise my solution to HackerRank's Largest Rectangle problem? [duplicate]

I have a histogram with integer heights and constant width 1. I want to maximize the rectangular area under a histogram.
e.g.:
_
| |
| |_
| |
| |_
| |
The answer for this would be 6, 3 * 2, using col1 and col2.
O(n^2) brute force is clear to me, I would like an O(n log n) algorithm. I'm trying to think dynamic programming along the lines of maximum increasing subsequence O(n log n) algo, but am not going forward. Should I use divide and conquer algorithm?
PS: People with enough reputation are requested to remove the divide-and-conquer tag if there is no such solution.
After mho's comments: I mean the area of largest rectangle that fits entirely. (Thanks j_random_hacker for clarifying :) ).
The above answers have given the best O(n) solution in code, however, their explanations are quite tough to comprehend. The O(n) algorithm using a stack seemed magic to me at first, but right now it makes every sense to me. OK, let me explain it.
First observation:
To find the maximal rectangle, if for every bar x, we know the first smaller bar on its each side, let's say l and r, we are certain that height[x] * (r - l - 1) is the best shot we can get by using height of bar x. In the figure below, 1 and 2 are the first smaller of 5.
OK, let's assume we can do this in O(1) time for each bar, then we can solve this problem in O(n)! by scanning each bar.
Then, the question comes: for every bar, can we really find the first smaller bar on its left and on its right in O(1) time? That seems impossible right? ... It is possible, by using a increasing stack.
Why using an increasing stack can keep track of the first smaller on its left and right?
Maybe by telling you that an increasing stack can do the job is not convincing at all, so I will walk you through this.
Firstly, to keep the stack increasing, we need one operation:
while x < stack.top():
stack.pop()
stack.push(x)
Then you can check that in the increasing stack (as depicted below), for stack[x], stack[x-1] is the first smaller on its left, then a new element that can pop stack[x] out is the first smaller on its right.
Still can't believe stack[x-1] is the first smaller on the left on stack[x]?
I will prove it by contradiction.
First of all, stack[x-1] < stack[x] is for sure. But let's assume stack[x-1] is not the first smaller on the left of stack[x].
So where is the first smaller fs?
If fs < stack[x-1]:
stack[x-1] will be popped out by fs,
else fs >= stack[x-1]:
fs shall be pushed into stack,
Either case will result fs lie between stack[x-1] and stack[x], which is contradicting to the fact that there is no item between stack[x-1] and stack[x].
Therefore stack[x-1] must be the first smaller.
Summary:
Increasing stack can keep track of the first smaller on left and right for each element. By using this property, the maximal rectangle in histogram can be solved by using a stack in O(n).
Congratulations! This is really a tough problem, I'm glad my prosaic explanation didn't stop you from finishing. Attached is my proved solution as your reward :)
def largestRectangleArea(A):
ans = 0
A = [-1] + A
A.append(-1)
n = len(A)
stack = [0] # store index
for i in range(n):
while A[i] < A[stack[-1]]:
h = A[stack.pop()]
area = h*(i-stack[-1]-1)
ans = max(ans, area)
stack.append(i)
return ans
There are three ways to solve this problem in addition to the brute force approach. I will write down all of them. The java codes have passed tests in an online judge site called leetcode: http://www.leetcode.com/onlinejudge#question_84. so I am confident codes are correct.
Solution 1: dynamic programming + n*n matrix as cache
time: O(n^2), space: O(n^2)
Basic idea: use the n*n matrix dp[i][j] to cache the minimal height between bar[i] and bar[j]. Start filling the matrix from rectangles of width 1.
public int solution1(int[] height) {
int n = height.length;
if(n == 0) return 0;
int[][] dp = new int[n][n];
int max = Integer.MIN_VALUE;
for(int width = 1; width <= n; width++){
for(int l = 0; l+width-1 < n; l++){
int r = l + width - 1;
if(width == 1){
dp[l][l] = height[l];
max = Math.max(max, dp[l][l]);
} else {
dp[l][r] = Math.min(dp[l][r-1], height[r]);
max = Math.max(max, dp[l][r] * width);
}
}
}
return max;
}
Solution 2: dynamic programming + 2 arrays as cache.
time: O(n^2), space: O(n)
Basic idea: this solution is like solution 1, but saves some space. The idea is that in solution 1 we build the matrix from row 1 to row n. But in each iteration, only the previous row contributes to the building of the current row. So we use two arrays as previous row and current row by turns.
public int Solution2(int[] height) {
int n = height.length;
if(n == 0) return 0;
int max = Integer.MIN_VALUE;
// dp[0] and dp[1] take turns to be the "previous" line.
int[][] dp = new int[2][n];
for(int width = 1; width <= n; width++){
for(int l = 0; l+width-1 < n; l++){
if(width == 1){
dp[width%2][l] = height[l];
} else {
dp[width%2][l] = Math.min(dp[1-width%2][l], height[l+width-1]);
}
max = Math.max(max, dp[width%2][l] * width);
}
}
return max;
}
Solution 3: use stack.
time: O(n), space:O(n)
This solution is tricky and I learnt how to do this from explanation without graphs and explanation with graphs. I suggest you read the two links before reading my explanation below. It's hard to explain without graphs so my explanations might be hard to follow.
Following are my explanations:
For each bar, we must be able to find the biggest rectangle containing this bar. So the biggest one of these n rectangles is what we want.
To get the biggest rectangle for a certain bar (let's say bar[i], the (i+1)th bar), we just need to find out the biggest interval
that contains this bar. What we know is that all the bars in this interval must be at least the same height with bar[i]. So if we figure out how many
consecutive same-height-or-higher bars are there on the immediate left of bar[i], and how many consecutive same-height-or-higher bars are there on the immediate right of the bar[i], we
will know the length of the interval, which is the width of the biggest rectangle for bar[i].
To count the number of consecutive same-height-or-higher bars on the immediate left of bar[i], we only need to find the closest bar on the left that is shorter
than the bar[i], because all the bars between this bar and bar[i] will be consecutive same-height-or-higher bars.
We use a stack to dynamicly keep track of all the left bars that are shorter than a certain bar. In other words, if we iterate from the first bar to bar[i], when we just arrive at the bar[i] and haven't updated the stack,
the stack should store all the bars that are no higher than bar[i-1], including bar[i-1] itself. We compare bar[i]'s height with every bar in the stack until we find one that is shorter than bar[i], which is the cloest shorter bar.
If the bar[i] is higher than all the bars in the stack, it means all bars on the left of bar[i] are higher than bar[i].
We can do the same thing on the right side of the i-th bar. Then we know for bar[i] how many bars are there in the interval.
public int solution3(int[] height) {
int n = height.length;
if(n == 0) return 0;
Stack<Integer> left = new Stack<Integer>();
Stack<Integer> right = new Stack<Integer>();
int[] width = new int[n];// widths of intervals.
Arrays.fill(width, 1);// all intervals should at least be 1 unit wide.
for(int i = 0; i < n; i++){
// count # of consecutive higher bars on the left of the (i+1)th bar
while(!left.isEmpty() && height[i] <= height[left.peek()]){
// while there are bars stored in the stack, we check the bar on the top of the stack.
left.pop();
}
if(left.isEmpty()){
// all elements on the left are larger than height[i].
width[i] += i;
} else {
// bar[left.peek()] is the closest shorter bar.
width[i] += i - left.peek() - 1;
}
left.push(i);
}
for (int i = n-1; i >=0; i--) {
while(!right.isEmpty() && height[i] <= height[right.peek()]){
right.pop();
}
if(right.isEmpty()){
// all elements to the right are larger than height[i]
width[i] += n - 1 - i;
} else {
width[i] += right.peek() - i - 1;
}
right.push(i);
}
int max = Integer.MIN_VALUE;
for(int i = 0; i < n; i++){
// find the maximum value of all rectangle areas.
max = Math.max(max, width[i] * height[i]);
}
return max;
}
Implementation in Python of the #IVlad's answer O(n) solution:
from collections import namedtuple
Info = namedtuple('Info', 'start height')
def max_rectangle_area(histogram):
"""Find the area of the largest rectangle that fits entirely under
the histogram.
"""
stack = []
top = lambda: stack[-1]
max_area = 0
pos = 0 # current position in the histogram
for pos, height in enumerate(histogram):
start = pos # position where rectangle starts
while True:
if not stack or height > top().height:
stack.append(Info(start, height)) # push
elif stack and height < top().height:
max_area = max(max_area, top().height*(pos-top().start))
start, _ = stack.pop()
continue
break # height == top().height goes here
pos += 1
for start, height in stack:
max_area = max(max_area, height*(pos-start))
return max_area
Example:
>>> f = max_rectangle_area
>>> f([5,3,1])
6
>>> f([1,3,5])
6
>>> f([3,1,5])
5
>>> f([4,8,3,2,0])
9
>>> f([4,8,3,1,1,0])
9
Linear search using a stack of incomplete subproblems
Copy-paste algorithm's description (in case the page goes down):
We process the elements in
left-to-right order and maintain a
stack of information about started but
yet unfinished subhistograms. Whenever
a new element arrives it is subjected
to the following rules. If the stack
is empty we open a new subproblem by
pushing the element onto the stack.
Otherwise we compare it to the element
on top of the stack. If the new one is
greater we again push it. If the new
one is equal we skip it. In all these
cases, we continue with the next new
element. If the new one is less, we
finish the topmost subproblem by
updating the maximum area w.r.t. the
element at the top of the stack. Then,
we discard the element at the top, and
repeat the procedure keeping the
current new element. This way, all
subproblems are finished until the
stack becomes empty, or its top
element is less than or equal to the
new element, leading to the actions
described above. If all elements have
been processed, and the stack is not
yet empty, we finish the remaining
subproblems by updating the maximum
area w.r.t. to the elements at the
top.
For the update w.r.t. an element, we
find the largest rectangle that
includes that element. Observe that an
update of the maximum area is carried
out for all elements except for those
skipped. If an element is skipped,
however, it has the same largest
rectangle as the element on top of the
stack at that time that will be
updated later. The height of the
largest rectangle is, of course, the
value of the element. At the time of
the update, we know how far the
largest rectangle extends to the right
of the element, because then, for the
first time, a new element with smaller
height arrived. The information, how
far the largest rectangle extends to
the left of the element, is available
if we store it on the stack, too.
We therefore revise the procedure
described above. If a new element is
pushed immediately, either because the
stack is empty or it is greater than
the top element of the stack, the
largest rectangle containing it
extends to the left no farther than
the current element. If it is pushed
after several elements have been
popped off the stack, because it is
less than these elements, the largest
rectangle containing it extends to the
left as far as that of the most
recently popped element.
Every element is pushed and popped at
most once and in every step of the
procedure at least one element is
pushed or popped. Since the amount of
work for the decisions and the update
is constant, the complexity of the
algorithm is O(n) by amortized
analysis.
The other answers here have done a great job presenting the O(n)-time, O(n)-space solution using two stacks. There's another perspective on this problem that independently provides an O(n)-time, O(n)-space solution to the problem, and might provide a little bit more insight as to why the stack-based solution works.
The key idea is to use a data structure called a Cartesian tree. A Cartesian tree is a binary tree structure (though not a binary search tree) that's built around an input array. Specifically, the root of the Cartesian tree is built above the minimum element of the array, and the left and right subtrees are recursively constructed from the subarrays to the left and right of the minimum value.
For example, here's a sample array and its Cartesian tree:
+----------------------- 23 ------+
| |
+------------- 26 --+ +-- 79
| | |
31 --+ 53 --+ 84
| |
41 --+ 58 -------+
| |
59 +-- 93
|
97
+----+----+----+----+----+----+----+----+----+----+----+
| 31 | 41 | 59 | 26 | 53 | 58 | 97 | 93 | 23 | 84 | 79 |
+----+----+----+----+----+----+----+----+----+----+----+
The reason that Cartesian trees are useful in this problem is that the question at hand has a really nice recursive structure to it. Begin by looking at the lowest rectangle in the histogram. There are three options for where the maximum rectangle could end up being placed:
It could pass right under the minimum value in the histogram. In that case, to make it as large as possible, we'd want to make it as wide as the entire array.
It could be entirely to the left of the minimum value. In that case, we recursively want the answer formed from the subarray purely to the left of the minimum value.
It could be entirely to the right of the minimum value. In that case, we recursively want the answer formed from the subarray purely to the right of the minimum value.
Notice that this recursive structure - find the minimum value, do something with the subarrays to the left and the right of that value - perfectly matches the recursive structure of a Cartesian tree. In fact, if we can create a Cartesian tree for the overall array when we get started, we can then solve this problem by recursively walking the Cartesian tree from the root downward. At each point, we recursively compute the optimal rectangle in the left and right subarrays, along with the rectangle you'd get by fitting right under the minimum value, and then return the best option we find.
In pseudocode, this looks like this:
function largestRectangleUnder(int low, int high, Node root) {
/* Base case: If the range is empty, the biggest rectangle we
* can fit is the empty rectangle.
*/
if (low == high) return 0;
/* Assume the Cartesian tree nodes are annotated with their
* positions in the original array.
*/
return max {
(high - low) * root.value, // Widest rectangle under the minimum
largestRectangleUnder(low, root.index, root.left),
largestRectnagleUnder(root.index + 1, high, root.right)
}
}
Once we have the Cartesian tree, this algorithm takes time O(n), since we visit each node exactly once and do O(1) work per node.
It turns out that there's a simple, linear-time algorithm for building Cartesian trees. The "natural" way you'd probably think to build one would be to scan across the array, find the minimum value, then recursively build a Cartesian tree from the left and right subarrays. The problem is that the process of finding the minimum value is really expensive, and this can take time Θ(n2).
The "fast" way to build a Cartesian tree is by scanning the array from the left to the right, adding in one element at a time. This algorithm is based on the following observations about Cartesian trees:
First, Cartesian trees obey the heap property: every element is less than or equal to its children. The reason for this is that the Cartesian tree root is the smallest value in the overall array, and its children are the smallest elements in their subarrays, etc.
Second, if you do an inorder traversal of a Cartesian tree, you get back the elements of the array in the order in which they appear. To see why this is, notice that if you do an inorder traversal of a Cartesian tree, you first visit everything to the left of the minimum value, then the minimum value, then everything to the right of the minimum value. Those visitations are recursively done the same way, so everything ends up being visited in order.
These two rules give us a lot of information about what happens if we start with a Cartesian tree of the first k elements of the array and want to form a Cartesian tree for the first k+1 elements. That new element will have to end up on the right spine of the Cartesian tree - the part of the tree formed by starting at the root and only taking steps to the right - because otherwise something would come after it in an inorder traversal. And, within that right spine, it has to be placed in a way that makes it bigger than everything above it, since we need to obey the heap property.
The way that you actually add a new node to the Cartesian tree is to start at the rightmost node in the tree and walk upwards until you either hit the root of the tree or find a node that has a smaller value. You then make the new value have as its left child the last node it walked up on top of.
Here's a trace of that algorithm on a small array:
+---+---+---+---+
| 2 | 4 | 3 | 1 |
+---+---+---+---+
2 becomes the root.
2 --+
|
4
4 is bigger than 2, we can't move upwards. Append to right.
+---+---+---+---+
| 2 | 4 | 3 | 1 |
+---+---+---+---+
2 ------+
|
--- 3
|
4
3 is lesser than 4, climb over it. Can't climb further over 2, as it is smaller than 3. Climbed over subtree rooted at 4 goes to the left of new value 3 and 3 becomes rightmost node now.
+---+---+---+---+
| 2 | 4 | 3 | 1 |
+---+---+---+---+
+---------- 1
|
2 ------+
|
--- 3
|
4
1 climbs over the root 2, the entire tree rooted at 2 is moved to left of 1, and 1 is now the new root - and also the rightmost value.
+---+---+---+---+
| 2 | 4 | 3 | 1 |
+---+---+---+---+
Although this might not seem to run in linear time - wouldn't you potentially end up climbing all the way to the root of the tree over and over and over again? - you can show that this runs in linear time using a clever argument. If you climb up over a node in the right spine during an insertion, that node ends up getting moved off the right spine and therefore can't be rescanned in a future insertion. Therefore, every node is only ever scanned over at most once, so the total work done is linear.
And now the kicker - the standard way that you'd actually implement this approach is by maintaining a stack of the values that correspond to the nodes on the right spine. The act of "walking up" and over a node corresponds to popping a node off the stack. Therefore, the code for building a Cartesian tree looks something like this:
Stack s;
for (each array element x) {
pop s until it's empty or s.top > x
push x onto the stack.
do some sort of pointer rewiring based on what you just did.
}
The stack manipulations here might seem really familiar, and that's because these are the exact stack operations that you would do in the answers shown elsewhere here. In fact, you can think of what those approaches are doing as implicitly building the Cartesian tree and running the recursive algorithm shown above in the process of doing so.
The advantage, I think, of knowing about Cartesian trees is that it provides a really nice conceptual framework for seeing why this algorithm works correctly. If you know that what you're doing is running a recursive walk of a Cartesian tree, it's easier to see that you're guaranteed to find the largest rectangle. Plus, knowing that the Cartesian tree exists gives you a useful tool for solving other problems. Cartesian trees show up in the design of fast data structures for the range minimum query problem and are used to convert suffix arrays into suffix trees.
Here's some Java code that implements this idea, courtesy of #Azeem!
import java.util.Stack;
public class CartesianTreeMakerUtil {
private static class Node {
int val;
Node left;
Node right;
}
public static Node cartesianTreeFor(int[] nums) {
Node root = null;
Stack<Node> s = new Stack<>();
for(int curr : nums) {
Node lastJumpedOver = null;
while(!s.empty() && s.peek().val > curr) {
lastJumpedOver = s.pop();
}
Node currNode = this.new Node();
currNode.val = curr;
if(s.isEmpty()) {
root = currNode;
}
else {
s.peek().right = currNode;
}
currNode.left = lastJumpedOver;
s.push(currNode);
}
return root;
}
public static void printInOrder(Node root) {
if(root == null) return;
if(root.left != null ) {
printInOrder(root.left);
}
System.out.println(root.val);
if(root.right != null) {
printInOrder(root.right);
}
}
public static void main(String[] args) {
int[] nums = new int[args.length];
for (int i = 0; i < args.length; i++) {
nums[i] = Integer.parseInt(args[i]);
}
Node root = cartesianTreeFor(nums);
tester.printInOrder(root);
}
}
The easiest solution in O(N)
long long getMaxArea(long long hist[], long long n)
{
stack<long long> s;
long long max_area = 0;
long long tp;
long long area_with_top;
long long i = 0;
while (i < n)
{
if (s.empty() || hist[s.top()] <= hist[i])
s.push(i++);
else
{
tp = s.top(); // store the top index
s.pop(); // pop the top
area_with_top = hist[tp] * (s.empty() ? i : i - s.top() - 1);
if (max_area < area_with_top)
{
max_area = area_with_top;
}
}
}
while (!s.empty())
{
tp = s.top();
s.pop();
area_with_top = hist[tp] * (s.empty() ? i : i - s.top() - 1);
if (max_area < area_with_top)
max_area = area_with_top;
}
return max_area;
}
There is also another solution using Divide and Conquer. The algorithm for it is :
1) Divide the array into 2 parts with the smallest height as the breaking point
2) The maximum area is the maximum of :
a) Smallest height * size of the array
b) Maximum rectangle in left half array
c) Maximum rectangle in right half array
The time complexity comes to O(nlogn)
The stack solution is one of the most clever solutions I've seen till date. And it can be a little hard to understand why that works.
I've taken a jab at explaining the same in some detail here.
Summary points from the post:-
General way our brain thinks is :-
Create every situation and try to find the value of the contraint that is needed to solve the problem.
And we happily convert that to code as :- find the value of contraint(min) for each situation(pair(i,j))
The clever solutions tries to flip the problem.For each constraint/min value of tha area, what is the best possible left and right extremes ?
So if we traverse over each possible min in the array. What are the left and right extremes for each value ?
Little thought says, the first left most value less than the current min and similarly the first rightmost value that is lesser than the current min.
So now we need to see if we can find a clever way to find the first left and right values lesser than the current value.
To think: If we have traversed the array partially say till min_i, how can the solution to min_i+1 be built?
We need the first value less than min_i to its left.
Inverting the statement : we need to ignore all values to the left of min_i that are greater than min_i. We stop when we find the first value smaller than min_i (i) . The troughs in the curve hence become useless once we have crossed it. In histogram , (2 4 3) => if 3 is min_i, 4 being larger is not of interest.
Corrollary: in a range (i,j). j being the min value we are considering.. all values between j and its left value i are useless. Even for further calculations.
Any histogram on the right with a min value larger than j, will be binded at j. The values of interest on the left form a monotonically increasing sequence with j being the largest value. (Values of interest here being possible values that may be of interest for the later array)
Since, we are travelling from left to right, for each min value/ current value - we do not know whether the right side of the array will have an element smaller than it.
So we have to keep it in memory until we get to know this value is useless. (since a smaller value is found)
All this leads to a usage of our very own stack structure.
We keep on stack until we don't know its useless.
We remove from stack once we know the thing is crap.
So for each min value to find its left smaller value, we do the following:-
pop the elements larger to it (useless values)
The first element smaller than the value is the left extreme. The i to our min.
We can do the same thing from the right side of the array and we will get j to our min.
It's quite hard to explain this, but if this is making sense then I'd suggest read the complete article here since it has more insights and details.
I don't understand the other entries, but I think I know how to do it in O(n) as follows.
A) for each index find the largest rectangle inside the histogram ending at that index where the index column touches the top of the rectangle and remember where the rectangle starts. This can be done in O(n) using a stack based algorithm.
B) Similarly for each index find the largest rectangle starting at that index where the index column touches the top of the rectangle and remember where the rectangle ends. Also O(n) using the same method as (A) but scanning the histogram backwards.
C) For each index combine the results of (A) and (B) to determine the largest rectangle where the column at that index touches the top of the rectangle. O(n) like (A).
D) Since the largest rectangle must be touched by some column of the histogram the largest rectangle is the largest rectangle found in step (C).
The hard part is implementing (A) and (B), which I think is what JF Sebastian may have solved rather than the general problem stated.
I coded this one and felt little better in the sense:
import java.util.Stack;
class StackItem{
public int sup;
public int height;
public int sub;
public StackItem(int a, int b, int c){
sup = a;
height = b;
sub =c;
}
public int getArea(){
return (sup - sub)* height;
}
#Override
public String toString(){
return " from:"+sup+
" to:"+sub+
" height:"+height+
" Area ="+getArea();
}
}
public class MaxRectangleInHistogram {
Stack<StackItem> S;
StackItem curr;
StackItem maxRectangle;
public StackItem getMaxRectangleInHistogram(int A[], int n){
int i = 0;
S = new Stack();
S.push(new StackItem(0,0,-1));
maxRectangle = new StackItem(0,0,-1);
while(i<n){
curr = new StackItem(i,A[i],i);
if(curr.height > S.peek().height){
S.push(curr);
}else if(curr.height == S.peek().height){
S.peek().sup = i+1;
}else if(curr.height < S.peek().height){
while((S.size()>1) && (curr.height<=S.peek().height)){
curr.sub = S.peek().sub;
S.peek().sup = i;
decideMaxRectangle(S.peek());
S.pop();
}
S.push(curr);
}
i++;
}
while(S.size()>1){
S.peek().sup = i;
decideMaxRectangle(S.peek());
S.pop();
}
return maxRectangle;
}
private void decideMaxRectangle(StackItem s){
if(s.getArea() > maxRectangle.getArea() )
maxRectangle = s;
}
}
Just Note:
Time Complexity: T(n) < O(2n) ~ O(n)
Space Complexity S(n) < O(n)
I would like to thank #templatetypedef for his/her extremely detailed and intuitive answer. The Java code below is based on his suggestion to use Cartesian Trees and solves the problem in O(N) time and O(N) space. I suggest that you read #templatetypedef's answer above before reading the code below. The code is given in the format of the solution to the problem at leetcode: https://leetcode.com/problems/largest-rectangle-in-histogram/description/ and passes all 96 test cases.
class Solution {
private class Node {
int val;
Node left;
Node right;
int index;
}
public Node getCartesianTreeFromArray(int [] nums) {
Node root = null;
Stack<Node> s = new Stack<>();
for(int i = 0; i < nums.length; i++) {
int curr = nums[i];
Node lastJumpedOver = null;
while(!s.empty() && s.peek().val >= curr) {
lastJumpedOver = s.pop();
}
Node currNode = this.new Node();
currNode.val = curr;
currNode.index = i;
if(s.isEmpty()) {
root = currNode;
}
else {
s.peek().right = currNode;
}
currNode.left = lastJumpedOver;
s.push(currNode);
}
return root;
}
public int largestRectangleUnder(int low, int high, Node root, int [] nums) {
/* Base case: If the range is empty, the biggest rectangle we
* can fit is the empty rectangle.
*/
if(root == null) return 0;
if (low == high) {
if(0 <= low && low <= nums.length - 1) {
return nums[low];
}
return 0;
}
/* Assume the Cartesian tree nodes are annotated with their
* positions in the original array.
*/
int leftArea = -1 , rightArea= -1;
if(root.left != null) {
leftArea = largestRectangleUnder(low, root.index - 1 , root.left, nums);
}
if(root.right != null) {
rightArea = largestRectangleUnder(root.index + 1, high,root.right, nums);
}
return Math.max((high - low + 1) * root.val,
Math.max(leftArea, rightArea));
}
public int largestRectangleArea(int[] heights) {
if(heights == null || heights.length == 0 ) {
return 0;
}
if(heights.length == 1) {
return heights[0];
}
Node root = getCartesianTreeFromArray(heights);
return largestRectangleUnder(0, heights.length - 1, root, heights);
}
}
python-3
a=[3,4,7,4,6]
a.sort()
r=0
for i in range(len(a)):
if a[i]* (n-1) > r:
r = a[i]*(n-i)
print(r)
output:
16
I come across this question in one of interview. Was trying to solve this, resulting in observed following things -
Need to check consecutive left elements greater than current
element
Need to check consecutive right elements greater than
current element
Calculate area (number of left side max elements + number of right side max elements + 1) * current element
Check and replace existing maxArea if calculated area is greater than
maxArea
Following is the JS code implementing above pseudocode
function maxAreaCovered(arr) {
let maxArea = 0;
for (let index = 0; index < arr.length; index++) {
let l = index - 1;
let r = index + 1;
let maxEleCount = 0
while (l > -1) {
if (arr[l] >= arr[index]) {
maxEleCount++;
} else {
break;
}
l--;
}
while (r < arr.length) {
if (arr[r] >= arr[index]) {
maxEleCount++;
} else {
break;
}
r++;
}
let area = (maxEleCount + 1) * arr[index];
maxArea = Math.max(area, maxArea);
}
return maxArea
}
console.log(maxAreaCovered([6, 2, 5, 4, 5, 1, 6]));
You can use O(n) method which uses stack to calculate the maximum area under the histogram.
long long histogramArea(vector<int> &histo){
stack<int> s;
long long maxArea=0;
long long area= 0;
int i =0;
for (i = 0; i < histo.size();) {
if(s.empty() || histo[s.top()] <= histo[i]){
s.push(i++);
}
else{
int top = s.top(); s.pop();
area= histo[top]* (s.empty()?i:i-s.top()-1);
if(area >maxArea)
maxArea= area;
}
}
while(!s.empty()){
int top = s.top();s.pop();
area= histo[top]* (s.empty()?i:i-s.top()-1);
if(area >maxArea)
maxArea= area;
}
return maxArea;
}
For explanation you can read here http://www.geeksforgeeks.org/largest-rectangle-under-histogram/

Unique rows after column removal from matrix

Binary matrix of n rows and m columns is given. n and m can be up to 2000.
We need to tell, is it possible to remove a column, such that rows of the remaining matrix will be unique.
Example for n = 3 and m = 4:
1010
0001
1001
Answer is yes. We can remove the second column and the remaining rows (110, 001, 101) will be unique.
Example for n = 4 and m = 2:
00
01
10
11
Answer is no. Whatever column we will choose, rows 0 and 0 will stay.
I have O(m*m*n) brute-force algorithm. I check the uniqueness of rows by removing each and every column.
Do you know faster algorithm?
!EDIT : My solution is unfortunatelly only halfway to solve this, sorry.
Well I am sure, I can do it in O(m*n) time.
You can create tree in n*m time. Just going one row by one and updating this structure:
Node{
int accessed;
Node nextZero;
Node nextOne;
}
If you do create this tree, you only have to check the last row, if it has "zeros" and "ones" equal or bigger than two or not.
There is a visual example of what it looks like after processing two numbers.
You just go row by row, always start from root.
For example, when you start processing the second row, you start in root. The number in second row is "101". You take first number, it is "1", so you go into nextOne node. Then you get "0", so you get into nextZero. Then you get "1", which does not exist, so you create it.
After all, you are only interested in the "accessed" number in the most depth nodes, if they all have "accessed" equal to 1, there are all distinct, otherwise, they are not.
pseudocode
Node{
int accessed;
Node nextZero;
Node nextOne;
}
bool isDistinct(){
Node root = new Node();
Node next;
for (int i=0;i<arr.length;i++){
Node actual = root;
for (int j=0;j<arr[i].length;j++){
if (arr[i][j] == 0){
next = actual.nextZero;
if (next == null){
next = new Node();
acutal.nextZero = next;
}
} else {
next = actual.nextOne;
if (next == null){
next = new Node();
acutal.nextOne = next;
}
}
actual = next;
actual.accessed++;
if ((j == arr[i].length - 1) && (actual >= 2)){
return false;
}
}
}
return true;
}
Sorry, this is really only "halfaway" to do it, I did not read properly what exactly I am supposed to do. But with some thinking, maybe you can remove a node from tree and rebalanced it effectively...
Each row represent some number in base 10.
We can calculate all this numbers in O(n*m).
We will get an array a of length n.
We can create an array b where in position b[i] will be number of times we have number i in array a in O(n)
If we have b[i]>1 for some i, the answer is No
No we will try to remove columns one by, which will change numbers accordingly. For example if we remove kth column, we need to make array c which will have the same meaning as array b but without kth column. To do this we initialize c[i]=b[i] if i<2^k and if b[i]=1 for i>=2^k then we will update c[i-2^k]++. If we get c[i]>1 for some i the answer is No, and continue with next column. Otherwise the answer is Yes. This can be done in O(n*m).
Edit:
The complexity for whole solution is then O(n*m).
Because the numbers will be big, you can represent array b as a sparse array using dictionary, and for numbers you can use some library for big numbers. The whole solution should be faster then brute force.

Trouble with a stack based algorithm

I'm working on this programming assignment. It tests our understanding of stacks and their applications. I find it extremely difficult to come up with an algorithm that can work efficiently and accurately. Some of their test cases have 200,000+ "trees"! While my algorithm can work for simpler test cases with less than 10 trees, it failed in the accuracy and efficiency departments when the number of "trees" is exceedingly large (from 100+ onwards).
I would appreciate it very much, if you guys can kindly give me a hint or point me to the right direction. Thank you.
Task Statement
Monkeys like to swing from tree to tree. They can swing from one tree
to another directly as long as there is no tree in between that is
taller than or have the same height as either one of the two trees.
For example, if there are 5 trees with heights 19m, 17m, 20m, 20m and
20m lining up in that order, then the monkey will be able to swing
from one tree to the other as shown below:
1. from first tree to second tree
2. from first tree to third tree
3. from second tree to third tree
4. from third tree to fourth tree
5. from fourth tree to fifth tree
Tarzan, the king of jungle who is able to communicate with the
monkeys, wants to test the monkeys to see if they know how to count
the total number of pairs of trees that they can swing directly from
one to the other. But he himself is not very good in counting. So he
turns to you, the best Java programmer in the country, to write a
program for getting the correct count for the trees in different parts
of the jungle.
Input
The first line contains N, the number of trees in the path. The next
line contains N integers a1 a2 a3 ... aN, where ai represents the
height of the i-th tree in the path, 0 < ai ≤ 231 and 2 ≤ N ≤ 500,000.
Note that short symbol N is used above for convenience. In your
program, you are expected to give it a descriptive name.
Output
The total number of pairs of trees which the monkeys can swing
directly from one to the other with the given list of tree heights.
Sample Input 1
4
3 4 1 2
Sample Output 1
4
Sample Input 2
5
19 17 20 20 20
Sample Output 2
5
Sample Input 3
4 1
2 21 21 12
Sample Output 3
3
Here's my code. So this is a method that returns the number of pairs of trees a monkey can swing with. The parameter is an array of inputs.
My algorithm goes as follows:
we set the numPairs to be (array length - 1), since all trees can be swing from one to another.
now we find the extra numPairs (extra trees to swing with).
push the first input into the empty stack
we enter a for loop:
for the next input until the end of array:
case1:
if the top of the stack is smaller than the current input and the size of the stack is equal to 1, then we replace the top with the input.
case2:
if the top of the stack smaller than the current input and the size of the stack is bigger than 1, we pop the top, and enter a while loop to pop the previous elements which is smaller than the current top of the stack.
we then push the current input after we exit the while loop.
case3:
otherwise, if the above conditions are not satisfied, we simply push the current input into the stack.
we exit the for loop
return the numPairs
public int solve(int[] arr) {
int input, temp;
numPairs = arr.length-1;
for(int i=0; i<arr.length; i++)
{
input = arr[i];
if(stack.isEmpty())
stack.push(input);
else if(!stack.isEmpty())
{
if(input>stack.peek() && stack.size() == 1)
{
stack.pop();
stack.push(input);
}
else if(input>stack.peek() && stack.size() > 1)
{
temp = stack.pop();
while(!stack.isEmpty() && temp < stack.peek())
{
numPairs++;
temp = stack.pop();
}
stack.push(input);
//numPairs++;
}
else
stack.push(input);
}
}
return numPairs;
}
Here's my solution, it's an iterative one.
class Result {
// declare the member field
Stack<Integer> stack;
int numPairs = 0;
// declare the constructor
public Result()
{
stack = new Stack<Integer>();
}
/*
* solve : to compute the result, return the result
* Pre-condition : parameter must be of array of integer type
* Post-condition : return the number of tree pairs that can be swung with
*/
public int solve(int[] arr) {
// implementation
int input;
for(int i=0; i<arr.length; i++)
{
input = arr[i];
if(stack.isEmpty()) //if stack is empty, just push the input
stack.push(input);
else if(!stack.isEmpty())
{
//do a while loop to pop all possible top stack element until
//the top element is bigger than the input
//or the stack is empty
while(!stack.isEmpty() && input > stack.peek())
{
stack.pop();
numPairs++;
}
//if the stack is empty after exiting the while loop
//push the current element onto the stack
if(stack.isEmpty())
stack.push(input);
//this condition applies for two cases:
//1. the while loop is never entered because the input is smaller than the top element by default
//2. the while loop is exited and the input is pushed onto the non-empty stack with numPairs being incremented
else if(!stack.isEmpty() && input < stack.peek())
{
stack.push(input);
numPairs++;
}
//this is the last condition:
//the input is never pushed if the input is identical to the top element
//instead we increment the numPairs
else if(input == stack.peek())
numPairs++;
}
}
return numPairs;
}
}
If I understand the problem correctly, there are two kinds of trees accessible to each other:
Trees that are next to each (adjacent) other are always accessible to each other
Trees that are not adjacent are only accessible if all the trees in between are shorter than both of the trees.
One might come up with several types of solutions for this:
The brute force solution: compare every tree to every other tree checking the conditions above. Running time: O(n^2)
Find near accessible neighbors solution: look for near neighbors that are accessible. Running time: close to O(n). Here's how this would work:
Build an array of tree sizes in order that they are given. Then walk this array in order and for every tree at index i:
Going to the right from i
If tree at i+1 is taller then tree at i break out (no more accessible neighbors can be found)
Add 1 to the count of accessible trees if tree at i+1 is shorter than tree at i+2
Do the same for trees i+2, i+3.. etc. until you find a tree that is taller than tree at i.
This will get a count of non-adjacent accessible trees for every tree. Then just add N*2-2 to the count to account for all the adjacent trees, and you are done.

Is there an efficient data structure for row and column swapping?

I have a matrix of numbers and I'd like to be able to:
Swap rows
Swap columns
If I were to use an array of pointers to rows, then I can easily switch between rows in O(1) but swapping a column is O(N) where N is the amount of rows.
I have a distinct feeling there isn't a win-win data structure that gives O(1) for both operations, though I'm not sure how to prove it. Or am I wrong?
Without having thought this entirely through:
I think your idea with the pointers to rows is the right start. Then, to be able to "swap" the column I'd just have another array with the size of number of columns and store in each field the index of the current physical position of the column.
m =
[0] -> 1 2 3
[1] -> 4 5 6
[2] -> 7 8 9
c[] {0,1,2}
Now to exchange column 1 and 2, you would just change c to {0,2,1}
When you then want to read row 1 you'd do
for (i=0; i < colcount; i++) {
print m[1][c[i]];
}
Just a random though here (no experience of how well this really works, and it's a late night without coffee):
What I'm thinking is for the internals of the matrix to be a hashtable as opposed to an array.
Every cell within the array has three pieces of information:
The row in which the cell resides
The column in which the cell resides
The value of the cell
In my mind, this is readily represented by the tuple ((i, j), v), where (i, j) denotes the position of the cell (i-th row, j-th column), and v
The would be a somewhat normal representation of a matrix. But let's astract the ideas here. Rather than i denoting the row as a position (i.e. 0 before 1 before 2 before 3 etc.), let's just consider i to be some sort of canonical identifier for it's corresponding row. Let's do the same for j. (While in the most general case, i and j could then be unrestricted, let's assume a simple case where they will remain within the ranges [0..M] and [0..N] for an M x N matrix, but don't denote the actual coordinates of a cell).
Now, we need a way to keep track of the identifier for a row, and the current index associated with the row. This clearly requires a key/value data structure, but since the number of indices is fixed (matrices don't usually grow/shrink), and only deals with integral indices, we can implement this as a fixed, one-dimensional array. For a matrix of M rows, we can have (in C):
int RowMap[M];
For the m-th row, RowMap[m] gives the identifier of the row in the current matrix.
We'll use the same thing for columns:
int ColumnMap[N];
where ColumnMap[n] is the identifier of the n-th column.
Now to get back to the hashtable I mentioned at the beginning:
Since we have complete information (the size of the matrix), we should be able to generate a perfect hashing function (without collision). Here's one possibility (for modestly-sized arrays):
int Hash(int row, int column)
{
return row * N + column;
}
If this is the hash function for the hashtable, we should get zero collisions for most sizes of arrays. This allows us to read/write data from the hashtable in O(1) time.
The cool part is interfacing the index of each row/column with the identifiers in the hashtable:
// row and column are given in the usual way, in the range [0..M] and [0..N]
// These parameters are really just used as handles to the internal row and
// column indices
int MatrixLookup(int row, int column)
{
// Get the canonical identifiers of the row and column, and hash them.
int canonicalRow = RowMap[row];
int canonicalColumn = ColumnMap[column];
int hashCode = Hash(canonicalRow, canonicalColumn);
return HashTableLookup(hashCode);
}
Now, since the interface to the matrix only uses these handles, and not the internal identifiers, a swap operation of either rows or columns corresponds to a simple change in the RowMap or ColumnMap array:
// This function simply swaps the values at
// RowMap[row1] and RowMap[row2]
void MatrixSwapRow(int row1, int row2)
{
int canonicalRow1 = RowMap[row1];
int canonicalRow2 = RowMap[row2];
RowMap[row1] = canonicalRow2
RowMap[row2] = canonicalRow1;
}
// This function simply swaps the values at
// ColumnMap[row1] and ColumnMap[row2]
void MatrixSwapColumn(int column1, int column2)
{
int canonicalColumn1 = ColumnMap[column1];
int canonicalColumn2 = ColumnMap[column2];
ColumnMap[row1] = canonicalColumn2
ColumnMap[row2] = canonicalColumn1;
}
So that should be it - a matrix with O(1) access and mutation, as well as O(1) row swapping and O(1) column swapping. Of course, even an O(1) hash access will be slower than the O(1) of array-based access, and more memory will be used, but at least there is equality between rows/columns.
I tried to be as agnostic as possible when it comes to exactly how you implement your matrix, so I wrote some C. If you'd prefer another language, I can change it (it would be best if you understood), but I think it's pretty self descriptive, though I can't ensure it's correctedness as far as C goes, since I'm actually a C++ guys trying to act like a C guy right now (and did I mention I don't have coffee?). Personally, writing in a full OO language would do it the entrie design more justice, and also give the code some beauty, but like I said, this was a quickly whipped up implementation.

Algorithm to Render a Horizontal Binary-ish Tree in Text/ASCII form

It's a pretty normal binary tree, except for the fact that one of the nodes may be empty.
I'd like to find a way to output it in a horizontal way (that is, the root node is on the left and expands to the right).
I've had some experience expanding trees vertically (root node at the top, expanding downwards), but I'm not sure where to start, in this case.
Preferably, it would follow these couple of rules:
If a node has only one child, it can be skipped as redundant (an "end node", with no children, is always displayed)
All nodes of the same depth must be aligned vertically; all nodes must be to the right of all less-deep nodes and to the left of all deeper nodes.
Nodes have a string representation which includes their depth.
Each "end node" has its own unique line; that is, the number of lines is the number of end nodes in the tree, and when an end node is on a line, there may be nothing else on that line after that end node.
As a consequence of the last rule, the root node might be better off in either the top left or the bottom left corner; top left is preferred.
For example, this is a valid tree, with six end nodes (node is represented by a name, and its depth): EDIT: Please see bottom of question for an alternative, easier rendering
[a0]-----------[b3]------[c5]------[d8]
\ \ \----------[e9]
\ \----[f5]
\-[g1]--------[h4]------[i6]
\ \--------------------[j10]
\-[k3]
Which represents the vertical, explicit binary tree:
0 a
/ \
1 g *
/ \ \
2 * * *
/ \ \
3 k * b
/ / \
4 h * *
/ \ \ \
5 * * f c
/ \ / \
6 * i * *
/ / \
7 * * *
/ / \
8 * * d
/ /
9 * e
/
10 j
(branches folded for compactness; * representing redundant, one-child nodes; note that *'s are actual nodes, storing one child each, just with names omitted here for presentation sake)
(also, to clarify, I'd like to generate the first, horizontal tree; not this vertical tree)
I say language-agnostic because I'm just looking for an algorithm; I say ruby because I'm eventually going to have to implement it in ruby anyway.
Assume that each Node data structure stores only its id, a left node, and a right node.
A master Tree class keeps tracks of all nodes and has adequate algorithms to find:
A node's nth ancestor
A node's nth descendant
All end-node descendants of a node, and their count
The generation of a node
The lowest common ancestor of two given nodes
I already know:
The number of end nodes
Anyone have any ideas of where I could start? Should I go for the recursive approach? Iterative?
Some Psuedo-code would be pretty cool too, and much appreciated =)
progress
As per walkytalky's suggestion, I decided to see what it would look like to map each "relevant" or significant node to a grid, with the columns being the depth and the rows identifiable by their end nodes. Here is what happens (skipping column 7 because there are no significant nodes in depth 7):
depth: 0 1 2 3 4 5 6 8 9 10
a b c d
e
f
g h i
j
k
It should be easy enough to generate this grid, with either breadth-first or depth-first searches. Perhaps most trivially by simply keeping a 2D array and placing every significant node found into it, inserting a row for every "second child".
Now, knowing these facts:
The last node in a row must be an end node
Children are always to the right, and on the same row or lower, of their parent node.
All non-end nodes must have exactly two children
Therefore, all non-end nodes have children that are the first to the right of their column, the first child being on the same row, the second child being n rows below them, where n is the number of nodes on the right side of it.
We can see that, given any valid grid, there is one unambiguous way to "connect the dots", so to speak; there is one unambiguous tree being represented.
Now, the "connecting the dots" is no longer a binary-tree-structure question...it's simply a decoration question. We just need to build an algorithm to properly place the right -'s and \'s where they can go, perhaps following only simple grid/lexicographical rules, instead of binary-tree-structure rules.
Basically, this means that the problem of rendering a tree is now the much simpler problem of rendering a grid, with fancy decorations.
Can anyone suggest any way of formulating these rules? Or maybe a completely different method altogether?
edit
I have conceived of a much, much easier final rendering:
--d0----d1----d3----d4----d5----d6----d8----d9----d10-- => guide line (not rendered)
[a0 ]-------[b3 ]-------[c5 ]-------[d8 ]
| | \---------------[e9 ]
| \---------[f5 ]
\---[g1 ]-------[h4 ]-------[i6 ]
| \---------------------------[j10]
\---[k3 ]
--d0----d1----d3----d4----d5----d6----d8----d9----d10-- => guide line (not rendered)
It might be easier to try to create this one, instead of the one I had posted earlier. For one, it preserves a pretty grid shape, and you don't have to fickle with diagonal lines. The rows are all mapped along clearly visible column lines. Unfortunately, it is nowhere near as pretty as the first.
If there are N end nodes, there must be N-1 internal nodes with 2 children. (There can be any number of internal nodes with 1 child, which we will have to count to get the depths but otherwise ignore.) Generating the tree is thus equivalent to positioning these nodes on a grid, where:
the number of rows in the grid is N
I think the number of columns is between 1+floor(log2(N)) and 2*N-1, depending on how much overlap there is; this probably doesn't matter much for our purposes, though
each endpoint appears on a different row
all nodes at the same depth appear in the same column
all internal nodes appear on the same row as their rightmost descendant endpoint
So, let's see:
Walk the tree depth-first, right-to-left.
For each endpoint, record its depth and label.
For each 2-child internal, record its depth, label and the indices of both rightmost and leftmost child endpoints.
Sort the whole lot by depth -- this gives you the column ordering, with the number of distinct depths giving the actual number of columns. (All other ordering should come out automatically from the walk, I think, but that's not the case here because any branch can be any depth.)
Place all the nodes in the grid.
Mark empty cells to the right of each non-endpoint node as horizontal branches.
Mark empty cells down from each internal node to the row above its left child as vertical branches, and the cell at the level of the left child as a junction.
Print with appropriate ASCII decoration.
Update:
As you say, the positioning is enough to unambiguously determine the connections, but you still need to do some bottom-up work to get that right, so I'd probably still do the "mark" steps during the grid building.
I sort of thought the printing was trivial enough to gloss over, but:
Iterate down each column and determine the column width as size of fixed elements + max label length + floor(log10(depth) + 1). (Fixed elements might be [ and ]-, for example. We can substitute ]\n as the suffix for endpoints.)
For each row
for each column
if cell contains a node or endpoint
print fixed prefix
print label
print depth
print fill spaces (max label length - current label length)
print appropriate suffix
if node is an endpoint, skip to next row
if cell is empty, print fill spaces to width of column
if cell contains a vertical, print some chosen prefix number of spaces, a bar, and fill with spaces
if cell contains a junction, print some chosen prefix number of spaces, a backslash, and fill with hyphens
if cell contains a horizontal, print full column width of hyphens
Converting this to print diagonals might be easiest if you generate the straight version first and then do some substitutions in the character array -- otherwise you can get cases where you're rendering a long vertical branch in a different column than the one in which it originated.
At some point I may try to put this into code, but it probably won't be today -- stuff to do!
Looks like an interesting problem; I'd be happy to give it a try, if I had more time.
I'd probably go with the following approach :
Start rendering "right" (or in your case, "top") nodes, until I reach the end. (i.e.: render a, b, c, and d)
Go back to the last node with a child (i.e.: c), and do the same thing recursively
You would have to keep a global variable indicating on wich row you are printing. Each recursive call increases this variable.
edit: ok, couldn't resist trying to write some untested pseudo-code, hope it works:
function print_tree(Node n) {
print "\n" // begin on a fresh new line
childs = new Array();
do {
if (n.hasLeftChild) {
childs.push(n.leftChild)
}
print "---" + n.id //this needs a lot of tweaking, but you get the idea
} while(n = n.rightChild)
childs.reverse()
foreach(child in childs) {
print_tree(child);
}
}
If you start with a label width for each level (not including [] characters), equal to the largest label for that width (in this example the widths are mostly 2 except j10 which is 3, and levels 2 and 7 which are 0).
Have each level with non-zero max label width equally spaced with one - character between each level, so you can calculate initial level y locations.
Give each node it's line number.
Then adjust the level locations based on the maximum number of lines between children for a level.
Added 2 to level 1 for a0 to g1
Added 1 to level 2 for g1 to k3
Added 1 to level 4 for b3 to [ ]
Use \ and ` characters for diagonals.
[a0]---------[b3]-------[c5]------[d8]
\ \ `----------[e9]
\ `-----[f5]
`[g1]--------[h4]------[i6]
\ `--------------------[j10]
`[k3]
Below is fully functional C# code that does exactly what you want. How it does it:
The tree is represented as objects from classes that inherit from Node
First compute the number of leaves and create an array of that much lines
Then for each level:
find out on what lines are we going to write
for those lines, compute the maximum of what is already on those lines
write the all the nodes to column max(number from previous step, end of previous level)+1; prepend with - to get to that column
write diagonal lines from all binary nodes up to the line of their right child (in my program first child is left, second is right, you have it the other way around)
advance one level
The algorithm makes sure that each level starts only after previous ends. That is probably good choice for short names, but for longer names, this probably shouldn't be enforced.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace SO_ASCII_tree
{
class Program
{
static void Main()
{
Node root = …;
StringBuilder[] lines = Enumerable.Range(0, root.Leaves).Select(i => new StringBuilder()).ToArray();
Node[] currentLevel = new Node[] { root };
int level = 0;
int min = 0;
int max = 0;
while (currentLevel.Any())
{
NamedNode[] namedNodes = currentLevel.OfType<NamedNode>().ToArray();
if (namedNodes.Any())
{
min = namedNodes.Select(node => lines[node.Line].Length).Max();
min = Math.Max(min, max);
if (min != 0)
min++;
foreach (NamedNode namedNode in namedNodes)
WriteAtPosition(lines[namedNode.Line], namedNode.Write(level), min, '-');
max = namedNodes.Select(node => lines[node.Line].Length).Max();
// change to max = min + 1; for long names
}
foreach (Node node in currentLevel)
node.SetChildLines();
Binary[] binaries = namedNodes.OfType<Binary>().ToArray();
foreach (Binary binary in binaries)
GoDown(lines, binary.Line, binary.Right.Line);
currentLevel = currentLevel.SelectMany(node => node.Children).ToArray();
level++;
}
foreach (StringBuilder line in lines)
Console.WriteLine(line.ToString());
}
static void WriteAtPosition(StringBuilder line, string message, int position, char prepend = ' ')
{
if (line.Length > position)
throw new ArgumentException();
line.Append(prepend, position - line.Length);
line.Append(message);
}
static void GoDown(StringBuilder[] lines, int from, int to)
{
int line = from + 1;
int position = lines[from].Length;
for (; line <= to; line++, position++)
WriteAtPosition(lines[line], "\\", position);
}
}
abstract class Node
{
public int Line
{ get; set; }
public abstract int Leaves
{ get; }
public abstract IEnumerable<Node> Children
{ get; }
public virtual void SetChildLines()
{ }
}
abstract class NamedNode : Node
{
public string Name
{ get; set; }
public string Write(int level)
{
return '[' + Name + level.ToString() + ']';
}
}
class Binary : NamedNode
{
public Node Left
{ get; set; }
public Node Right
{ get; set; }
int? leaves;
public override int Leaves
{
get
{
if (leaves == null)
leaves = Left.Leaves + Right.Leaves;
return leaves.Value;
}
}
public override IEnumerable<Node> Children
{
get
{
yield return Left;
yield return Right;
}
}
public override void SetChildLines()
{
Left.Line = Line;
Right.Line = Line + Left.Leaves;
}
}
class Unary : Node
{
public Node Child
{ get; set; }
int? leaves;
public override int Leaves
{
get
{
if (leaves == null)
leaves = Child.Leaves;
return leaves.Value;
}
}
public override IEnumerable<Node> Children
{
get
{
yield return Child;
}
}
public override void SetChildLines()
{
Child.Line = Line;
}
}
class Leaf : NamedNode
{
public override int Leaves
{
get
{
return 1;
}
}
public override IEnumerable<Node> Children
{
get
{
yield break;
}
}
}
}
EDIT: Your example tree gets rendered exactly the same as your rendering:
[a0]-----------[b3]------[c5]------[d8]
\ \ \----------[e9]
\ \----[f5]
\-[g1]--------[h4]------[i6]
\ \--------------------[j10]
\-[k3]
You'd probably need to perform a depth first search if not a search of the entire tree in order to properly size it for output along 2 dimensions.

Resources