How do I get one row for every Min or Max on every column of a dataframe in Pyspark efficiently? - performance

I'm trying to reduce a big dataset to rows having minimum and maximum values for each column. In other words, I would like, for every column of this dataset to get one row that has the minimum value on that column, as well as another that has the maximum value on the same column. I should mention that I do not know in advance what columns this dataset will have. Here's an example:
+----+----+----+ +----+----+----+
|Col1|Col2|Col3| ==> |Col1|Col2|Col3|
+----+----+----+ +----+----+----+
| F | 99 | 17 | | A | 34 | 25 |
| M | 32 | 20 | | Z | 51 | 49 |
| D | 2 | 84 | | D | 2 | 84 |
| H | 67 | 90 | | F | 99 | 17 |
| P | 54 | 75 | | C | 18 | 9 |
| C | 18 | 9 | | H | 67 | 90 |
| Z | 51 | 49 | +----+----+----+
| A | 34 | 25 |
+----+----+----+
The first row is selected because A is the smallest value on Col1. The second because Z is the largest value on Col1. The third because 2 is the smallest on Col2, and so on. The code below seems to do the right thing (correct me if I'm wrong), but performance is sloooow. I start with getting a dataframe from a random .csv file:
input_file = (sqlContext.read
.format("csv")
.options(header="true", inferSchema="true", delimiter=";", charset="UTF-8")
.load("/FileStore/tables/random.csv")
)
Then I create two other dataframes that each have one row with the min and respectively, max values of each column:
from pyspark.sql.functions import col, min, max
min_values = input_file.select(
*[min(col(col_name)).name(col_name) for col_name in input_file.columns]
)
max_values = input_file.select(
*[max(col(col_name)).name(col_name) for col_name in input_file.columns]
)
Finally, I repeatedly join the original input file to these two dataframes holding minimum and maximum values, using every column in turn, and do a union between all the results.
min_max_rows = (
input_file
.join(min_values, input_file[input_file.columns[0]] == min_values[input_file.columns[0]])
.select(input_file["*"]).limit(1)
.union(
input_file
.join(max_values, input_file[input_file.columns[0]] == max_values[input_file.columns[0]])
.select(input_file["*"]).limit(1)
)
)
for c in input_file.columns[1:]:
min_max_rows = min_max_rows.union(
input_file
.join(min_values, input_file[c] == min_values[c])
.select(input_file["*"]).limit(1)
.union(
input_file
.join(max_values, input_file[c] == max_values[c])
.select(input_file["*"]).limit(1)
)
)
min_max_rows.dropDuplicates()
For my test dataset of 500k rows, 40 columns, doing all this takes about 7-8 minutes on a standard Databricks cluster. I'm supposed to sift through more than 20 times this amount of data regularly. Is there any way to optimize this code? I'm quite afraid I've taken the naive approach to it, since I'm quite new to Spark.
Thanks!

Does not seem to be a popular question, but interesting (for me). And a lot of work for 15 pts. In fact I got it wrong first time round.
Here is a scaleable solution that you can partition accordingly to increase throughput.
Hard to explain, manipulation of the data and transposing the data is
the key issue here - and some lateral thinking.
I did not focus on variable columns all sorts of data types. That needs to be solved by yourself, can be done but some if else logic required to check if alpha or double or numeric. Mixing data types and applying to stuff gets problematic, but can be solved. I gave a notion of num_string, but did not complete that.
I have focused on the scalability issue and approach, with less procedural logic. Smaller sample size with all numbers, but correct as now as far as I can see. General principle is there.
Try it. Success.
Code:
from pyspark.sql.functions import *
from pyspark.sql.types import *
def reshape(df, by):
cols, dtypes = zip(*((c, t) for (c, t) in df.dtypes if c not in by))
kvs = explode(array([
struct(lit(c).alias("key"), col(c).alias("val")) for c in cols
])).alias("kvs")
return df.select(by + [kvs]).select(by + ["kvs.key", "kvs.val"])
df1 = spark.createDataFrame(
[(4, 15, 3), (200, 100, 25), (7, 16, 4)], ("c1", "c2", "c3"))
df1 = df1.withColumn("rowId", monotonically_increasing_id())
df1.cache
df1.show()
df2 = reshape(df1, ["rowId"])
df2.show()
# In case you have other types like characters in the other column - not focusing on that aspect
df3 = df2.withColumn("num_string", format_string("%09d", col("val")))
# Avoid column name issues.
df3 = df3.withColumn("key1", col("key"))
df3.show()
df3 = df3.groupby('key1').agg(min(col("val")).alias("min_val"), max(col("val")).alias("max_val"))
df3.show()
df4 = df3.join(df2, df3.key1 == df2.key)
new_column_condition = expr(
"""IF(val = min_val, -1, IF(val = max_val, 1, 0))"""
)
df4 = df4.withColumn("col_class", new_column_condition)
df4.show()
df5 = df4.filter( '(min_val = val or max_val = val) and col_class <> 0' )
df5.show()
df6 = df5.join(df1, df5.rowId == df1.rowId)
df6.show()
df6.select([c for c in df6.columns if c in ['c1','c2', 'c3']]).distinct().show()
Returns:
+---+---+---+
| c1| c2| c3|
+---+---+---+
| 4| 15| 3|
|200|100| 25|
+---+---+---+
Data wrangling the clue here.

Related

How to mark the min and the max values in an Open Office Calc column?

I have a table like this:
| A | B |
|---|---|
| a | 5 | <- max, should be red
| b | 1 | <- min, should be green
| c | 0 | <- zero, should not count
| d | 1 | <- min, should be green
| e | 3 |
| f | 5 | <- max, should be red
| g | 4 |
| h | 0 | <- zero, should not count
The objective is to get the maximum values formatted red and the minimum values green. The cells with the value 0 should not count (as minimum value).
I tried conditional formatting with following rules:
Condition 1
Formula is: MAX(E2:E40)
Cell Style: max
Condition 2
Formula is: MINIFS(E2:E40;E2:E40;">0")
Cell Style: max
But the result is, that all cells with value > 0 get marked red.
How to mark the greatest and the lowest values in a column and ignore the cells with a defined value?
The trick with conditional formatting is that the current cell is referenced by the first cell, not by a range of cells. That is, E2 refers to the current cell, applying to E3, E4 and so on throughout the conditionally formatted range.
References in formulas change for each cell unless they are fixed with $, so in the formula below, $E2 is used to fix the reference to column E (because the value is in column E even when we're formatting column D) but lets the reference to row 2 change for each row that needs to be formatted. In contrast, the range to check for min and max values should not change no matter what the current cell, so that is $E$2:$E$40.
Anyway, whether you followed that explanation or not, here are the two formulas.
$E2 = MAX($E$2:$E$40)
$E2 = MINIFS($E$2:$E$40;$E$2:$E$40;">0")

Solving a constrained system of linear equations

I have a system of equations of the form y=Ax+b where y, x and b are n×1 vectors and A is a n×n (symmetric) matrix.
So here is the wrinkle. Not all of x is unknown. Certain rows of x are specified and the corresponding rows of y are unknown. Below is an example
| 10 | | 5 -2 1 | | * | | -1 |
| * | = | -2 2 0 | | 1 | + | 1 |
| 1 | | 1 0 1 | | * | | 2 |
where * designates unknown quantities.
I have built a solver for problems such as the above in Fortran, but I wanted to know if there is a decent robust solver out-there as part of Lapack or MLK for these types of problems?
My solver is based on a sorting matrix called pivot = [1,3,2] which rearranges the x and y vectors according to known and unknown
| 10 | | 5 1 -2 | | * | | -1 |
| 1 | | 1 1 0 | | * | + | 2 |
| * | | -2 0 2 | | 1 | | 1 |
and the solving using a block matrix solution & LU decomposition
! solves a n×n system of equations where k values are known from the 'x' vector
function solve_linear_system(A,b,x_known,y_known,pivot,n,k) result(x)
use lu
integer(c_int),intent(in) :: n, k, pivot(n)
real(c_double),intent(in) :: A(n,n), b(n), x_known(k), y_known(n-k)
real(c_double) :: x(n), y(n), r(n-k), A1(n-k,n-k), A3(n-k,k), b1(n-k)
integer(c_int) :: i, j, u, code, d, indx(n-k)
u = n-k
!store known `x` and `y` values
x(pivot(u+1:n)) = x_known
y(pivot(1:u)) = y_known
!define block matrices
! |y_known| = | A1 A3 | | * | + |b1|
| | * | = | A3` A2 | | x_known | |b2|
A1 = A(pivot(1:u), pivot(1:u))
A3 = A(pivot(1:u), pivot(u+1:n))
b1 = b(pivot(1:u))
!define new rhs vector
r = y_known -matmul(A3, x_known)-b1
% solve `A1*x=r` with LU decomposition from NR book for 'x'
call ludcmp(A1,u,indx,d,code)
call lubksb(A1,u,indx,r)
% store unknown 'x' values (stored into 'r' by 'lubksb')
x(pivot(1:u)) = r
end function
For the example above the solution is
| 10.0 | | 3.5 |
y = | -4.0 | x = | 1.0 |
| 1.0 | | -4.5 |
PS. The linear systems have typically n<=20 equations.
The problem with only unknowns is a linear least squares problem.
Your a-priori knowledge can be introduced with equality-constraints (fixing some variables), transforming it to an linear equality-constrained least squares problem.
There is indeed an algorithm within lapack solving the latter, called xGGLSE.
Here is some overview.
(It also seems, you need to multiply b with -1 in your case to be compatible with the definition)
Edit: On further inspection, i missed the unknowns within y. Ouch. This is bad.
First, i would rewrite your system into a AX=b form where A and b are known. In your example, and provided that i didn't make any mistakes, it would give :
5 0 1 x1 13
A = 2 1 0 X = x2 and b = 3
1 0 1 x3 -1
Then you can use plenty of methods coming from various libraries, like LAPACK or BLAS depending on the properties of your matrix A (positive-definite ,...). As a starting point, i would suggest a simple method with a direct inversion of the matrix A, especially if your matrix is small. There are also many iterative approach ( Jacobi, Gradients, Gauss seidel ...) that you can use for bigger cases.
Edit : An idea to solve it in 2 steps
First step : You can rewrite your system in 2 subsystem that have X and Y as unknows but dimension are equals to the numbers of unknowns in each vector.
The first subsystem in X will be AX = b which can be solved by direct or iterative methods.
Second step : The second system in Y can be directly resolved once you know X cause Y will be expressed in the form Y = A'X + b'
I think this approach is more general.

Minimum cost within limited time for a timetable?

I have a timetable like this:
+-----------+-------------+------------+------------+------------+------------+-------+----+
| transport | trainnumber | departcity | arrivecity | departtime | arrivetime | price | id |
+-----------+-------------+------------+------------+------------+------------+-------+----+
| Q | Q00 | BJ | TJ | 13:00:00 | 15:00:00 | 10 | 1 |
| Q | Q01 | BJ | TJ | 18:00:00 | 20:00:00 | 10 | 2 |
| Q | Q02 | TJ | BJ | 16:00:00 | 18:00:00 | 10 | 3 |
| Q | Q03 | TJ | BJ | 21:00:00 | 23:00:00 | 10 | 4 |
| Q | Q04 | HA | DL | 06:00:00 | 11:00:00 | 50 | 5 |
| Q | Q05 | HA | DL | 14:00:00 | 19:00:00 | 50 | 6 |
| Q | Q06 | HA | DL | 18:00:00 | 23:00:00 | 50 | 7 |
| Q | Q07 | DL | HA | 07:00:00 | 12:00:00 | 50 | 8 |
| Q | Q08 | DL | HA | 15:00:00 | 20:00:00 | 50 | 9 |
| ... | ... | ... | ... | ... | ... | ... | ...|
+-----------+-------------+------------+------------+------------+------------+-------+----+
In this table, there 13 cities and 116 routes altogether and the smallest unit of time is half an hour.
There are difference transports, which doesn't matter. As you can see, there can be multiple edges with same departcity and arrivecity but difference time and difference price. The time is constant everyday.
Now, here arises a problem.
A user wonder how he can travel from city A to city B (A and B may be one city), with passing zero or some cities C, D...(whether they should be in order depends on whether the user wants it to be, that is, there are two problems), within X hours and also least costs under above conditions.
Before this problem, I have solved another simpler problem.
A user wonder how he can travel from city A to city B (A and B may be one city), with passing zero or some cities C, D...(whether they should be in order depends), with least costs under above conditions.
Here is how I solve it (just take not in order as an example):
Sort the must-pass cities:C1, C2, C3...Cn. Let C0 = A, C(n+1) = B, minCost.cost = INFINITE;
i = 0, j = 1, W = {};
Find a least cost way S from Ci to Cj using Dijkstra Algorithm with price as the weight of edges. W=W∪S;
i = i + 1, j = j + 1;
If j <= n + 1, goto 3;
if W.cost < minCost.cost, minCost = W;
If next permutation for C1...Cn exists, rearrange list C1...Cn in order of the next permutation for C1...Cn and goto 2;
Return minCost;
However, I cannot come up with a efficient solution to the first problem, Please help me, thanks.
I'll be appreciated if anyone can solve another problem:
A user wonder how he can travel from city A to city B (A and B may be one city), with passing zero or some cities C, D...(whether they should be in order depends), within least time under above conditions.
It's quite a big problem, so I will just sketch a solution.
First, remodel your graph as follows. Instead of each vertex representing a city, let a vertex represent a tuple of (city, time). This is feasible as there are only 13 cities and only (time_limit - current_time) * 2 possible time slots as the smallest unit of time is half an hour. Now connect vertices according to the given timetable with prices as their weights as before. Don't forget that the user can stay at any city for any amount of time for free. All nodes with city A are start nodes, all nodes with city B are target nodes. Take the minimum value of all (B, time) vertices to get the solution with least cost. If there are multiple, take the one with the smallest time.
Now on towards forcing the user to pass through certain cities in order. If there are n cities to pass through (plus start and target city), you need n+2 copies of the same graph which act as different levels. The level represents how many cities of your list you have already passed. So you start in level 0 on vertex A. Once you get to C1 in level 0 you move to the vertex C1 in level 1 of the graph (connect the vertices by 0-weight edges). This means that when you are in level k, you have already passed cities C1 to Ck and you can get to the next level only by going through C(k+1). The vertices of city B in the last level are your target nodes.
Note: I said copies of the same graph, but that is not exactly true. You can't allow the user to reach C(k+2), ..., B in level k, that would violate the required order.
To enforce passing cities in any order, a different scheme of connecting the levels (and modifying them during runtime) is required. I'll leave this to you.

Intersection ranges (algorithm)

As example I have next arrays:
[100,192]
[235,280]
[129,267]
As intersect arrays we get:
[129,192]
[235,267]
Simple exercise for people but problem for creating algorithm that find second multidim array…
Any language, any ideas..
If somebody do not understand me:
I'll assume you wish to output any range that has 2 or more overlapping intervals.
So the output for [1,5], [2,4], [3,3] will be (only) [2,4].
The basic idea here is to use a sweep-line algorithm.
Split the ranges into start- and end-points.
Sort the points.
Now iterate through the points with a counter variable initialized to 0.
If you get a start-point:
Increase the counter.
If the counter's value is now 2, record that point as the start-point for a range in the output.
If you get an end-point
Decrease the counter.
If the counter's value is 1, record that point as the end-point for a range in the output.
Note:
If a start-point and an end-point have the same value, you'll need to process the end-point first if the counter is 1 and the start-point first if the counter is 2 or greater, otherwise you'll end up with a 0-size range or a 0-size gap between two ranges in the output.
This should be fairly simple to do by having a set of the following structure:
Element
int startCount
int endCount
int value
Then you combine all points with the same value into one such element, setting the counts appropriately.
Running time:
O(n log n)
Example:
Input:
[100, 192]
[235, 280]
[129, 267]
(S for start, E for end)
Points | | 100 | 129 | 192 | 235 | 267 | 280 |
Type | | Start | Start | End | Start | End | End |
Count | 0 | 1 | 2 | 1 | 2 | 1 | 0 |
Output | | | [129, | 192] | [235, | 267] | |
This is python implementation of intersection algorithm. Its computcomputational complexity O(n^2).
a = [[100,192],[235,280],[129,267]]
def get_intersections(diapasons):
intersections = []
for d in diapasons:
for check in diapasons:
if d == check:
continue
if d[0] >= check[0] and d[0] <= check[1]:
right = d[1]
if check[1] < d[1]:
right = check[1]
intersections.append([d[0], right])
return intersections
print get_intersections(a)

Counting the ways to build a wall with two tile sizes [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
You are given a set of blocks to build a panel using 3”×1” and 4.5”×1" blocks.
For structural integrity, the spaces between the blocks must not line up in adjacent rows.
There are 2 ways in which to build a 7.5”×1” panel, 2 ways to build a 7.5”×2” panel, 4 ways to build a 12”×3” panel, and 7958 ways to build a 27”×5” panel. How many different ways are there to build a 48”×10” panel?
This is what I understand so far:
with the blocks 3 x 1 and 4.5 x 1
I've used combination formula to find all possible combinations that the 2 blocks can be arranged in a panel of this size
C = choose --> C(n, k) = n!/r!(n-r)! combination of group n at r at a time
Panel: 7.5 x 1 = 2 ways -->
1 (3 x 1 block) and 1 (4.5 x 1 block) --> Only 2 blocks are used--> 2 C 1 = 2 ways
Panel: 7.5 x 2 = 2 ways
I used combination here as well
1(3 x 1 block) and 1 (4.5 x 1 block) --> 2 C 1 = 2 ways
Panel: 12 x 3 panel = 2 ways -->
2(4.5 x 1 block) and 1(3 x 1 block) --> 3 C 1 = 3 ways
0(4.5 x 1 block) and 4(3 x 1 block) --> 4 C 0 = 1 way
3 ways + 1 way = 4 ways
(This is where I get confused)
Panel 27 x 5 panel = 7958 ways
6(4.5 x 1 block) and 0(3 x 1) --> 6 C 0 = 1 way
4(4.5 x 1 block) and 3(3 x 1 block) --> 7 C 3 = 35 ways
2(4.5 x 1 block) and 6(3 x 1 block) --> 8 C 2 = 28 ways
0(4.5 x 1 block) and 9(3 x 1 block) --> 9 C 0 = 1 way
1 way + 35 ways + 28 ways + 1 way = 65 ways
As you can see here the number of ways is nowhere near 7958. What am I doing wrong here?
Also how would I find how many ways there are to construct a 48 x 10 panel?
Because it's a little difficult to do it by hand especially when trying to find 7958 ways.
How would write a program to calculate an answer for the number of ways for a 7958 panel?
Would it be easier to construct a program to calculate the result? Any help would be greatly appreciated.
I don't think the "choose" function is directly applicable, given your "the spaces between the blocks must not line up in adjacent rows" requirement. I also think this is where your analysis starts breaking down:
Panel: 12 x 3 panel = 2 ways -->
2(4.5 x 1 block) and 1(3 x 1 block)
--> 3 C 1 = 3 ways
0(4.5 x 1 block) and 4(3 x 1 block)
--> 4 C 0 = 1 way
3 ways + 1 way = 4 ways
...let's build some panels (1 | = 1 row, 2 -'s = 1 column):
+---------------------------+
| | | | |
| | | | |
| | | | |
+---------------------------+
+---------------------------+
| | | |
| | | |
| | | |
+---------------------------+
+---------------------------+
| | | |
| | | |
| | | |
+---------------------------+
+---------------------------+
| | | |
| | | |
| | | |
+---------------------------+
Here we see that there are 4 different basic row types, but none of these are valid panels (they all violate the "blocks must not line up" rule). But we can use these row types to create several panels:
+---------------------------+
| | | | |
| | | | |
| | | |
+---------------------------+
+---------------------------+
| | | | |
| | | | |
| | | |
+---------------------------+
+---------------------------+
| | | | |
| | | | |
| | | |
+---------------------------+
+---------------------------+
| | | |
| | | |
| | | | |
+---------------------------+
+---------------------------+
| | | |
| | | |
| | | |
+---------------------------+
+---------------------------+
| | | |
| | | |
| | | |
+---------------------------+
...
But again, none of these are valid. The valid 12x3 panels are:
+---------------------------+
| | | | |
| | | |
| | | | |
+---------------------------+
+---------------------------+
| | | |
| | | | |
| | | |
+---------------------------+
+---------------------------+
| | | |
| | | |
| | | |
+---------------------------+
+---------------------------+
| | | |
| | | |
| | | |
+---------------------------+
So there are in fact 4 of them, but in this case it's just a coincidence that it matches up with what you got using the "choose" function. In terms of total panel configurations, there are quite more than 4.
Find all ways to form a single row of the given width. I call this a "row type". Example 12x3: There are 4 row types of width 12: (3 3 3 3), (4.5 4.5 3), (4.5 3 4.5), (3 4.5 4.5). I would represent these as a list of the gaps. Example: (3 6 9), (4.5 9), (4.5 7.5), (3 7.5).
For each of these row types, find which other row types could fit on top of it.
Example:
a. On (3 6 9) fits (4.5 7.5).
b. On (4.5 9) fits (3 7.5).
c: On (4.5 7.5) fits (3 6 9).
d: On (3 7.5) fits (4.5 9).
Enumerate the ways to build stacks of the given height from these rules. Dynamic programming is applicable to this, as at each level, you only need the last row type and the number of ways to get there.
Edit: I just tried this out on my coffee break, and it works. The solution for 48x10 has 15 decimal digits, by the way.
Edit: Here is more detail of the dynamic programming part:
Your rules from step 2 translate to an array of possible neighbours. Each element of the array corresponds to a row type, and holds that row type's possible neighbouring row types' indices.
0: (2)
1: (3)
2: (0)
3: (1)
In the case of 12×3, each row type has only a single possible neighbouring row type, but in general, it can be more.
The dynamic programming starts with a single row, where each row type has exactly one way of appearing:
1 1 1 1
Then, the next row is formed by adding for each row type the number of ways that possible neighbours could have formed on the previous row. In the case of a width of 12, the result is 1 1 1 1 again. At the end, just sum up the last row.
Complexity:
Finding the row types corresponds to enumerating the leaves of a tree; there are about (/ width 3) levels in this tree, so this takes a time of O(2w/3) = O(2w).
Checking whether two row types fit takes time proportional to their length, O(w/3). Building the cross table is proportional to the square of the number of row types. This makes step 2 O(w/3·22w/3) = O(2w).
The dynamic programming takes height times the number of row types times the average number of neighbours (which I estimate to be logarithmic to the number of row types), O(h·2w/3·w/3) = O(2w).
As you see, this is all dominated by the number of row types, which grow exponentially with the width. Fortunately, the constant factors are rather low, so that 48×10 can be solved in a few seconds.
This looks like the type of problem you could solve recursively. Here's a brief outline of an algorithm you could use, with a recursive method that accepts the previous layer and the number of remaining layers as arguments:
Start with the initial number of layers (e.g. 27x5 starts with remainingLayers = 5) and an empty previous layer
Test all possible layouts of the current layer
Try adding a 3x1 in the next available slot in the layer we are building. Check that (a) it doesn't go past the target width (e.g. doesn't go past 27 width in a 27x5) and (b) it doesn't violate the spacing condition given the previous layer
Keep trying to add 3x1s to the current layer until we have built a valid layer that is exactly (e.g.) 27 units wide
If we cannot use a 3x1 in the current slot, remove it and replace with a 4.5x1
Once we have a valid layer, decrement remainingLayers and pass it back into our recursive algorithm along with the layer we have just constructed
Once we reach remainingLayers = 0, we have constructed a valid panel, so increment our counter
The idea is that we build all possible combinations of valid layers. Once we have (in the 27x5 example) 5 valid layers on top of each other, we have constructed a complete valid panel. So the algorithm should find (and thus count) every possible valid panel exactly once.
This is a '2d bin packing' problem. Someone with decent mathematical knowledge will be able to help or you could try a book on computational algorithms. It is known as a "combinatorial NP-hard problem". I don't know what that means but the "hard" part grabs my attention :)
I have had a look at steel cutting prgrams and they mostly use a best guess. In this case though 2 x 4.5" stacked vertically can accommodate 3 x 3" inch stacked horizontally. You could possibly get away with no waste. Gets rather tricky when you have to figure out the best solution --- the one with minimal waste.
Here's a solution in Java, some of the array length checking etc is a little messy but I'm sure you can refine it pretty easily.
In any case, I hope this helps demonstrate how the algorithm works :-)
import java.util.Arrays;
public class Puzzle
{
// Initial solve call
public static int solve(int width, int height)
{
// Double the widths so we can use integers (6x1 and 9x1)
int[] prev = {-1}; // Make sure we don't get any collisions on the first layer
return solve(prev, new int[0], width * 2, height);
}
// Build the current layer recursively given the previous layer and the current layer
private static int solve(int[] prev, int[] current, int width, int remaining)
{
// Check whether we have a valid frame
if(remaining == 0)
return 1;
if(current.length > 0)
{
// Check for overflows
if(current[current.length - 1] > width)
return 0;
// Check for aligned gaps
for(int i = 0; i < prev.length; i++)
if(prev[i] < width)
if(current[current.length - 1] == prev[i])
return 0;
// If we have a complete valid layer
if(current[current.length - 1] == width)
return solve(current, new int[0], width, remaining - 1);
}
// Try adding a 6x1
int total = 0;
int[] newCurrent = Arrays.copyOf(current, current.length + 1);
if(current.length > 0)
newCurrent[newCurrent.length - 1] = current[current.length - 1] + 6;
else
newCurrent[0] = 6;
total += solve(prev, newCurrent, width, remaining);
// Try adding a 9x1
if(current.length > 0)
newCurrent[newCurrent.length - 1] = current[current.length - 1] + 9;
else
newCurrent[0] = 9;
total += solve(prev, newCurrent, width, remaining);
return total;
}
// Main method
public static void main(String[] args)
{
// e.g. 27x5, outputs 7958
System.out.println(Puzzle.solve(27, 5));
}
}

Resources