Given a binary matrix (values of 0 or 1), adjacent entries of 1 denote “hills”. Also, given some number k, find the minimum number of 0's you need to “flip” to 1 in order to form a hill of at least size k.
Edit: For clarification, adjacent means left-right-up-down neighborhoods. Diagonals do not count as adjacent. For example,
[0 1
0 1]
is one hill of size 2,
[0 1
1 0]
defines 2 hills of size 1,
[0 1
1 1]
defines 1 hill of size 3, and
[1 1
1 1]
defines 1 hill of size 4.
Also for clarification, size is defined by the area formed by the adjacent blob of 1's.
My initial solution has to do with transforming each existing hill into nodes of a graph, and the cost to be the minimal path to each other node. Then, performing a DFS (or similar algorithm) to find the minimum cost.
This fails in cases where choosing some path reduces the cost for another edge, and solutions to combat this (that I can think of) are too close to a brute force solution.
Your problem is closely related to the rectilinear Steiner tree problem.
A Steiner tree connects a set of points together using line segments, minimising the total length of the line segments. The line segments can meet in arbitrary places, not necessarily at points in the set (so it is not the same thing as a minimum spanning tree). For example, given three points at the corners of an equilateral triangle, the Euclidean Steiner tree connects them by meeting in the middle:
A rectilinear Steiner tree is the same, except you minimise the total Manhattan distance instead of the total Euclidean distance.
In your problem, instead of joining your hills with line segments whose length is measured by Euclidean distance, you are joining your hills by adding pixels. The total number of 0s you need to flip to join two cells in your array is equal to the Manhattan distance between those two cells, minus 1.
The rectilinear Steiner tree problem is known to be NP-complete, even when restricted to points with integer coordinates. Your problem is a generalisation, except for two differences:
The "minus 1" part when measuring the Manhattan distance. I doubt that this subtle difference is enough to bring the problem into a lower complexity class, though I don't have a proof for you.
The coordinates of your integer points are bounded by the size of the matrix (as pointed out by Albert Hendriks in the comments). This does matter — it means that pseudo-polynomial time for the rectilinear Steiner tree problem would be polynomial time for your problem.
This means that your problem may or may not be NP-hard, depending on whether the rectilinear Steiner tree problem is weakly NP-complete or strongly NP-complete. I wasn't able to find a definitive answer to this in the literature, and there isn't much information about the problem other than in academic literature. It does at least appear that there isn't a known pseudo-polynomial time algorithm, as far as I can tell.
Given that, your most likely options are some kind of backtracking search for an exact solution, or applying a heuristic to get a "good enough" solution. One possible heuristic as described by Wikipedia is to compute a rectilinear minimum spanning tree and then try to improve on the RMST using an iterative improvement method. The RMST itself gives a solution within a constant factor of 1.5 of the true optimum.
A hill is composed by four sequences of 1's:
The right sequence is composed of r 'bits', the up sequence has u bits, and so on.
A hill of size k is k= 1 + r + l + u + d (1 central + sequences), where each value is 0 <= v < k.
The problem is combinatorial. For each cell all possible combinations of {r,l,u,d} that satisfy the former relation should be tested.
When testing a combination in a cell, you must count the number of the existing 1 in each value of the combination, they don't "flip". This will also skip early some other combinations.
Related
I came across this question in preparation for the final exam, and I could not find the recursive formula although I saw similar questions.
I will thank you for any help!
the problem is:
Suppose we are given a set L of n line segments in the plane, where the endpoints
of each segment lie on the unit circle x
2 + y
2 = 1, and all 2n endpoints are
distinct. Describe and analyze an algorithm to compute the largest subset of L in
which every pair of segments intersects
The solution needs to be an algorithm in dynamic programming approach (based on recursive formula)
I am assuming the question ("the largest subset of L...") is dealing with the subset size, and not that the subset cannot be extended. If the latter is true, the problem is trivial and the simple greedy algorithm works.
Now to your question. Following Matt Timmermans' hint (can you prove it?) this can be viewed as the longest common subsequence problem, except that we don't know what the 2 input strings are = where the splitting point between the 2 sequence occurences is.
Longest common subsequence problem can be solved in O(m*n) time and linear memory. By moving the splitting point along your 2n-length array you will create 2n instances of the LCS problem each of which can be solved in O(n^2) time, which yields the total time complexity of O(n^3).
Your problem is known as the maximum clique problem (with line segments corresponding to graph nodes, and line segments intersections corresponding to graph edges) of a circle graph and has been shown in 2010 to have a solution with O(n^2*log(n)) time complexity.
Please note that the maximum clique problem (the decision version) is NP-hard (NP-complete, to be exact) in the case of an arbitrary graph.
I have a set of N points (in particular this point are binary string) and for each of them I have a discrete metric (the Hamming distance) such that given two points, i and j, Dij is the distance between the i-th and the j-th point.
I want to find a subset of k elements (with k < N of course) such that the distance between this k points is the maximum as possibile.
In other words what I want is to find a sort of "border points" that cover the maximum area in the space of the points.
If k = 2 the answer is trivial because I can try to search the two most distant element in the matrix of distances and these are the two points, but how I can generalize this question when k>2?
Any suggest? It's a NP-hard problem?
Thanks for the answer
One generalisation would be "find k points such that the minimum distance between any two of these k points is as large as possible".
Unfortunately, I think this is hard, because I think if you could do this efficiently you could find cliques efficiently. Suppose somebody gives you a matrix of distances and asks you to find a k-clique. Create another matrix with entries 1 where the original matrix had infinity, and entries 1000000 where the original matrix had any finite distance. Now a set of k points in the new matrix where the minimum distance between any two points in that set is 1000000 corresponds to a set of k points in the original matrix which were all connected to each other - a clique.
This construction does not take account of the fact that the points correspond to bit-vectors and the distance between them is the Hamming distance, but I think it can be extended to cope with this. To show that a program capable of solving the original problem can be used to find cliques I need to show that, given an adjacency matrix, I can construct a bit-vector for each point so that pairs of points connected in the graph, and so with 1 in the adjacency matrix, are at distance roughly A from each other, and pairs of points not connected in the graph are at distance B from each other, where A > B. Note that A could be quite close to B. In fact, the triangle inequality will force this to be the case. Once I have shown this, k points all at distance A from each other (and so with minimum distance A, and a sum of distances of k(k-1)A/2) will correspond to a clique, so a program finding such points will find cliques.
To do this I will use bit-vectors of length kn(n-1)/2, where k will grow with n, so the length of the bit-vectors could be as much as O(n^3). I can get away with this because this is still only polynomial in n. I will divide each bit-vector into n(n-1)/2 fields each of length k, where each field is responsible for representing the connection or lack of connection between two points. I claim that there is a set of bit-vectors of length k so that all of the distances between these k-long bit-vectors are roughly the same, except that two of them are closer together than the others. I also claim that there is a set of bit-vectors of length k so that all of the distances between them are roughly the same, except that two of them are further apart than the others. By choosing between these two different sets, and by allocating the nearer or further pair to the two points owning the current bit-field of the n(n-1)/2 fields within the bit-vector I can create a set of bit-vectors with the required pattern of distances.
I think these exist because I think there is a construction that creates such patterns with high probability. Create n random bit-vectors of length k. Any two such bit-vectors have an expected Hamming distance of k/2 with a variance of k/4 so a standard deviation of sqrt(k)/2. For large k we expect the different distances to be reasonably similar. To create within this set two points that are very close together, make one a copy of the other. To create two points that are very far apart, make one the not of the other (0s in one where the other has 1s and vice versa).
Given any two points their expected distance from each other will be (n(n-1)/2 - 1)k/2 + k (if they are supposed to be far apart) and (n(n-1)/2 -1)k/2 (if they are supposed to be close together) and I claim without proof that by making k large enough the expected difference will triumph over the random variability and I will get distances that are pretty much A and pretty much B as I require.
#mcdowella, I think that probably I don't explain very well my problem.
In my problem I have binary string and for each of them I can compute the distance to the other using the Hamming distance
In this way I have a distance matrix D that has a finite value in each element D(i,j).
I can see this distance matrix like a graph: infact, each row is a vertex in the graph and in the column I have the weight of the arc that connect the vertex Vi to the vertex Vj.
This graph, for the reason that I explain, is complete and it's a clique of itself.
For this reason, if i pick at random k vertex from the original graph I obtain a subgraph that is also complete.
From all the possible subgraph with order k I want to choose the best one.
What is the best one? Is a graph such that the distance between the vertex as much large but also much uniform as possible.
Suppose that I have two vertex v1 and v2 in my subgraph and that their distance is 25, and I have three other vertex v3, v4, v5, such that
d(v1, v3) = 24, d(v1, v4) = 7, d(v2, v3) = 5, d(v2, v4) = 22, d(v1, v5) = 14, d(v1, v5) = 14
With these distance I have that v3 is too far from v1 but is very near to v2, and the opposite situation for v4 that is too far from v2 but is near to v1.
Instead I prefer to add the vertex v5 to my subgraph because it is distant to the other two in a more uniform way.
I hope that now my problem is clear.
You think that your formulation is already correct?
I have claimed that the problem of finding k points such that the minimum distance between these points, or the sum of the distances between these points, is as large as possible is NP-complete, so there is no polynomial time exact answer. This suggests that we should look for some sort of heuristic solution, so here is one, based on an idea for clustering. I will describe it for maximising the total distance. I think it can be made to work for maximising the minimum distance as well, and perhaps for other goals.
Pick k arbitrary points and note down, for each point, the sum of the distances to the other points. For each other point in the data, look at the sum of the distances to the k chosen points and see if replacing any of the chosen points with that point would increase the sum. If so, replace whichever point increases the sum most and continue. Keep trying until none of the points can be used to increase the sum. This is only a local optimum, so repeat with another set of k arbitrary/random points in the hope of finding a better one until you get fed up.
This inherits from its clustering forebear the following property, which might at least be useful for testing: if the points can be divided into k classes such that the distance between any two points in the same class is always less than the distance between any two points in different classes then, when you have found k points where no local improvement is possible, these k points should all be from different classes (because if not, swapping out one of a pair of points from the same class would increase the sum of distances between them).
This problem is known as the MaxMin Diversity Problem (MMDP). It is known to be NP-hard. However, there are algorithms for giving good approximate solutions in reasonable time, such as this one.
I'm answering this question years after it was asked because I was looking for algorithms to solve the same problem, and had trouble even finding out what to call it.
This is a algorithmic question which thought by me, but myself couldn't think of an easy solution.
The problem is inspired by merging two famous problems: Minimum segment coverage & Knapsack problem, and the description is as followed:
Given n segments [l_i, r_i], where all l_i, r_i in [1,M]. n, M are known.
Each segment has a value v_i, what is the maximum total value you can get if you can choose any number of non-overlapping segments? (touching is ok)
I have a strong feeling that my thought is over-complicated
but now the solution in my head is use dynamic programming like we solve knapsack.
Sort the segments by r_i in ascending order
Define DP(i) := maximum value we can get using segment [0,i], here the index is the sorted index after step 1
DP(i) = max(DP(j) + v[i], DP(i-1)) where j is the largest index where r_j <= l_i, which can be found using binary search
I think this solution is of O(N lg N). Now my problem is:
Is this solution correct?
Is there any easier, better-performance solution?
The segment coverage can be represented by a graph that is called interval graph. Since you don't want to take two overlapping segments, you are looking at finding a Maximum Weighted Independent Set in an interval graph. This problem is NP-hard on general graph, but fortunately it can easily be solved on interval graphs. If you look at the GraphClasses website you can see that the problem is solvable in linear time, even for the chordal graphs (it is a larger class than the interval graph), and you have the reference to the original paper that proves it.
I have a bunch of points on a 2-dimensional Grid. I want to group the Points into pairs, while minimizing the sum of the euclidean distances between the points of the pairs.
Example:
Given the points:
p1: (1,1)
p2: (5,5)
p3: (1,3)
p4: (6,6)
Best solution:
pair1 = (p1,p3), distance = 2
pair2 = (p2,p4), distance = 1
Minimized total distance: 1+2 = 3
I suspect this problem might be solvable with a variant of the Hungarian Algorithm?!
What is the fastest way to solve the problem?
(Little Remark: I always should have less than 12 points.)
The problem you are trying to solve is similar to the shortest path through a fully connected (mesh) network, where you are not allowed to visit each vertex/node more than once, and you don't care about connecting the minimal pairs.
This problem is approachable when using techniques from graph theory, metric spaces, and other results from computational geometry.
This problem is similar the wiki article on the Closest pair of points problem, and the article offers some useful insights regarding Voroni diagrams and Delaunay triangulation, as well as using Recursive Divide and Conquer algorithms to solve the problem.
Note that solving the closest pair of points is not the solution, as you could have four points (A,B,C,D) in a line, where d(B,C) is least, but then you would also have d(A,D), and the sum would be larger than d(A,B) and d(C,D).
This stackoverflow question explains how to find the shortest distance between two points, and has a useful hint to skip computing the square root while comparing distances. Answers suggest using a divide and conquer approach (linear), but observe that splitting both X and Y coordinates might partition more appropriately.
This math stackexchange question addresses a similar problem, and suggests using Prim's algorithm, Kruskal's algorithm, or notes that this is a special case of the Travelling Salesman problem, which is NP-hard.
My approach would be to solve your problem (pairing the closest points) using a greedy algorithm to compute a minimal spanning tree, and then remove from the spanning tree 1/2 the edges (leaving disconnected pairs). Likely using a second (variant) of a greedy algorithm.
There are so few pairings possible for 12 or less points (about 10000 or less as pointed out in a comment), you can check all pairings by brute force and even with this solution you can solve about 10000 problems per second with 12 or less points on a modern personal computer. If you want a faster solution, you can enumerate nearest neighbors in order for each point and then just check pairings that are minimal with respect to which nearest neighbors are used for each point. In the worst-case I don't think this gives a speed-up, but for example if your 12 points come in 6 pairs of very close points (where unpaired points are far away) then you'd find the solution very quickly because the minimal pairing with respect to nearest neighbors would match together each point with its first nearest neighbor.
I was trying to solve the following problem:
An mn maze is an mn rectangular grid with walls placed between grid cells such that there is exactly one path from the top-left square to any other square.
The following are examples of a 912 maze and a 1520 maze:
Let C(m,n) be the number of distinct mn mazes. Mazes which can be formed by rotation and reflection from another maze are considered distinct.
It can be verified that C(1,1) = 1, C(2,2) = 4, C(3,4) = 2415, and C(9,12) = 2.5720e46 (in scientific notation rounded to 5 significant digits).
Find C(100,500)
Now, there is an explicit formula which gives the right result, and it is perfectly computable. However, as I understand, the solutions to Project Euler problems should be more like clever algorithms and not explicit formula computations. Trying to formulate the solution as a recursion, I could only arrive at a linear system with number of variables growing exponentially with the size of the maze (more precisely, if one tries to write a recursion for the number of mxn mazes with m held fixed, one arrives at a linear system such that the number of its variables grows exponentially with m: one of the variables is the number of mxn mazes with the property given in the declaration of problem 380, while the other variables are numbers of mxn mazes with more than one connected component which touch the boundary of the maze in some specific "configuration" - and the number of such "configurations" seems to grow exponentially with m. So, while this approach is feasible with m=2,3,4 etc, it does not seem to work with m=100).
I thought also to reduce the problem to subproblems which can be solved more easily,
then reusing the subproblems solutions when constructing a solution to larger subproblems(the dynamic programming approach), but here I stumbled upon the fact that subproblems seem to involve mazes of irregular shapes, and again, the number of such mazes is exponential in m,n.
If someone knows of a feasible approach (m=100, n=500) other than using explicit formulas or some ad hoc theorems, and can hint where to look, for me it would be quite interesting.
This is basically a spanning tree counting problem. Specifically, it is counting the number of spanning trees in a grid graph.
Counting Spanning Trees in a Grid Graph
From the "Counting spanning trees" section of the Wikipedia entry:
The number t(G) of spanning trees of a connected graph is a
well-studied invariant. In some cases, it is easy to calculate t(G)
directly. For example, if G is itself a tree, then t(G)=1, while if G
is the cycle graph C_n with n vertices, then t(G)=n. For any graph G,
the number t(G) can be calculated using Kirchhoff's matrix-tree theorem...
Related Algorithms
Here are a few papers or posts related to counting the number of spanning trees in grid graphs:
"Counting Spanning Trees in Grid Graphs", Melissa Desjarlais and Robert Molina
Department of Mathematics and Computer Science, Alma College, August 17, 2012? (publish date uncertain)
"Counting the number of spanning trees in a graph - A spectral approach", from Univ. of Maryland class notes for CMSC858W: Algorithms for Biosequence Analysis,
April 29th, 2010
"Automatic Generation of Generating Functions for Counting the Number of Spanning Trees for Grid Graphs (and more general creatures) of Fixed (but arbitrary!) Width", by Shalosh B. Ekhad and Doron Zeilberger
The latter by Ekhad and Zeilberger provided the following, with answers that matched up with the problem-at-hand:
If you want to see explicit expressions (as rational functions in z)
for the formal power series whose coefficient of zn in its Maclaurin
expansion (with respect to z) would give you the number of spanning
trees of the m by n grid graph (the Cartesian product of a path of m
vertices and a path of length n) for m=2 to m=6, the the input
gives the output.
Specifically, see the output.
Sidenote: Without the provided solution values that suggest otherwise, a valid interpretation could be that the external structure of the maze is important. Two or more mazes with identical paths would be different and distinct in this case, as there could be 3 options for entering and exiting a maze on a corner, where the top left corner would be open at top, top left corner open on left, or open on both left and top, and similar for a corner exit. If trying to represent these maze possibilities as a tree, two nodes may converge on entry rather than just diverging from start to finish, and there would be one or more additional nodes for exit possibilities. This would increase the value of C(m,n).
The insight here comes from the question (emphasis mine)
A .. maze is a rectangular grid with walls placed between grid cells such that there is exactly one path from the top-left square to any other square.
If you think of the dual of the maze, i.e. the spaces one can occupy, it is clear that a maze must form a graph. Not just any graph either, for there to be a singular path the graph must not contain any cycles which makes it a tree. This reduction to a combinatorics problem suggests an algorithm. In the spirit of Project Euler, the rest is left as an exercise to the reader.
SPOILER AHEAD
I was wrong, stating in one of the comments that "Now, there is a general theorem about spanning trees in a graph, but it does not seem to give a computationally feasible way to compute the number sought". The "general theorem", being the Matrix-Tree theorem, attributed to Kirchhoff, and referred to in one of the answers here, gives the result not only as the product of the nonzero eigenvalues of the graph Laplacian divided by the order of the graph, but also as the absolute value of any cofactor of the Laplacian, which in this case is the absolute value of the determinant of a 49999x49999 matrix. But, although the matrix is very sparse, it still looked to me out of reach.
However, the reference
http://arxiv.org/pdf/0712.0681.pdf
("Determinants of block tridiagonal matrices", by Luca Guido Molinari),
permitted to reduce the problem to the evaluation of the determinant of an integer 100x100 dense matrix, having very large integers as its entries.
Further, the reference
http://www.ams.org/journals/mcom/1968-22-103/S0025-5718-1968-0226829-0/S0025-5718-1968-0226829-0.pdf
by Erwin H. Bareiss (usually one just speaks of "Bareiss algorithm", but the recursion which I used and which is referred to as formula (8) in the reference, seems to be due to Charles Dodgson, a.k.a. Lewis Carroll :) ), perimitted me then to evaluate this last determinant and thus to obtain the answer to the original problem.
I would say that finding a explicit formula is a correct way to solve an Euler problem. It will be fast, it can be scaled. Just go for it :)