Fill grid with squares which are all connected by free space - algorithm

I have a grid with x fields. This grid should be filled with as much sqaures (lets call them "farms") of the size 2x2 (so each farm is 4 fields in size) as possible. Each farm has to be connected to a certain field ("root") through "roads".
I have written a kind of brute force algorithm which tries every combination of farms and roads. Everytime a farm is placed on the grid, the algorithm checks, if the Farm has a connection to the root using the A* algorithm. It works very well on small grids, but on large grids, it's too time consuming.
Here is a small already solved grid
http://www.tmk-stgeorgen.at/algo/small.png
Blue squares are the farms, red squares are free space or "roads" and the filled red square is the root field, to which every farm needs a connection.
I need to solve this grid:
http://www.tmk-stgeorgen.at/algo/grid.png
Is there any fast standard algorithm, which I can use?

i think the following is better than a search, but it's based on a search, so i'll describe that first:
search
you can make a basic search efficient in various ways.
first, you need to enumerate the possible arrangements efficiently. i think i would do this by storing the number of shifts relative to the first position a farm can be placed, starting from the bottom (near the root). so (0) would be a single farm on the left of the bottom line; (1) would be that farm shifted one right; (0,0) would be two farms, first as (0), second at the first position possible scanning upwards (second line, touching first farm); (0,1) would have the second farm one to the right; etc.
second, you need to prune as efficiently as possible. there it's a trade-off between doing smart but expensive things, and dumb but fast things. dumb but fast would be a flood fill from the root, checking whether all farms can be reached. smarter would be working out how to do that in an incremental fashion when you add one farm - for example, you know that you can rely on previous flood fills cells smaller than the smallest value the farm covers. even smarter would be identifying which roads are critical (unique access to another farm) and "protecting" them in some way.
third, there may be extra tweaks you can do at a higher level. for example, it might be better to solve for a symmetric grid (and use symmetry to avoid repeating the same pattern in different ways) and then check which solutions are consistent with the grid you actually have. another approach that might be useful, but that i can't see how to make work, is to focus on the road rather than the farms.
caching
here's the secret sauce. the search i have described "fills up" farms into the space from the bottom, left to right scanning.
now imagine that you have run the search to the point where the space is full, with a nearly-optimal distribution. it may be that to improve that solution you have to backtrack almost to the start to rearrange a few farms "near the bottom". which is expensive because then you have to continue the search to re-fill the space above.
but you don't need to repeat the entire search if the "boundary" around the farms is the same as an earlier arrangement. because you've already "filled in" above that boundary in some optimal way. so you can cache by "best result for a given boundary" and simply look-up those solutions.
the boundary description must include the shape of the boundary and the positions of roads that provide access to the root. that is all.

Here's something kind of crude in Haskell, which could probably benefit from optimization, memoization, and better heuristics...
The idea is to start with a grid that is all farm and place roads on it, starting with the root and expanding from there. The recursion uses a basic heuristic, where the candidates are chosen from all adjacent straight-two-block segments all along the road/s, and only if they satisfy the requirement that adding the segment will increase the number of farms connected to the road/s (overlapping segments are just added as one block rather than two).
import qualified Data.Map as M
import Data.List (nubBy)
-- (row,(rowLength,offset))
grid' = M.fromList [(9,[6])
,(8,[5..7])
,(7,[4..8])
,(6,[3..9])
,(5,[2..10])
,(4,[1..11])
,(3,[2..10])
,(2,[3..9])
,(1,[4..7])]
grid = M.fromList [(19,[10])
,(18,[9..11])
,(17,[8..12])
,(16,[7..13])
,(15,[6..14])
,(14,[5..15])
,(13,[4..16])
,(12,[3..17])
,(11,[2..18])
,(10,[1..19])
,(9,[1..20])
,(8,[1..19])
,(7,[2..18])
,(6,[3..17])
,(5,[4..16])
,(4,[5..15])
,(3,[6..14])
,(2,[7..13])
,(1,[8..11])]
root' = (1,7) --(row,column)
root = (1,11) --(row,column)
isOnGrid (row,col) =
case M.lookup row grid of
Nothing -> False
Just a -> elem col a
isFarm (topLeftRow,topLeftCol) =
and (map isOnGrid [(topLeftRow,topLeftCol),(topLeftRow,topLeftCol + 1)
,(topLeftRow - 1,topLeftCol),(topLeftRow - 1,topLeftCol + 1)])
isNotOnFarm tile#(r,c) farm#(fr,fc) =
not (elem r [fr,fr - 1]) || not (elem c [fc, fc + 1])
isOnFarm tile#(r,c) farm#(fr,fc) =
elem r [fr,fr - 1] && elem c [fc, fc + 1]
farmOnFarm farm#(fr,fc) farm' =
or (map (flip isOnFarm farm') [(fr,fc),(fr,fc + 1),(fr - 1,fc),(fr - 1,fc + 1)])
addRoad tile#(r,c) result#(road,(numFarms,farms))
| not (isOnGrid tile) || elem tile road = result
| otherwise = (tile:road,(length $ nubBy (\a b -> farmOnFarm a b) farms',farms'))
where
newFarms' = filter (isNotOnFarm tile) farms
newFarms = foldr comb newFarms' adjacentFarms
farms' = newFarms ++ adjacentFarms
comb adjFarm newFarms'' =
foldr (\a b -> if farmOnFarm a adjFarm || a == adjFarm then b else a:b) [] newFarms''
adjacentFarms = filter (\x -> isFarm x && and (map (flip isNotOnFarm x) road))
[(r - 1,c - 1),(r - 1,c),(r,c - 2),(r + 1,c - 2)
,(r + 2,c - 1),(r + 2,c),(r + 1,c + 1),(r,c + 1)]
candidates result#(road,(numFarms,farms)) =
filter ((>numFarms) . fst . snd)
$ map (\roads -> foldr (\a b -> addRoad a b) result roads)
$ concatMap (\(r,c) -> [[(r + 1,c),(r + 1,c - 1)],[(r + 1,c),(r + 1,c + 1)]
,[(r,c - 1),(r + 1,c - 1)],[(r,c - 1),(r - 1,c - 1)]
,[(r,c + 1),(r + 1,c + 1)],[(r,c + 1),(r - 1,c + 1)]
,[(r - 1,c),(r - 1,c - 1)],[(r - 1,c),(r - 1,c + 1)]
,[(r + 1,c),(r + 2,c)],[(r,c - 1),(r,c - 2)]
,[(r,c + 1),(r,c + 2)],[(r - 1,c),(r - 2, c)]]) road
solve = solve' (addRoad root ([],(0,[]))) where
solve' result#(road,(numFarms,farms)) =
if null candidates'
then [result]
else do candidate <- candidates'
solve' candidate
where candidates' = candidates result
b n = let (road,(numFarms,farms)) = head $ filter ((>=n) . fst . snd) solve
in (road,(numFarms,nubBy (\a b -> farmOnFarm a b) farms))
Output, small grid:
format: (road/s,(numFarms,farms))
*Main> b 8
([(5,5),(5,4),(6,6),(4,6),(5,6),(4,8),(3,7),(4,7),(2,7),(2,6),(1,7)]
,(8,[(2,4),(3,8),(5,9),(8,6),(6,7),(5,2),(4,4),(7,4)]))
(0.62 secs, 45052432 bytes)
Diagram (O's are roads):
X
XXX
XXXXX
XXXOXXX
XXOOOXXXX
XXXXXOOOXXX
XXXXXOXXX
XXXOOXX
XXXO
Output, large grid:
format: (road/s,(numFarms,farms))
*Main> b 30
([(9,16),(9,17),(13,8),(13,7),(16,10),(7,6),(6,6),(9,3),(8,4),(9,4),(8,5)
,(8,7),(8,6),(9,7),(10,8),(10,7),(11,8),(12,9),(12,8),(14,9),(13,9),(14,10)
,(15,10),(14,11),(13,12),(14,12),(13,14),(13,13),(12,14),(11,15),(11,14)
,(10,15),(8,15),(9,15),(8,14),(8,13),(7,14),(7,15),(5,14),(6,14),(5,12)
,(5,13),(4,12),(3,11),(4,11),(2,11),(2,10),(1,11)]
,(30,[(2,8),(4,9),(6,10),(4,13),(6,15),(7,12),(9,11),(10,13),(13,15),(15,13)
,(12,12),(13,10),(11,9),(9,8),(10,5),(8,2),(10,1),(11,3),(5,5),(7,4),(7,7)
,(17,8),(18,10),(16,11),(12,6),(14,5),(15,7),(10,18),(8,16),(11,16)]))
(60.32 secs, 5475243384 bytes)
*Main> b 31
still waiting....

I don't know if this solution will maximize your number farms, but you can try to put them in a regular way: allign them horizontally or vertically. You can stick 2 columns (or rows) together for the best density of farms. You should just take care to let 1 space on top/bottom (or left/right).
When you can't put more column (row), just check if you can put some farms near the border of your grid.
Wish it could help you !

Related

What is an efficient "modified" goldmine problem algorithm?

We are given an M x N map (basically a 2D array) of values (can be negative), and you gotta find the path that makes the most money.
The trick is that the "drill" that you're using is three units large.
So to drill a hole in a certain place (blue), you gotta make sure the three above it (red) are drilled as well;
and it's exponential up, meaning if you wanna dig a deeper hole, then you gotta dig the three above, and the three above those ones, etc
So far I have an inefficient, semi-brute force (kinda) algorithm that's O(n^2) so as soon as sample size goes up (for a 900 x 1200 sample for example), the algorithm can't be done (we have a 3 minutes limit).
I'm suspecting maybe dynamic programming could be a way, but I'm not sure at all how to implement that.
Let me know if anything else comes to mind.
We've worked with Python by the way.
Thank you guys in advance.
You can calculate all the values in O(n * m) instead of O((n * m)^2).
Let's say we have input matrix A and want to calculate the result for each position resulting in matrix B.
The first row of B is just the same as A.
The second row of B we get by summing the 3 values above (or 2 on the edges).
For all the following rows (r = row, c = column):
c not on the edge: B[r][c] = B[r - 1][c - 1] + B[r - 1][c + 1] + A[r - 1][c] - B[r - 2][c].
c on the left edge: B[r][0] = B[r - 1][1] + A[r - 1][0]
c on the right edge: B[r][len - 1] = B[r - 1][len - 2] + A[r - 1][len - 1]
If you color the matrix and the values you sum, you can easily see what we are doing. Basically we sum left and right neighbour above to get the value, but we miss the value in the middle and the value of the result two rows above is calculated twice, so we subtract it.

Haskell - Grouping specific nearest neighbours in a cartesian grid

I'm after some direction in this puzzle where I need to group specific nearest neighbours together.
my input data is:
myList :: Map (Int, Int) Int
myList =
fromList
[((-2,-2),0),((-2,-1),0),((-2,0),2),((-2,1),0),((-2,2),0)
,((-1,-2),1),((-1,-1),3),((-1,0),0),((-1,1),0),((-1,2),1)
,((0,-2),0),((0,-1),0),((0,0),0),((0,1),0),((0,2),0)
,((1,-2),0),((1,-1),0),((1,0),0),((1,1),2),((1,2),1)
,((2,-2),0),((2,-1),2),((2,0),0),((2,1),0),((2,2),0)]
Which is a data representation of this 5 x 5 grid (brown land, blue water):
I'm using (Int,Int) as XY coordinates, because the way the list had to be generated (thus its ordering) was in a spiral on a cartesian coordinate grid (0,0) being the origin. The remaining Int is size of population 0 being water, 1..9 being land.
Because of the ordering of my Map I've been struggling with finding a way I can traverse my data and return 4 grouped land items that are grouped due to each others connected proximity (including diagonal), so I'm looking for a result like bellow:
[ [(-1 , 2)]
, [(1, 2),(1,1)]
, [(-2, -0),(-1,-1),(-1,-2)]
, [(2, -1)]]
I've researched and tried various algorithm like BFS, Flood Fill but my input data never fit the structural requirements or my understanding of the subjects doesn't allow me to convert it to using coordinates.
Is there a way I can run an algorithm directly on the data, or should I be looking at another direction?
I'm sorry there is no code examples of what I have so far but I've not even been able to create anything remotely useful to use.
I recommend using a union-find data structure. Loop over all positions; if it is land, mark it equivalent to any positions immediately NE, N, NW, or W of it that are also land. (It will automatically get marked equivalent to any land that exists E, SW, S, or SE of it when you visit that other land. The critical property of the set D={NE, N, NW, W} is that if you mirror all the directions in D to get M, then M∪D contains every direction; any other set D with this property will do fine, too.) The equivalence classes returned by the data structure at the end of this process will be your connected land chunks.
If n is the total number of positions, this process is O(n*log n); the log n component comes from the Map lookups needed to determine if a neighbor is land or water.
You should consider making the Map sparse if you can -- storing only the key-value pairs corresponding to land and skipping the water keys -- to graduate to O(m*log m) where m is the total number of lands, rather than the total number of positions. If you cannot (because you must remember the difference between water and not-existing positions, say), you could consider switching to an array as your backing store to graduate to O(n*a n), where a is the inverse Ackermann function, and so the whole shebang would basically be as close to O(n) as it is possible to get without actually being O(n).
Whether O(m*log m) or O(n*a n) is preferable when both are an option is a matter for empirical exploration on some data sets that you believe represent your typical use case.
I ended up going with this solution by Chris Penner via FP slack channel, it uses Union Find Algorithm (I've added comments to code to help a little):
-- | Take Map of land coordinates and return list of grouped land items forming islands
-- | Using Union find algorythm
findIslands :: M.Map Coordinate Coordinate -> IO [[Coordinate]]
findIslands land = do
-- create fresh point map
pointMap <- traverse U.fresh land
-- traverse each point checking for neighbours
void . flip M.traverseWithKey pointMap $ \(x, y) point ->
for_ (catMaybes (flip M.lookup pointMap <$> [(x + 1, y), (x, y + 1),(x +1, y +1), (x - 1, y + 1)]))
$ \neighbourPoint ->
U.union point neighbourPoint
-- traverse ppintMap and representative and their descriptors
withUnionKey :: (M.Map Coordinate Coordinate) <- for pointMap (U.repr >=> U.descriptor)
-- swap cordinates arround
let unionKeyToCoord :: [(Coordinate, Coordinate)] = (swap <$> M.toList withUnionKey)
-- combine coordinates to create islands
results :: M.Map Coordinate [Coordinate] = M.fromListWith (<>) (fmap (:[]) <$> unionKeyToCoord)
-- return just the elements from the Map
return (M.elems results)
convertTolandGrid :: [Coordinate] -> M.Map Coordinate Coordinate
convertTolandGrid = M.fromList . fmap (id &&& id)

Lazily Tying the Knot for 1 Dimensional Dynamic Programming

Several years ago I took an algorithms course where we were giving the following problem (or one like it):
There is a building of n floors with an elevator that can only go up 2 floors at a time and down 3 floors at a time. Using dynamic programming write a function that will compute the number of steps it takes the elevator to get from floor i to floor j.
This is obviously easy using a stateful approach, you create an array n elements long and fill it up with the values. You could even use a technically non-stateful approach that involves accumulating a result as recursively passing it around. My question is how to do this in a non-stateful manner by using lazy evaluation and tying the knot.
I think I've devised the correct mathematical formula:
where i+2 and i-3 are within the allowed values.
Unfortunately I can't get it to terminate. If I put the i+2 case first and then choose an even floor I can get it to evaluate the even floors below the target level but that's it. I suspect that it shoots straight to the highest even floor for everything else, drops 3 levels, then repeats, forever oscillating between the top few floors.
So it's probably exploring the infinite space (or finite but with loops) in a depth first manner. I can't think of how to explore the space in a breadth first fashion without using a whole lot of data structures in between that effectively mimic a stateful approach.
Although this simple problem is disappointingly difficult I suspect that having seen a solution in 1 dimension I might be able to make it work for a 2 dimensional variation of the problem.
EDIT: A lot of the answers tried to solve the problem in a different way. The problem itself isn't interesting to me, the question is about the method used. Chaosmatter's approach of creating a minimal function which can compare potentially infinite numbers is possibly a step in the right direction. Unfortunately if I try to create a list representing a building with 100 floors the result takes too long to compute, since the solutions to sub problems are not reused.
I made an attempt to use a self-referencing data structure but it doesn't terminate, there is some kind of infinite loop going on. I'll post my code so you can understand what it is I'm going for. I'll change the accepted answer if someone can actually solve the problem using dynamic programming on a self-referential data structure using laziness to avoid computing things more than once.
levels = go [0..10]
where
go [] = []
go (x:xs) = minimum
[ if i == 7
then 0
else 1 + levels !! i
| i <- filter (\n -> n >= 0 && n <= 10) [x+2,x-3] ]
: go xs
You can see how 1 + levels !! i tries to reference the previously calculated result and how filter (\n -> n >= 0 && n <= 10) [x+2,x-3] tries to limit the values of i to valid ones. As I said, this doesn't actually work, it simply demonstrates the method by which I want to see this problem solved. Other ways of solving it are not interesting to me.
Since you're trying to solve this in two dimensions, and for other problems than the one described, let's explore some more general solutions. We are trying to solve the shortest path problem on directed graphs.
Our representation of a graph is currently something like a -> [a], where the function returns the vertices reachable from the input. Any implementation will additionally require that we can compare to see if two vertices are the same, so we'll need Eq a.
The following graph is problematic, and introduces almost all of the difficulty in solving the problem in general:
problematic 1 = [2]
problematic 2 = [3]
problematic 3 = [2]
problematic 4 = []
When trying to reach 4 from 1, there are is a cycle involving 2 and 3 that must be detected to determine that there is no path from 1 to 4.
Breadth-first search
The algorithm Will presented has, if applied to the general problem for finite graphs, worst case performance that is unbounded in both time and space. We can modify his solution to attack the general problem for graphs containing only finite paths and finite cycles by adding cycle detection. Both his original solution and this modification will find finite paths even in infinite graphs, but neither is able to reliably determine that there is no path between two vertices in an infinite graph.
acyclicPaths :: (Eq a) => (a->[a]) -> a -> a -> [[a]]
acyclicPaths steps i j = map (tail . reverse) . filter ((== j).head) $ queue
where
queue = [[i]] ++ gen 1 queue
gen d _ | d <= 0 = []
gen d (visited:t) = let r = filter ((flip notElem) visited) . steps . head $ visited
in map (:visited) r ++ gen (d+length r-1) t
shortestPath :: (Eq a) => (a->[a]) -> a -> a -> Maybe [a]
shortestPath succs i j = listToMaybe (acyclicPaths succs i j)
Reusing the step function from Will's answer as the definition of your example problem, we could get the length of the shortest path from floor 4 to 5 of an 11 story building by fmap length $ shortestPath (step 11) 4 5. This returns Just 3.
Let's consider a finite graph with v vertices and e edges. A graph with v vertices and e edges can be described by an input of size n ~ O(v+e). The worst case graph for this algorithm is to have one unreachable vertex, j, and the remaining vertexes and edges devoted to creating the largest number of acyclic paths starting at i. This is probably something like a clique containing all the vertices that aren't i or j, with edges from i to every other vertex that isn't j. The number of vertices in a clique with e edges is O(e^(1/2)), so this graph has e ~ O(n), v ~ O(n^(1/2)). This graph would have O((n^(1/2))!) paths to explore before determining that j is unreachable.
The memory required by this function for this case is O((n^(1/2))!), since it only requires a constant increase in the queue for each path.
The time required by this function for this case is O((n^(1/2))! * n^(1/2)). Each time it expands a path, it must check that the new node isn't already in the path, which takes O(v) ~ O(n^(1/2)) time. This could be improved to O(log (n^(1/2))) if we had Ord a and used a Set a or similar structure to store the visited vertices.
For non-finite graphs, this function should only fail to terminate exactly when there doesn't exists a finite path from i to j but there does exist a non-finite path from i to j.
Dynamic Programming
A dynamic programming solution doesn't generalize in the same way; let's explore why.
To start with, we'll adapt chaosmasttter's solution to have the same interface as our breadth-first search solution:
instance Show Natural where
show = show . toNum
infinity = Next infinity
shortestPath' :: (Eq a) => (a->[a]) -> a -> a -> Natural
shortestPath' steps i j = go i
where
go i | i == j = Zero
| otherwise = Next . foldr minimal infinity . map go . steps $ i
This works nicely for the elevator problem, shortestPath' (step 11) 4 5 is 3. Unfortunately, for our problematic problem, shortestPath' problematic 1 4 overflows the stack. If we add a bit more code for Natural numbers:
fromInt :: Int -> Natural
fromInt x = (iterate Next Zero) !! x
instance Eq Natural where
Zero == Zero = True
(Next a) == (Next b) = a == b
_ == _ = False
instance Ord Natural where
compare Zero Zero = EQ
compare Zero _ = LT
compare _ Zero = GT
compare (Next a) (Next b) = compare a b
we can ask if the shortest path is shorter than some upper bound. In my opinion, this really shows off what's happening with lazy evaluation. problematic 1 4 < fromInt 100 is False and problematic 1 4 > fromInt 100 is True.
Next, to explore dynamic programming, we'll need to introduce some dynamic programming. Since we will build a table of the solutions to all of the sub-problems, we will need to know the possible values that the vertices can take. This gives us a slightly different interface:
shortestPath'' :: (Ix a) => (a->[a]) -> (a, a) -> a -> a -> Natural
shortestPath'' steps bounds i j = go i
where
go i = lookupTable ! i
lookupTable = buildTable bounds go2
go2 i | i == j = Zero
| otherwise = Next . foldr minimal infinity . map go . steps $ i
-- A utility function that makes memoizing things easier
buildTable :: (Ix i) => (i, i) -> (i -> e) -> Array i e
buildTable bounds f = array bounds . map (\x -> (x, f x)) $ range bounds
We can use this like shortestPath'' (step 11) (1,11) 4 5 or shortestPath'' problematic (1,4) 1 4 < fromInt 100. This still can't detect cycles...
Dynamic programming and cycle detection
The cycle detection is problematic for dynamic programming, because the sub-problems aren't the same when they are approached from different paths. Consider a variant of our problematic problem.
problematic' 1 = [2, 3]
problematic' 2 = [3]
problematic' 3 = [2]
problematic' 4 = []
If we are trying to get from 1 to 4, we have two options:
go to 2 and take the shortest path from 2 to 4
go to 3 and take the shortest path from 3 to 4
If we choose to explore 2, we will be faced with the following option:
go to 3 and take the shortest path from 3 to 4
We want to combine the two explorations of the shortest path from 3 to 4 into the same entry in the table. If we want to avoid cycles, this is really something slightly more subtle. The problems we faced were really:
go to 2 and take the shortest path from 2 to 4 that doesn't visit 1
go to 3 and take the shortest path from 3 to 4 that doesn't visit 1
After choosing 2
go to 3 and take the shortest path from 3 to 4 that doesn't visit 1 or 2
These two questions about how to get from 3 to 4 have two slightly different answers. They are two different sub-problems which can't fit in the same spot in a table. Answering the first question eventually requires determining that you can't get to 4 from 2. Answering the second question is straightforward.
We could make a bunch of tables for each possible set of previously visited vertices, but that doesn't sound very efficient. I've almost convinced myself that we can't do reach-ability as a dynamic programming problem using only laziness.
Breadth-first search redux
While working on a dynamic programming solution with reach-ability or cycle detection, I realized that once we have seen a node in the options, no later path visiting that node can ever be optimal, whether or not we follow that node. If we reconsider problematic':
If we are trying to get from 1 to 4, we have two options:
go to 2 and take the shortest path from 2 to 4 without visiting 1, 2, or 3
go to 3 and take the shortest path from 3 to 4 without visiting 1, 2, or 3
This gives us an algorithm to find the length of the shortest path quite easily:
-- Vertices first reachable in each generation
generations :: (Ord a) => (a->[a]) -> a -> [Set.Set a]
generations steps i = takeWhile (not . Set.null) $ Set.singleton i: go (Set.singleton i) (Set.singleton i)
where go seen previouslyNovel = let reachable = Set.fromList (Set.toList previouslyNovel >>= steps)
novel = reachable `Set.difference` seen
nowSeen = reachable `Set.union` seen
in novel:go nowSeen novel
lengthShortestPath :: (Ord a) => (a->[a]) -> a -> a -> Maybe Int
lengthShortestPath steps i j = findIndex (Set.member j) $ generations steps i
As expected, lengthShortestPath (step 11) 4 5 is Just 3 and lengthShortestPath problematic 1 4 is Nothing.
In the worst case, generations requires space that is O(v*log v), and time that is O(v*e*log v).
The problem is that min needs to fully evaluate both calls to f,
so if one of them loops infinitly min will never return.
So you have to create a new type, encoding that the number returned by f is Zero or a Successor of Zero.
data Natural = Next Natural
| Zero
toNum :: Num n => Natural -> n
toNum Zero = 0
toNum (Next n) = 1 + (toNum n)
minimal :: Natural -> Natural -> Natural
minimal Zero _ = Zero
minimal _ Zero = Zero
minimal (Next a) (Next b) = Next $ minimal a b
f i j | i == j = Zero
| otherwise = Next $ minimal (f l j) (f r j)
where l = i + 2
r = i - 3
This code actually works.
standing on the floor i of n-story building, find minimal number of steps it takes to get to the floor j, where
step n i = [i-3 | i-3 > 0] ++ [i+2 | i+2 <= n]
thus we have a tree. we need to search it in breadth-first fashion until we get a node holding the value j. its depth is the number of steps. we build a queue, carrying the depth levels,
solution n i j = case dropWhile ((/= j).snd) queue
of [] -> Nothing
((k,_):_) -> Just k
where
queue = [(0,i)] ++ gen 1 queue
The function gen d p takes its input p from d notches back from its production point along the output queue:
gen d _ | d <= 0 = []
gen d ((k,i1):t) = let r = step n i1
in map (k+1 ,) r ++ gen (d+length r-1) t
Uses TupleSections. There's no knot tying here, just corecursion, i.e. (optimistic) forward production and frugal exploration. Works fine without knot tying because we only look for the first solution. If we were searching for several of them, then we'd need to eliminate the cycles somehow.
see also: https://en.wikipedia.org/wiki/Corecursion#Discussion
With the cycle detection:
solutionCD1 n i j = case dropWhile ((/= j).snd) queue
of [] -> Nothing
((k,_):_) -> Just k
where
step n i visited = [i2 | let i2=i-3, not $ elem i2 visited, i2 > 0]
++ [i2 | let i2=i+2, not $ elem i2 visited, i2 <=n]
queue = [(0,i)] ++ gen 1 queue [i]
gen d _ _ | d <= 0 = []
gen d ((k,i1):t) visited = let r = step n i1 visited
in map (k+1 ,) r ++
gen (d+length r-1) t (r++visited)
e.g. solution CD1 100 100 7 runs instantly, producing Just 31. The visited list is pretty much a copy of the instantiated prefix of the queue itself. It could be maintained as a Map, to improve time complexity (as it is, sol 10000 10000 7 => Just 3331 takes 1.27 secs on Ideone).
Some explanations seem to be in order.
First, there's nothing 2D about your problem, because the target floor j is fixed.
What you seem to want is memoization, as your latest edit indicates. Memoization is useful for recursive solutions; your function is indeed recursive - analyzing its argument into sub-cases, synthetizing its result from results of calling itself on sub-cases (here, i+2 and i-3) which are closer to the base case (here, i==j).
Because arithmetics is strict, your formula is divergent in the presence of any infinite path in the tree of steps (going from floor to floor). The answer by chaosmasttter, by using lazy arithmetics instead, turns it automagically into a breadth-first search algorithm which is divergent only if there's no finite paths in the tree, exactly like my first solution above (save for the fact that it's not checking for out-of-bounds indices). But it is still recursive, so indeed memoization is called for.
The usual way to approach it first, is to introduce sharing by "going through a list" (inefficient, because of sequential access; for efficient memoization solutions see hackage):
f n i j = g i
where
gs = map g [0..n] -- floors 1,...,n (0 is unused)
g i | i == j = Zero
| r > n = Next (gs !! l) -- assuming there's enough floors in the building
| l < 1 = Next (gs !! r)
| otherwise = Next $ minimal (gs !! l) (gs !! r)
where r = i + 2
l = i - 3
not tested.
My solution is corecursive. It needs no memoization (just needs to be careful with the duplicates), because it is generative, like the dynamic programming is too. It proceeds away from its starting case, i.e. the starting floor. An external accessor chooses the appropriate generated result.
It does tie a knot - it defines queue by using it - queue is on both sides of the equation. I consider it the simpler case of knot tying, because it is just about accessing the previously generated values, in disguise.
The knot tying of the 2nd kind, the more complicated one, is usually about putting some yet-undefined value in some data structure and returning it to be defined by some later portion of the code (like e.g. a back-link pointer in doubly-linked circular list); this is indeed not what my1 code is doing. What it does do is generating a queue, adding at its end and "removing" from its front; in the end it's just a difference list technique of Prolog, the open-ended list with its end pointer maintained and updated, the top-down list building of tail recursion modulo cons - all the same things conceptually. First described (though not named) in 1974, AFAIK.
1 based entirely on the code from Wikipedia.
Others have answered your direct question about dynamic programming. However, for this kind of problem I think the greedy approach works the best. It's implementation is very straightforward.
f i j :: Int -> Int -> Int
f i j = snd $ until (\(i,_) -> i == j)
(\(i,x) -> (i + if i < j then 2 else (-3),x+1))
(i,0)

Algorithm for calculating the sum-of-squares distance of a rolling window from a given line function

Given a line function y = a*x + b (a and b are previously known constants), it is easy to calculate the sum-of-squares distance between the line and a window of samples (1, Y1), (2, Y2), ..., (n, Yn) (where Y1 is the oldest sample and Yn is the newest):
sum((Yx - (a*x + b))^2 for x in 1,...,n)
I need a fast algorithm for calculating this value for a rolling window (of length n) - I cannot rescan all the samples in the window every time a new sample arrives.
Obviously, some state should be saved and updated for every new sample that enters the window and every old sample leaves the window.
Notice that when a sample leaves the window, the indecies of the rest of the samples change as well - every Yx becomes Y(x-1). Therefore when a sample leaves the window, every other sample in the window contribute a different value to the new sum: (Yx - (a*(x-1) + b))^2 instead of (Yx - (a*x + b))^2.
Is there a known algorithm for calculating this? If not, can you think of one? (It is ok to have some mistakes due to first-order linear approximations).
Won't a straightforward approach do the trick?...
By 'straightforward' I mean maintaining a queue of samples. Once a new sample arrives, you would:
pop the oldest sample from the queue
subtract its distance from your sum
append the new sample to the queue
calculate its distance and add it to your sum
As for time, everything here is O(1) if the queue is implemented as linked list or something similar, You would want to store the distance with your samples in queue, too, so you calculate it only once. The memory usage is thus 3 floats per sample - O(n).
If you expand the term (Yx - (a*x + b))^2 the terms break into three parts:
Terms of only a,x and b. These produce some constant when summed over n and can be ignored.
Terms of only Yx and b. These can be handled in the style of a boxcar integrator as #Xion described.
One term of -2*Yx*a*x. The -2*a is a constant so ignore that part. Consider the partial sum S = Y1*1 + Y2*2 + Y3*3 ... Yn*n. Given Y1 and a running sum R = Y1 + Y2 + ... + Yn you can find S - R which eliminates Y1*1 and reduces each of the other terms, leaving you with Y2*1 + Y3*2 + ... + Yn*(n-1). Now update the running sum R as for (2) by subtracting off Y1 and adding Y(n+1). Add the new Yn*n term to S.
Now just add up all those partial terms.

Smoothing of Sequences

I think there should be an algorithm for this out there - probably in a field like bioinformatics (the problem reminds me a bit of sequence alignment) so I hope someone can help me out here.
The problem is as follows: Assume I have classified some data into two different classes X and Y. The result of this may look something like this: ..XXX Y XXX.. Further assume that we have some domain knowledge about those classes and know that it's extremely unlikely to have less than a certain number of instances in a row (ie it's unlikely that there are less than 4 Xs or Ys in a sequence - preferably I could use a different threshold per class but that's not a must). So if we use this domain knowledge it's "obvious" that we'd like to replace the single Y in the middle with a X.
So the algorithm should take a sequence of classified instances and the thresholds for the classes (or 1 threshold for all if it simplifies the problem) and try to find a sequence that fulfills the property (no sequences of classes shorter than the given threshold). Obviously there can be an extremely large number of correct solutions (eg in the above example we could also replace all X with a Y) so I think a reasonable optimization criterium would be to minimize the number of replacements.
I don't need an especially efficient algorithm here since the number of instances will be rather small (say < 4k) and we'll only have two classes. Also since this is obviously only a heuristic I'm fine with some inaccuracies if they vastly simplify the algorithm.
A very similar problem to this can be solved as a classic dynamic programming shortest path problem. We wish to find the sequence which minimises some notion of cost. Penalise each character in the sequence that is different from the corresponding character in the original sequence. Penalise each change of character in the sequence, so penalise each change from X to Y and vice versa.
This is not quite what you want because the penalty for YYYXYYY is the same as the penalty for YXXXXXXY - one penalty for YX and one for XY - however it may be a good approximation because e.g. if the base sequence says YYY....YXY....YY then it will be cheaper to change the central X to a Y than to pay the cost of XY and YX - and you can obviously fiddle with the different cost penalties to get something that looks plausible.
Now think of each position in the sequence as being two points, one above the other, one point representing "X goes here" and one representing "Y goes here". You can link points with lines of cost depending on whether the corresponding character is X or Y in the original sequence, and whether the line joins an X with an X or an X with a Y or so on. Then work out the shortest path from left to right using a dynamic program that works out the best paths terminating in X and Y at position i+1, given knowledge of the cost of the best paths terminating in X and Y at position i.
If you really want to penalise short lived changes more harshly than long lived changes you can probably do so by increasing the number of points in the path-finding representation - you would have points that correspond to "X here and the most recent Y was 3 characters ago". But depending on what you want for a penalty you might end up with an incoveniently large number of points at each character.
You can use dynamic programming as in the following pseudocode sketch (for simplicity, this code assumes the threshold is 3 Xs or Ys in a row, rather than 4):
min_switch(s):
n = len(s)
optx = array(4, n, infinity) // initialize all values to infinity
opty = array(4, n, infinity) // initialize all values to infinity
if s[0] == 'X':
optx[1][0] = 0
opty[1][0] = 1
else:
optx[1][0] = 1
opty[1][0] = 0
for i in {1, n - 1}:
x = s[i]
if x == 'X':
optx[1][i] = opty[3][i - 1]
optx[2][i] = optx[1][i - 1]
optx[3][i] = min(optx[2][i - 1], optx[3][i - 1])
opty[1][i] = 1 + min(optx[1][i - 1], optx[2][i - 1], optx[3][i - 1])
opty[2][i] = 1 + opty[1][i - 1]
opty[3][i] = 1 + min(opty[2][i - 1], opty[3][i - 1])
else:
optx[1][i] = 1 + min(opty[1][i - 1], opty[2][i - 1], opty[3][i - 1])
optx[2][i] = 1 + opty[1][i - 1]
optx[3][i] = 1 + min(opty[2][i - 1], opty[3][i - 1])
opty[1][i] = optx[3][i - 1]
opty[2][i] = opty[1][i - 1]
opty[3][i] = min(opty[2][i - 1], opty[3][i - 1])
return min(optx[3][n - 1], opty[3][n - 1])
The above code essentially computes the lowest cost of creating a smooth sequence up to the ith character storing the optimal value for all relevant numbers of consecutive Xs or Ys in a row (1, 2, or 3 in a row). More formally
opt[i][0][k] stores the smallest
cost to convert the string s[0...k]
into a smooth sequence then ends in
i consecutive Xs. Runs of 3 or more
are accounted for in opt[3][0][k].
opt[0][j][k] stores the smallest
cost to convert the string s[0...k]
into a smooth sequence then ends in
j consecutive Ys. Runs of 3 or more
are accounted for in opt[0][3][k].
It is straightforward to convert this to an algorithm that returns the sequence as well as the optimal cost.
Note that some of the cases in the above code are probably unnecessary, it's just a straightforward recurrence derived from the constraints.

Resources