Optimizing a sudoku solver on Haskell - performance

I have written a sudoku solver in Haskell. It goes through a list and when it finds '0' (an empty cell) it will get the numbers that could fit and try them:
import Data.List (group, (\\), sort)
import Data.Maybe (fromMaybe)
row :: Int -> [Int] -> [Int]
row y grid = foldl (\acc x -> (grid !! x):acc) [] [y*9 .. y*9+8]
where y' = y*9
column :: Int -> [Int] -> [Int]
column x grid = foldl (\acc n -> (grid !! n):acc) [] [x,x+9..80]
box :: Int -> Int -> [Int] -> [Int]
box x y grid = foldl (\acc n -> (grid !! n):acc) [] [x+y*9*3+y' | y' <- [0,9,18], x <- [x'..x'+2]]
where x' = x*3
isValid :: [Int] -> Bool
isValid grid = and [isValidRow, isValidCol, isValidBox]
where isValidRow = isValidDiv row
isValidCol = isValidDiv column
isValidBox = and $ foldl (\acc (x,y) -> isValidList (box x y grid):acc) [] [(x,y) | x <- [0..2], y <- [0..2]]
isValidDiv f = and $ foldl (\acc x -> isValidList (f x grid):acc) [] [0..8]
isValidList = all (\x -> length x <= 1) . tail . group . sort -- tail removes entries that are '0'
isComplete :: [Int] -> Bool
isComplete grid = length (filter (== 0) grid) == 0
solve :: Maybe [Int] -> Maybe [Int]
solve grid' = foldl f Nothing [0..80]
where grid = fromMaybe [] grid'
f acc x
| isValid grid = if isComplete grid then grid' else f' acc x
| otherwise = acc
f' acc x
| (grid !! x) == 0 = case guess x grid of
Nothing -> acc
Just x -> Just x
| otherwise = acc
guess :: Int -> [Int] -> Maybe [Int]
guess x grid
| length valid /= 0 = foldl f Nothing valid
| otherwise = Nothing
where valid = [1..9] \\ (row rowN grid ++ column colN grid ++ box (fst boxN) (snd boxN) grid) -- remove numbers already used in row/collumn/box
rowN = x `div` 9 -- e.g. 0/9=0 75/9=8
colN = x - (rowN * 9) -- e.g. 0-0=0 75-72=3
boxN = (colN `div` 3, rowN `div` 3)
before x = take x grid
after x = drop (x+1) grid
f acc y = case solve $ Just $ before x ++ [y] ++ after x of
Nothing -> acc
Just x -> Just x
For some puzzles this works, for example this one:
sudoku :: [Int]
sudoku = [5,3,0,6,7,8,0,1,2,
Took under a second, however this one:
sudoku :: [Int]
sudoku = [5,3,0,0,7,0,0,1,2,
I have not seen finish. I don't think this is a problem with the method, as it does return correct results.
Profiling showed that most of the time was spent in the "isValid" function. Is there something obviously inefficient/slow about that function?

The implementation is of course improvable, but that's not the problem. The problem is that for the second grid, the simple guess-and-check algorithm needs a lot of backtracking. Even if you speed up each of your functions 1000-fold, there will be grids where it still needs several times the age of the universe to find the (first, if the grid is not unique) solution.
You need a better algorithm to avoid that. A fairly efficient method to avoid such cases is to guess the square with the least number of possibilities first. That doesn't avoid all bad cases, but reduces them much.
One thing that you should also do is replace the length thing == 0 check with null thing. With the relatively short lists occurring here, the effect is limited, but in general it can be dramatic (and in general you should also not use length list <= 1, use null $ drop 1 list instead).

isValidList = all (\x -> length x <= 1) . tail . group . sort -- tail removes entries that are '0'
If the original list does not contain any zeros, tail will remove something else, perhaps a list of two ones. I'd replace tail . group. sort with group . sort . filter (/= 0).
I don't understand why isValidBox and isValidDiv use foldl as map appears to be adequate. Have I missed something / are they doing something terribly clever?


Haskell groupBy depending on accumulator value

I have a list of pairs of views which represents list of content labels and their widths which I want to group in lines (if the next content label doesn't fit in line then put it into another line). So we have: viewList = [(View1, 45), (View2, 223.5), (View3, 14) (View4, 42)].
I want to write a function groupViews :: [a] -> [[a]] to group this list into a list of sublists where each sublist will contain only views with sum of widths less than the maximum specified width (let's say 250).
So for a sorted viewList this function will return : [[(View3, 14), (View4, 42), (View1, 45)],[(View2, 223.5)]]
It looks similar to groupBy. However, groupBy doesn't maintain an accumulator. I tried to use scanl + takeWhile(<250) combination but in this case I was able to receive only first valid sublist. Maybe use iterate + scanl + takeWhile somehow? But this looks very cumbersome and not functional at all. Any help will be much appreciated.
I would start with a recursive definition like this:
groupViews :: Double -> (a -> Double) -> [a] -> [[a]]
groupViews maxWidth width = go (0, [[]])
go (current, acc : accs) (view : views)
| current + width view <= maxWidth
= go (current + width view, (view : acc) : accs) views
| otherwise = go (width view, [view] : acc : accs) views
go (_, accs) []
= reverse $ map reverse accs
Invoked like groupViews 250 snd (sortOn snd viewList). The first thing I notice is that it can be represented as a left fold:
groupViews' maxWidth width
= reverse . map reverse . snd . foldl' go (0, [[]])
go (current, acc : accs) view
| current + width view <= maxWidth
= (current + width view, (view : acc) : accs)
| otherwise
= (width view, [view] : acc : accs)
I think this is fine, though you could factor it further if you like, into one scan to accumulate the widths modulo the max width, and another pass to group the elements into ascending runs. For example, here’s a version that works on integer widths:
groupViews'' maxWidth width views
= map fst
$ groupBy ((<) `on` snd)
$ zip views
$ drop 1
$ scanl (\ current view -> (current + width view) `mod` maxWidth) 0 views
And of course you can include the sort in these definitions instead of passing the sorted list from outside.
I don't know a clever way to do this just by combining functions from the standard library, but I do think you can do better than just implementing it from scratch.
This problem fits into a class of problems that I've seen before: "batch up items from this list somehow, and combine its items into batches according to some combination rule and some rule for deciding when a batch is too big". Years ago, when I was writing Clojure, I built a function that abstracted out this idea of batched combinations, just asking you to specify the rules for batching, and was able to use it in a surprising number of places.
Here's how I think it might be reimagined in Haskell:
glue :: Monoid a => (a -> Bool) -> [a] -> [a]
glue tooBig = go mempty
where go current [] = [current]
go current (x:xs) | tooBig x' = current : go x xs
| otherwise = go x' xs
where x' = current `mappend` x
If you had such a glue function already, you could build a simple data type with the appropriate Monoid instance (a list of objects and their cumulative sum), and then let glue do the heavy lifting:
import Data.Monoid (Sum(..))
data ViewGroup contents size = ViewGroup {totalSize :: size,
elements :: [(contents, size)]}
instance Monoid b => Monoid (ViewGroup a b) where
mempty = ViewGroup mempty []
mappend (ViewGroup lSize lElts) (ViewGroup rSize rElts) =
ViewGroup (lSize `mappend` rSize)
(lElts ++ rElts)
viewGroups = let views = [("a", 14), ("b", 42), ("c", 45), ("d", 223.5)]
in glue ((> 250) . totalSize) [ViewGroup (Sum width) [(x, Sum width)]
| (x, width) <- views]
main = print (viewGroups :: [ViewGroup String (Sum Double)])
[ViewGroup {totalSize = Sum {getSum = 101.0},
elements = [("a",Sum {getSum = 14.0}),
("b",Sum {getSum = 42.0}),
("c",Sum {getSum = 45.0})]},
ViewGroup {totalSize = Sum {getSum = 223.5},
elements = [("d",Sum {getSum = 223.5})]}]
On the one hand this looks like quite a bit of work for a simple function, but on the other it's rather nice to have a type that describes the cumulative summing you're doing, and Monoid instances are nice to have anyway...and after defining the type and the Monoid instance there's almost no work left to do in the calling of glue itself.
Well, I don't know, maybe it's still too much work, especially if you don't believe you can reuse that type. But I do think it's useful to recognize that this is a specific case of a more general problem, and try to solve the more general problem as well.
Given that groupBy and span themselves are defined by manual recursive functions, our modified functions will use the same mechanism.
Let us first define a general function groupAcc which takes an initial value for the accumulator, and then a function which takes an element in the list, the current accumulator state and potentially produces a new accumulated value (Nothing means the element is not accepted):
{-# LANGUAGE LambdaCase #-}
import Data.List (sortOn)
import Control.Arrow (first, second)
spanAcc :: z -> (a -> z -> Maybe z) -> [a] -> ((z, [a]), [a])
spanAcc z0 p = \case
xs#[] -> ((z0, xs), xs)
xs#(x:xs') -> case p x z0 of
Nothing -> ((z0, []), xs)
Just z1 -> first (\(z2, xt) -> (if null xt then z1 else z2, x : xt)) $
spanAcc z1 p xs'
groupAcc :: z -> (a -> z -> Maybe z) -> [a] -> [(z, [a])]
groupAcc z p = \case
[] -> [] ;
xs -> uncurry (:) $ second (groupAcc z p) $ spanAcc z p xs
For our specific problem, we define:
threshold :: (Num a, Ord a) => a -> a -> a -> Maybe a
threshold max a z0 = let z1 = a + z0 in if z1 < max then Just z1 else Nothing
groupViews :: (Ord z, Num z) => [(lab, z)] -> [[(lab, z)]]
groupViews = fmap snd . groupAcc 0 (threshold 250 . snd)
Which finally gives us:
groupFinal :: (Num a, Ord a) => [(lab, a)] -> [[(lab, a)]]
groupFinal = groupViews . sortOn snd
And ghci gives us:
> groupFinal [("a", 45), ("b", 223.5), ("c", 14), ("d", 42)]
If we want to, we can simplify groupAcc by assuming that z is a Monoid wherefore mempty may be used, such that:
groupAcc2 :: Monoid z => (a -> z -> Maybe z) -> [a] -> [(z, [a])]
groupAcc2 p = \case
[] -> [] ;
xs -> let z = mempty in
uncurry (:) $ second (groupAcc z p) $ spanAcc z p xs

a haskell function to test if an integer appears after another integer

I'm writing a function called after which takes a list of integers and two integers as parameters. after list num1 num2 should return True if num1 occurs in the list and num2 occurs in list afternum1. (Not necessarily immediately after).
after [] _ _=False
after [x:xs] b c
|x==b && c `elem` xs =True
|x/=b && b `elem` xs && b `elem` xs=True
This is what I have so far,my biggest problem is that I don't know how to force num2 to be after num1.
There's a few different ways to approach this one; while it's tempting to go straight for recursion on this, it's nice to
avoid using recursion explicitly if there's another option.
Here's a simple version using some list utilities. Note that it's a Haskell idiom that the object we're operating over is usually the last argument. In this case switching the arguments lets us write it as a pipeline with it's third argument (the list) passed implicitly:
after :: Int -> Int -> [Int] -> Bool
after a b = elem b . dropWhile (/= a)
Hopefully this is pretty easy to understand; we drop elements of the list until we hit an a, assuming we find one we check if there's a b in the remaining list. If there was no a, this list is [] and obviously there's no b there, so it returns False as expected.
You haven't specified what happens if 'a' and 'b' are equal, so I'll leave it up to you to adapt it for that case. HINT: add a tail somewhere ;)
Here are a couple of other approaches if you're interested:
This is pretty easily handled using a fold;
We have three states to model. Either we're looking for the first elem, or
we're looking for the second elem, or we've found them (in the right order).
data State =
FindA | FindB | Found
deriving Eq
Then we can 'fold' (aka reduce) the list down to the result of whether it matches or not.
after :: Int -> Int -> [Int] -> Bool
after a b xs = foldl go FindA xs == Found
go FindA x = if x == a then FindB else FindA
go FindB x = if x == b then Found else FindB
go Found _ = Found
You can also do it recursively if you like:
after :: Int -> Int -> [Int] -> Bool
after _ _ [] = False
after a b (x:xs)
| x == a = b `elem` xs
| otherwise = after a b xs
You can split it into two parts: the first one will find the first occurrence of num1. After that, you just need to drop all elements before it and just check that num2 is in the remaining part of the list.
There's a standard function elemIndex for the first part. The second one is just elem.
import Data.List (elemIndex)
after xs x y =
case x `elemIndex` xs of
Just i -> y `elem` (drop (i + 1) xs)
Nothing -> False
If you'd like to implement it without elem or elemIndex, you could include a subroutine. Something like:
after xs b c = go xs False
where go (x:xs) bFound
| x == b && not (null xs) = go xs True
| bFound && x == c = True
| null xs = False
| otherwise = go xs bFound

Project Euler No. 14 Haskell

I'm trying to resolve problem 14 of Project Euler (http://projecteuler.net/problem=14) and I hit a dead end using Haskell.
Now, I know that the numbers may be small enough and I could do a brute force, but that isn't the purpose of my exercise.
I am trying to memorize the intermediate results in a Map of type Map Integer (Bool, Integer) with the meaning of:
- the first Integer (the key) holds the number
- the Tuple (Bool, Interger) holds either (True, Length) or (False, Number)
where Length = length of the chain
Number = the number before him
for 13: the chain is 13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1
My map should contain :
13 - (True, 10)
40 - (False, 13)
20 - (False, 40)
10 - (False, 20)
5 - (False, 10)
16 - (False, 5)
8 - (False, 16)
4 - (False, 8)
2 - (False, 4)
1 - (False, 2)
Now when I search for another number like 40 i know that the chain has (10 - 1) length and so on.
I want now, if I search for 10, not only to tell me that length of 10 is (10 - 3) length and update the map, but also I want to update 20, 40 in case they are still (False, _)
My code:
import Data.Map as Map
solve :: [Integer] -> Map Integer (Bool, Integer)
solve xs = solve' xs Map.empty
solve' :: [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
solve' [] table = table
solve' (x:xs) table =
case Map.lookup x table of
Nothing -> countF x 1 (x:xs) table
Just (b, _) ->
case b of
True -> solve' xs table
False -> {-WRONG-} solve' xs table
f :: Integer -> Integer
f x
| x `mod` 2 == 0 = x `quot` 2
| otherwise = 3 * x + 1
countF :: Integer -> Integer -> [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
countF n cnt (x:xs) table
| n == 1 = solve' xs (Map.insert x (True, cnt) table)
| otherwise = countF (f n) (cnt + 1) (x:xs) $ checkMap (f n) n table
checkMap :: Integer -> Integer -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
checkMap n rez table =
case Map.lookup n table of
Nothing -> Map.insert n (False, rez) table
Just _ -> table
At the {-WRONG-} part we should update all the values like in the following example:
--We are looking for 10:
10 - (False, 20)
V {-finally-} update 10 => (True, 10 - 1 - 1 - 1)
20 - (False, 40) ^
| |
V update 20 => 20 - (True, 10 - 1 - 1)
40 - (False, 13) ^
| |
V update 40 => 40 - (True, 10 - 1)
13 - (True, 10) ^
| |
The problem is that I don't know if its possible to do 2 things in a function like updating a number and continue the recurence. In a C like language I may do something like (pseudocode):
void f(int n, tuple(b,nr), int &length, table)
if(b == False) f (nr, (table lookup nr), 0, table);
// the bool is true so we got a length
length = nr;
// Since this is a recurence it would work as a stack, producing the right output
table update(n, --cnt);
The last instruction would work since we are sending cnt by reference. Also we always know that it will finish at some point and cnt should not be < 1.
The easiest optimization (as you have identified) is memoization. You have attempted create a memoization system yourself, however have come across issues on how to store the memoized values. There are solutions to doing this in a maintainable way, such as using a State monad or a STArray. However, there is a much simpler solution to your problem - use haskell's existing memoization. Haskell by default remembers constant values, so if you create a value that stores the collatz values, it will be automatically memoized!
A simple example of this is the following fibonacci definition:
fib :: Int -> Integer
fib n = fibValues !! n where
fibValues = 1 : 1 : zipWith (+) fibValues (tail fibValues)
The fibValues is a [Integer], and as it is just a constant value, it is memoized. However, that doesn't mean it is all memoized at once, since as it is an infinte list, this would never finish. Instead, the values are only calculated when needed, as haskell is lazy.
So if you do something similar with your problem, you will get memoization without a lot of the work. However, using a list like above won't work well in your solution. This is because the collatz algorithm uses many different values to get the result for a given number, so the container used will require random access to be efficient. The obvious choice is an array.
collatzMemoized :: Array Integer Int
Next, we need to fill up the array with the correct values. I'll write this function pretending a collatz function exists that calculates the collatz value for any n. Also, note that arrays are fixed size, so a value needs to be used to determine the maximum number to memoize. I'll use a million, but any value can be used (it is a memory/speed tradeoff).
collatzMemoized = listArray (1, maxNumberToMemoize) $ map collatz [1..maxNumberToMemoize] where
maxNumberToMemroize = 1000000
That is pretty straightforward, the listArray is given bounds, and the a list of all the collatz values in that range is given to it. Remember that this won't calculate all the collatz values straight away, as the values are lazy.
Now, the collatz function can be written. The most important part is to only check the collatzMemoized array if the number being checked is within its bounds:
collatz :: Integer -> Int
collatz 1 = 1
collatz n
| inRange (bounds collatzMemoized) nextValue = 1 + collatzMemoized ! nextValue
| otherwise = 1 + collatz nextValue
nextValue = case n of
1 -> 1
n | even n -> n `div` 2
| otherwise -> 3 * n + 1
In ghci, you can now see the effectiveness of the memoization. Try collatz 200000. It will take about 2 seconds to finish. However, if you run it again, it will complete instantly.
Finally, the solution can be found:
maxCollatzUpTo :: Integer -> (Integer, Int)
maxCollatzUpTo n = maximumBy (compare `on` snd) $ zip [1..n] (map collatz [1..n]) where
and then printed:
main = print $ maxCollatzUpTo 1000000
If you run main, the result will be printed in about 10 seconds.
Now, a small problem with this approach is it uses a lot of stack space. It will work fine in ghci (which seems to use be more flexible with regards to stack space). However, if you compile it and try to run the executable, it will crash (with a stack space overflow). So to run the program, you have to specify more when you compile it. This can be done by adding -with-rtsopts='K64m' to the compile options. This increases the stack to 64mb.
Now the program can be compiled and ran:
> ghc -O3 --make -with-rtsopts='-K6m' problem.hs
Running ./problem will give the result in less than a second.
You are going about memoization the hard way, trying to write an imperative program in Haskell. Borrowing from David Eisenstat's solution, we'll solve it as j_random_hacker suggested:
collatzLength :: Integer -> Integer
collatzLength n
| n == 1 = 1
| even n = 1 + collatzLength (n `div` 2)
| otherwise = 1 + collatzLength (3*n + 1)
The dynamic programming solution for this is to replace the recursion with looking things up in a table. Let's make a function where we can replace the recursive call:
collatzLengthDef :: (Integer -> Integer) -> Integer -> Integer
collatzLengthDef r n
| n == 1 = 1
| even n = 1 + r (n `div` 2)
| otherwise = 1 + r (3*n + 1)
Now we could define the recursive algorithm as
collatzLength :: Integer -> Integer
collatzLength = collatzLengthDef collatzLength
Now we could also make a tabled version of this (it takes a number for the table size, and returns a collatzLength function that is calculated using a table of that size):
-- A utility function that makes memoizing things easier
buildTable :: (Ix i) => (i, i) -> (i -> e) -> Array i e
buildTable bounds f = array $ map (\x -> (x, f x)) $ range bounds
collatzLengthTabled :: Integer -> Integer -> Integer
collatzLengthTabled n = collatzLengthTableLookup
bounds = (1, n)
table = buildTable bounds (collatzLengthDef collatzLengthTableLookup)
collatzLengthTableLookup =
\x -> Case inRange bounds x of
True -> table ! x
_ -> (collatzLengthDef collatzLengthTableLookup) x
This works by defining the collatzLength to be a table lookup, with the table being the definition of the function, but with recursive calls replaced by table lookup. The table lookup function checks to see if the argument to the function is in the range that is tabled, and falls back on the definition of the function. We can even make this work for tabling any function like this:
tableRange :: (Ix a) => (a, a) -> ((a -> b) -> a -> b) -> a -> b
tableRange bounds definition = tableLookup
table = buildTable bounds (definition tableLookup)
tableLookup =
\x -> Case inRange bounds x of
True -> table ! x
_ -> (definition tableLookup) x
collatzLengthTabled n = tableRange (1, n) collatzLengthDef
You just need to make sure that you
let memoized = collatzLengthTabled 10000000
... memoized ...
So that only one table is built in memory.
I remember finding memoisation of dynamic programming algorithms very counterintuitive in Haskell, and it's been a while since I've done it, but hopefully the following trick works for you.
But first, I don't quite understand your current DP scheme, though I suspect it may be quite inefficient as it seems like it will need to update many entries for each answer. (a) I don't know how to do this in Haskell, and (b) you don't need to do this to solve the problem efficiently ;-)
I suggest the following approach instead: first build an ordinary recursive function that computes the right answer for an input number. (Hint: it will have a signature like collatzLength :: Int -> Int.) When you have this function working, just replace its definition with the definition of an array whose elements are defined lazily with the array function using an association list, and replace all recursive calls to the function to array lookups (e.g. collatzLength 42 would become collatzLength ! 42). This will automagically populate the array in the necessary order! So your "top-level" collatzLength object will now actually be an array, rather than a function.
As I suggested above, I would use an array instead of a map datatype to hold the DP table, since you will need to store values for all integer indices from 1 up to 1,000,000.
I don't have a Haskell compiler handy, so I apologize for any broken code.
Without memoization, there's a function
collatzLength :: Integer -> Integer
collatzLength n
| n == 1 = 1
| even n = 1 + collatzLength (n `div` 2)
| otherwise = 1 + collatzLength (3*n + 1)
With memoization, the type signature is
memoCL :: Map Integer Integer -> Integer -> (Map Integer Integer, Integer)
since memoCL receives a table as input and gives the updated table as output. What memoCL needs to do is intercept the return of the recursive call with a let form and insert the new result.
-- table must have an initial entry for 1
memoCL table n = case Map.lookup n table of
Just m -> (table, m)
Nothing -> let (table', m) = memoCL table (collatzStep n) in (Map.insert n (1 + m) table', 1 + m)
collatzStep :: Integer -> Integer
collatzStep n = if even n then n `div` 2 else 3*n + 1
At some point you'll get sick of the above idiom. Then it's time for monads.
I eventually modify the {-WRONG-} part to do what it should with a call to mark x (b, n) [] xs table where
mark :: Integer -> (Bool, Integer) -> [Integer] -> [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
mark crtElem (b, n) list xs table
| b == False = mark n (findElem n table) (crtElem:list) xs table
| otherwise = continueWith n list xs table
continueWith :: Integer -> [Integer] -> [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
continueWith _ [] xs table = solve' xs table
continueWith cnt (y:ys) xs table = continueWith (cnt - 1) ys xs (Map.insert y (True, cnt - 1) table)
findElem :: Integer -> Map Integer (Bool, Integer) -> (Bool, Integer)
findElem n table =
case Map.lookup n table of
Nothing -> (False, 0)
Just (b, nr) -> (b, nr)
But it seams that there are better (and far less verbose) answers than this 1
Maybe you might find interesting how I solved the problem. Its is pretty functional though it might be not the most efficient thing on earth :)
You can find the code here: https://github.com/fmancinelli/project-euler/blob/master/haskell/project-euler/Problem014.hs
P.S.: Disclaimer: I was doing Project Euler exercises in order to learn Haskell, so the quality of the solution could be debatable.
Since we are studying recursion schemes, here's one for you.
Let's consider functor N(A,B,X)=A+B*X, which is a stream of Bs with the last element being A.
{-# LANGUAGE DeriveFunctor
, TypeFamilies
, TupleSections #-}
import Data.Functor.Foldable
import qualified Data.Map as M
import Data.List
import Data.Function
import Data.Int
data N a b x = Z a | S b x deriving (Functor)
This stream is handy for several kinds of iterations. For one, we can use it to represent a chain of Ints in a Collatz sequence:
type instance Base Int64 = N Int Int64
instance Foldable Int64 where
project 1 = Z 1
project x | odd x = S x $ 3*x+1
project x = S x $ x `div` 2
This is just a algebra, not a initial one, because the transformation is not a isomorphism (same chain of Ints is part of a chain for 2*x and (x-1)/3), but this is sufficient to represent the fixpoint Base Int64 Int64.
With this definition, cata is going to feed the chain to the algebra given to it, and you can use it to construct a memo Map of integers to the chain length. Finally, anamorphism can use it to generate a stream of solutions to the problem of different sizes:
problems = ana (uncurry $ cata . phi) (M.empty, 1) where
phi :: M.Map Int64 Int ->
Base Int64 (Prim [(Int64, Int)] (M.Map Int64 Int, Int64)) ->
Prim [(Int64, Int)] (M.Map Int64 Int, Int64)
phi m (Z v) = found m 1 v
phi m (S x ~(Cons (_, v') (m', _))) = maybe (notFound m' x v') (found m x) $
M.lookup x m
The ~ before (Cons ...) means lazy pattern matching. We don't touch the pattern until the values are needed. If not for lazy pattern matching, it would always construct the whole chain, and using the map would be useless. With lazy pattern matching we only construct the values v' and m' if the chain length for x was not in the map.
Helper functions construct the stream of (Int, chain length) pairs:
found m x v = Cons (x, v) (m, x+1)
notFound m x v = Cons (x, 1+v) (M.insert x (1+v) m, x+1)
Now just take the first 999999 problems, and figure out the one that has the longest chain:
main = print $ maximumBy (compare `on` snd) $ take 999999 problems
This works slower than array-based solution, because Map lookup is logarithmic of map size, but this solution is not fixed size. Still, it finishes in about 5 seconds.

How can I remove the first apperance of a number in a list? Haskell

I need to make a function that takes a list and an element and returns a list in which the first occurrence of the element is removed: something like
removeFst [1,5,2,3,5,3,4,5,6] 5
What I tried is:
main :: IO()
main = do
putStr ( show $ removeFst [1,5,2,3,5,3,4,5,6] 5)
removeFst :: [Int] -> Int -> [Int]
removeFst [] m = []
removeFst [x] m
| x == m = []
| otherwise = [x]
removeFst (x:xs) m
| x == m = xs
| otherwise = removeFst xs m
But this doesn't work... it returns the list without the first elements. I think I should make the recursive call to make the list something like:
removeFst (x:xs) m
| x == m = xs
| otherwise = removeFst (-- return the whole list till element x) m
You are very close, what you miss is prepending the elements before the first found m to the result list,
removeFst :: [Int] -> Int -> [Int]
removeFst [] m = []
removeFst (x:xs) m
| x == m = xs
| otherwise = x : removeFst xs m
-- ^^^ keep x /= m
Note that the special case for one-element lists is superfluous.
Also note that removeFst = flip delete with delete from Data.List.
It should be mentioned that your function is equivalent to Data.List.delete.
Here another version:
import Data.List
removeFst xs x = front ++ drop 1 back where
(front, back) = break (==x) xs

Removing elements in a functional style

I have been struggling with something that looks like a simple algorithm, but can't find a clean way to express it in a functional style so far. Here is an outline of the problem: suppose I have 2 arrays X and Y,
X = [| 1; 2; 2; 3; 3 |]
Y = [| 5; 4; 4; 3; 2; 2 |]
What I want is to retrieve the elements that match, and the unmatched elements, like:
matched = [| 2; 2; 3 |]
unmatched = [| 1; 3 |], [| 4; 4; 5 |]
In pseudo-code, this is how I would think of approaching the problem:
let rec match matches x y =
let m = find first match from x in y
if no match, (matches, x, y)
let x' = remove m from x
let y' = remove m from y
let matches' = add m to matches
match matches' x' y'
The problem I run into is the "remove m from x" part - I can't find a clean way to do this (I have working code, but it's ugly as hell). Is there a nice, idiomatic functional way to approach that problem, either the removal part, or a different way to write the algorithm itself?
This could be solved easily using the right data structures, but in case you wanted to do it manually, here's how I would do it in Haskell. I don't know F# well enough to translate this, but I hope it is similar enough. So, here goes, in (semi-)literate Haskell.
overlap xs ys =
I start by sorting the two sequences to get away from the problem of having to know about previous values.
go (sort xs) (sort ys)
The two base cases for the recursion are easy enough to handle -- if either list is empty, the result includes the other list in the list of elements that are not overlapping.
go xs [] = ([], (xs, []))
go [] ys = ([], ([], ys))
I then inspect the first elements in each list. If they match, I can be sure that the lists overlap on that element, so I add that to the included elements, and I let the excluded elements be. I continue the search for the rest of the list by recursing on the tails of the lists.
go (x:xs) (y:ys)
| x == y = let ( included, excluded) = go xs ys
in (x:included, excluded)
Then comes the interesting part! What I essentially want to know is if the first element of one of the lists does not exist in the second list – in that case I should add it to the excluded lists and then continue the search.
| x < y = let (included, ( xex, yex)) = go xs (y:ys)
in (included, (x:xex, yex))
| y < x = let (included, ( xex, yex)) = go (x:xs) ys
in (included, ( xex, y:yex))
And this is actually it. It seems to work for at least the example you gave.
> let (matched, unmatched) = overlap x y
> matched
> unmatched
It seems that you're describing multiset (bag) and its operations.
If you use the appropriate data structures, operations are very easy to implement:
// Assume that X, Y are initialized bags
let matches = X.IntersectWith(Y)
let x = X.Difference(Y)
let y = Y.Difference(X)
There's no built-in Bag collection in .NET framework. You could use Power Collection library including Bag class where the above function signature is taken.
You can represent a bag by a weakly ascending list. Here is an improved version of #kqr's answer in F# syntax:
let overlap xs ys =
let rec loop (matches, ins, outs) xs ys =
match xs, ys with
// found a match
| x::xs', y::ys' when x = y -> loop (x::matches, ins, outs) xs' ys'
// `x` is smaller than every element in `ys`, put `x` into `ins`
| x::xs', y::ys' when x < y -> loop (matches, x::ins, outs) xs' ys
// `y` is smaller than every element in `xs`, put `y` into `outs`
| x::xs', y::ys' -> loop (matches, ins, y::outs) xs ys'
// copy remaining elements in `xs` to `ins`
| x::xs', [] -> loop (matches, x::ins, outs) xs' ys
// copy remaining elements in `ys` to `outs`
| [], y::ys' -> loop (matches, ins, y::outs) xs ys'
| [], [] -> (List.rev matches, List.rev ins, List.rev outs)
loop ([], [], []) (List.sort xs) (List.sort ys)
After two calls to List.sort, which are probably O(nlogn), finding matches is linear to the sum of the lengths of two lists.
If you need a quick-and-dirty bag module, I would suggest a module signature like this:
type Bag<'T> = Bag of 'T list
module Bag =
val count : 'T -> Bag<'T> -> int
val insert : 'T -> Bag<'T> -> Bag<'T>
val intersect : Bag<'T> -> Bag<'T> -> Bag<'T>
val union : Bag<'T> -> Bag<'T> -> Bag<'T>
val difference : Bag<'T> -> Bag<'T> -> Bag<'T>
