Haskell to find the distance between the two closest points - algorithm

Given a list of points in a two dimensional space, you want to perform a function in
Haskell to find the distance between the two closest points.
example:
Input: project [(1,5), (3,4), (2,8), (-1,2), (-8.6), (7.0), (1.5), (5.5), (4.8), (7.4)]
Output: 2.0
Assume that the distance between the two farthest points in the list is at most 10000.
HereĀ“s my code:
import Data.List
import System.Random
sort_ :: Ord a => [a] -> [a]
sort_ [] = []
sort_ [x] = [x]
sort_ xs = merge (sort_ left) (sort_ right)
where
(left, right) = splitAt (length xs `div` 2) xs
merge [] xs = xs
merge xs [] = xs
merge (x:xs) (y:ys)=
if x <= y then
x : merge xs (y:ys)
else y : merge (x:xs) ys
project :: [(Float,Float)] -> Float
project [] = 0
project (x:xs)=
if null (xs) then
error "The list have only 1 point"
else head(sort_(dstList(x:xs)))
distance :: (Float,Float)->(Float,Float) -> Float
distance (x1,y1) (x2,y2) = sqrt((x1 - x2)^2 + (y1 - y2)^2)
dstList :: [(Float,Float)] -> [Float]
dstList (x:xs)=
if length xs == 1 then
(dstBetween x xs):[]
else (dstBetween x xs):(dstList xs)
dstBetween :: (Float,Float) -> [(Float,Float)] -> Float
dstBetween pnt (x:xs)=
if null (xs) then
distance pnt x
else minimum ((distance pnt ):((dstBetween pnt xs)):[])
{-
Calling generator to create a file created at random points
-}
generator = do
putStrLn "Enter File Name"
file <- getLine
g <- newStdGen
let pts = take 1000 . unfoldr (Just . (\([a,b],c)->((a,b),c)) . splitAt 2)
$ randomRs(-1,1) g :: [(Float,Float)]
writeFile file . show $ pts
{-
Call the main to read a file and pass it to the function of project
The function of the project should keep the name 'project' as described
in the statement
-}
main= do
putStrLn "Enter filename to read"
name <- getLine
file <- readFile name
putStrLn . show . project $ readA file
readA::String->[(Float,Float)]
readA = read
I can perform a run of the program as in the example or using the generator as follows:
in haskell interpreter must type "generator", the program will ask for a file name containing a thousand points here. and after the file is generated in the Haskell interpreter must write main, and request a file name, which is the name of the file you create with "generator".
The problem is that for 1000 points randomly generated my program takes a long time, about 3 minutes on a computer with dual core processor. What am I doing wrong? How I can optimize my code to work faster?

You are using a quadratic algorithm:
project [] = error "Empty list of points"
project [_] = error "Single point is given"
project ps = go 10000 ps
where
go a [_] = a
go a (p:ps) = let a2 = min a $ minimum [distance p q | q<-ps]
in a2 `seq` go a2 ps
You should use a better algorithm. Search computational-geometry tag on SO for a better algorithm.
See also http://en.wikipedia.org/wiki/Closest_pair_of_points_problem .
#maxtaldykin proposes a nice, simple and effective change to the algorithm, which should make a real difference for random data -- pre-sort the points by X coordinate, and never try points more than d units away from the current point, in X coordinate (where d is the currently known minimal distance):
import Data.Ord (comparing)
import Data.List (sortBy)
project2 ps#(_:_:_) = go 10000 p1 t
where
(p1:t) = sortBy (comparing fst) ps
go d _ [] = d
go d p1#(x1,_) t = g2 d t
where
g2 d [] = go d (head t) (tail t)
g2 d (p2#(x2,_):r)
| x2-x1 >= d = go d (head t) (tail t)
| d2 >= d = g2 d r
| otherwise = g2 d2 r -- change it "mid-flight"
where
d2 = distance p1 p2
On random data, g2 will work in O(1) time so that go will take O(n) and the whole thing will be bounded by a sort, ~ n log n.
Empirical orders of growth show ~ n^2.1 for the first code (on 1k/2k range) and ~n^1.1 for the second, on 10k/20k range, testing it quick'n'dirty compiled-loaded into GHCi (with second code running 50 times faster than first for 2,000 points, and 80 times faster for 3,000 points).

It's possible to slightly modify your bruteforce search to get better performance on random data.
Main idea is to sort points by x coordinate and, while comparing distances in loop, consider only points that have horizontal distance not grater than current minimum distance.
This could be order of magnitude faster but in the worst case it is still O(n^2).
Actually, on 2000 points it is 50 times faster on my machine.
project points = loop1 10000 byX
where
-- sort points by x coordinate
-- (you need import Data.Ord to use `comparing`)
byX = sortBy (comparing fst) points
-- loop through all points from left to right
-- threading `d` through iterations as a minimum distance so far
loop1 d = foldl' loop2 d . tails
-- `tail` drops leftmost points one by one so `x` is moving from left to right
-- and `xs` contains all points to the right of `x`
loop2 d [] = d
loop2 d (x:xs) = let
-- we take only those points of `xs` whose horizontal distance
-- is not greater than current minimum distance
xs' = takeWhile ((<=d) . distanceX x) xs
distanceX (a,_) (b,_) = b - a
-- then just get minimum distance from `x` to those `xs'`
in minimum $ d : map (distance x) xs'
Btw, please don't use so many parentheses. Haskell does not require to enclose function arguments.

Related

Generate all unique directed graphs with 2 inputs to each node

I'm trying to generate all unique digraphs that fit a spec:
each node must have exactly 2 inputs
and are allowed arbitrarily many outputs to other nodes in the graph
My current solution is slow. Eg for 6 nodes, the algo has taken 1.5 days to get where I think it's complete, but it'll probably be checking for a few more days still.
My algorithm for a graph with n nodes:
generate all n-length strings of 0, where one symbol is a 1, eg, for n=3, [[0,0,1], [0,1,0], [1,0,0]]. These can be thought of as rows from an identity matrix.
generate all possible n * n matrixes where each row is all possible combinations of step 1. + step 1.
This is the connectivity matrix where each cell represents a connection from column-index to row-index
So, for n=3, these are possible:
[0,1,0] + [1,0,0] = [1,1,0]
[1,0,0] + [1,0,0] = [2,0,0]
These represent the inputs to a node, and by adding step 1 to itself, the result will always represent 2 inputs.
For ex:
A B C
A' [[0,1,1],
B' [0,2,0],
C' [1,1,0]]
So B and C connect to A once each: B -> A', C -> A',
And B connects to itself twice: B => B'
I only want unique ones, so for each connectivity matrix generated, I can only keep it if it is not isomorphic to an already-seen graph.
This step is expensive. I need to convert the graph to a "canonical form" by running through each permutation of isomorphic graphs, sorting them, and considering the first one as the "canonical form".
If anyone dives into testing any of this out, here are the count of unique graphs for n nodes:
2 - 6
3 - 44
4 - 475
5 - 6874
6 - 109,934 (I think, it's not done running yet but I haven't found a new graph in >24 hrs.)
7 - I really wanna know!
Possible optimizations:
since I get to generate the graphs to test, is there a way of ruling them out, without testing, as being isomorphic to already-seen ones?
is there a faster graph-isomorphism algorithm? I think this one is related to "Nauty", and there are others I've read of in papers, but I haven't had the expertise (or bandwidth) to implement them yet.
Here's a demonstrable connectivity matrix that can be plotted at graphonline.ru for fun, showing self connections, and 2 connections to t he same node:
1, 0, 0, 0, 0, 1,
1, 0, 0, 0, 1, 0,
0, 1, 0, 1, 0, 0,
0, 1, 2, 0, 0, 0,
0, 0, 0, 1, 0, 1,
0, 0, 0, 0, 1, 0,
here's the code in haskell if you want to play with it, but I'm more concerned about getting the algorithm right (eg pruning down the search space), than the implementation:
-- | generate all permutations of length n given symbols from xs
npermutations :: [a] -> Int -> [[a]]
npermutations xs size = mapM (const xs) [1..size]
identity :: Int -> [[Int]]
identity size = scanl
(\xs _ -> take size $ 0 : xs) -- keep shifting right
(1 : (take (size - 1) (repeat 0))) -- initial, [1,0,0,...]
[1 .. size-1] -- correct size
-- | return all possible pairings of [Column]
columnPairs :: [[a]] -> [([a], [a])]
columnPairs xs = (map (\x y -> (x,y)) xs)
<*> xs
-- | remove duplicates
rmdups :: Ord a => [a] -> [a]
rmdups = rmdups' Set.empty where
rmdups' _ [] = []
rmdups' a (b : c) = if Set.member b a
then rmdups' a c
else b : rmdups' (Set.insert b a) c
-- | all possible patterns for inputting 2 things into one node.
-- eg [0,1,1] means cells B, and C project into some node
-- [0,2,0] means cell B projects twice into one node
binaryInputs :: Int -> [[Int]]
binaryInputs size = rmdups $ map -- rmdups because [1,0]+[0,1] is same as flipped
(\(x,y) -> zipWith (+) x y)
(columnPairs $ identity size)
transposeAdjMat :: [[Int]] -> [[Int]]
transposeAdjMat ([]:_) = []
transposeAdjMat m = (map head m) : transposeAdjMat (map tail m)
-- | AdjMap [(name, inbounds)]
data AdjMap a = AdjMap [(a, [a])] deriving (Show, Eq)
addAdjColToMap :: Int -- index
-> [Int] -- inbound
-> AdjMap Int
-> AdjMap Int
addAdjColToMap ix col (AdjMap xs) =
let conns = foldl (\c (cnt, i) -> case cnt of
1 -> i:c
2 -> i:i:c
_ -> c
)
[]
(zip col [0..]) in
AdjMap ((ix, conns) : xs)
adjMatToMap :: [[Int]] -> AdjMap Int
adjMatToMap cols = foldl
(\adjMap#(AdjMap nodes) col -> addAdjColToMap (length nodes) col adjMap)
(AdjMap [])
cols
-- | a graph's canonical form : http://mfukar.github.io/2015/09/30/haskellxiii.html
-- very expensive algo, of course
canon :: (Ord a, Enum a, Show a) => AdjMap a -> String
canon (AdjMap g) = minimum $ map f $ Data.List.permutations [1..(length g)]
where
-- Graph vertices:
vs = map fst g
-- Find, via brute force on all possible orderings (permutations) of vs,
-- a mapping of vs to [1..(length g)] which is minimal.
-- For example, map [1, 5, 6, 7] to [1, 2, 3, 4].
-- Minimal is defined lexicographically, since `f` returns strings:
f p = let n = zip vs p
in (show [(snd x, sort id $ map (\x -> snd $ head $ snd $ break ((==) x . fst) n)
$ snd $ take_edge g x)
| x <- sort snd n])
-- Sort elements of N in ascending order of (map f N):
sort f n = foldr (\x xs -> let (lt, gt) = break ((<) (f x) . f) xs
in lt ++ [x] ++ gt) [] n
-- Get the first entry from the adjacency list G that starts from the given node X
-- (actually, the vertex is the first entry of the pair, hence `(fst x)`):
take_edge g x = head $ dropWhile ((/=) (fst x) . fst) g
-- | all possible matrixes where each node has 2 inputs and arbitrary outs
binaryMatrixes :: Int -> [[[Int]]]
binaryMatrixes size = let columns = binaryInputs size
unfiltered = mapM (const columns) [1..size] in
fst $ foldl'
(\(keep, seen) x -> let can = canon . adjMatToMap $ x in
(if Set.member can seen
then keep
else id $! x : keep
, Set.insert can seen))
([], Set.fromList [])
unfiltered
There are a number of approaches you could try. One thing that I do note is that having loops with multi-edges (colored loops?) is a little unusual, but is probably just needs a refinement of existing techniques.
Filter the output of another program
The obvious candidate here is of course nAUTy/traces (http://pallini.di.uniroma1.it/) or similar (saucy, bliss, etc). Depending on how you want to do this, it could be as simple as run nauty (for example) and output to file, then read in the list filtering as you go.
For larger values of n this could start to be a problem if you are generating huge files. I'm not sure whether you start to run out of space before you run out of time, but still. What might be better is to generate and test them as you go, throwing away candidates. For your purposes, there may be an existing library for generation - I found this one but I have no idea how good it is.
Use graph invariants
A very easy first step to more efficient listing of graphs is to filter using graph invariants. An obvious one would be degree sequence (the ordered list of degrees of the graph). Others include the number of cycles, the girth, and so on. For your purposes, there might be some indegree/outdegree sequence you could use.
The basic idea is to use the invariant as a filter to avoid expensive checks for isomorphism. You can store the (list of ) invariants for already generated graphs, and check the new one against the list first. The canonical form of a structure is a kind of invariant.
Implement an algorithm
There are lost of GI algorithms, including the ones used by nauty and friends. However, they do tend to be quite hard! The description given in this answer is an excellent overview, but the devil is in the details of course.
Also note that the description is for general graphs, while you have a specific subclass of graph that might be easier to generate. There may be papers out there for digraph listing (generating) but I have not checked.

sort a list of numbers by their 'visual similarity'

consider a function, which rates the level of 'visual similarity' between two numbers: 666666 and 666166 would be very similar, unlike 666666 and 111111
type N = Int
type Rate = Int
similar :: N -> N -> Rate
similar a b = length . filter id . zipWith (==) a' $ b'
where a' = show a
b' = show b
similar 666666 666166
--> 5
-- high rate : very similar
similar 666666 111111
--> 0
-- low rate : not similar
There will be more sophisticated implementations for this, however this serves the purpose.
The intention is to find a function that sorts a given list of N's, so that each item is the most similar one to it's preceding item. Since the first item does not have a predecessor, there must be a given first N.
similarSort :: N -> [N] -> [N]
Let's look at some sample data: They don't need to have the same arity but it makes it easier to reason about it.
sample :: [N]
sample = [2234, 8881, 1222, 8888, 8822, 2221, 5428]
one could be tempted to implement the function like so:
similarSortWrong x xs = reverse . sortWith (similar x) $ xs
but this would lead to a wrong result:
similarSortWrong 2222 sample
--> [2221,1222,8822,2234,5428,8888,8881]
In the beginning it looks correct, but it's obvious that 8822 should rather be followed by 8881, since it's more similar that 2234.
So here's the implementation I came up with:
similarSort _ [] = []
similarSort x xs = x : similarSort a as
where (a:as) = reverse . sortWith (similar x) $ xs
similarSort 2222 sample
--> [2222,2221,2234,1222,8822,8888,8881]
It seems to work. but it also seems to do lot more more work than necessary. Every step the whole rest is sorted again, just to pick up the first element. Usually lazyness should allow this, but reverse might break this again. I'd be keen to hear, if someone know if there's a common abstraction for this problem.
It's relatively straightforward to implement the greedy algorithm you ask for. Let's start with some boilerplate; we'll use the these package for a zip-like that hands us the "unused" tail ends of zipped-together lists:
import Data.Align
import Data.These
sampleStart = "2222"
sampleNeighbors = ["2234", "8881", "1222", "8888", "8822", "2221", "5428"]
Instead of using numbers, I'll use lists of digits -- just so we don't have to litter the code with conversions between the form that's convenient for the user and the form that's convenient for the algorithm. You've been a bit fuzzy about how to rate the similarity of two digit strings, so let's make it as concrete as possible: any digits that differ cost 1, and if the digit strings vary in length we have to pay 1 for each extension to the right. Thus:
distance :: Eq a => [a] -> [a] -> Int
distance l r = sum $ alignWith elemDistance l r where
elemDistance (These l r) | l == r = 0
elemDistance _ = 1
A handy helper function will pick the smallest element of some list (by a user-specified measure) and return the rest of the list in some implementation-defined order.
minRestOn :: Ord b => (a -> b) -> [a] -> Maybe (a, [a])
minRestOn f [] = Nothing
minRestOn f (x:xs) = Just (go x [] xs) where
go min rest [] = (min, rest)
go min rest (x:xs) = if f x < f min
then go x (min:rest) xs
else go min (x:rest) xs
Now the greedy algorithm almost writes itself:
greedy :: Eq a => [a] -> [[a]] -> [[a]]
greedy here neighbors = here : case minRestOn (distance here) neighbors of
Nothing -> []
Just (min, rest) -> greedy min rest
We can try it out on your sample:
> greedy sampleStart sampleNeighbors
["2222","1222","2221","2234","5428","8888","8881","8822"]
Just eyeballing it, that seems to do okay. However, as with many greedy algorithms, this one only minimizes the local cost of each edge in the path. If you want to minimize the total cost of the path found, you need to use another algorithm. For example, we can pull in the astar package. For simplicity, I'm going to do everything in a very inefficient way, but it's not too hard to do it "right". We'll need a fair chunk more imports:
import Data.Graph.AStar
import Data.Hashable
import Data.List
import Data.Maybe
import qualified Data.HashSet as HS
Unlike before, where we only wanted the nearest neighbor, we'll now want all the neighbors. (Actually, we could probably implement the previous use of minRestOn using the following function and minimumOn or something. Give it a try if you're interested!)
neighbors :: (a, [a]) -> [(a, [a])]
neighbors (_, xs) = go [] xs where
go ls [] = []
go ls (r:rs) = (r, ls ++ rs) : go (r:ls) rs
We can now call the aStar search method with appropriate parameters. We'll use ([a], [[a]]) -- representing the current list of digits and the remaining lists that we can choose from -- as our node type. The arguments to aStar are then, in order: the function for finding neighboring nodes, the function for computing distance between neighboring nodes, the heuristic for how far we have left to go (we'll just say 1 for each unique element in the list), whether we've reached a goal node, and the initial node to start the search from. We'll call fromJust, but it should be okay: all nodes have at least one path to a goal node, just by choosing the remaining lists of digits in order.
optimal :: (Eq a, Ord a, Hashable a) => [a] -> [[a]] -> [[a]]
optimal here elsewhere = (here:) . map fst . fromJust $ aStar
(HS.fromList . neighbors)
(\(x, _) (y, _) -> distance x y)
(\(x, xs) -> HS.size (HS.fromList (x:xs)) - 1)
(\(_, xs) -> null xs)
(here, elsewhere)
Let's see it run in ghci:
> optimal sampleStart sampleNeighbors
["2222","1222","8822","8881","8888","5428","2221","2234"]
We can see that it's done better this time by adding a pathLength function that computes all the distances between neighbors in a result.
pathLength :: Eq a => [[a]] -> Int
pathLength xs = sum [distance x y | x:y:_ <- tails xs]
In ghci:
> pathLength (greedy sampleStart sampleNeighbors)
15
> pathLength (optimal sampleStart sampleNeighbors)
14
In this particular example, I think the greedy algorithm could have found the optimal path if it had made the "right" choices whenever there were ties for minimal next step; but I expect it is not too hard to cook up an example where the greedy algorithm is forced into bad early choices.

Solving CHRL4 on code chef (CHEF and Way) in Haskell

I am trying to solve this question in Haskell but the codechef compiler keeps on saying it is the wrong answer. The question is as follows:
After visiting a childhood friend, Chef wants to get back to his home. Friend lives at the first street, and Chef himself lives at the N-th (and the last) street. Their city is a bit special: you can move from the X-th street to the Y-th street if and only if 1 <= Y - X <= K, where K is the integer value that is given to you. Chef wants to get to home in such a way that the product of all the visited streets' special numbers is minimal (including the first and the N-th street). Please, help him to find such a product.
Input
The first line of input consists of two integer numbers - N and K - the number of streets and the value of K respectively. The second line consist of N numbers - A1, A2, ..., AN respectively, where Ai equals to the special number of the i-th street.
The output should be modulo 1000000007
Input
4 2
1 2 3 4
Output
8
The solution I used is as follows:
import qualified Data.ByteString.Char8 as B
import Data.Maybe (fromJust)
findMinIndex x index minIndex n
| index == n = minIndex
| (x!!index) < (x!!minIndex) = findMinIndex x (index+1) index n
| otherwise = findMinIndex x (index+1) minIndex n
minCost [] _ = 1
minCost (x:xs) k = let indexList = take k xs
minIndex = findMinIndex indexList 0 0 (length indexList)
in x * minCost(drop minIndex xs) k
main :: IO()
main = do
t <- B.getContents
let inputs = B.lines t
let firstLine = inputs !! 0
let secondLine = inputs !! 1
let [n,k] = map (fst . fromJust . B.readInt) $ B.split ' ' firstLine
let specialNums = reverse $ map (fst . fromJust . B.readInteger) $ B.split ' ' secondLine
putStrLn $ show ((minCost specialNums k) `mod` 1000000007)
It worked for the given test case and a few other test cases I tries out. But it is not being accepted by codechef. I followed the editorial for the problem and made it. Basically starting from the last number in the list of special numbers the program search it's immediate k predecessors and finds the minimum one in that range and multiplies it with the current value and so on till the beginning of the list
Your algorithm doesn't always give the smallest product for all the inputs, e.g. this one:
5 2
3 2 3 2 3
The editorial explained the problem throughout, you really should read it again.
This problem is basically a shortest path problem, streets are vertices, possible movements from street to street are edges of the graph, the weight of an edge is determined by the special value of the tail alone. While the total movement cost is defined as the product but not the sum of all the costs, the question can be normalized by taking logarithms of all the special values, since
a * b = exp(log(a) + log(b))
Given log is monotonically increasing function, the minimal product is just the minimal sum of logarithms.
In editorial the editor picked Dijkstra's algorithm, but after taking the log transformation, it will be a standard shortest path problem and can be solved with any shortest path algorithm you like.
There are many implementations of Dijkstra's algorithm in Haskell, I found two on Hackage and one here. The parsing and graph initializing code is straight forward.
import Control.Monad (foldM)
import Control.Monad.ST
import Data.Array
import Data.Array.MArray
import Data.Array.ST
import Data.Function (on)
import Data.IntMap.Strict as M
import Data.List (groupBy)
import Data.Set as S
-- Code from http://rosettacode.org/wiki/Dijkstra's_algorithm#Haskell
dijkstra :: (Ix v, Num w, Ord w, Bounded w) => v -> v -> Array v [(v,w)] -> (Array v w, Array v v)
dijkstra src invalid_index adj_list = runST $ do
min_distance <- newSTArray b maxBound
writeArray min_distance src 0
previous <- newSTArray b invalid_index
let aux vertex_queue =
case S.minView vertex_queue of
Nothing -> return ()
Just ((dist, u), vertex_queue') ->
let edges = adj_list Data.Array.! u
f vertex_queue (v, weight) = do
let dist_thru_u = dist + weight
old_dist <- readArray min_distance v
if dist_thru_u >= old_dist then
return vertex_queue
else do
let vertex_queue' = S.delete (old_dist, v) vertex_queue
writeArray min_distance v dist_thru_u
writeArray previous v u
return $ S.insert (dist_thru_u, v) vertex_queue'
in
foldM f vertex_queue' edges >>= aux
aux (S.singleton (0, src))
m <- freeze min_distance
p <- freeze previous
return (m, p)
where b = bounds adj_list
newSTArray :: Ix i => (i,i) -> e -> ST s (STArray s i e)
newSTArray = newArray
shortest_path_to :: (Ix v) => v -> v -> Array v v -> [v]
shortest_path_to target invalid_index previous =
aux target [] where
aux vertex acc | vertex == invalid_index = acc
| otherwise = aux (previous Data.Array.! vertex) (vertex : acc)
-- Code I wrote
instance Bounded Double where
minBound = -1e100
maxBound = 1e100
constructInput :: Int -> Int -> M.IntMap Integer -> Array Int [(Int, Double)]
constructInput n k specMap =
let
specMap' = fmap (log . fromIntegral) specMap
edges = [(src, [(dest, specMap' M.! dest) | dest <- [src+1..src+k], dest <= n]) | src <- [1..n]]
in
array (1, n) edges
main :: IO ()
main = do
rawInput <- getContents
let
[l, l'] = lines rawInput
[n,k] = fmap read . words $ l
specs = fmap read . words $ l'
specMap = M.fromList $ [1..n] `zip` specs
adj_list = constructInput n k specMap
(_, previous) = dijkstra 1 0 adj_list
path = shortest_path_to n 0 previous
weight = (product $ fmap (specMap M.!) path) `mod` 1000000007
print weight
PS: My program scores 30 with a lot of TLE (short for "Too Long Execution" I guess) on CodeChief, for the full mark you may have to try it yourself and get a better solution.

Recursion confusion in Haskell again - subsets with an inclusion test

I'm testing a simple program to generate subsets with an inclusion test. For example, given
*Main Data.List> factorsets 7
[([2],2),([2,3],1),([3],1),([5],1),([7],1)]
calling chooseP 3 (factorsets 7), I would like to get (read from right to left, a la cons)
[[([5],1),([3],1),([2],2)]
,[([7],1),([3],1),([2],2)]
,[([7],1),([5],1),([2],2)]
,[([7],1),([5],1),([2,3],1)]
,[([7],1),([5],1),([3],1)]]
But my program is returning an extra [([7],1),([5],1),([3],1)] (and missing a [([7],1),([5],1),([2],2)]):
[[([5],1),([3],1),([2],2)]
,[([7],1),([3],1),([2],2)]
,[([7],1),([5],1),([3],1)]
,[([7],1),([5],1),([2,3],1)]
,[([7],1),([5],1),([3],1)]]
The inclusion test is: members' first part of the tuple must have a null intersection.
Once tested as working, the plan is to sum the internal products of each subset's snds, rather than accumulate them.
Since I've asked a similar question before, I imagine that an extra branch is generated since when the recursion splits at [2,3], the second branch runs over the same possibilities once it passes the skipped section. Any pointers on how to resolve that would be appreciated; and if you'd like to share ideas about how to enumerate and sum such product combinations more efficiently, that would be great, too.
Haskell code:
chooseP k xs = chooseP' xs [] 0 where
chooseP' [] product count = if count == k then [product] else []
chooseP' yys product count
| count == k = [product]
| null yys = []
| otherwise = f ++ g
where (y:ys) = yys
(factorsY,numY) = y
f = let zzs = dropWhile (\(fs,ns) -> not . and . map (null . intersect fs . fst) $ product) yys
in if null zzs
then chooseP' [] product count
else let (z:zs) = zzs in chooseP' zs (z:product) (count + 1)
g = if and . map (null . intersect factorsY . fst) $ product
then chooseP' ys product count
else chooseP' ys [] 0
Your code is complicated enough that I might recommend starting over. Here's how I would proceed.
Write a specification. Let it be as stupidly inefficient as necessary -- for example, the spec I choose below will build all combinations of k elements from the list, then filter out the bad ones. Even the filter will be stupidly slow.
sorted xs = sort xs == xs
unique xs = nub xs == xs
disjoint xs = and $ liftM2 go xs xs where
go x1 x2 = x1 == x2 || null (intersect x1 x2)
-- check that x is valid according to all the validation functions in fs
-- (there are other fun ways to spell this, but this is particularly
-- readable and clearly correct -- just what we want from a spec)
allFuns fs x = all ($x) fs
choosePSpec k = filter good . replicateM k where
good pairs = allFuns [unique, disjoint, sorted] (map fst pairs)
Just to make sure it's right, we can test it at the prompt:
*Main> mapM_ print $ choosePSpec 3 [([2],2),([2,3],1),([3],1),([5],1),([7],1)]
[([2],2),([3],1),([5],1)]
[([2],2),([3],1),([7],1)]
[([2],2),([5],1),([7],1)]
[([2,3],1),([5],1),([7],1)]
[([3],1),([5],1),([7],1)]
Looks good.
Now that we have a spec, we can try to improve the speed one refactoring at a time, always checking that it matches the spec. The first thing I'd want to do is notice that we can ensure uniqueness and sortedness just by sorting the input and picking things "in an increasing way". To do this, we can define a function which chooses subsequences of a given length. It piggy-backs on the tails function, which you can think of as nondeterministically choosing a place to split its input list.
subseq 0 xs = [[]]
subseq n xs = do
x':xt <- tails xs
xs' <- subseq (n-1) xt
return (x':xs')
Here's an example of this function in action:
*Main> subseq 3 [1..4]
[[1,2,3],[1,2,4],[1,3,4],[2,3,4]]
Now we can write a slightly faster chooseP by replacing replicateM with subseq. Recall that we're assuming the inputs are already sorted and unique, though.
choosePSlow k = filter good . subseq k where
good pairs = disjoint $ map fst pairs
We can sanity-check that it's working by running it on the particular input we have from above:
*Main> let i = [([2],2),([2,3],1),([3],1),([5],1),([7],1)]
*Main> choosePSlow 3 i == choosePSpec 3 i
True
Or, better yet, we can stress-test it with QuickCheck. We'll need a tiny bit more code. The condition k < 5 is just because the spec is so hopelessly slow that bigger values of k take forever.
propSlowMatchesSpec :: NonNegative Int -> OrderedList ([Int], Int) -> Property
propSlowMatchesSpec (NonNegative k) (Ordered xs)
= k < 5 && unique (map fst xs)
==> choosePSlow k xs == choosePSpec k xs
*Main> quickCheck propSlowMatchesSpec
+++ OK, passed 100 tests.
There are several more opportunities to make things faster. For instance, the disjoint test could be sped up using choose 2 instead of liftM2; or we might be able to ensure disjointness during element selection and prune the search even earlier; etc. How you want to improve it from here I leave to you -- but the basic technique (start with stupid and slow, then make it smarter, testing as you go) should be helpful to you.

Writing infinite list to skip every factor of p?

How can I efficiently represent the list [0..] \\ [t+0*p, t+1*p ..]?
I have defined:
Prelude> let factors p t = [t+0*p, t+1*p ..]
I want to efficiently represent an infinite list that is the difference of [0..] and factors p t, but using \\ from Data.List requires too much memory for even medium-sized lists:
Prelude Data.List> [0..10000] \\ (factors 5 0)
<interactive>: out of memory
I know that I can represent the values between t+0*p and t+1*p with:
Prelude> let innerList p1 p2 t = [t+p1+1, t+p1+2 .. t+p2-1]
Prelude> innerList 0 5 0
[1,2,3,4]
However, repeatedly calculating and concatenating innerList for increasing intervals seems clumsy.
Can I efficiently represent [0..] \\ (factors p t) without calculating rem or mod for each element?
For the infinite list [0..] \\ [t,t+p..],
yourlist t p = [0..t-1] ++ [i | m <- [0,p..], i <- [t+m+1..t+m+p-1]]
Of course this approach doesn't scale, at all, if you'd want to remove some other factors, like
[0..] \\ [t,t+p..] \\ [s,s+q..] \\ ...
in which case you'll have to remove them in sequence with minus, mentioned in Daniel Fischer's answer. There is no magic bullet here.
But there's also a union, with which the above becomes
[0..] \\ ( [t,t+p..] `union` [s,s+q..] `union` ... )
the advantage is, we can arrange the unions in a tree, and get algorithmic improvement.
You can't use (\\) for that, because
(\\) :: (Eq a) => [a] -> [a] -> [a]
(\\) = foldl (flip delete)
the list of elements you want to remove is infinite, and a left fold never terminates when the list it folds over is infinite.
If you rather want to use something already written than write it yourself, you can use minus from the data-ordlist package.
The performance should be adequate.
Otherwise,
minus :: Ord a => [a] -> [a] -> [a]
minus xxs#(x:xs) yys#(y:ys)
| x < y = x : minus xs yys
| x == y = minus xs ys
| otherwise = minus xss ys
minus xs _ = xs
You can use a list comprehesion with a predicate, using rem:
>>> let t = 0
>>> let p = 5
>>> take 40 $ [ x | x <- [1..], x `rem` p /= t ]
[1,2,3,4,6,7,8,9,11,12,13,14,16,17,18,19,21,22,23,24,26,27,28,29,31,32,33,34,36,37,38,39,41,42,43,44,46,47,48,49]
If you want efficiency, why does your solution have to use list comprehension syntax?
Why not something like this?
gen' n i p | i == p = gen' (n + p) 1 p
gen' n i p = (n+i) : gen' n (i+1) p
gen = gen' 0 1
and then do
gen 5
Because you have ascending lists, you can simply lazily merge them:
nums = [1..]
nogos = factors p t
result = merge nums (dropWhile (<head nums) nogos) where
merge (a:as) (b:bs)
| a < b = a : merge as (b:bs)
| a == b = merge as bs
| otherwise = error "should not happen"
Writing this in a general way so that we have a function that builds the difference of two infinite lists, provided only that they are in ascending order, is left as exercise. In the end, the following should be possible
[1..] `infiniteDifference` primes `infiniteDifference` squares
For this, make it a left associative operator.

Resources