So I have two implementations of the function tabulate, which, given a function f :: Int -> a and a number n, should produce the list [f 0, f 1, ..., f (n-1)]. I'm trying to guess which one is better in terms of work and span.
tabulate1 :: (Int -> a) -> Int -> [a]
tabulate1 f n = tab (\x -> f (n - x)) n where
tab _ 0 = []
tab g n = let (x,xs) = (g n) ||| (tab g (n-1))
in (x:xs)
tabulate2 :: (Int -> a) -> Int -> [a]
tabulate2 f n = tab f 0 (n-1) where
tab f n m
| n > m = []
| n == m = [f n]
| otherwise = let i = (n + m) `div` 2
(l, r) = (tab f n i) ||| (tab f i+1 m)
in (l ++ r)
While the first one avoids the using of (++), which has linear work and span, the second one computes the two sublists in parallel but uses (++).
So... which one is better?
Time and space complexity in Haskell is often non-trivial as it is a lazy language. This means that while a function might be O(n!), its result might never be needed and therefore never evaluated. Or like in this case, if your function returns a list, and only the first 3 elements are needed by other functions, only those are evaluated.
Anyways, your functions is just a particular case of map, and as such it could be coded in a much more readable way:
tabulate f n = map f [0..n]
Map is implemented with a fold, and is probably the most optimised version you could get
Written in Haskell, here is the data type that proves that one list is a permutation of another:
data Belongs (x :: k) (ys :: [k]) (zs :: [k]) where
BelongsHere :: Belongs x xs (x ': xs)
BelongsThere :: Belongs x xs xys -> Belongs x (y ': xs) (y ': xys)
data Permutation (xs :: [k]) (ys :: [k]) where
PermutationEmpty :: Permutation '[] '[]
PermutationCons :: Belongs x ys xys -> Permutation xs ys -> Permutation (x ': xs) xys
With a Permutation, we can now permute a record:
data Rec :: (u -> *) -> [u] -> * where
RNil :: Rec f '[]
(:&) :: !(f r) -> !(Rec f rs) -> Rec f (r ': rs)
insertRecord :: Belongs x ys zs -> f x -> Rec f ys -> Rec f zs
insertRecord BelongsHere v rs = v :& rs
insertRecord (BelongsThere b) v (r :& rs) = r :& insertRecord b v rs
permute :: Permutation xs ys -> Rec f xs -> Rec f ys
permute PermutationEmpty RNil = RNil
permute (PermutationCons b pnext) (r :& rs) = insertRecord b r (permute pnext rs)
This works fine. However, permute is O(n^2) where n is the length of the record. I'm wondering if there is a way to get it to be any faster by using a different data type to represent a permutation.
For comparison, in a mutable and untyped setting (which I know is a very different setting indeed), we could apply a permutation to a heterogeneous record like this in O(n) time. You represent the record as an array of values and the permutation as an array of new positions (no duplicates are allowed and all digits must be between 0 and n). Applying the permutation is just iterating that array and indexing into the record's array with those positions.
I don't expect that an O(n) permutation is possible in a more rigorously typed settings. But it seems like O(n*log(n)) might be possible. I appreciate any feedback, and let me know if I need to clarify anything. Also, answers to this can use Haskell, Agda, or Idris depending on what it feels easier to communicate with.
A faster simple solution is to compare the sorted permutation of the permutations.
Given permutation A and B.
Then there exist the sorted permutations,
As = sort(A)
Bs = sort(B)
As is a permutation of A and Bs is a permutation of B.
If As == Bs then A is a permutation of B.
Thus the order of this algorithm is O(n log(n)) < O(n²)
And this is leading to the optimal solution.
Using a different storage of permutation yields O(n)
Using the statements from above, we are changing the storage format of each permutation into
the sorted data
the original unsorted data
To determine if a list is a permutation of another one, simple a comparison of the sorted data is necessary -> O(n).
This answers the question correctly, but the effort is hidden in creating the doubled data storage ^^ So it will depend on the use if this is a real advantage or not.
I have decided to solve first programing assignment from Standford algorithm course https://class.coursera.org/algo-005 using Haskell. Despite I am very new to language I implemented it much faster than in c++. I have 6+ years of work experience in c++ so it impressed me a bit. But performance is disappointing: 0.19 sec (c++) vs 9.88 (haskell) version.
How can I improve performance of Haskell implementation so it can be comparable to c++?
Here is my code in Haskell
data SortedList = SortedList {
inversionCount :: Int,
list :: [Int]
} deriving (Show)
-- first list accumulator
packm :: Int -> SortedList -> Int -> SortedList
packm x (SortedList count xs) add = SortedList (count + add) (x:xs)
merge2 :: [Int] -> [Int] -> SortedList
merge2 [] xs = SortedList 0 xs
merge2 xs [] = SortedList 0 xs
merge2 xlist#(x:xs) ylist#(y:ys)
| x < y = packm x (merge2 xs ylist) 0
| otherwise = packm y (merge2 xlist ys) $ length xlist
countAndMerge :: SortedList -> SortedList -> SortedList
countAndMerge (SortedList lcount lxs) (SortedList rcount rxs) =
let merged = merge2 lxs rxs
in SortedList (lcount + rcount + inversionCount merged) $ list merged
mergesort :: [Int] -> SortedList
mergesort [] = SortedList 0 []
mergesort [x] = SortedList 0 [x]
mergesort xs =
let leftsorted = mergesort $ take halfElements xs
rightsorted = mergesort $ drop halfElements xs
in countAndMerge leftsorted rightsorted
where halfElements = length xs `div` 2
main = do
contents <- getContents
let intlist = [ read x :: Int | x <- (lines contents) ]
print $ inversionCount $ mergesort intlist
The biggest problem is that the asymptotic performance isn't right to begin with; it's O(n^2 * log n) rather than the optimal O(n * log n). The culprit is merge2:
| otherwise = packm y (merge2 xlist ys) $ length xlist
length xlist is O(n). Supposing a random input list, we need to compute length xlist on about half of the merge2 calls, thus making one level of merging O(n^2).
otherwise = packm y (merge2 xlist ys) $ length xlist
This computes length at every other step of the merge on the average. This makes the whole business quadratic.
If you track length of lists not by counting elements, but by passing the count down from the top level, you restore the O(N log N) behaviour. For a list of 100000 elements this means execution time goes down from 20 seconds to 0.45 second (on my machine with -O2).
Scaling it further up without changing the algorithm is problematic, because it currently runs in linear stack space, and cannot cope with 1 million elements with default RTS options. Change mergesort to a merge-adjacent-pairs version, it is likely to run much better.
The following (unoptimal) code generates all the subsets of size N for certain subset.
This code works but, as I said, is highly unoptimal. Using an intermediate list to avoid the O(log(n)) of Set.insert doesn't seem help due to the large cost of later reconverting the list to a Set
Can anybody suggest how to optimize the code?
import qualified Data.Set as Set
subsetsOfSizeN :: Ord a => Int -> Set.Set a -> Set.Set (Set.Set a)
subsetsOfSizeN n s
| Set.size s < n || n < 0 = error "subsetOfSizeN: wrong parameters"
| otherwise = doSubsetsOfSizeN n s
where doSubsetsOfSizeN n s
| n == 0 = Set.singleton Set.empty
| Set.size s == n = Set.singleton s
| otherwise =
case Set.minView s of
Nothing -> Set.empty
Just (firstS, restS) ->
let partialN n = doSubsetsOfSizeN n restS in
Set.map (Set.insert firstS) (partialN (n-1)) `Set.union` partialN n
This is inspired by Pascal's triangle.
choose :: [b] -> Int -> [[b]]
_ `choose` 0 = [[]]
[] `choose` _ = []
(x:xs) `choose` k = (x:) `fmap` (xs `choose` (k-1)) ++ xs `choose` k
This code works but, as I said, is highly unoptimal.
Doesn't seem so terribly bad to me. The number of subsets of size k of a set of size n is n `choose` k which grows rather fast for k ~ n/2. So creating all the subsets must scale badly.
Using an intermediate list to avoid the O(log(n)) of Set.insert doesn't seem help due to the large cost of later reconverting the list to a Set.
Hmm, I found using lists to give better performance. Not asymptotically, I think, but a not negligible more-or-less constant factor.
But first, there is an inefficiency in your code that is simple to repair:
Set.map (Set.insert firstS) (partialN (n-1))
Note that Set.map must rebuild a tree from scratch. But we know that firstS is always smaller than any element in any of the sets in partialN (n-1), so we can use Set.mapMonotonic that can reuse the spine of the set.
And that principle is also what makes lists attractive, the subsets are generated in lexicographic order, so instead of Set.fromList we can use the more efficient Set.fromDistinctAscList. Transcribing the algorithm yields
onlyLists :: Ord a => Int -> Set.Set a -> Set.Set (Set.Set a)
onlyLists n s
| n == 0 = Set.singleton Set.empty
| Set.size s < n || n < 0 = error "onlyLists: out of range n"
| Set.size s == n = Set.singleton s
| otherwise = Set.fromDistinctAscList . map Set.fromDistinctAscList $
go n (Set.size s) (Set.toList s)
where
go 1 _ xs = map return xs
go k l (x:xs)
| k == l = [x:xs]
| otherwise = map (x:) (go (k-1) (l-1) xs) ++ go k (l-1) xs
which in the few benchmarks I've run is between 1.5 and 2× faster than the amended algorithm using Sets.
And that is in turn, in my criterion benchmarks, nearly twice as fast as dave4420's.
subsets :: Int -> [a] -> [[a]]
subsets 0 _ = [[]]
subsets _ [] = []
subsets k (x:xs) = map (x:) (subsets (k - 1) xs) ++ subsets k xs
First, use a better algorithm.
Look at your final line:
Set.map (Set.insert firstS) (partialN (n-1)) `Set.union` partialN n
Evaluating doSubsetsOfSizeN k (Set.fromList $ 1:2:xs) will involve evaluating doSubsetsOfSizeN (k-1) (Set.fromList xs) twice (once when inserting 1, and once when inserting 2). This duplication is wasteful.
Enter a better algorithm.
mine :: Ord a => Int -> Set.Set a -> Set.Set (Set.Set a)
mine n s | Set.size s < n || n < 0 = Set.empty
| otherwise = Set.foldr cons nil s !! n
where
nil :: Ord a => [Set.Set (Set.Set a)]
nil = Set.singleton Set.empty : repeat Set.empty
cons :: Ord a => a -> [Set.Set (Set.Set a)] -> [Set.Set (Set.Set a)]
cons x sets = zipWith Set.union sets
(Set.empty : map (Set.map $ Set.insert x) sets)
mine 9 (Data.Set.fromList [0..18]) `seq` () is faster than subsetsOfSizeN 9 (Data.Set.fromList [0..18]) `seq` () and should have better asymptotic performance.
I haven't tried optimising this any further. There may be a better algorithm still.
(If the cost of insert and fromList are issues, you should consider giving back a list of lists instead of a set of sets.)
I found this, may be it can help you
f [] = [[1]]
f l = (:) [u] l'
where
u = succ (head (head l))
l' = (++) l (map(\x->(:) u x) l)
fix f n = if (n==0) then [] else f (fix f (n-1))
To test it
$ length $ (fix f 10) => 1023 -- The empty set is always include then == 1024
I am wondering how to implement the following efficiently:
A hash map that maps a subset of a set to another set. For example, for the set {1, 2}, then the insertions to the map are {1} -> {100}, {2} -> {100}, {1, 2}-> {100} (the empty set is not considered)
While the insertion complexity is almost 2^|X|, which is the number of subset of powerset of X to insert ( |X| stands for the cardinality of the set X).
How to use Haskell map and set to efficiently implement this? Or construct new data structures and schemes based on Haskell existing map and set?
Be more specific
Insertions: get subset of a set, insert the mapping from each subset to the value set, if the key is already there, union the result.
Lookups: given a set A, first get all the subset of A, then for each subset, look it up in the SetMap, then union the results as the return value.
For example:
If first insert {1, 2} -> {100} in a empty map M,
Insertions:
then insertions are: ({1}, {100}), ({2}, {100}), ({1,2}, {100}).
if further to insert {2,3}->{200}, then for each subset of {2, 3}, if there is already key in M, union {200} with that value. If not, just set the value as {200}. So, the map will now look like ({1}, {100}),({2}, {100,200}), ({3},{200}), ({1,2}, {100}), ({2,3}, {200})
Lookups:
lookup {2,3}, works almost like insertions: first get subsets of {2,3}, which are {2},{3}, {2,3}, for each of those subsets, look it up in the M, union all the results, which are {100, 200} union {200} union {200} = {100,200} (after two steps in 1.). (union empty set if the key not found)
Also the Aside problem: how to efficiently implement generating powerset!
Here is an observation: when looking up a set, it is sufficient to look up each of its elements and union those results together. I therefore propose that we keep two pieces of data: the insertions, and a mapping from elements to the sets that have those elements.
import Data.Map (Map)
import Data.Set (Set)
import qualified Data.Map as M
import qualified Data.Set as S
flatLookup :: Ord k => Map k (Set v) -> k -> [v]
flatLookup m k = maybe [] S.toList (M.lookup k m)
type SetMap k v = (Map (Set k) (Set v), Map k (Set (Set k)))
emptySM :: SetMap k v
emptySM = (M.empty, M.empty)
insertSM :: Ord k => Set k -> Set v -> SetMap k v -> SetMap k v
insertSM keys values (setMap, elemMap) = (setMap', elemMap') where
setMap' = M.insert keys values setMap
newElems = M.fromList [(k, S.singleton keys) | k <- S.toList keys]
elemMap' = M.unionWith S.union elemMap newElems
lookupSM :: (Ord k, Ord v) => Set k -> SetMap k v -> Set v
lookupSM keys (setMap, elemMap) = S.fromList $
S.toList keys >>= flatLookup elemMap >>= flatLookup setMap
deleteSM :: Ord k => Set k -> SetMap k v -> SetMap k v
deleteSM keys (setMap, elemMap) = (setMap', elemMap') where
setMap' = M.delete keys setMap
delElems = M.fromList [(k, S.singleton keys) | k <- S.toList keys]
elemMap' = M.unionWith S.difference elemMap delElems
In ghci:
*Main> let x = insertSM (S.fromList [2,3]) (S.fromList [200]) . insertSM (S.fromList [1,2]) (S.fromList [100]) $ emptySM
*Main> lookupSM (S.fromList [2,3]) x
fromList [100,200]
So at least your test-case passes. =)
Insertion is pretty cheap. Suppose you've already inserted n keysets of average size m and the new set has size k. The expensive part is doing k insertions into the second map. Finding the right place to insert does O(log n) comparisons of sets of size m, so the top-level insertion takes O(km*log n) time.
Lookup is not terrible, either. Suppose you've already inserted n keysets of average size m. The expensive part is looking up the keysets that your keys participate in; in the worst case, we must look up all n keysets. This takes O(nm*log n) time total.
If you don't care about removing elements, this can be simplified significantly (and the complexity improved, as well):
type SetMap' k v = Map k (Set v)
emptySM' :: SetMap' k v
emptySM' = M.empty
insertSM' :: (Ord k, Ord v) => Set k -> Set v -> SetMap' k v -> SetMap' k v
insertSM' keys values setMap = M.unionWith S.union setMap newMap where
newMap = M.fromList [(k, values) | k <- S.toList keys]
lookupSM' :: (Ord k, Ord v) => Set k -> SetMap' k v -> Set v
lookupSM' keys setMap = S.fromList $ S.toList keys >>= flatLookup setMap
A shameless rip-off of #Daniel Wagner's solution with the counting set implementation I suggested:
import Data.Map (Map)
import Data.Set (Set)
import qualified Data.Map as M
import qualified Data.Set as S
import Control.Arrow ( (&&&) )
type SetMap k v = (Map k (Map v Int), Map (Set k) (Set v))
emptySM :: SetMap k v
emptySM = (M.empty, M.empty)
insertSM :: (Ord k, Ord v) => Set k -> Set v -> SetMap k v -> SetMap k v
insertSM keys values setMap = (keyMap', keySetMap') where
(keyMap, keySetMap) = deleteSM keys setMap
keyMap' = M.unionWith (M.unionWith (+)) keyMap newKeyMap where
newKeyMap = fromSet keys . fromSet values $ 1
keySetMap' = M.insert keys values keySetMap
lookupSM :: (Ord k, Ord v) => Set k -> SetMap k v -> Set v
lookupSM keys (keyMap, _) = S.fromList $ S.toList keys >>= flatLookup keyMap where
flatLookup m k = maybe [] M.keys . M.lookup k $ m
deleteSM :: (Ord k, Ord v) => Set k -> SetMap k v -> SetMap k v
deleteSM keys setMap#(keyMap, keySetMap) = maybe setMap setMap' valuesIndex where
setMap' = keyMap' &&& keySetMap'
valuesIndex = M.lookupIndex keys keySetMap
keyMap' i = differenceUntil (differenceUntil (-) (==0)) M.null keyMap oldKeyMap where
values = snd . M.elemAt i $ keySetMap
oldKeyMap = fromSet keys . fromSet values $ 1
keySetMap' i = M.deleteAt i keySetMap
differenceUntil :: Ord k => (a -> b -> a) -> (a -> Bool) -> Map k a -> Map k b -> Map k a
differenceUntil f p = M.differenceWith $
\a b -> let a' = f a b in if p a' then Nothing else Just a'
fromSet :: Ord k => Set k -> v -> Map k v
fromSet s v = M.fromDistinctAscList [(k, v) | k <- S.toAscList s]
Ok, so you are saying that when you insert a set into this map, all the subsets of the set get mapped as well.
You have analyzed that the insertion complexity, if you use a Data.Map, is O(2^n) where n is the size of the set of the set to insert. That's about right, it's actually O(n 2^n) because insertion is O(log n).
Therefore, if you want to avoid exponential time, you can't take this route. Inserting all the subsets when you insert a set will be exponential time. So we have to figure something else out. What sorts of operations do you need on the map? I will assume that you just need lookup: i.e. find the value assigned to a given set. The choice of solution is always dependent on both you insertion and your lookup interface.
Other things to consider: is it okay if insertion is more expensive than lookup? Or the other way around? What are the usage patterns?
So we have two operations (I will call the type we are designing SetMap). Also note that all this code will be pseudocode: there will probably be some class constraints on the variables, and I'm assuming the existence of functions whose semantics correspond to their names.
insert :: Set a -> b -> SetMap a b -> SetMap a b
lookup :: SetMap a b -> Set a -> Maybe b
We could implement SetMap a b = [(Set a, b)], and then insertion would be O(1) and lookup would be O(n m) (where n is the number of items inserted and m is the size of the key being looked up). Not great, obviously.
What if you organized the sets in the map in a tree by unions.
data SetMap a b
= Branch (Set a) [SetMap a b]
| Leaf b
So, for example, the root of the tree would be the union of every key in the map. Then to lookup a key in the map you see if the key is a subset of the root; if so, then you try to lookup the key in each of its children, otherwise you fail:
lookup :: SetMap a b -> Set a -> Maybe b
lookup (Branch s children) x
| x `subset` s = msum $ map (`lookup` x) children
| otherwise = Nothing
lookup (Leaf b) x = Just b
I am not quite sure how to analyze the complexity of this data structure, it kind of depends on the shape of your sets.
I'm sure there are other options, that's the first thing that came to mind. I'd be happy to help more on this problem (perhaps on a new SO question), but could you describe what you are doing in more detail? It's hard to tell the constraints of your problem from your limited description. Don't be shy, describe your goal in practical terms; a lot of times I reduce my problems to the need for a certain data structure, when the best solution comes from taking a step back and re-expressing the problem so I don't need that data structure.