It could very well be that the answer to this question is an obvious and resounding "there's no such thing", but I'll give it a shot: Is there a functional map-like data structure that is more efficient than a standard map when the keys have an arbitrary, often very big, size?
For the sake of concreteness, consider the Haskell type
(Ord k) => Map [k] v
in which lookups can take a very long time if lists need to be compared down to a deep level. I guess hashing is also out of the question due to the arbitrary length of the lists. I still can't help but think there could be a clever data structure out there.
Is hashing out of the question? There's no prefix of the key structure that can be computed efficiently?
If not, how about a hashmap? Take the very big key, reduce it to something very small, use that as an index into a structure.
hashmap on Hackage.
Johan Tibbel's talk on hash tree structures
A trie?
If you have two long keys that are almost identical, a Map will compare them both from the beginning, but a trie will only compare the suffixes that haven't already been eliminated by previous comparisons (if you see what I mean). So a trie would be more time-efficient in that situation.
Tries can be optimised in various ways, and you might also want to look at ternary trees.
Here's one:
module ListMap where
import Data.Map as M
data ListMap k v = ListMap { ifEmpty :: Maybe v, ifFull :: Maybe k (ListMap k v) }
empty :: ListMap k v
empty = ListMap Nothing M.empty
singleton :: [k] -> v -> ListMap k v
singleton [] v = ListMap.empty { ifEmpty = Just v }
singleton (k:ks) v = ListMap.empty { ifFull = M.singleton k (ListMap.singleton ks v) }
lookup :: Ord k => [k] -> ListMap k v -> Maybe v
lookup [] lm = ifEmpty lm
lookup (k:ks) lm = M.lookup k (ifFull lm) >>= ListMap.lookup ks
insert :: Ord k => [k] -> v -> ListMap k v -> ListMap k v
insert [] v lm = lm { ifEmpty = Just v }
insert (k:ks) v lm = lm { ifFull = M.alter (Just . insertion) k (ifFull lm) }
where insertion = maybe (ListMap.singleton ks v) (ListMap.insert ks v)
It's essentially creating a prefix tree on the list elements so you only compare as far as necessary.
Related
I'm studying Floyd-Warshall algorithm. Now having managed to implement it in Haskell, the way I implement it is similar to how it is implemented in imperative languages (that is to say, use list of lists to simulate 2D arrays), but this is really inefficient giving that accessing an element in a list is much more slower than in a array.
Is there a smarter way to do this in Haskell? I thought I could do this by concate some lists but keep failing.
My Code:
floydwarshall :: [[Weight]] -> [[Weight]]
floydwarshall lst = fwAlg 1 $ initMatrix 0 $ list2matrix lst
fwAlg :: Int -> [[Weight]] -> [[Weight]]
fwAlg k m | k < rows m = let n = rows m
m' = foldl (\m (i,j) -> updateDist i j k m) m [(i,j) | i <- [0..n-1], j <- [0..n-1]]
in fwAlg (k+1) m'
| otherwise = m
-- a special case where k is 0
initMatrix :: Int -> [[Weight]] -> [[Weight]]
initMatrix n m = if n == rows m then m else initMatrix (n+1) $ updateAtM 0.0 (n,n) m
updateDist :: Int -> Int -> Int -> [[Weight]] -> [[Weight]]
updateDist i j k m =
let w = min (weight i j m) (weight i k m + weight k j m)
in updateAtM w (i, j) m
weight :: Vertice -> Vertice -> [[Weight]] -> Weight
weight i j m = let Just w = elemAt (i, j) m in w
The algorithm has a regular access pattern so we can avoid a lot of
indexing and still write it with lists, with (I think) the same
asymptotic performance as the imperative version.
If you do want to use arrays for more speed, you might still want to do
something similar to this with bulk operations on rows and columns
rather than reading and writing individual cells.
-- Let's have a type for weights. We could use Maybe but the ordering
-- behaviour is wrong - when there's no weight it should be like
-- +infinity.
data Weight = Weight Int | None deriving (Eq, Ord, Show)
addWeights :: Weight -> Weight -> Weight
addWeights (Weight x) (Weight y) = Weight (x + y)
addWeights _ _ = None
-- the main function just steps the matrix a number of times equal to
-- the node count. Also pass along k at each step.
floydwarshall :: [[Weight]] -> [[Weight]]
floydwarshall m = snd (iterate step (0, m) !! length m)
-- step takes k and the matrix for k, returns k+1 and the matrix for
-- k+1.
step :: (Int, [[Weight]]) -> (Int, [[Weight]])
step (k, m) = (k + 1, zipWith (stepRow ktojs) istok m)
where
ktojs = m !! k -- current k to each j
istok = transpose m !! k -- each i to current k
-- Make shortest paths from one i to all j.
-- We need the shortest paths from the current k to all j
-- and the shortest path from this i to the current k
-- and the shortest paths from this i to all j
stepRow :: [Weight] -> Weight -> [Weight] -> [Weight]
stepRow ktojs itok itojs = zipWith stepOne itojs ktojs
where
stepOne itoj ktoj = itoj `min` (itok `addWeights` ktoj)
-- example from wikipedia for testing
test :: [[Weight]]
test = [[Weight 0, None, Weight (-2), None],
[Weight 4, Weight 0, Weight 3, None],
[None, None, Weight 0, Weight 2],
[None, Weight (-1), None, Weight 0]]
I don't know how to achieve peak performance, but I can give you some tips on making your code abstract so that you can work on performance tuning more easily.
First of all, it would be nice if, when you change around your data types, you don't have to rewrite everything. Right now, you've made everything concretely about lists of lists, so let's see if we can abstract that out. First, we have to figure out what your minimal matrix interface is. Glancing at your code, you appear to have initMatrix, list2matrix, rows, elemAt, and updateAtM. These are the functions that query or modify your matrix, and these are what you would need to implement to make a new version of this code for a different Matrix type.
One way to organize this interface is to make a class out of it. For instance:
class Matrix m where
list2matrix :: [[a]] -> m a
matrix2List :: m a -> [[a]]
rows :: m a -> Int
elemAt :: Int -> Int -> m a -> a
updateAtM :: a -> (Int, Int) -> m a -> m a
setDiag :: a -> m a -> m a
(I went ahead and added a matrix2List function for extracting your result and renamed/modified initMatrix into setDiag, which felt a little more general.)
We can then update your code to use this new class:
floydwarshall :: Matrix m => [[Weight]] -> m Weight
floydwarshall lst = fwAlg 1 $ initMatrix $ list2matrix lst
fwAlg :: Matrix m => Int -> m Weight -> m Weight
fwAlg k m | k < rows m = let n = rows m
m' = foldl (\m (i,j) -> updateDist i j k m) m [(i,j) | i <- [0..n-1], j <- [0..n-1]]
in fwAlg (k+1) m'
| otherwise = m
initMatrix :: Matrix m => m Weight -> m Weight
initMatrix = setDiag 0
updateDist :: Matrix m => Int -> Int -> Int -> m Weight -> m Weight
updateDist i j k m =
let w = min (elemAt i j m) (elemAt i k m + elemAt k j m)
in updateAtM w (i, j) m
dist :: Matrix m => Int -> Int -> Int -> m Weight -> Weight
dist i j 0 m = elemAt i j m
dist i j k m = min (dist i j (k-1) m) (dist i k (k-1) m + dist k j (k-1) m)
Now all we need to do is start defining some Matrix types and see how performance is!
Let's start with lists, since you've already done this work. We'll have to use a newtype wrapper to make GHC happy, but ignoring the wrapping and unwrapping, this is morally the same as the code you wrote:
newtype ListMatrix a = ListMatrix { getListMatrix :: [[a]] }
instance Matrix ListMatrix where
list2matrix = ListMatrix
matrix2List = getListMatrix
rows = length . getListMatrix
elemAt i j (ListMatrix m) = m !! i !! j
updateAtM a (i,j) (ListMatrix m) =
let (firstRows, row:laterRows) = splitAt i m
(firstCols, _:laterCols) = splitAt j row
in ListMatrix $ firstRows <> ((firstCols <> (a:laterCols)):laterRows)
setDiag x = go 0
where go n m = if n == rows m then m else go (n+1) $ updateAtM x (n,n) m
(Also, I filled in elemAt and updateAtM.) You should be able to run
matrix2List #ListMatrix $ floydwarshall myList
and get the same result (and performance) that you currently have.
Now, on to the experimentation! All that's necessary is for us to define new instances of Matrix and see what happens. Perhaps we should try pure functions:
data FunMatrix a = FunMatrix { size :: Int, getFunMatrix :: Int -> Int -> a }
instance Matrix FunMatrix where
list2matrix l = FunMatrix (length l) (\i j -> l !! i !! j)
matrix2List (FunMatrix s f) = (\i -> f i <$> [0..s-1]) <$> [0..s-1]
rows = size
elemAt i j m = getFunMatrix m i j
updateAtM a (i,j) (FunMatrix s f) = FunMatrix s (\i' j' -> if i==i' && j==j' then a else f i' j')
setDiag x (FunMatrix s f) = FunMatrix s (\i j -> if i==j then x else f i j)
How does that perform? One problem is that the starting lookup function is still just indexing into the list of lists, which is slow. One fix would be to convert to an array or vector first and then index. Because we've nicely abstracted everything, all that would need to change is the definition of list2matrix right here, and you'll probably get a nice performance boost!
On the topic of performance, there's one other note I can point out. The definition of dist does some serious "dynamic programming". This could work fine if you were writing and reading directly into an array, but in this recursive form, you may end up doing a lot of duplicate work. One fix is to memoize. My goto memoization package is MemoTrie, which makes it really easy to memoize things. In this case, you could change dist to:
dist :: Matrix m => m Weight -> Int -> Int -> Int -> Weight
dist m = go'
where
go' = memo3 go
go i j 0 = elemAt i j m
go i j k = min (go' i j (k-1)) (go' i k (k-1) + go' k j (k-1))
That might give you a bit of a boost!
You might consider taking #Chi's advice and use STUArray, but you'll run into a problem: the STUArray interface demands that array lookups are in a monad. It's still possible to use the abstraction method I show off above, but you'll have to change the types of the functions. And, because you change the types in the interface, you'll need to update your algorithm code to be monadic. It can be a bit of a pain, but it might be necessary to get optimal performance.
Imagine an array that associate numbers and words. Exactly one number for one word and vice versa.
dic = [0: 'food',
1: 'dinner',
2.5: 'breakfast',
...]
Now I want to access dic[0] and get food and something['food'] and get 0. Is there any kind of reversible hashtable in the wild ? As far as I know only doing a duplicate can solve this problem.
Yes, this is generally done by combining two maps. Modifications are a bit tricky, because they have to maintain the one-to-one rule.
Let's start with a class for finite maps, and a couple sample instances for containers types:
{-# LANGUAGE
FunctionalDependencies
, FlexibleInstances
, UndecidableInstances
, GeneralizedNewtypeDeriving #-}
module Mappy where
import Prelude hiding (lookup)
-- Example base maps
import qualified Data.Map.Strict as M
import Data.Map (Map)
import qualified Data.IntMap.Strict as IM
import Data.IntMap (IntMap)
class Mappy k v m | m -> k v where
empty :: m
insert :: k -> v -> m -> m
delete :: k -> m -> m
lookup :: k -> m -> Maybe v
instance Ord k => Mappy k a (Map k a) where
empty = M.empty
insert = M.insert
delete = M.delete
lookup = M.lookup
instance Mappy Int a (IntMap a) where
empty = IM.empty
insert = IM.insert
delete = IM.delete
lookup = IM.lookup
Now we can build our type for bidirectional maps:
data Bimap m n = Bimap !m !n
instance Show m => Show (Bimap m n) where
showsPrec p (Bimap m _) = showParen (p > 10) $
showString "Bimap " . showsPrec 11 m
invert :: Bimap m n -> Bimap n m
invert (Bimap m n) = Bimap n m
instance (Mappy k v kv, Mappy v k vk) => Mappy k v (Bimap kv vk) where
empty = Bimap empty empty
insert k v (Bimap kv vk)
| Just k' <- lookup v vk
= Bimap (insert k v $ delete k' kv) (insert v k vk)
| otherwise
= Bimap (insert k v kv) (insert v k vk)
delete k m#(Bimap kv vk)
| Just v <- lookup k kv
= Bimap (delete k kv) (delete v vk)
| otherwise
= m
lookup k (Bimap kv _) = lookup k kv
We can also define some wrappers to make it easier to write our desired map types.
newtype MapMap k v = MapMap (Bimap (Map k v) (Map v k)) deriving (Show, Mappy k v)
newtype IMM v = IMM (Bimap (IntMap v) (Map v Int)) deriving (Show, Mappy Int v)
Written in Haskell, here is the data type that proves that one list is a permutation of another:
data Belongs (x :: k) (ys :: [k]) (zs :: [k]) where
BelongsHere :: Belongs x xs (x ': xs)
BelongsThere :: Belongs x xs xys -> Belongs x (y ': xs) (y ': xys)
data Permutation (xs :: [k]) (ys :: [k]) where
PermutationEmpty :: Permutation '[] '[]
PermutationCons :: Belongs x ys xys -> Permutation xs ys -> Permutation (x ': xs) xys
With a Permutation, we can now permute a record:
data Rec :: (u -> *) -> [u] -> * where
RNil :: Rec f '[]
(:&) :: !(f r) -> !(Rec f rs) -> Rec f (r ': rs)
insertRecord :: Belongs x ys zs -> f x -> Rec f ys -> Rec f zs
insertRecord BelongsHere v rs = v :& rs
insertRecord (BelongsThere b) v (r :& rs) = r :& insertRecord b v rs
permute :: Permutation xs ys -> Rec f xs -> Rec f ys
permute PermutationEmpty RNil = RNil
permute (PermutationCons b pnext) (r :& rs) = insertRecord b r (permute pnext rs)
This works fine. However, permute is O(n^2) where n is the length of the record. I'm wondering if there is a way to get it to be any faster by using a different data type to represent a permutation.
For comparison, in a mutable and untyped setting (which I know is a very different setting indeed), we could apply a permutation to a heterogeneous record like this in O(n) time. You represent the record as an array of values and the permutation as an array of new positions (no duplicates are allowed and all digits must be between 0 and n). Applying the permutation is just iterating that array and indexing into the record's array with those positions.
I don't expect that an O(n) permutation is possible in a more rigorously typed settings. But it seems like O(n*log(n)) might be possible. I appreciate any feedback, and let me know if I need to clarify anything. Also, answers to this can use Haskell, Agda, or Idris depending on what it feels easier to communicate with.
A faster simple solution is to compare the sorted permutation of the permutations.
Given permutation A and B.
Then there exist the sorted permutations,
As = sort(A)
Bs = sort(B)
As is a permutation of A and Bs is a permutation of B.
If As == Bs then A is a permutation of B.
Thus the order of this algorithm is O(n log(n)) < O(n²)
And this is leading to the optimal solution.
Using a different storage of permutation yields O(n)
Using the statements from above, we are changing the storage format of each permutation into
the sorted data
the original unsorted data
To determine if a list is a permutation of another one, simple a comparison of the sorted data is necessary -> O(n).
This answers the question correctly, but the effort is hidden in creating the doubled data storage ^^ So it will depend on the use if this is a real advantage or not.
The following (unoptimal) code generates all the subsets of size N for certain subset.
This code works but, as I said, is highly unoptimal. Using an intermediate list to avoid the O(log(n)) of Set.insert doesn't seem help due to the large cost of later reconverting the list to a Set
Can anybody suggest how to optimize the code?
import qualified Data.Set as Set
subsetsOfSizeN :: Ord a => Int -> Set.Set a -> Set.Set (Set.Set a)
subsetsOfSizeN n s
| Set.size s < n || n < 0 = error "subsetOfSizeN: wrong parameters"
| otherwise = doSubsetsOfSizeN n s
where doSubsetsOfSizeN n s
| n == 0 = Set.singleton Set.empty
| Set.size s == n = Set.singleton s
| otherwise =
case Set.minView s of
Nothing -> Set.empty
Just (firstS, restS) ->
let partialN n = doSubsetsOfSizeN n restS in
Set.map (Set.insert firstS) (partialN (n-1)) `Set.union` partialN n
This is inspired by Pascal's triangle.
choose :: [b] -> Int -> [[b]]
_ `choose` 0 = [[]]
[] `choose` _ = []
(x:xs) `choose` k = (x:) `fmap` (xs `choose` (k-1)) ++ xs `choose` k
This code works but, as I said, is highly unoptimal.
Doesn't seem so terribly bad to me. The number of subsets of size k of a set of size n is n `choose` k which grows rather fast for k ~ n/2. So creating all the subsets must scale badly.
Using an intermediate list to avoid the O(log(n)) of Set.insert doesn't seem help due to the large cost of later reconverting the list to a Set.
Hmm, I found using lists to give better performance. Not asymptotically, I think, but a not negligible more-or-less constant factor.
But first, there is an inefficiency in your code that is simple to repair:
Set.map (Set.insert firstS) (partialN (n-1))
Note that Set.map must rebuild a tree from scratch. But we know that firstS is always smaller than any element in any of the sets in partialN (n-1), so we can use Set.mapMonotonic that can reuse the spine of the set.
And that principle is also what makes lists attractive, the subsets are generated in lexicographic order, so instead of Set.fromList we can use the more efficient Set.fromDistinctAscList. Transcribing the algorithm yields
onlyLists :: Ord a => Int -> Set.Set a -> Set.Set (Set.Set a)
onlyLists n s
| n == 0 = Set.singleton Set.empty
| Set.size s < n || n < 0 = error "onlyLists: out of range n"
| Set.size s == n = Set.singleton s
| otherwise = Set.fromDistinctAscList . map Set.fromDistinctAscList $
go n (Set.size s) (Set.toList s)
where
go 1 _ xs = map return xs
go k l (x:xs)
| k == l = [x:xs]
| otherwise = map (x:) (go (k-1) (l-1) xs) ++ go k (l-1) xs
which in the few benchmarks I've run is between 1.5 and 2× faster than the amended algorithm using Sets.
And that is in turn, in my criterion benchmarks, nearly twice as fast as dave4420's.
subsets :: Int -> [a] -> [[a]]
subsets 0 _ = [[]]
subsets _ [] = []
subsets k (x:xs) = map (x:) (subsets (k - 1) xs) ++ subsets k xs
First, use a better algorithm.
Look at your final line:
Set.map (Set.insert firstS) (partialN (n-1)) `Set.union` partialN n
Evaluating doSubsetsOfSizeN k (Set.fromList $ 1:2:xs) will involve evaluating doSubsetsOfSizeN (k-1) (Set.fromList xs) twice (once when inserting 1, and once when inserting 2). This duplication is wasteful.
Enter a better algorithm.
mine :: Ord a => Int -> Set.Set a -> Set.Set (Set.Set a)
mine n s | Set.size s < n || n < 0 = Set.empty
| otherwise = Set.foldr cons nil s !! n
where
nil :: Ord a => [Set.Set (Set.Set a)]
nil = Set.singleton Set.empty : repeat Set.empty
cons :: Ord a => a -> [Set.Set (Set.Set a)] -> [Set.Set (Set.Set a)]
cons x sets = zipWith Set.union sets
(Set.empty : map (Set.map $ Set.insert x) sets)
mine 9 (Data.Set.fromList [0..18]) `seq` () is faster than subsetsOfSizeN 9 (Data.Set.fromList [0..18]) `seq` () and should have better asymptotic performance.
I haven't tried optimising this any further. There may be a better algorithm still.
(If the cost of insert and fromList are issues, you should consider giving back a list of lists instead of a set of sets.)
I found this, may be it can help you
f [] = [[1]]
f l = (:) [u] l'
where
u = succ (head (head l))
l' = (++) l (map(\x->(:) u x) l)
fix f n = if (n==0) then [] else f (fix f (n-1))
To test it
$ length $ (fix f 10) => 1023 -- The empty set is always include then == 1024
I am wondering how to implement the following efficiently:
A hash map that maps a subset of a set to another set. For example, for the set {1, 2}, then the insertions to the map are {1} -> {100}, {2} -> {100}, {1, 2}-> {100} (the empty set is not considered)
While the insertion complexity is almost 2^|X|, which is the number of subset of powerset of X to insert ( |X| stands for the cardinality of the set X).
How to use Haskell map and set to efficiently implement this? Or construct new data structures and schemes based on Haskell existing map and set?
Be more specific
Insertions: get subset of a set, insert the mapping from each subset to the value set, if the key is already there, union the result.
Lookups: given a set A, first get all the subset of A, then for each subset, look it up in the SetMap, then union the results as the return value.
For example:
If first insert {1, 2} -> {100} in a empty map M,
Insertions:
then insertions are: ({1}, {100}), ({2}, {100}), ({1,2}, {100}).
if further to insert {2,3}->{200}, then for each subset of {2, 3}, if there is already key in M, union {200} with that value. If not, just set the value as {200}. So, the map will now look like ({1}, {100}),({2}, {100,200}), ({3},{200}), ({1,2}, {100}), ({2,3}, {200})
Lookups:
lookup {2,3}, works almost like insertions: first get subsets of {2,3}, which are {2},{3}, {2,3}, for each of those subsets, look it up in the M, union all the results, which are {100, 200} union {200} union {200} = {100,200} (after two steps in 1.). (union empty set if the key not found)
Also the Aside problem: how to efficiently implement generating powerset!
Here is an observation: when looking up a set, it is sufficient to look up each of its elements and union those results together. I therefore propose that we keep two pieces of data: the insertions, and a mapping from elements to the sets that have those elements.
import Data.Map (Map)
import Data.Set (Set)
import qualified Data.Map as M
import qualified Data.Set as S
flatLookup :: Ord k => Map k (Set v) -> k -> [v]
flatLookup m k = maybe [] S.toList (M.lookup k m)
type SetMap k v = (Map (Set k) (Set v), Map k (Set (Set k)))
emptySM :: SetMap k v
emptySM = (M.empty, M.empty)
insertSM :: Ord k => Set k -> Set v -> SetMap k v -> SetMap k v
insertSM keys values (setMap, elemMap) = (setMap', elemMap') where
setMap' = M.insert keys values setMap
newElems = M.fromList [(k, S.singleton keys) | k <- S.toList keys]
elemMap' = M.unionWith S.union elemMap newElems
lookupSM :: (Ord k, Ord v) => Set k -> SetMap k v -> Set v
lookupSM keys (setMap, elemMap) = S.fromList $
S.toList keys >>= flatLookup elemMap >>= flatLookup setMap
deleteSM :: Ord k => Set k -> SetMap k v -> SetMap k v
deleteSM keys (setMap, elemMap) = (setMap', elemMap') where
setMap' = M.delete keys setMap
delElems = M.fromList [(k, S.singleton keys) | k <- S.toList keys]
elemMap' = M.unionWith S.difference elemMap delElems
In ghci:
*Main> let x = insertSM (S.fromList [2,3]) (S.fromList [200]) . insertSM (S.fromList [1,2]) (S.fromList [100]) $ emptySM
*Main> lookupSM (S.fromList [2,3]) x
fromList [100,200]
So at least your test-case passes. =)
Insertion is pretty cheap. Suppose you've already inserted n keysets of average size m and the new set has size k. The expensive part is doing k insertions into the second map. Finding the right place to insert does O(log n) comparisons of sets of size m, so the top-level insertion takes O(km*log n) time.
Lookup is not terrible, either. Suppose you've already inserted n keysets of average size m. The expensive part is looking up the keysets that your keys participate in; in the worst case, we must look up all n keysets. This takes O(nm*log n) time total.
If you don't care about removing elements, this can be simplified significantly (and the complexity improved, as well):
type SetMap' k v = Map k (Set v)
emptySM' :: SetMap' k v
emptySM' = M.empty
insertSM' :: (Ord k, Ord v) => Set k -> Set v -> SetMap' k v -> SetMap' k v
insertSM' keys values setMap = M.unionWith S.union setMap newMap where
newMap = M.fromList [(k, values) | k <- S.toList keys]
lookupSM' :: (Ord k, Ord v) => Set k -> SetMap' k v -> Set v
lookupSM' keys setMap = S.fromList $ S.toList keys >>= flatLookup setMap
A shameless rip-off of #Daniel Wagner's solution with the counting set implementation I suggested:
import Data.Map (Map)
import Data.Set (Set)
import qualified Data.Map as M
import qualified Data.Set as S
import Control.Arrow ( (&&&) )
type SetMap k v = (Map k (Map v Int), Map (Set k) (Set v))
emptySM :: SetMap k v
emptySM = (M.empty, M.empty)
insertSM :: (Ord k, Ord v) => Set k -> Set v -> SetMap k v -> SetMap k v
insertSM keys values setMap = (keyMap', keySetMap') where
(keyMap, keySetMap) = deleteSM keys setMap
keyMap' = M.unionWith (M.unionWith (+)) keyMap newKeyMap where
newKeyMap = fromSet keys . fromSet values $ 1
keySetMap' = M.insert keys values keySetMap
lookupSM :: (Ord k, Ord v) => Set k -> SetMap k v -> Set v
lookupSM keys (keyMap, _) = S.fromList $ S.toList keys >>= flatLookup keyMap where
flatLookup m k = maybe [] M.keys . M.lookup k $ m
deleteSM :: (Ord k, Ord v) => Set k -> SetMap k v -> SetMap k v
deleteSM keys setMap#(keyMap, keySetMap) = maybe setMap setMap' valuesIndex where
setMap' = keyMap' &&& keySetMap'
valuesIndex = M.lookupIndex keys keySetMap
keyMap' i = differenceUntil (differenceUntil (-) (==0)) M.null keyMap oldKeyMap where
values = snd . M.elemAt i $ keySetMap
oldKeyMap = fromSet keys . fromSet values $ 1
keySetMap' i = M.deleteAt i keySetMap
differenceUntil :: Ord k => (a -> b -> a) -> (a -> Bool) -> Map k a -> Map k b -> Map k a
differenceUntil f p = M.differenceWith $
\a b -> let a' = f a b in if p a' then Nothing else Just a'
fromSet :: Ord k => Set k -> v -> Map k v
fromSet s v = M.fromDistinctAscList [(k, v) | k <- S.toAscList s]
Ok, so you are saying that when you insert a set into this map, all the subsets of the set get mapped as well.
You have analyzed that the insertion complexity, if you use a Data.Map, is O(2^n) where n is the size of the set of the set to insert. That's about right, it's actually O(n 2^n) because insertion is O(log n).
Therefore, if you want to avoid exponential time, you can't take this route. Inserting all the subsets when you insert a set will be exponential time. So we have to figure something else out. What sorts of operations do you need on the map? I will assume that you just need lookup: i.e. find the value assigned to a given set. The choice of solution is always dependent on both you insertion and your lookup interface.
Other things to consider: is it okay if insertion is more expensive than lookup? Or the other way around? What are the usage patterns?
So we have two operations (I will call the type we are designing SetMap). Also note that all this code will be pseudocode: there will probably be some class constraints on the variables, and I'm assuming the existence of functions whose semantics correspond to their names.
insert :: Set a -> b -> SetMap a b -> SetMap a b
lookup :: SetMap a b -> Set a -> Maybe b
We could implement SetMap a b = [(Set a, b)], and then insertion would be O(1) and lookup would be O(n m) (where n is the number of items inserted and m is the size of the key being looked up). Not great, obviously.
What if you organized the sets in the map in a tree by unions.
data SetMap a b
= Branch (Set a) [SetMap a b]
| Leaf b
So, for example, the root of the tree would be the union of every key in the map. Then to lookup a key in the map you see if the key is a subset of the root; if so, then you try to lookup the key in each of its children, otherwise you fail:
lookup :: SetMap a b -> Set a -> Maybe b
lookup (Branch s children) x
| x `subset` s = msum $ map (`lookup` x) children
| otherwise = Nothing
lookup (Leaf b) x = Just b
I am not quite sure how to analyze the complexity of this data structure, it kind of depends on the shape of your sets.
I'm sure there are other options, that's the first thing that came to mind. I'd be happy to help more on this problem (perhaps on a new SO question), but could you describe what you are doing in more detail? It's hard to tell the constraints of your problem from your limited description. Don't be shy, describe your goal in practical terms; a lot of times I reduce my problems to the need for a certain data structure, when the best solution comes from taking a step back and re-expressing the problem so I don't need that data structure.