Is there any "reversible hashtable" like data structure?

Is there any "reversible hashtable" like data structure? - data-structures

Imagine an array that associate numbers and words. Exactly one number for one word and vice versa.
dic = [0: 'food',
1: 'dinner',
2.5: 'breakfast',
...]
Now I want to access dic[0] and get food and something['food'] and get 0. Is there any kind of reversible hashtable in the wild ? As far as I know only doing a duplicate can solve this problem.

Yes, this is generally done by combining two maps. Modifications are a bit tricky, because they have to maintain the one-to-one rule.
Let's start with a class for finite maps, and a couple sample instances for containers types:
{-# LANGUAGE
FunctionalDependencies
, FlexibleInstances
, UndecidableInstances
, GeneralizedNewtypeDeriving #-}
module Mappy where
import Prelude hiding (lookup)
-- Example base maps
import qualified Data.Map.Strict as M
import Data.Map (Map)
import qualified Data.IntMap.Strict as IM
import Data.IntMap (IntMap)
class Mappy k v m | m -> k v where
empty :: m
insert :: k -> v -> m -> m
delete :: k -> m -> m
lookup :: k -> m -> Maybe v
instance Ord k => Mappy k a (Map k a) where
empty = M.empty
insert = M.insert
delete = M.delete
lookup = M.lookup
instance Mappy Int a (IntMap a) where
empty = IM.empty
insert = IM.insert
delete = IM.delete
lookup = IM.lookup
Now we can build our type for bidirectional maps:
data Bimap m n = Bimap !m !n
instance Show m => Show (Bimap m n) where
showsPrec p (Bimap m _) = showParen (p > 10) $
showString "Bimap " . showsPrec 11 m
invert :: Bimap m n -> Bimap n m
invert (Bimap m n) = Bimap n m
instance (Mappy k v kv, Mappy v k vk) => Mappy k v (Bimap kv vk) where
empty = Bimap empty empty
insert k v (Bimap kv vk)
| Just k' <- lookup v vk
= Bimap (insert k v $ delete k' kv) (insert v k vk)
| otherwise
= Bimap (insert k v kv) (insert v k vk)
delete k m#(Bimap kv vk)
| Just v <- lookup k kv
= Bimap (delete k kv) (delete v vk)
| otherwise
= m
lookup k (Bimap kv _) = lookup k kv
We can also define some wrappers to make it easier to write our desired map types.
newtype MapMap k v = MapMap (Bimap (Map k v) (Map v k)) deriving (Show, Mappy k v)
newtype IMM v = IMM (Bimap (IntMap v) (Map v Int)) deriving (Show, Mappy Int v)

Related

Floyd-Warshall Algorithm in Haskell

I'm studying Floyd-Warshall algorithm. Now having managed to implement it in Haskell, the way I implement it is similar to how it is implemented in imperative languages (that is to say, use list of lists to simulate 2D arrays), but this is really inefficient giving that accessing an element in a list is much more slower than in a array.
Is there a smarter way to do this in Haskell? I thought I could do this by concate some lists but keep failing.
My Code:
floydwarshall :: [[Weight]] -> [[Weight]]
floydwarshall lst = fwAlg 1 $ initMatrix 0 $ list2matrix lst
fwAlg :: Int -> [[Weight]] -> [[Weight]]
fwAlg k m | k < rows m = let n = rows m
m' = foldl (\m (i,j) -> updateDist i j k m) m [(i,j) | i <- [0..n-1], j <- [0..n-1]]
in fwAlg (k+1) m'
| otherwise = m
-- a special case where k is 0
initMatrix :: Int -> [[Weight]] -> [[Weight]]
initMatrix n m = if n == rows m then m else initMatrix (n+1) $ updateAtM 0.0 (n,n) m
updateDist :: Int -> Int -> Int -> [[Weight]] -> [[Weight]]
updateDist i j k m =
let w = min (weight i j m) (weight i k m + weight k j m)
in updateAtM w (i, j) m
weight :: Vertice -> Vertice -> [[Weight]] -> Weight
weight i j m = let Just w = elemAt (i, j) m in w

The algorithm has a regular access pattern so we can avoid a lot of
indexing and still write it with lists, with (I think) the same
asymptotic performance as the imperative version.
If you do want to use arrays for more speed, you might still want to do
something similar to this with bulk operations on rows and columns
rather than reading and writing individual cells.
-- Let's have a type for weights. We could use Maybe but the ordering
-- behaviour is wrong - when there's no weight it should be like
-- +infinity.
data Weight = Weight Int | None deriving (Eq, Ord, Show)
addWeights :: Weight -> Weight -> Weight
addWeights (Weight x) (Weight y) = Weight (x + y)
addWeights _ _ = None
-- the main function just steps the matrix a number of times equal to
-- the node count. Also pass along k at each step.
floydwarshall :: [[Weight]] -> [[Weight]]
floydwarshall m = snd (iterate step (0, m) !! length m)
-- step takes k and the matrix for k, returns k+1 and the matrix for
-- k+1.
step :: (Int, [[Weight]]) -> (Int, [[Weight]])
step (k, m) = (k + 1, zipWith (stepRow ktojs) istok m)
where
ktojs = m !! k -- current k to each j
istok = transpose m !! k -- each i to current k
-- Make shortest paths from one i to all j.
-- We need the shortest paths from the current k to all j
-- and the shortest path from this i to the current k
-- and the shortest paths from this i to all j
stepRow :: [Weight] -> Weight -> [Weight] -> [Weight]
stepRow ktojs itok itojs = zipWith stepOne itojs ktojs
where
stepOne itoj ktoj = itoj `min` (itok `addWeights` ktoj)
-- example from wikipedia for testing
test :: [[Weight]]
test = [[Weight 0, None, Weight (-2), None],
[Weight 4, Weight 0, Weight 3, None],
[None, None, Weight 0, Weight 2],
[None, Weight (-1), None, Weight 0]]

I don't know how to achieve peak performance, but I can give you some tips on making your code abstract so that you can work on performance tuning more easily.
First of all, it would be nice if, when you change around your data types, you don't have to rewrite everything. Right now, you've made everything concretely about lists of lists, so let's see if we can abstract that out. First, we have to figure out what your minimal matrix interface is. Glancing at your code, you appear to have initMatrix, list2matrix, rows, elemAt, and updateAtM. These are the functions that query or modify your matrix, and these are what you would need to implement to make a new version of this code for a different Matrix type.
One way to organize this interface is to make a class out of it. For instance:
class Matrix m where
list2matrix :: [[a]] -> m a
matrix2List :: m a -> [[a]]
rows :: m a -> Int
elemAt :: Int -> Int -> m a -> a
updateAtM :: a -> (Int, Int) -> m a -> m a
setDiag :: a -> m a -> m a
(I went ahead and added a matrix2List function for extracting your result and renamed/modified initMatrix into setDiag, which felt a little more general.)
We can then update your code to use this new class:
floydwarshall :: Matrix m => [[Weight]] -> m Weight
floydwarshall lst = fwAlg 1 $ initMatrix $ list2matrix lst
fwAlg :: Matrix m => Int -> m Weight -> m Weight
fwAlg k m | k < rows m = let n = rows m
m' = foldl (\m (i,j) -> updateDist i j k m) m [(i,j) | i <- [0..n-1], j <- [0..n-1]]
in fwAlg (k+1) m'
| otherwise = m
initMatrix :: Matrix m => m Weight -> m Weight
initMatrix = setDiag 0
updateDist :: Matrix m => Int -> Int -> Int -> m Weight -> m Weight
updateDist i j k m =
let w = min (elemAt i j m) (elemAt i k m + elemAt k j m)
in updateAtM w (i, j) m
dist :: Matrix m => Int -> Int -> Int -> m Weight -> Weight
dist i j 0 m = elemAt i j m
dist i j k m = min (dist i j (k-1) m) (dist i k (k-1) m + dist k j (k-1) m)
Now all we need to do is start defining some Matrix types and see how performance is!
Let's start with lists, since you've already done this work. We'll have to use a newtype wrapper to make GHC happy, but ignoring the wrapping and unwrapping, this is morally the same as the code you wrote:
newtype ListMatrix a = ListMatrix { getListMatrix :: [[a]] }
instance Matrix ListMatrix where
list2matrix = ListMatrix
matrix2List = getListMatrix
rows = length . getListMatrix
elemAt i j (ListMatrix m) = m !! i !! j
updateAtM a (i,j) (ListMatrix m) =
let (firstRows, row:laterRows) = splitAt i m
(firstCols, _:laterCols) = splitAt j row
in ListMatrix $ firstRows <> ((firstCols <> (a:laterCols)):laterRows)
setDiag x = go 0
where go n m = if n == rows m then m else go (n+1) $ updateAtM x (n,n) m
(Also, I filled in elemAt and updateAtM.) You should be able to run
matrix2List #ListMatrix $ floydwarshall myList
and get the same result (and performance) that you currently have.
Now, on to the experimentation! All that's necessary is for us to define new instances of Matrix and see what happens. Perhaps we should try pure functions:
data FunMatrix a = FunMatrix { size :: Int, getFunMatrix :: Int -> Int -> a }
instance Matrix FunMatrix where
list2matrix l = FunMatrix (length l) (\i j -> l !! i !! j)
matrix2List (FunMatrix s f) = (\i -> f i <$> [0..s-1]) <$> [0..s-1]
rows = size
elemAt i j m = getFunMatrix m i j
updateAtM a (i,j) (FunMatrix s f) = FunMatrix s (\i' j' -> if i==i' && j==j' then a else f i' j')
setDiag x (FunMatrix s f) = FunMatrix s (\i j -> if i==j then x else f i j)
How does that perform? One problem is that the starting lookup function is still just indexing into the list of lists, which is slow. One fix would be to convert to an array or vector first and then index. Because we've nicely abstracted everything, all that would need to change is the definition of list2matrix right here, and you'll probably get a nice performance boost!
On the topic of performance, there's one other note I can point out. The definition of dist does some serious "dynamic programming". This could work fine if you were writing and reading directly into an array, but in this recursive form, you may end up doing a lot of duplicate work. One fix is to memoize. My goto memoization package is MemoTrie, which makes it really easy to memoize things. In this case, you could change dist to:
dist :: Matrix m => m Weight -> Int -> Int -> Int -> Weight
dist m = go'
where
go' = memo3 go
go i j 0 = elemAt i j m
go i j k = min (go' i j (k-1)) (go' i k (k-1) + go' k j (k-1))
That might give you a bit of a boost!
You might consider taking #Chi's advice and use STUArray, but you'll run into a problem: the STUArray interface demands that array lookups are in a monad. It's still possible to use the abstraction method I show off above, but you'll have to change the types of the functions. And, because you change the types in the interface, you'll need to update your algorithm code to be monadic. It can be a bit of a pain, but it might be necessary to get optimal performance.

Which one of the following is better?

So I have two implementations of the function tabulate, which, given a function f :: Int -> a and a number n, should produce the list [f 0, f 1, ..., f (n-1)]. I'm trying to guess which one is better in terms of work and span.
tabulate1 :: (Int -> a) -> Int -> [a]
tabulate1 f n = tab (\x -> f (n - x)) n where
tab _ 0 = []
tab g n = let (x,xs) = (g n) ||| (tab g (n-1))
in (x:xs)
tabulate2 :: (Int -> a) -> Int -> [a]
tabulate2 f n = tab f 0 (n-1) where
tab f n m
| n > m = []
| n == m = [f n]
| otherwise = let i = (n + m) `div` 2
(l, r) = (tab f n i) ||| (tab f i+1 m)
in (l ++ r)
While the first one avoids the using of (++), which has linear work and span, the second one computes the two sublists in parallel but uses (++).
So... which one is better?

Time and space complexity in Haskell is often non-trivial as it is a lazy language. This means that while a function might be O(n!), its result might never be needed and therefore never evaluated. Or like in this case, if your function returns a list, and only the first 3 elements are needed by other functions, only those are evaluated.
Anyways, your functions is just a particular case of map, and as such it could be coded in a much more readable way:
tabulate f n = map f [0..n]
Map is implemented with a fold, and is probably the most optimised version you could get

permutations acting on lists of length n

In math, when I want to rearrange a list of length n, I'll act on the list with a permutation. For example:
(1 2) * (x, y, z) = (y, x, z)
(1 n 2) * (v[1], v[2], ..., v[n]) = (v[n], v[1], ..., v[2])
perm * (v[1], v[2], ..., v[n]) = ( v[perm(1)], v[perm(2)], ..., v[perm(n)] )
How would I do this in Haskell?

I would use the input permutation to build a map from old indices to new indices.
import Prelude hiding ((*))
import qualified Data.Map as M
infixr 5 * -- right-associative so we can compose permutations conveniently
(*) :: [Int] -> [a] -> [a]
perm * xs = zipWith (\i _ -> xs !! M.findWithDefault i i ixMap) [0..] xs
where ixMap = M.fromList (zip perm (drop 1 perm ++ take 1 perm))
You can see it in action in the ghci prompt (though as usual in programming it uses 0-based rather than 1-based indexing):
> [0,1] * "xyz"
"yxz"
> [0,4,1] * "abcde"
"eacdb"
This costs O(n^2 log m) where n is the length of xs and m is the length of perm. You can reduce this to O(n log(nm)) by switching from (!!) to M.lookup for the indexing into xs, too.

Fast obtention of all the subsets of size N in Haskell

The following (unoptimal) code generates all the subsets of size N for certain subset.
This code works but, as I said, is highly unoptimal. Using an intermediate list to avoid the O(log(n)) of Set.insert doesn't seem help due to the large cost of later reconverting the list to a Set
Can anybody suggest how to optimize the code?
import qualified Data.Set as Set
subsetsOfSizeN :: Ord a => Int -> Set.Set a -> Set.Set (Set.Set a)
subsetsOfSizeN n s
| Set.size s < n || n < 0 = error "subsetOfSizeN: wrong parameters"
| otherwise = doSubsetsOfSizeN n s
where doSubsetsOfSizeN n s
| n == 0 = Set.singleton Set.empty
| Set.size s == n = Set.singleton s
| otherwise =
case Set.minView s of
Nothing -> Set.empty
Just (firstS, restS) ->
let partialN n = doSubsetsOfSizeN n restS in
Set.map (Set.insert firstS) (partialN (n-1)) `Set.union` partialN n

This is inspired by Pascal's triangle.
choose :: [b] -> Int -> [[b]]
_ `choose` 0 = [[]]
[] `choose` _ = []
(x:xs) `choose` k = (x:) `fmap` (xs `choose` (k-1)) ++ xs `choose` k

This code works but, as I said, is highly unoptimal.
Doesn't seem so terribly bad to me. The number of subsets of size k of a set of size n is n `choose` k which grows rather fast for k ~ n/2. So creating all the subsets must scale badly.
Using an intermediate list to avoid the O(log(n)) of Set.insert doesn't seem help due to the large cost of later reconverting the list to a Set.
Hmm, I found using lists to give better performance. Not asymptotically, I think, but a not negligible more-or-less constant factor.
But first, there is an inefficiency in your code that is simple to repair:
Set.map (Set.insert firstS) (partialN (n-1))
Note that Set.map must rebuild a tree from scratch. But we know that firstS is always smaller than any element in any of the sets in partialN (n-1), so we can use Set.mapMonotonic that can reuse the spine of the set.
And that principle is also what makes lists attractive, the subsets are generated in lexicographic order, so instead of Set.fromList we can use the more efficient Set.fromDistinctAscList. Transcribing the algorithm yields
onlyLists :: Ord a => Int -> Set.Set a -> Set.Set (Set.Set a)
onlyLists n s
| n == 0 = Set.singleton Set.empty
| Set.size s < n || n < 0 = error "onlyLists: out of range n"
| Set.size s == n = Set.singleton s
| otherwise = Set.fromDistinctAscList . map Set.fromDistinctAscList $
go n (Set.size s) (Set.toList s)
where
go 1 _ xs = map return xs
go k l (x:xs)
| k == l = [x:xs]
| otherwise = map (x:) (go (k-1) (l-1) xs) ++ go k (l-1) xs
which in the few benchmarks I've run is between 1.5 and 2× faster than the amended algorithm using Sets.
And that is in turn, in my criterion benchmarks, nearly twice as fast as dave4420's.

subsets :: Int -> [a] -> [[a]]
subsets 0 _ = [[]]
subsets _ [] = []
subsets k (x:xs) = map (x:) (subsets (k - 1) xs) ++ subsets k xs

First, use a better algorithm.
Look at your final line:
Set.map (Set.insert firstS) (partialN (n-1)) `Set.union` partialN n
Evaluating doSubsetsOfSizeN k (Set.fromList $ 1:2:xs) will involve evaluating doSubsetsOfSizeN (k-1) (Set.fromList xs) twice (once when inserting 1, and once when inserting 2). This duplication is wasteful.
Enter a better algorithm.
mine :: Ord a => Int -> Set.Set a -> Set.Set (Set.Set a)
mine n s | Set.size s < n || n < 0 = Set.empty
| otherwise = Set.foldr cons nil s !! n
where
nil :: Ord a => [Set.Set (Set.Set a)]
nil = Set.singleton Set.empty : repeat Set.empty
cons :: Ord a => a -> [Set.Set (Set.Set a)] -> [Set.Set (Set.Set a)]
cons x sets = zipWith Set.union sets
(Set.empty : map (Set.map $ Set.insert x) sets)
mine 9 (Data.Set.fromList [0..18]) `seq` () is faster than subsetsOfSizeN 9 (Data.Set.fromList [0..18]) `seq` () and should have better asymptotic performance.
I haven't tried optimising this any further. There may be a better algorithm still.
(If the cost of insert and fromList are issues, you should consider giving back a list of lists instead of a set of sets.)

I found this, may be it can help you
f [] = [[1]]
f l = (:) [u] l'
where
u = succ (head (head l))
l' = (++) l (map(\x->(:) u x) l)
fix f n = if (n==0) then [] else f (fix f (n-1))
To test it
$ length $ (fix f 10) => 1023 -- The empty set is always include then == 1024

Some efficient way to implement insertion and lookup for this kind of map?

I am wondering how to implement the following efficiently:
A hash map that maps a subset of a set to another set. For example, for the set {1, 2}, then the insertions to the map are {1} -> {100}, {2} -> {100}, {1, 2}-> {100} (the empty set is not considered)
While the insertion complexity is almost 2^|X|, which is the number of subset of powerset of X to insert ( |X| stands for the cardinality of the set X).
How to use Haskell map and set to efficiently implement this? Or construct new data structures and schemes based on Haskell existing map and set?
Be more specific
Insertions: get subset of a set, insert the mapping from each subset to the value set, if the key is already there, union the result.
Lookups: given a set A, first get all the subset of A, then for each subset, look it up in the SetMap, then union the results as the return value.
For example:
If first insert {1, 2} -> {100} in a empty map M,
Insertions:
then insertions are: ({1}, {100}), ({2}, {100}), ({1,2}, {100}).
if further to insert {2,3}->{200}, then for each subset of {2, 3}, if there is already key in M, union {200} with that value. If not, just set the value as {200}. So, the map will now look like ({1}, {100}),({2}, {100,200}), ({3},{200}), ({1,2}, {100}), ({2,3}, {200})
Lookups:
lookup {2,3}, works almost like insertions: first get subsets of {2,3}, which are {2},{3}, {2,3}, for each of those subsets, look it up in the M, union all the results, which are {100, 200} union {200} union {200} = {100,200} (after two steps in 1.). (union empty set if the key not found)
Also the Aside problem: how to efficiently implement generating powerset!

Here is an observation: when looking up a set, it is sufficient to look up each of its elements and union those results together. I therefore propose that we keep two pieces of data: the insertions, and a mapping from elements to the sets that have those elements.
import Data.Map (Map)
import Data.Set (Set)
import qualified Data.Map as M
import qualified Data.Set as S
flatLookup :: Ord k => Map k (Set v) -> k -> [v]
flatLookup m k = maybe [] S.toList (M.lookup k m)
type SetMap k v = (Map (Set k) (Set v), Map k (Set (Set k)))
emptySM :: SetMap k v
emptySM = (M.empty, M.empty)
insertSM :: Ord k => Set k -> Set v -> SetMap k v -> SetMap k v
insertSM keys values (setMap, elemMap) = (setMap', elemMap') where
setMap' = M.insert keys values setMap
newElems = M.fromList [(k, S.singleton keys) | k <- S.toList keys]
elemMap' = M.unionWith S.union elemMap newElems
lookupSM :: (Ord k, Ord v) => Set k -> SetMap k v -> Set v
lookupSM keys (setMap, elemMap) = S.fromList $
S.toList keys >>= flatLookup elemMap >>= flatLookup setMap
deleteSM :: Ord k => Set k -> SetMap k v -> SetMap k v
deleteSM keys (setMap, elemMap) = (setMap', elemMap') where
setMap' = M.delete keys setMap
delElems = M.fromList [(k, S.singleton keys) | k <- S.toList keys]
elemMap' = M.unionWith S.difference elemMap delElems
In ghci:
*Main> let x = insertSM (S.fromList [2,3]) (S.fromList [200]) . insertSM (S.fromList [1,2]) (S.fromList [100]) $ emptySM
*Main> lookupSM (S.fromList [2,3]) x
fromList [100,200]
So at least your test-case passes. =)
Insertion is pretty cheap. Suppose you've already inserted n keysets of average size m and the new set has size k. The expensive part is doing k insertions into the second map. Finding the right place to insert does O(log n) comparisons of sets of size m, so the top-level insertion takes O(km*log n) time.
Lookup is not terrible, either. Suppose you've already inserted n keysets of average size m. The expensive part is looking up the keysets that your keys participate in; in the worst case, we must look up all n keysets. This takes O(nm*log n) time total.
If you don't care about removing elements, this can be simplified significantly (and the complexity improved, as well):
type SetMap' k v = Map k (Set v)
emptySM' :: SetMap' k v
emptySM' = M.empty
insertSM' :: (Ord k, Ord v) => Set k -> Set v -> SetMap' k v -> SetMap' k v
insertSM' keys values setMap = M.unionWith S.union setMap newMap where
newMap = M.fromList [(k, values) | k <- S.toList keys]
lookupSM' :: (Ord k, Ord v) => Set k -> SetMap' k v -> Set v
lookupSM' keys setMap = S.fromList $ S.toList keys >>= flatLookup setMap

A shameless rip-off of #Daniel Wagner's solution with the counting set implementation I suggested:
import Data.Map (Map)
import Data.Set (Set)
import qualified Data.Map as M
import qualified Data.Set as S
import Control.Arrow ( (&&&) )
type SetMap k v = (Map k (Map v Int), Map (Set k) (Set v))
emptySM :: SetMap k v
emptySM = (M.empty, M.empty)
insertSM :: (Ord k, Ord v) => Set k -> Set v -> SetMap k v -> SetMap k v
insertSM keys values setMap = (keyMap', keySetMap') where
(keyMap, keySetMap) = deleteSM keys setMap
keyMap' = M.unionWith (M.unionWith (+)) keyMap newKeyMap where
newKeyMap = fromSet keys . fromSet values $ 1
keySetMap' = M.insert keys values keySetMap
lookupSM :: (Ord k, Ord v) => Set k -> SetMap k v -> Set v
lookupSM keys (keyMap, _) = S.fromList $ S.toList keys >>= flatLookup keyMap where
flatLookup m k = maybe [] M.keys . M.lookup k $ m
deleteSM :: (Ord k, Ord v) => Set k -> SetMap k v -> SetMap k v
deleteSM keys setMap#(keyMap, keySetMap) = maybe setMap setMap' valuesIndex where
setMap' = keyMap' &&& keySetMap'
valuesIndex = M.lookupIndex keys keySetMap
keyMap' i = differenceUntil (differenceUntil (-) (==0)) M.null keyMap oldKeyMap where
values = snd . M.elemAt i $ keySetMap
oldKeyMap = fromSet keys . fromSet values $ 1
keySetMap' i = M.deleteAt i keySetMap
differenceUntil :: Ord k => (a -> b -> a) -> (a -> Bool) -> Map k a -> Map k b -> Map k a
differenceUntil f p = M.differenceWith $
\a b -> let a' = f a b in if p a' then Nothing else Just a'
fromSet :: Ord k => Set k -> v -> Map k v
fromSet s v = M.fromDistinctAscList [(k, v) | k <- S.toAscList s]

Ok, so you are saying that when you insert a set into this map, all the subsets of the set get mapped as well.
You have analyzed that the insertion complexity, if you use a Data.Map, is O(2^n) where n is the size of the set of the set to insert. That's about right, it's actually O(n 2^n) because insertion is O(log n).
Therefore, if you want to avoid exponential time, you can't take this route. Inserting all the subsets when you insert a set will be exponential time. So we have to figure something else out. What sorts of operations do you need on the map? I will assume that you just need lookup: i.e. find the value assigned to a given set. The choice of solution is always dependent on both you insertion and your lookup interface.
Other things to consider: is it okay if insertion is more expensive than lookup? Or the other way around? What are the usage patterns?
So we have two operations (I will call the type we are designing SetMap). Also note that all this code will be pseudocode: there will probably be some class constraints on the variables, and I'm assuming the existence of functions whose semantics correspond to their names.
insert :: Set a -> b -> SetMap a b -> SetMap a b
lookup :: SetMap a b -> Set a -> Maybe b
We could implement SetMap a b = [(Set a, b)], and then insertion would be O(1) and lookup would be O(n m) (where n is the number of items inserted and m is the size of the key being looked up). Not great, obviously.
What if you organized the sets in the map in a tree by unions.
data SetMap a b
= Branch (Set a) [SetMap a b]
| Leaf b
So, for example, the root of the tree would be the union of every key in the map. Then to lookup a key in the map you see if the key is a subset of the root; if so, then you try to lookup the key in each of its children, otherwise you fail:
lookup :: SetMap a b -> Set a -> Maybe b
lookup (Branch s children) x
| x `subset` s = msum $ map (`lookup` x) children
| otherwise = Nothing
lookup (Leaf b) x = Just b
I am not quite sure how to analyze the complexity of this data structure, it kind of depends on the shape of your sets.
I'm sure there are other options, that's the first thing that came to mind. I'd be happy to help more on this problem (perhaps on a new SO question), but could you describe what you are doing in more detail? It's hard to tell the constraints of your problem from your limited description. Don't be shy, describe your goal in practical terms; a lot of times I reduce my problems to the need for a certain data structure, when the best solution comes from taking a step back and re-expressing the problem so I don't need that data structure.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Is there any "reversible hashtable" like data structure? - data-structures

Related

Floyd-Warshall Algorithm in Haskell

Which one of the following is better?

permutations acting on lists of length n

Fast obtention of all the subsets of size N in Haskell

Some efficient way to implement insertion and lookup for this kind of map?

Categories

Resources