Problem description
Given vertices V which can be seen as named "propositions".
Given weights:
data W
= Requires -- ^ Denotes that a "proposition" depends on another.
| Invalidates -- ^ Denotes that a "proposition" invalidates another.
In a linear ordering, if A requires B, then B must come before A, conversely, if A invalidates B, then B must come after A.
Given a weighted directed multigraph (multidigraph) with at most 2 parallel edges... Where a vertex can only require the inclusion of another vertex once, and only invalidates another vertex once...
G = (V, E)
E = (V, V, W)
Or alternatively represented as a directed cyclic graph with no self-loops and where the only cycles form directly between one vertex and another. With weights changed to:
data W
= Requires -- ^ Denotes that a "proposition" depends on another.
| InvalidatedBy -- ^ Denotes that a "proposition" is invalidated by another.
Given that vertices may occur more than once in the ordering...
How can a linear ordering be constructed from such a graph?
Additionally, if the tail of the linear ordering ends with a vertex V which was included due to being InvalidatedBy another vertex, then it may be omitted if the head of the ordering starts with V.
Some desired properties are:
Minimality - there should be as little duplication of vertices as possible
Stability - the ordering should be as similar as possible to the order between vertices on the same "level" in which the graph was constructed
Run-time complexity - The number of vertices are not that high, but still... the run-time complexity should be as low as possible.
If various algorithms fulfill these to varying degrees, I'd love to see all of them with their trade offs.
Algorithms written in any language, or pseudocode, are welcome.
Example graphs:
Example graph 1:
B `requires` A
C `requires` A
D `requires` A
E `invalidates` A
F `invalidates` A
G `invalidates` A
With minimal linear ordering: [A, B, C, D, E, F, G]
Example graph 2:
C `requires` A
C `invalidates` A
B `requires` A
With minimal linear ordering: [A, B, C]
Example graph 3:
B `requires` A
B `invalidates` A
C `requires` A
C `invalidates` A
With minimal linear ordering: [A, B, A, C]
Naive implementation
A naive implementation constructs a linear ordering by starting with all nodes with no incoming edges and for all of those nodes:
fetches all outgoing edges
partitions those by requires/invalidates
constructs the linear ordering of "requires" and puts that first
adds the current node
constructs the linear ordering of "invalidates" and adds that.
Here's a Haskell implementation of this description:
import Data.List (partition)
import Data.Maybe (fromJust)
import Control.Arrow ((***))
import Data.Graph.Inductive.Graph
fboth :: Functor f => (a -> b) -> (f a, f a) -> (f b, f b)
fboth f = fmap f *** fmap f
outs :: Graph gr => gr a b -> Node -> (Adj b, a)
outs gr n = let (_, _, l, o) = fromJust $ fst $ match n gr in (o, l)
starts :: Graph gr => gr a b -> [(Adj b, a)]
starts gr = filter (not . null . fst) $ outs gr <$> nodes gr
partW :: Adj W -> (Adj W, Adj W)
partW = partition ((Requires ==) . fst)
linearize :: Graph gr => gr a W -> [a]
linearize gr = concat $ linearize' gr <$> starts gr
linearize' :: Graph gr => gr a W -> (Adj W, a) -> [a]
linearize' gr (o, a) = concat req ++ [a] ++ concat inv
where (req, inv) = fboth (linearize' gr . outs gr . snd) $ partW o
The ordering can then be optimized by removing equal consecutive like so:
-- | Remove consecutive elements which are equal to a previous element.
-- Runtime complexity: O(n), space: O(1)
removeConsequtiveEq :: Eq a => [a] -> [a]
removeConsequtiveEq = \case
[] -> []
[x] -> [x]
(h:t) -> h : ug h t
where
ug e = \case
[] -> []
(x:xs) | e == x -> ug x xs
(x:xs) | otherwise -> x : ug x xs
Edit: Using DCG, SCC, and topsort
With the algorithm described by #Cirdec :
Given a directed cyclic graph (DCG) where edges of form: (f, t) denote that f must come before t in the ordering.
Compute the condensation of the DCG in 1.
Turn each SSC in the condensation in 2. into a palindrome.
Compute the topsort of the graph in 3.
Concatenate the computed ordering.
In Haskell:
{-# LANGUAGE LambdaCase #-}
import Data.List (nub)
import Data.Maybe (fromJust)
import Data.Graph.Inductive.Graph
import Data.Graph.Inductive.PatriciaTree
import Data.Graph.Inductive.NodeMap
import Data.Graph.Inductive.Query.DFS
data MkEdge = MkEdge Bool Int Int
req = MkEdge True
inv = MkEdge False
toGraph :: [MkEdge] -> [(Int, Int, Bool)] -> Gr Int Bool
toGraph edges es = run_ empty nm
where ns = nub $ edges >>= \(MkEdge _ f t) -> [f, t]
nm = insMapNodesM ns >> insMapEdgesM es
-- | Make graph into a directed cyclic graph (DCG).
-- "Requires" denotes a forward edge.
-- "Invalidates" denotes a backward edge.
toDCG :: [MkEdge] -> Gr Int Bool
toDCG edges = toGraph edges $
(\(MkEdge w f t) -> if w then (t, f, w) else (f, t, w)) <$> edges
-- | Make a palindrome of the given list by computing: [1 .. n] ++ [n - 1 .. 1].
-- Runtime complexity: O(n).
palindrome :: [a] -> [a]
palindrome = \case
[] -> []
xs -> xs ++ tail (reverse xs)
linearize :: Gr Int a -> [Int]
linearize dcg = concat $ topsort' scc2
where scc = nmap (fmap (fromJust . lab dcg)) $ condensation dcg
scc2 = nmap palindrome scc
For the graph g2:
g2 = [ 2 `req` 1
, 2 `inv` 1
, 3 `req` 1
, 3 `inv` 1
, 4 `req` 1
, 5 `inv` 1
]
> prettyPrint $ toDCG g2
1:2->[(False,2)]
2:1->[(True,1),(True,3),(True,4)]
3:3->[(False,2)]
4:4->[]
5:5->[(False,2)]
> prettyPrint $ condensation $ toDCG g2
1:[5]->[((),2)]
2:[1,2,3]->[((),3)]
3:[4]->[]
> linearize $ toDCG g2
[5,2,1,3,1,2,4]
This ordering is neither minimal nor valid since the ordering violates the dependencies. 5 invalidates 1, which 2 depends on. 2 invalidates 1 which 4 depends on.
A valid and minimal ordering is: [1,4,2,1,3,5]. By shifting the list to the right, we get [5,1,4,2,1,3] which is also a valid ordering.
If the direction of the graph is flipped, the ordering becomes: [4,2,1,3,1,2,5]. This is not a valid ordering either... At the boundaries, 5 can happen, and then 4, but 5 invalidates 1 which 4 depends on.
I believe the following algorithm will find a minimal string of vertices in linear time:
Decompose the graph into its strongly connected components. Existing algorithms do this in linear time.
In each strongly connected component each node needs to be listed both before and after every other node. List the nodes [1..n] of each strongly connected component in the following order [1..n] ++ [n-1..1]
Concatenate the strongly connected components together in order by a topological sort. Existing algorithms topologically sort directed acylic graphs like this in linear time.
consider a function, which rates the level of 'visual similarity' between two numbers: 666666 and 666166 would be very similar, unlike 666666 and 111111
type N = Int
type Rate = Int
similar :: N -> N -> Rate
similar a b = length . filter id . zipWith (==) a' $ b'
where a' = show a
b' = show b
similar 666666 666166
--> 5
-- high rate : very similar
similar 666666 111111
--> 0
-- low rate : not similar
There will be more sophisticated implementations for this, however this serves the purpose.
The intention is to find a function that sorts a given list of N's, so that each item is the most similar one to it's preceding item. Since the first item does not have a predecessor, there must be a given first N.
similarSort :: N -> [N] -> [N]
Let's look at some sample data: They don't need to have the same arity but it makes it easier to reason about it.
sample :: [N]
sample = [2234, 8881, 1222, 8888, 8822, 2221, 5428]
one could be tempted to implement the function like so:
similarSortWrong x xs = reverse . sortWith (similar x) $ xs
but this would lead to a wrong result:
similarSortWrong 2222 sample
--> [2221,1222,8822,2234,5428,8888,8881]
In the beginning it looks correct, but it's obvious that 8822 should rather be followed by 8881, since it's more similar that 2234.
So here's the implementation I came up with:
similarSort _ [] = []
similarSort x xs = x : similarSort a as
where (a:as) = reverse . sortWith (similar x) $ xs
similarSort 2222 sample
--> [2222,2221,2234,1222,8822,8888,8881]
It seems to work. but it also seems to do lot more more work than necessary. Every step the whole rest is sorted again, just to pick up the first element. Usually lazyness should allow this, but reverse might break this again. I'd be keen to hear, if someone know if there's a common abstraction for this problem.
It's relatively straightforward to implement the greedy algorithm you ask for. Let's start with some boilerplate; we'll use the these package for a zip-like that hands us the "unused" tail ends of zipped-together lists:
import Data.Align
import Data.These
sampleStart = "2222"
sampleNeighbors = ["2234", "8881", "1222", "8888", "8822", "2221", "5428"]
Instead of using numbers, I'll use lists of digits -- just so we don't have to litter the code with conversions between the form that's convenient for the user and the form that's convenient for the algorithm. You've been a bit fuzzy about how to rate the similarity of two digit strings, so let's make it as concrete as possible: any digits that differ cost 1, and if the digit strings vary in length we have to pay 1 for each extension to the right. Thus:
distance :: Eq a => [a] -> [a] -> Int
distance l r = sum $ alignWith elemDistance l r where
elemDistance (These l r) | l == r = 0
elemDistance _ = 1
A handy helper function will pick the smallest element of some list (by a user-specified measure) and return the rest of the list in some implementation-defined order.
minRestOn :: Ord b => (a -> b) -> [a] -> Maybe (a, [a])
minRestOn f [] = Nothing
minRestOn f (x:xs) = Just (go x [] xs) where
go min rest [] = (min, rest)
go min rest (x:xs) = if f x < f min
then go x (min:rest) xs
else go min (x:rest) xs
Now the greedy algorithm almost writes itself:
greedy :: Eq a => [a] -> [[a]] -> [[a]]
greedy here neighbors = here : case minRestOn (distance here) neighbors of
Nothing -> []
Just (min, rest) -> greedy min rest
We can try it out on your sample:
> greedy sampleStart sampleNeighbors
["2222","1222","2221","2234","5428","8888","8881","8822"]
Just eyeballing it, that seems to do okay. However, as with many greedy algorithms, this one only minimizes the local cost of each edge in the path. If you want to minimize the total cost of the path found, you need to use another algorithm. For example, we can pull in the astar package. For simplicity, I'm going to do everything in a very inefficient way, but it's not too hard to do it "right". We'll need a fair chunk more imports:
import Data.Graph.AStar
import Data.Hashable
import Data.List
import Data.Maybe
import qualified Data.HashSet as HS
Unlike before, where we only wanted the nearest neighbor, we'll now want all the neighbors. (Actually, we could probably implement the previous use of minRestOn using the following function and minimumOn or something. Give it a try if you're interested!)
neighbors :: (a, [a]) -> [(a, [a])]
neighbors (_, xs) = go [] xs where
go ls [] = []
go ls (r:rs) = (r, ls ++ rs) : go (r:ls) rs
We can now call the aStar search method with appropriate parameters. We'll use ([a], [[a]]) -- representing the current list of digits and the remaining lists that we can choose from -- as our node type. The arguments to aStar are then, in order: the function for finding neighboring nodes, the function for computing distance between neighboring nodes, the heuristic for how far we have left to go (we'll just say 1 for each unique element in the list), whether we've reached a goal node, and the initial node to start the search from. We'll call fromJust, but it should be okay: all nodes have at least one path to a goal node, just by choosing the remaining lists of digits in order.
optimal :: (Eq a, Ord a, Hashable a) => [a] -> [[a]] -> [[a]]
optimal here elsewhere = (here:) . map fst . fromJust $ aStar
(HS.fromList . neighbors)
(\(x, _) (y, _) -> distance x y)
(\(x, xs) -> HS.size (HS.fromList (x:xs)) - 1)
(\(_, xs) -> null xs)
(here, elsewhere)
Let's see it run in ghci:
> optimal sampleStart sampleNeighbors
["2222","1222","8822","8881","8888","5428","2221","2234"]
We can see that it's done better this time by adding a pathLength function that computes all the distances between neighbors in a result.
pathLength :: Eq a => [[a]] -> Int
pathLength xs = sum [distance x y | x:y:_ <- tails xs]
In ghci:
> pathLength (greedy sampleStart sampleNeighbors)
15
> pathLength (optimal sampleStart sampleNeighbors)
14
In this particular example, I think the greedy algorithm could have found the optimal path if it had made the "right" choices whenever there were ties for minimal next step; but I expect it is not too hard to cook up an example where the greedy algorithm is forced into bad early choices.
I have a combinatorial problem that can be solved inefficiently using the cartesian
product of multiple sets. Concretely, I have multiple items and multiple elements that
satisfy each item. The problem consists of finding all possible combinations of elements
that satisfy all items. For example:
items -> elements
------------------------
1 -> {a,b} // a and b cover the item 1
2 -> {a,b} // a and b cover the item 2
3 -> {a,b,c} // a, b and c cover the item 3
4 -> {a,b,e,f} // a, b, e, f cover the item 4
Alternative representation:
element -> items covered
------------------------
a -> {1,2,3,4}
b -> {1,2,3,4}
c -> {3}
e -> {4}
f -> {4}
The goal is to find all combinations that cover items 1,2,3,4.
Valid solutions are:
{a},{a,b},{a,c},{a,e},{a,f},{a,b,c},{a,b,e},{a,b,f},{a,b,c,e},{a,b,c,f}
{b},{b,c},{b,e},{b,f},{b,c,e},{b,c,f}
Note that the order is not important, so {a,b} = {b,a} ({a,b} x {c,d} = {c,d} x {a,b}).
Also, note that {a,a,a,a}, {a,a,a,b}... are redundant combinations.
As you can see, this problem is similar to the set cover problem, where the universe
of elements for this example are the items U={1,2,3,4} and the set of subsets from U is S={ab={1,2,3,4},c={3},ef{4}}, where set {1,2,3,4} is the set of items covered by the element a and b, {3} is the set of elements covered by c, and {4} is the set of elements covered by elements e and f. However, the goal here is not finding the
minimal combination of sets from S that covers all elements from U, but finding all combinations of elements {a,b,c,e,f} that cover all items {1,2,3,4}.
A näive implementation could be done by performing a cartesian product between
sets for 1,2,3 and 4, and then filtering the combinations that are redundant. However,
this approach is very inefficient. Suppose I have this situation:
1 -> {a,b,c,d,e,f,g,h}
2 -> {a,b,c,d,e,f,g,h}
3 -> {a,b,c,d,e,f,g,h}
4 -> {a,b,c,d,e,f,g,h}
5 -> {a,b,c,d,e,f,g,h}
6 -> {a,b,c,d,e,f,g,h,i}
A cartesian product between the six sets will result in a 8^5*9=294912 combinations,
when there are actually many fewer combinations, which are: {a,b,c,d,e,f,g} U {a,b,c,d,e,f,g} x {i}.
Another way to solve this problem is to enumerate all elements, skipping
the combinations that are equivalent to other previously generated, and also
skipping repeated elements. This is kinda easy to compute and can be implemented
as an iterator that returns a combination at a time, but I don't know if there is
a better way to solve this problem, or if this problem was studied before.
How would you solve this problem?
First, realize that if a set of elements does not satisfy all items, neither does any of its subsets.
Second, realize that if a set satisfies all items, so do all its supersets.
Now, all you have to do is:
Let S be the set of all elements.
Let R be the empty set.
Define a function find( s, r ) which does:
If r includes s, return r.
If s does not satisfy all items, return r.
Otherwise add s to r.
For every item I in s,
let s' be s-I
let s be f(s', r)
return s.
Just call find(S,R) and you have your answer.
This method performs some duplicate tests, but always kills a branch whenever it is identified as such. This leads to a lot of pruning on a large set of elements.
Both lookup of whether r includes a particular set of elements and the check if s satisfies all items can be made very fast at the expense of extra memory.
What if you did this:
1 -> {a,b}
2 -> {b,c}
3 -> {a,b,c}
4 -> {a,e,f}
=>
a -> [1,3,4]
b -> [1,2,3]
c -> [2,3]
e -> [4]
f -> [4]
Then enumerate the combinations of the left side that provide (at least) [1,2,3,4]
For each item in the set of all-satisfying sets, enumerate combinations
with other items.
All-Satisfying-Sets: {{a,b},{b,e},{b,f}}
Combinations within All-Satisfiying-Sets: {{a,b,e},{a,b,f},{b,e,f},{a,b,e,f}}
Others: {c}
Combinations with Others: {{a,b,c},{b,e,c},{b,f,c}
,{a,b,e,c},{a,b,f,c},{b,e,f,c},{a,b,e,f,c}}
Or you could do this in Haskell:
import Data.List (union, subsequences, sort)
example1 = [(["a"],[1,2,3,4])
,(["b"],[1,2,3,4])
,(["c"],[3])
,(["e"],[4])
,(["f"],[4])]
example2 = [(["a"],[1,2,3,4,5,6])
,(["b"],[1,2,3,4,5,6])
,(["c"],[1,2,3,4,5,6])
,(["e"],[1,2,3,4,5,6])
,(["f"],[1,2,3,4,5,6])
,(["g"],[1,2,3,4,5,6])
,(["h"],[1,2,3,4,5,6])
,(["i"],[6])]
combs items list =
let unify (a,b) (a',b') = (sort (a ++ a'), sort (union b b'))
in map fst
. filter ((==items) . snd)
. map (foldr unify ([],[]))
. subsequences
$ list
OUTPUT:
*Main> combs [1..4] example1
[["a"],["b"],["a","b"],["a","c"],["b","c"],["a","b","c"],["a","e"],["b","e"],
["a","b","e"],["a","c","e"],["b","c","e"],["a","b","c","e"],["a","f"],["b","f"],
["a","b","f"],["a","c","f"],["b","c","f"],["a","b","c","f"],["a","e","f"],
["b","e","f"],["a","b","e","f"],["a","c","e","f"],["b","c","e","f"],
["a","b","c","e","f"]]