Rather general question. I have a list like this:
A B
A C
C A
D E
F G
E F
C L
M N
and so on.
What I want to do - is to figure out all the relations and put everything that's related in a single line. The example above would become:
A B C L
D E F G
M N
so that every letter appears only once, and the letters that related to each other are on in one line (list, array, whatever).
Is this some kind of known problem with a well-defined algorithm? Does it have a name? Sounds like it should be. I'd assume some kind of a recursive solution should be in place.
One way to solve this is to use an undirected graph G=(V,E). Each pair in your input represents an edge in E, and the output you want is the connected components of G. There are some great Python graph modules such as NetworkX.
Demo
>>> data
[['A', 'B'], ['A', 'C'], ['C', 'A'], ['D', 'E'], ['F', 'G'], ['E', 'F'], ['C', 'L'], ['M', 'N']]
>>> import networkx as nx
>>> G = nx.Graph()
>>> G.add_edges_from( data )
>>> components = nx.connected_components( G )
>>> print "\n".join([ " ".join(sorted(cc)) for cc in components ])
A B C L
D E F G
M N
https://en.wikipedia.org/wiki/Connected_component_(graph_theory)
(but don't worry too much about their suggested algorithms, because you have a list of edges, whereas they assume that you don't.)
Let's call a letter a Node, and a set of nodes a Component. You need to produce a set of Components given a list of edges.
First, map Nodes to Components:
Map<Node, Component> map.
Then:
For each edge E:
For each node N in E (i.e. all two of them):
Component c = map.get (N)
if c doesn't exist then:
c = new Component
map.put (N, c)
c.add (N)
For each Component C in map.values ():
Print (sort C's nodes)
Related
I am looking for an algorithm to identify non-intersecting (super-)sets in a set of sets.
Lets, assume I have a set of sets containing the sets A, B, C and D, i.e. {A, B, C, D}. Each set may or may not intersect some or all of the other sets.
I would like to identify non-intersecting (super-)sets.
Examples:
If A & B intersect and C & D intersect but (A union B) does not intersect (C union D), I would like the output of {(A union B), (C union D)}
If only C & D intersect, I would like the output {A, B, (C union D)}
I am sure this problem has long been solved. Can somebody point me in the right direction?
Even better would be of course if somebody had already done the work and had an implementation in python they were willing to share. :-)
I would turn this from a set problem into a graph problem by constructing a graph whose nodes are the graphs with edges connecting sets with an intersection.
Here is some code that does it. It takes a dictionary mapping the name of the set to the set. It returns an array of sets of set names that connect.
def set_supersets (sets_by_label):
element_mappings = {}
for label, this_set in sets_by_label.items():
for elt in this_set:
if elt not in element_mappings:
element_mappings[elt] = set()
element_mappings[elt].add(label)
graph_conn = {}
for elt, sets in element_mappings.items():
for s in sets:
if s not in graph_conn:
graph_conn[s] = set()
for t in sets:
if t != s:
graph_conn[s].add(t)
seen = set()
answer = []
for s, sets in graph_conn.items():
if s not in seen:
todo = [s]
this_group = set()
while 0 < len(todo):
t = todo.pop()
if t not in seen:
this_group.add(t)
seen.add(t)
for u in graph_conn[t]:
todo.append(u)
answer.append(this_group)
return answer
print(set_supersets({
"A": set([1, 2]),
"B": set([1, 3]),
"C": set([4, 5]),
"D": set([3, 6])
}))
Problem description
Given vertices V which can be seen as named "propositions".
Given weights:
data W
= Requires -- ^ Denotes that a "proposition" depends on another.
| Invalidates -- ^ Denotes that a "proposition" invalidates another.
In a linear ordering, if A requires B, then B must come before A, conversely, if A invalidates B, then B must come after A.
Given a weighted directed multigraph (multidigraph) with at most 2 parallel edges... Where a vertex can only require the inclusion of another vertex once, and only invalidates another vertex once...
G = (V, E)
E = (V, V, W)
Or alternatively represented as a directed cyclic graph with no self-loops and where the only cycles form directly between one vertex and another. With weights changed to:
data W
= Requires -- ^ Denotes that a "proposition" depends on another.
| InvalidatedBy -- ^ Denotes that a "proposition" is invalidated by another.
Given that vertices may occur more than once in the ordering...
How can a linear ordering be constructed from such a graph?
Additionally, if the tail of the linear ordering ends with a vertex V which was included due to being InvalidatedBy another vertex, then it may be omitted if the head of the ordering starts with V.
Some desired properties are:
Minimality - there should be as little duplication of vertices as possible
Stability - the ordering should be as similar as possible to the order between vertices on the same "level" in which the graph was constructed
Run-time complexity - The number of vertices are not that high, but still... the run-time complexity should be as low as possible.
If various algorithms fulfill these to varying degrees, I'd love to see all of them with their trade offs.
Algorithms written in any language, or pseudocode, are welcome.
Example graphs:
Example graph 1:
B `requires` A
C `requires` A
D `requires` A
E `invalidates` A
F `invalidates` A
G `invalidates` A
With minimal linear ordering: [A, B, C, D, E, F, G]
Example graph 2:
C `requires` A
C `invalidates` A
B `requires` A
With minimal linear ordering: [A, B, C]
Example graph 3:
B `requires` A
B `invalidates` A
C `requires` A
C `invalidates` A
With minimal linear ordering: [A, B, A, C]
Naive implementation
A naive implementation constructs a linear ordering by starting with all nodes with no incoming edges and for all of those nodes:
fetches all outgoing edges
partitions those by requires/invalidates
constructs the linear ordering of "requires" and puts that first
adds the current node
constructs the linear ordering of "invalidates" and adds that.
Here's a Haskell implementation of this description:
import Data.List (partition)
import Data.Maybe (fromJust)
import Control.Arrow ((***))
import Data.Graph.Inductive.Graph
fboth :: Functor f => (a -> b) -> (f a, f a) -> (f b, f b)
fboth f = fmap f *** fmap f
outs :: Graph gr => gr a b -> Node -> (Adj b, a)
outs gr n = let (_, _, l, o) = fromJust $ fst $ match n gr in (o, l)
starts :: Graph gr => gr a b -> [(Adj b, a)]
starts gr = filter (not . null . fst) $ outs gr <$> nodes gr
partW :: Adj W -> (Adj W, Adj W)
partW = partition ((Requires ==) . fst)
linearize :: Graph gr => gr a W -> [a]
linearize gr = concat $ linearize' gr <$> starts gr
linearize' :: Graph gr => gr a W -> (Adj W, a) -> [a]
linearize' gr (o, a) = concat req ++ [a] ++ concat inv
where (req, inv) = fboth (linearize' gr . outs gr . snd) $ partW o
The ordering can then be optimized by removing equal consecutive like so:
-- | Remove consecutive elements which are equal to a previous element.
-- Runtime complexity: O(n), space: O(1)
removeConsequtiveEq :: Eq a => [a] -> [a]
removeConsequtiveEq = \case
[] -> []
[x] -> [x]
(h:t) -> h : ug h t
where
ug e = \case
[] -> []
(x:xs) | e == x -> ug x xs
(x:xs) | otherwise -> x : ug x xs
Edit: Using DCG, SCC, and topsort
With the algorithm described by #Cirdec :
Given a directed cyclic graph (DCG) where edges of form: (f, t) denote that f must come before t in the ordering.
Compute the condensation of the DCG in 1.
Turn each SSC in the condensation in 2. into a palindrome.
Compute the topsort of the graph in 3.
Concatenate the computed ordering.
In Haskell:
{-# LANGUAGE LambdaCase #-}
import Data.List (nub)
import Data.Maybe (fromJust)
import Data.Graph.Inductive.Graph
import Data.Graph.Inductive.PatriciaTree
import Data.Graph.Inductive.NodeMap
import Data.Graph.Inductive.Query.DFS
data MkEdge = MkEdge Bool Int Int
req = MkEdge True
inv = MkEdge False
toGraph :: [MkEdge] -> [(Int, Int, Bool)] -> Gr Int Bool
toGraph edges es = run_ empty nm
where ns = nub $ edges >>= \(MkEdge _ f t) -> [f, t]
nm = insMapNodesM ns >> insMapEdgesM es
-- | Make graph into a directed cyclic graph (DCG).
-- "Requires" denotes a forward edge.
-- "Invalidates" denotes a backward edge.
toDCG :: [MkEdge] -> Gr Int Bool
toDCG edges = toGraph edges $
(\(MkEdge w f t) -> if w then (t, f, w) else (f, t, w)) <$> edges
-- | Make a palindrome of the given list by computing: [1 .. n] ++ [n - 1 .. 1].
-- Runtime complexity: O(n).
palindrome :: [a] -> [a]
palindrome = \case
[] -> []
xs -> xs ++ tail (reverse xs)
linearize :: Gr Int a -> [Int]
linearize dcg = concat $ topsort' scc2
where scc = nmap (fmap (fromJust . lab dcg)) $ condensation dcg
scc2 = nmap palindrome scc
For the graph g2:
g2 = [ 2 `req` 1
, 2 `inv` 1
, 3 `req` 1
, 3 `inv` 1
, 4 `req` 1
, 5 `inv` 1
]
> prettyPrint $ toDCG g2
1:2->[(False,2)]
2:1->[(True,1),(True,3),(True,4)]
3:3->[(False,2)]
4:4->[]
5:5->[(False,2)]
> prettyPrint $ condensation $ toDCG g2
1:[5]->[((),2)]
2:[1,2,3]->[((),3)]
3:[4]->[]
> linearize $ toDCG g2
[5,2,1,3,1,2,4]
This ordering is neither minimal nor valid since the ordering violates the dependencies. 5 invalidates 1, which 2 depends on. 2 invalidates 1 which 4 depends on.
A valid and minimal ordering is: [1,4,2,1,3,5]. By shifting the list to the right, we get [5,1,4,2,1,3] which is also a valid ordering.
If the direction of the graph is flipped, the ordering becomes: [4,2,1,3,1,2,5]. This is not a valid ordering either... At the boundaries, 5 can happen, and then 4, but 5 invalidates 1 which 4 depends on.
I believe the following algorithm will find a minimal string of vertices in linear time:
Decompose the graph into its strongly connected components. Existing algorithms do this in linear time.
In each strongly connected component each node needs to be listed both before and after every other node. List the nodes [1..n] of each strongly connected component in the following order [1..n] ++ [n-1..1]
Concatenate the strongly connected components together in order by a topological sort. Existing algorithms topologically sort directed acylic graphs like this in linear time.
You are planning the group seating arrangement for a open book test given a list of students, V from different schools to participate. Assuming the fact that students who are known to each other directly or indirectly will probably cheat more as compared to unknown people sitting together.
Suppose you are also given a lookup table T where T[u] for u ? V is a list of students that u knows. If u knows v, then v knows u. You are required to arrange the seating such that any student at a table doesn't knows any other student sitting at the same table either directly or through some other student sitting at the same table. For example, if x knows y, and y knows z, then x, y, z can sit at the same table. Describe an efficient algorithm that, given V and T, returns the minimum number of tables needed to achieve this requirement. Analyze the running time of your algorithm.
Follow a student relations out to two edges, get a graph:
a - e - j
\ q
b - d
\ t
r - w - x - y - z
All the students in the same subgraph have to be separated, so the minimum number of tables is one for each students in the largest group - in this example the largest subgraph is r-w-x-y-z, so 5 tables.
Untested Python pseudocode:
# Given a student list
# a b c d e f j q r t w x y z
# start a chain at a
# a b c d e f j q r t w x y z
# .
# visit friends of a
# a b c d e f j q r t w x y z
# . .
# visit friends of a's friends
# a b c d e f j q r t w x y z
# . . . .
# if e and j are friends, don't double-count
# Get a count of 4 starting at person a
# Repeat for all students
# Report the longest chain.
friendCounts = {}
def countFriendsOf(T, student, friendTracker, moreSteps=2):
friendTracker[student] = True #quicker to set it regardless,
#than to check if it's set
if not moreSteps:
return
for friend in T[student]:
countFriendsOf(T, friend, friendTracker, moreSteps - 1)
return friendTracker
for u in V:
friends = countFriendsOf(T, u, friendTracker={})
friendCounts[u] = (len(friends), friends)
results = sorted(friendCounts.items(), key=lambda x: x[1][0], reverse=True)
(student, (friendCount, friends)) = results[0]
print "The smallest number of tables is:", friendCount
print "Mandated by the friend group of:", student
print
from pprint import pprint
pprint(friends)
Analyze the running time of your algorithm.
Analysis: Fine on any computer more powerful than a snowglobe.
Not sure. Best case: students have no friends - linear with respect to number of students. O(n). Worst case: every student is friends with every other student, then it does lookups for every student for every student, so O(n^3). Ew.
It was running more like O(n^2) until I realised that version was definitely wrong.
This version is only not-definitely-wrong, it isn't definitely-right.
I didn't even start it as a recursive solution, it just ended up going that way. friendTracker use is a nasty side-effect, and the recursive call is not tail recursion optimizable. Not that Python does that,
Given 5 finite sets a,b,c,d,e. Each set is assigned the arbitrary number:
a = 100, b = 34, c = 15, d = 89, e = 57
complement of each set has the same number assigned but negated e.g. for (a') it will be -100.
We need to find such intersection of these all sets or their complements so the resulting set is not null set, and the sum of the assigned numbers is maximal.
I only see one brute force solution to this problem, but it will be very inefficient and it's not elegant. In this case we just generate all combinations and resolve them to see if they are not empty, combinations look like this:
{a∩b'∩c'∩d'∩e'}, {a'∩b∩c'∩d∩e'}, {a'∩b'∩c∩d'∩e'}, {a'∩b'∩c'∩d∩e'}, {a'∩b'∩c'∩d'∩e} {a∩b∩c'∩d'∩e'}, {a∩b'∩c∩d'∩e'}, {a∩b'∩c'∩d∩e}, {a∩b'∩c'∩d'∩e}, {a'∩b∩c∩d'∩e'} {a'∩b∩c'∩d∩e'} {a'∩b∩c'∩d'∩e} ...
and then just pick the max number.
Looking forward to see if someone can think of something better :)
Define score(x, X) be to be the value of set X if x is in X, otherwise its negation.
Then, letting * represent an element that's not in any of the 5 sets, the highest score possible is:
max_{x in union(A, B, C, D, E, {*}} sum_{X in A, B, C, D, E} score(x, X)
This follows from the observation that any particular x is either in a set or its complement. You don't actually have to compute the union here. In Python you might write:
def max_config(A, B, C, D, E):
best = None
for S in A, B, C, D, E, set([None]):
for x in S:
best = max(best, sum(score(x, X) for X in A, B, C, D, E)))
return best
Assuming a set membership test is O(1), this has complexity O(N), where N is the total size of the given sets.
I'm having some trouble with a homework question involving using Tarjan's algorithm on a provided graph to find the particular SCC's for that graph. While (according to my professor) I have found the correct SCC's by using the pseudo-code algorithm found here, some of the nodes in my SCC's do not share the same lowest-link number as the root node for that SCC.
From what I can gather from the pseudo-code, this is because if an un-referenced node i (which is the input node for the current recursive call to the algorithm) has an arc to an already visited node i + 1 which is not the root node, then the algorithm sets is LL = MIN(i.LowestLink, (i + 1).index), and (i + 1).index may not be equal to its own lowest-link value anymore.
For example (this is similar to a part of the graph from the problem I'm trying to solve): if we have nodes in N = {a, b, c, d}, and arcs in E = {a ⇒ c, c ⇒ b, c ⇒ d, b ⇒ a, d ⇒ b}, and our root node which we start the algorithm from is a, then:
1.1) We set a.index = 1 (using 1 rather than 0), a.LL = 1, and push a onto the stack; a has a single arc to c, so we check c; finding that it is undiscovered, we call the algorithm on c.
2.1) We set c.index = 2, c.LL = 2, and push c onto the stack; c has two arcs, one to b, and one to d. Assume our for loop checks b first; b is undiscovered, and so we call the algorithm on b.
3.1) We set b.index = 3, b.LL = 3, and push b onto the stack; b has one arc to a; checking a we find that it is already on the stack, and so (by the pseudo-code linked above) we set b.LL = MIN(b.LL, a.index) = a.index = 1; b has no further arcs, so we exit our for loop, and check if b.LL = b.index, it does not, so we end this instance of the algorithm.
2.2) Now that the recursive call on b has ended, we set c.LL = MIN(c.LL, b.LL) = b.LL = 1. c still has the arc from c to d remaining; checking d we find it is undefined, so we call the algorithm on d.
4.1) d.index is set to 4, d.LL is set to 4, and we push d onto the stack. d has one arc from d to b, so we check b; we find that b is already in the stack, so we set d.LL = MIN(d.LL, b.index) = b.index = 3. d has no further arcs, so we exit our for loop and check if d.LL = d.index; it does not, so we end this instance of the algorithm.
2.3) With the recursive call on d ended, we again set c.LL = MIN(c.LL, d.LL) = c.LL = 1. c has no further arcs, and so we end our for loop. We check to see if c.LL = c.index; it does not, so we end this instance of the algorithm.
1.2) With the recursive call on c ended, we set a.LL = MIN(a.LL, c.LL) = 1. a has no further arcs, so we end our for loop. We check if a.LL = a.index; they are equal, so we have found a root node for this SCC; we create a new SCC, and pop each item in the stack into this SCC until we find a in the stack (wich also goes into this SCC).
After these steps all the nodes in the graph are discovered, so running the algorithm with the other nodes initially does nothing, we have one SCC = {a, b, c, d}. However, d.LL = 3 which is not equal to the rest of the nodes lowest-links (which are all 1).
Have I done something wrong here? Or is it possible in this situation to have an SCC with differing lowest-links among its nodes?