VF2 algorithm - implementation - algorithm

I have a problem with the VF2 algorithm implementation. Everything seems to be working perfectly in many cases, however there is a problem I cannot solve.
The algorithm does not work on the example below. In this example, we are comparing two identical graphs (see image below). Starting vertex is 0.
The set P, that is calculated inside s0, stores the powerset of all pairs of vertices.
Below is a pseudocode included in the publications about VF2 on which I based my implementation.
http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=B51AD0DAEDF60D6C8AB589A39A570257?doi=10.1.1.101.5342&rep=rep1&type=pdf
http://www.icst.pku.edu.cn/intro/leizou/teaching/2012-autumn/papers/part2/VF2%20A%20%28sub%29Graph%20Isomorphism%20Algorithm%20For%20Matching%20Large%20Graphs.pdf
Comments on the right side of /* describe the way I understand the code:
I'm not sure if creating the P() set is valid as described below. Powersets of pairs are iterated in lexicographical order by first and then second value of pair.
PROCEDURE Match(s)
INPUT: an intermediate state s; the initial state s0 has M(s0)=empty
OUTPUT: the mappings between the two graphs
IF M(s) covers all the nodes of G2 THEN
OUTPUT M(s)
ELSE
Compute the set P(s) of the pairs candidate for inclusion in M(s)
/*by powerset of all succesors from already matched M(s) if not empty or
/*predestors to already matched M(s) if not empty
/*or all possible not included vertices in M(s)
FOREACH (n, m)∈ P(s)
IF F(s, n, m) THEN
Compute the state s ́ obtained by adding (n, m) to M(s)
/*add n to M1(s), exclude from T1(s)
/*add m to M2(s), exclude from T2(s)
/*M1(s) is now M1(s'), other structures belong to s' too
CALL Match(s′)
END IF
END FOREACH
Restore data structures
/*Return all structures as from before foreach
END IF
END PROCEDURE
When the algorithm goes to s4, when returing from the function, it looses information about good vertices match.
It results in searching the subgraph-isomorphism ({(0,0),(1,1),(2,2),(5,3),(6,4)}) - even though the graphs are isomorphic.
What am I doing wrong here?

I think that to know your question "what am I doing wrong here", it is necessary to include some of your code here. You re-implemented the code yourself, based on the pseudo-code presented in the paper? or you were doing the matching with the help of some graph-processing packages?
For me I didn't have time to dig in the details, but I work with graphs as well, so I tried with networkx (a Python package) and Boost 1.55.0 library (very extensive C++ lib for graph). Your example and another example of a graph with 1000 nodes, 1500 edges return the correct matching (the trivial case of matching a graph with itself).
import networkx as nx
G1 = nx.Graph()
G2 = nx.Graph()
G1.clear()
G2.clear()
G1.add_nodes_from(range(0,7))
G2.add_nodes_from(range(0,7))
G1.add_edges_from([(0,1), (1,2), (2,3), (3,4), (2,5), (5,6)])
G2.add_edges_from([(0,1), (1,2), (2,3), (3,4), (2,5), (5,6)])
from networkx.algorithms import isomorphism
GM = isomorphism.GraphMatcher(G2,G1)
print GM.is_isomorphic()
GM.mapping
True
Out[39]:
{0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6}

Related

perfect hash function for random integer

Here's the problem:
X is a positive integer (include 0) set which has n different elements I know in advance. All of them is less equal than m. And I want to have an occ-free hash function as simple as possible to map them to 0-n-1.
For example:
X = [31,223,121,100,123,71], so n = 6, m = 223.
I want to find a hash function to map them to [0, 1, 2, 3, 4, 5].
If mapping to 0-n-1 is too difficult, then how to mapping X to a small range is also a problem.
Finding such a function is not too difficult, but to be simple and easy to be generated is hard.
It's better to preserve the order of the X.
Any clues?
My favorite perfect hash is pretty easy.
The hash function you generate has the form:
hash = table1[h1(key)%N] + table2[h2(key)%N]
h1 and h2 are randomly generated hash functions. In your case, you can generate random constants and then have h1(key)=key*C1/m and h2(key)=key*C2/m or something similarly simple
To generated the perfect hash:
Generate random constants C1 and C2
Imagine the bipartite graph, with table1 slots and table2 slots as vertices and an edge for each key between table1[h1(key)%N] and table2[h2(key)%N]. Run a DFS to see if the graph is acyclic. If not, go back to step 1.
Now that you have an acyclic graph, start at any key/edge in each connected component, and set its slots in table1 and table2 however you like to give it whatever hash you like.
Traverse the tree starting at the vertices adjacent to the edge you just set. For every edge you traverse, one of its slots will already be set. Set the other one to make the hash value come out however you like.
That's it. All of steps (2), (3) and (4) can be combined into a single DFS traversal pretty easily.
The complete description and analysis is in this paper.

Finding certain arrangements of all 2-combinatons for a given list

Given a list L of an even number (2k) of elements, I'm looking for an algorithm to produce a list of 2k-1 sublists with the following properties:
each sublist includes exactly k 2-combinations (pairs where the order does not matter) of elements from L,
each sublist includes every elements from L exactly once, and
the union of all elements from all sublists is exactly the set of all possible 2-combinations of the elements from L.
For example, if the input list is L = [a, b, c, d], we have k = 2 with 3 sublists, each including 2 pairs. A possible solution would look like [[ab, cd], [ac, bd], [ad, bc]]. If we ignore the ordering for all elements in the lists (think of all lists as sets), it turns out that this is also the only solution for k = 2.
My aim now is not only to find a single solution but all possible solutions. As the number of involved combinations grows pretty quickly, it would be nice to have all results be constructed in a clever way instead of generating a huge list of candidates and removing the elements from it that don't satisfy the given properties. Such a naïve algorithm could look like the following:
Find the set C of all 2-combinations for L.
Find the set D of all k-combinations for C.
Choose all sets from D that union equals L, call the new set D'.
Find the set E of all (2k-1)-combinations for D'.
Choose all sets from E that union is the set C, and let the new set be the final output.
This algorithm is easy to implement but it's incredibly slow for bigger input lists. So is there a way to construct the result list more efficently?
Edit: Here is the result for L = [a,b,c,d,e,f] with k = 3, calculated by the above algorithm:
[[[ab,cd,ef],[ac,be,df],[ad,bf,ce],[ae,bd,cf],[af,bc,de]],
[[ab,cd,ef],[ac,bf,de],[ad,be,cf],[ae,bc,df],[af,bd,ce]],
[[ab,ce,df],[ac,bd,ef],[ad,be,cf],[ae,bf,cd],[af,bc,de]],
[[ab,ce,df],[ac,bf,de],[ad,bc,ef],[ae,bd,cf],[af,be,cd]],
[[ab,cf,de],[ac,bd,ef],[ad,bf,ce],[ae,bc,df],[af,be,cd]],
[[ab,cf,de],[ac,be,df],[ad,bc,ef],[ae,bf,cd],[af,bd,ce]]]
All properties are satisfied:
each sublist has k = 3 2-combinations,
each sublist only includes each element once, and
the union of all 2k-1 = 5 sublists for one solution is exactly the set of all possible 2-combinations for L.
Edit 2: Based on user58697's answer, I improved the calculation algorithm by using the round-robin tournament scheduling:
Let S be the result set, starting with an empty set, and P be the set of all permutations of L.
Repeat the following until P is empty:
Select an arbitrary permutation from P
Perform full RRT scheduling for this permutation. In each round, the arrangement of elements from L forms a permutation of L. Remove all these 2k permutations from P.
Add the resulting schedule to S.
Remove all lists from S if the union of their sublists has duplicate elements (i.e. doesn't add up to all 2-combinations of L).
This algorithm is much more performant than the first one. I was able to calculate the number of results for k = 4 as 960 and k = 5 as 67200. The fact that there doesn't seem to be an OEIS result for this sequence makes me wonder if the numbers are actually correct, though, i.e. if the algorithm is producing the complete solution set.
It is a round-robin tournament scheduling:
A pair is a match,
A list is a round (each team plays with some other team)
A set of list is an entire tournament (each team plays each other team exactly once).
Take a look here.
This was an interesting question. In the process of answering it (basically after writing the program included below, and looking up the sequence on OEIS), I learned that the problem has a name and rich theory: what you want is to generate all 1-factorizations of the complete graph K2k.
Let's first restate the problem in that language:
You are given a number k, and a list (set) L of size 2k. We can view L as the vertex set of a complete graph K2k.
For example, with k=3, L could be {a, b, c, d, e, f}
A 1-factor (aka perfect matching) is a partition of L into unordered pairs (sets of size 2). That is, it is a set of k pairs, whose disjoint union is L.
For example, ab-cd-ef is a 1-factor of L = {a, b, c, d, e, f}. This means that a is matched with b, c is matched with d, and e is matched with f. This way, L has been partitioned into three sets {a, b}, {c, d}, and {e, f}, whose union is L.
Let S (called C in the question) denote the set of all pairs of elements of L. (In terms of the complete graph, if L is its vertex set, S is its edge set.) Note that S contains (2k choose 2) = k(2k-1) pairs. So for k = 0, 1, 2, 3, 4, 5, 6…, S has size 0, 1, 6, 15, 28, 45, 66….
For example, S = {ab, ac, ad, ae, af, bc, bd, be, bf, cd, ce, cf, de, df, ef} for our L above (k = 3, so |S| = k(2k-1) = 15).
A 1-factorization is a partition of S into sets, each of which is itself a 1-factor (perfect matching). Note that as each of these matchings has k pairs, and S has size k(2k-1), the partition has size 2k-1 (i.e., is made of 2k-1 matchings).
For example, this is a 1-factorization: {ab-cd-ef, ac-be-df, ad-bf-ce, ae-bd-cf, af-bc-de}
In other words, every element of S (every pair) occurs in exactly one element of the 1-factorization, and every element of L occurs exactly once in each element of the 1-factorization.
The problem asks to generate all 1-factorizations.
Let M denote the set of all 1-factors (all perfect matchings) of L. It is easy to prove that M contains (2k)!/(k!2^k) = 1×3×5×…×(2k-1) matchings. For k = 0, 1, 2, 3, 4, 5, 6…, the size of M is 1, 1, 3, 15, 105, 945, 10395….
For example, for our L above, M = {ab-cd-ef, ab-ce-df, ab-cf-de, ac-bd-ef, ac-be-df, ac-bf-de, ad-bc-ef, ad-be-cf, ad-bf-ce, ae-bc-df, ae-bd-cf, ae-bf-cd, af-bc-de, af-bd-ce, af-be-cd} (For k=3 this number 15 is the same as the number of pairs, but this is just a coincidence as you can from the other numbers: this number grows much faster than the number of pairs.)
M is easy to generate:
def perfect_matchings(l):
if len(l) == 0:
yield []
for i in range(1, len(l)):
first_pair = l[0] + l[i]
for matching in perfect_matchings(l[1:i] + l[i+1:]):
yield [first_pair] + matching
For example, calling perfect_matchings('abcdef') yields the 15 elements ['ab', 'cd', 'ef'], ['ab', 'ce', 'df'], ['ab', 'cf', 'de'], ['ac', 'bd', 'ef'], ['ac', 'be', 'df'], ['ac', 'bf', 'de'], ['ad', 'bc', 'ef'], ['ad', 'be', 'cf'], ['ad', 'bf', 'ce'], ['ae', 'bc', 'df'], ['ae', 'bd', 'cf'], ['ae', 'bf', 'cd'], ['af', 'bc', 'de'], ['af', 'bd', 'ce'], ['af', 'be', 'cd'] as expected.
By definition, a 1-factorization is a partition of S into elements from M. Or equivalently, any (2k-1) disjoint elements of M form a 1-factorization. This lends itself to a straightforward backtracking algorithm:
start with an empty list (partial factorization)
for each matching from the list of perfect matchings, try adding it to the current partial factorization, i.e. check whether it's disjoint (it should not contain any pair already used)
if fine, add it to the partial factorization, and try extending
In code:
matching_list = []
pair_used = defaultdict(lambda: False)
known_matchings = [] # Populate this list using perfect_matchings()
def extend_matching_list(r, need):
"""Finds ways of extending the matching list by `need`, using matchings r onwards."""
if need == 0:
use_result(matching_list)
return
for i in range(r, len(known_matchings)):
matching = known_matchings[i]
conflict = any(pair_used[pair] for pair in matching)
if conflict:
continue # Can't use this matching. Some of its pairs have already appeared.
# Else, use this matching in the current matching list.
for pair in matching:
pair_used[pair] = True
matching_list.append(matching)
extend_matching_list(i + 1, need - 1)
matching_list.pop()
for pair in matching:
pair_used[pair] = False
If you call it with extend_matching_list(0, len(l) - 1) (after populating known_matchings), it generates all 1-factorizations. I've put the full program that does this here. With k=4 (specifically, the list 'abcdefgh'), it outputs 6240 1-factorizations; the full output is here.
It was at this point that I fed the sequence 1, 6, 6240 into OEIS, and discovered OEIS A000438, sequence 1, 1, 6, 6240, 1225566720, 252282619805368320,…. It shows that for k=6, the number of solutions ≈2.5×1017 means that we can give up hope of generating all solutions. Even for k=5, the ≈1 billion solutions (recall that we're trying to find 2k-1=9 disjoint sets out of the |M|=945 matchings) will require some carefully optimized programs.
The first optimization (which, embarrassingly, I only realized later by looking closely at trace output for k=4) is that (under natural lexicographic numbering) the index of the first matching chosen in the partition cannot be greater than the number of matchings for k-1. This is because the lexicographically first element of S (like "ab") occurs only in those matchings, and if we start later than this one we'll never find it again in any other matching.
The second optimization comes from the fact that the bottleneck of a backtracking program is usually the testing for whether a current candidate is admissible. We need to test disjointness efficiently: whether a given matching (in our partial factorization) is disjoint with the union of all previous matchings. (Whether any of its k pairs is one of the pairs already covered by earlier matchings.) For k=5, it turns out that the size of S, which is (2k choose 2) = 45, is less than 64, so we can compactly represent a matching (which is after all a subset of S) in a 64-bit integer type: if we number the pairs as 0 to 44, then any matching can be represented by an integer having 1s in the positions corresponding to elements it contains. Then testing for disjointness is a simple bitwise operation on integers: we just check whether the bitwise-AND of the current candidate matching and the cumulative union (bitwise-OR) of previous matchings in our partial factorization is zero.
A C++ program that does this is here, and just the backtracking part (specialized for k=5) does not need any C++ features so it's extracted out as a C program here. It runs in about 4–5 hours on my laptop, and finds all 1225566720 1-factorizations.
Another way to look at this problem is to say that two elements of M have an edge between them if they intersect (have a pair (element of S) in common), and that we're looking for all maximum independent set in M. Again, the simplest way to solve that problem would still probably be backtracking (we'd write the same program).
Our programs can be made quite a lot more efficient by exploiting the symmetry in our problem: for example we could pick any matching as our first 1-factor in the 1-factorization (and then generate the rest by relabelling, being careful not to avoid duplicates). This is how the number of 1-factorizations for K12 (the current record) was calculated.
A note on the wisdom of generating all solutions
In The Art of Computer Programming Volume 4A, at the end of section 7.2.1.2 Generating All Permutations, Knuth has this important piece of advice:
Think twice before you permute. We have seen several attractive algorithms for permutation generation in this section, but many algorithms are known by which permutations that are optimum for particular purposes can be found without running through all possibilities. For example, […] the best way to arrange records on a sequential storage […] takes only O(n log n) steps. […] the assignment problem, which asks how to permute the columns of a square matrix so that the sum of the diagonal elements is maximized […] can be solved in at most O(n3) operations, so it would be foolish to use a method of order n! unless n is extremely small. Even in cases like the traveling salesrep problem, when no efficient algorithm is known, we can usually find a much better approach than to examine every possible solution. Permutation generation is best used when there is good reason to look at each permutation individually.
This is what seems to have happened here (from the comments below the question):
I wanted to calculate all solutions to run different attribute metrics on these and find an optional match […]. As the number of results seems to grow quicker than expected, this is impractical.
Generally, if you're trying to "generate all solutions" and you don't have a very good reason for looking at each one (and one almost never does), there are many other approaches that are preferable, ranging from directly trying to solve an optimization problem, to generating random solutions and looking at them, or generating solutions from some subset (which is what you seem to have done).
Further reading
Following up references from OEIS led to a rich history and theory.
On 1-factorizations of the complete graph and the relationship to round robin schedules, Gelling (M. A. Thesis), 1973
On the number of 1-factorizations of the complete graph, Charles C Lindner, Eric Mendelsohn, Alexander Rosa (1974?) -- this shows that the number of nonisomorphic 1-factorizations on K2n goes to infinity as n goes to infinity.
E. Mendelsohn and A. Rosa. On some properties of 1-factorizations of complete graphs. Congr. Numer, 24 (1979): 739–752
E. Mendelsohn and A. Rosa. One factorizations of the complete graph: A survey. Journal of Graph Theory, 9 (1985): 43–65 (As long ago as 1985, this exact question was studied well-enough to need a survey!)
Via papers of Dinitiz:
D. K. Garnick and J. H. Dinitz, On the number of one-factorizations of the complete graph on 12 points, Congressus Numerantium, 94 (1993), pp. 159-168. They announced they were computing the number of nonisomorphic 1-factorizations of K12. Their algorithm was basically backtracking.
Jeffrey H. Dinitz, David K. Garnick, Brendan D. McKay: There are 526,915,620 nonisomorphic one-factorizations of K12 (also here), Journal of Combinatorial Designs 2 (1994), pp. 273 - 285: They completed the computation, and reported the numbers they found for K12 (526,915,620 nonisomorphic, 252,282,619,805,368,320 total).
Various One-Factorizations of Complete Graphs by Gopal, Kothapalli, Venkaiah, Subramanian (2007). A paper that is relevant to this question, and has many useful references.
W. D. Wallis, Introduction to Combinatorial Designs, Second Edition (2007). Chapter 10 is "One-Factorizations", Chapter 11 is "Applications of One-Factorizations". Both are very relevant and have many useful references.
Charles J. Colbourn and Jeffrey H. Dinitz, Handbook of Combinatorial Designs, Second Edition (2007). A goldmine. See chapters VI.3 Balanced Tournament Designs, VI.51 Scheduling a Tournament, VII.5 Factorizations of Graphs (including its sections 5.4 Enumeration and Tables, 5.5 Some 1-Factorizations of Complete Graphs), VII.6 Computational Methods in Design Theory (6.2 Exhaustive Search). This last chapter references:
[715] How K12 was calculated ("orderly algorithm"), a backtracking -- the Dinitz-Garnick-McKay paper mentioned above
[725] “Contains, among many other subjects related to factorization, a fast algorithm for finding 1-factorizations of K2n.” ("Room squares and related designs", J. H. Dinitz and S. R. Stinson)
[1270] (P. Kaski and P. R. J. Östergård, One-factorizations of regular graphs of order 12, Electron. J. Comb. 12, Research Paper 2, 25 pp. (2005))
[1271] “Contains the 1-factorizations of complete graphs up to order 10 in electronic form.” (P. Kaski and P. R. J. Östergård, Classification Algorithms for Codes and Designs, Springer, Berlin, 2006.)
[1860] “A survey on perfect 1-factorizations of K2n” (E. S. Seah, Perfect one-factorizations of the complete graph—A survey, Bull. Inst. Combin. Appl. 1 (1991) 59–70)
[2107] “A survey of 1-factorizations of complete graphs including most of the material of this chapter.” W. D. Wallis, One-factorizations of complete graphs, in Dinitz and Stinson (ed), Contemporary Design Theory, 1992
[2108] “A book on 1-factorizations of graphs.” W. D. Wallis, "One-Factorizations", Kluwer, Dordrecht, 1997
Some other stuff:
*Factors and Factorizations of Graphs by Jin Akiyama and Mikio Kano (2007). This looks like a great book. “Frank Harary predicted that graph theory will grow so much that each chapter of his book Graph Theory will eventually expand to become a book on its own. He was right. This book is an expansion of his Chapter 9, Factorization.” There's not much about this particular topic (1-factorizations of complete graphs), but there is a proof in Chapter 4 (Theorem 4.1.1) that K2n always has a 1-factorization.
Papers on special types of 1-factorizations:
[Symmetry Groups Of] Some Perfect 1-Factorizations Of Complete Graphs, B. A. Anderson, 1977 (1973). Considers 1-factorizations that are in fact "perfect", having the property that the union of any two 1-factors (matchings) is a Hamiltonian cycle. (There's one up to isomorphism for K2k k ≤ 5, and two for K12.)
On 4-semiregular 1-factorizations of complete graphs and complete bipartite graphs.
Low Density MDS Codes and Factors of Complete Graphs -- also about perfect 1-factorizations
Self-invariant 1-Factorizations of Complete Graphs and Finite Bol Loops of Exponent 2
See also OEIS index entry for [sequences related to tournaments].
AMS feature column: Mathematics and Sports (April 2010) -- despite the overly broad name, is quite related.

Dijkstra's algorithm: is my implementation flawed?

In order to train myself both in Python and graph theory, I tried to implement the Dijkstra algo using Python 3, and submitted it against several online judges, to see if it was correct.
It works well in many cases, but not always.
For example, I am stuck with this one: the test case works fine and I also have tried custom test cases of my own, but when I submit the following solution, the judge keeps telling me "wrong answer", and the expected result is very different from my output, indeed.
Notice that the judge tests it against quite a complex graph (10000 nodes with 100000 edges), while all the cases I tried before never exceeded 20 nodes and around 20-40 edges.
Here is my code.
Given al an adjacency list in the following form:
al[n] = [(a1, w1), (a2, w2), ...]
where
n is the node id;
a1, a2, etc. are its adjacent nodes and w1, w2, etc. the respective weights for the given edge;
and supposing that maximum distance never exceeds 1 billion, I implemented Dijkstra's algorithm this way:
import queue
distance = [1000000000] * (N+1) # this is the array where I store the shortest path between 1 and each other node
distance[1] = 0 # starting from node 1 with distance 0
pq = queue.PriorityQueue()
pq.put((0, 1)) # same as above
visited = [False] * (N+1)
while not pq.empty():
n = pq.get()[1]
if visited[n]:
continue
visited[n] = True
for edge in al[n]:
if distance[edge[0]] > distance[n] + edge[1]:
distance[edge[0]] = distance[n] + edge[1]
pq.put((distance[edge[0]], edge[0]))
Could you please help me understand wether my implementation is flawed, or if I simply ran into some bugged online judge?
Thank you very much.
UPDATE
As requested, I'm providing the snippet I use to populate the adjacency list al for the linked problem.
N,M = input().split()
N,M = int(N), int(M)
al = [[] for n in range(N+1)]
for m in range(M):
try:
a,b,w = input().split()
a,b,w = int(a), int(b), int(w)
al[a].append((b, w))
al[b].append((a, w))
except:
pass
(Please don't mind the ugly "except: pass", I was using it just for debugging purposes... :P)
Primary problem in interpreting the question:
According to your parsing code, you are treating the input data as an undirected graph, i.e. each edge from A to B also is an edge from B to A.
Is seems like this premise is not valid and it should instead be a directed graph, i.e. you have to remove this line:
al[b].append((a, w)) # no back reference!
Previous problem, now already fixed in the code:
Currently, you are using the never-changing weight of the edges in your queue:
pq.put((edge[1], edge[0]))
This way, the nodes always end up at the same position in the queue, no matter at what stage of the algorithm and how far the path to reach that node actually is.
Instead, you should use the new distance to the target node edge[0], i.e. distance[edge[0]] as the priority in the queue:
pq.put((distance[edge[0]], edge[0]))

Implementing cartesian product, such that it can skip iterations

I want to implement a function which will return cartesian product of set, repeated given number. For example
input: {a, b}, 2
output:
aa
ab
bb
ba
input: {a, b}, 3
aaa
aab
aba
baa
bab
bba
bbb
However the only way I can implement it is firstly doing cartesion product for 2 sets("ab", "ab), then from the output of the set, add the same set. Here is pseudo-code:
function product(A, B):
result = []
for i in A:
for j in B:
result.append([i,j])
return result
function product1(chars, count):
result = product(chars, chars)
for i in range(2, count):
result = product(result, chars)
return result
What I want is to start computing directly the last set, without computing all of the sets before it. Is this possible, also a solution which will give me similar result, but it isn't cartesian product is acceptable.
I don't have problem reading most of the general purpose programming languages, so if you need to post code you can do it in any language you fell comfortable with.
Here's a recursive algorithm that builds S^n without building S^(n-1) "first". Imagine an infinite k-ary tree where |S| = k. Label with the elements of S each of the edges connecting any parent to its k children. An element of S^m can be thought of as any path of length m from the root. The set S^m, in that way of thinking, is the set of all such paths. Now the problem of finding S^n is a problem of enumerating all paths of length n - and we can name a path by considering the sequence of edge labels from beginning to end. We want to directly generate S^n without first enumerating all of S^(n-1), so a depth-first search modified to find all nodes at depth n seems appropriate. This is essentially how the below algorithm works:
// collection to hold generated output
members = []
// recursive function to explore product space
Products(set[1...n], length, current[1...m])
// if the product we're working on is of the
// desired length then record it and return
if m = length then
members.append(current)
return
// otherwise we add each possible value to the end
// and generate all products of the desired length
// with the new vector as a prefix
for i = 1 to n do
current.addLast(set[i])
Products(set, length, current)
currents.removeLast()
// reset the result collection and request the set be generated
members = []
Products([a, b], 3, [])
Now, a breadth-first approach is no less efficient than a depth-first one, and if you think about it would be no different from exactly what you're already doing. Indeed, and approach that generates S^n must necessarily generate S^(n-1) at least once, since that can be found in a solution to S^n.

Distinct sub sequences summing to given number in an array

During my current preparation for interview, I encountered a question for which I am having some difficulty to get optimal solution,
We are given an array A and an integer Sum, we need to find all distinct sub sequences of A whose sum equals Sum.
For eg. A={1,2,3,5,6} Sum=6 then answer should be
{1,2,3}
{1,5}
{6}
Presently I can think of two ways of doing this,
Use Recursion ( which I suppose should be last thing to consider for an interview question)
Use Integer Partitioning to partition Sum and check whether the elements of partition are present in A
Please guide my thoughts.
I agree with Jason. This solution comes to mind:
(complexity is O(sum*|A|) if you represent the map as an array)
Call the input set A and the target sum sum
Have a map of elements B, with each element being x:y, where x (the map key) is the sum and y (the map value) is the number of ways to get to it.
Starting of, add 0:1 to the map - there is 1 way to get to 0 (obviously by using no elements)
For each element a in A, consider each element x:y in B.
If x+a > sum, don't do anything.
If an element with the key x+a already exists in B, say that element is x+a:z, modify it to x+a:y+z.
If an element with the key doesn't exist, simply add x+a:y to the set.
Look up the element with key sum, thus sum:x - x is our desired value.
If B is sorted (or an array), you can simply skip the rest of the elements in B during the "don't do anything" step.
Tracing it back:
The above just gives the count, this will modify it to give the actual subsequences.
At each element in B, instead of the sum, store all the source elements and the elements used to get there (so have a list of pairs at each element in B).
For 0:1 there is no source elements.
For x+a:y, the source element is x and the element to get there is a.
During the above process, if an element with the key already exists, enqueue the pair x/a to the element x+a (enqueue is an O(1) operation).
If an element with the key doesn't exist, simply create a list with one pair x/a at the element x+a.
To reconstruct, simply start at sum and recursively trace your way back.
We have to be careful of duplicate sequences (do we?) and sequences with duplicate elements here.
Example - not tracing it back:
A={1,2,3,5,6}
sum = 6
B = 0:1
Consider 1
Add 0+1
B = 0:1, 1:1
Consider 2
Add 0+2:1, 1+2:1
B = 0:1, 1:1, 2:1, 3:1
Consider 3
Add 0+3:1 (already exists -> add 1 to it), 1+3:1, 2+1:1, 3+1:1
B = 0:1, 1:1, 2:1, 3:2, 4:1, 5:1, 6:1
Consider 5
B = 0:1, 1:1, 2:1, 3:2, 4:1, 5:2, 6:2
Generated sums thrown away = 7:1, 8:2, 9:1, 10:1, 11:1
Consider 6
B = 0:1, 1:1, 2:1, 3:2, 4:1, 5:2, 6:3
Generated sums thrown away = 7:1, 8:1, 9:2, 10:1, 11:2, 12:2
Then, from 6:3, we know we have 3 ways to get to 6.
Example - tracing it back:
A={1,2,3,5,6}
sum = 6
B = 0:{}
Consider 1
B = 0:{}, 1:{0/1}
Consider 2
B = 0:{}, 1:{0/1}, 2:{0/2}, 3:{1/2}
Consider 3
B = 0:{}, 1:{0/1}, 2:{0/2}, 3:{1/2,0/3}, 4:{1/3}, 5:{2/3}, 6:{3/3}
Consider 5
B = 0:{}, 1:{0/1}, 2:{0/2}, 3:{1/2,0/3}, 4:{1/3}, 5:{2/3,0/5}, 6:{3/3,1/5}
Generated sums thrown away = 7, 8, 9, 10, 11
Consider 6
B = 0:{}, 1:{0/1}, 2:{0/2}, 3:{1/2,0/3}, 4:{1/3}, 5:{2/3,0/5}, 6:{3/3,1/5,0/6}
Generated sums thrown away = 7, 8, 9, 10, 11, 12
Then, tracing back from 6: (not in {} means an actual element, in {} means a map entry)
{6}
{3}+3
{1}+2+3
{0}+1+2+3
1+2+3
Output {1,2,3}
{0}+3+3
3+3
Invalid - 3 is duplicate
{1}+5
{0}+1+5
1+5
Output {1,5}
{0}+6
6
Output {6}
This is a variant of the subset-sum problem. The subset-sum problem asks if there is a subset that sums to given a value. You are asking for all of the subsets that sum to a given value.
The subset-sum problem is hard (more precisely, it's NP-Complete) which means that your variant is hard too (it's not NP-Complete, because it's not a decision problem, but it is NP-Hard).
The classic approach to the subset-sum problem is either recursion or dynamic programming. It's obvious how to modify the recursive solution to the subset-sum problem to answer your variant. I suggest that you also take a look at the dynamic programming solution to subset-sum and see if you can modify it for your variant (tbc: I do not know if this is actually possible). That would certainly be a very valuable learning exercise whether or not is possible as it would certainly enhance your understanding of dynamic programming either way.
It would surprise me though, if the expected answer to your question is anything but the recursive solution. It's easy to come up with, and an acceptable approach to the problem. Asking for the dynamic programming solution on-the-fly is a bit much to ask.
You did, however, neglect to mention a very naïve approach to this problem: generate all subsets, and for each subset check if it sums to the given value or not. Obviously that's exponential, but it does solve the problem.
I assumed that given array contains distinct numbers.
Let's define function f(i, s) - which means we used some numbers in range [1, i] and the sum of used numbers is s.
Let's store all values in 2 dimensional matrix i.e. in cell (i, j) we will have value for f(i, j). Now if have already calculated values for cells which are located upper or lefter the cell (i, s) we can calculate value for f(i, s) i.e. f(i, s) = f(i - 1, s);(not to take i indexed number) and if(s >= a[i]) f(i, s) += f(i - 1, s - a[i]). And we can use bottom-up approach to fill all the matrix, setting [f(0, 0) = 1; f(0, i) = 0; 1 <= i <= s], [f(i, 0) = 1;1<=i<=n;]. If we calculated all the matrix then we have answer in cell f(n,S); Thus we have total time complexity O(n*s) and memory complexity O(n*s);
We can improve memory complexity if we note that in every iteration we need only information from previous row, it means that we can store matrix of size 2xS not nxS. We reduced memory complexity up to linear to S. This problem is NP complete thus we don't have polynomial algorithm for this and this approach is the best thing.

Resources