Unrealistic Eigenvalues in networkx - computational-geometry

Good afternoon,
A relatively well known mathematical theorem in graph theory (see the Introduction of "Bipartite and Neighborhood Graphs and the Spectrum of the Normalized Graph Laplace Operator" by Bauer and Jost) states that the spectrum of the normalized Laplace operator is always bounded from above by 2 and the upper bound is attained if and only if the graph is bipartite. I'm working with Networkx and the Laplacian spectrum (an array generated by Networkx utilities) is returning values far greater than 2. For the below example, the largest eigenvalue I am getting is 18.137. The graph is not bipartite so the largest eigenvalue should be strictly less than 2. Here is a sample of the code:
import networkx as nx
Graph=nx.karate_club_graph()
print nx.laplacian_spectrum(Graph)
1.137978592311377107e-15,
4.685252267013915728e-01,
9.092476638033122338e-01,
1.125010718244666030e+00,
1.259404110121709719e+00,
1.599283075429581258e+00,
1.761898621144031507e+00,
1.826055209825464098e+00,
1.955050447337369102e+00,
1.999999999999998446e+00,
1.999999999999999556e+00,
2.000000000000000000e+00,
2.000000000000000444e+00,
2.000000000000001332e+00,
2.487091734464515369e+00,
2.749157175276658815e+00,
3.013962966251617193e+00,
3.242067477421745725e+00,
3.376154092871075374e+00,
3.381966011250106874e+00,
3.472187399726446522e+00,
4.275876820141818691e+00,
4.480007671029976102e+00,
4.580792668029516790e+00,
5.378595077669420910e+00,
5.618033988749897567e+00,
6.331592223669625596e+00,
6.515544628031584296e+00,
6.996197033107128149e+00,
9.777240952801486529e+00,
1.092106753013355558e+01,
1.330612231276679225e+01,
1.705517119099513224e+01,
1.813669597300440017e+01
I understand this Networkx function is most likely using the Laplacian spectrum and not the normalized Laplacian spectrum. However, since they are "Similar" (in the mathematical sense) matrices they should have the same eigenvalues. Where am I going wrong? I am may be doing something silly, I just don't see it.

In networkx the Laplacian you are looking for is called "normalized_laplacian".
The two definitions give matrices that don't have the same eigenvalues.
The wikipedia page https://en.wikipedia.org/wiki/Laplacian_matrix has a decent discussion.
In [1]: import networkx as nx
In [2]: G = nx.karate_club_graph()
In [3]: from scipy.linalg import eigvalsh
In [4]: eigvalsh(nx.laplacian_matrix(G).todense())
Out[4]:
array([ -5.97438766e-15, 4.68525227e-01, 9.09247664e-01,
1.12501072e+00, 1.25940411e+00, 1.59928308e+00,
1.76189862e+00, 1.82605521e+00, 1.95505045e+00,
2.00000000e+00, 2.00000000e+00, 2.00000000e+00,
2.00000000e+00, 2.00000000e+00, 2.48709173e+00,
2.74915718e+00, 3.01396297e+00, 3.24206748e+00,
3.37615409e+00, 3.38196601e+00, 3.47218740e+00,
4.27587682e+00, 4.48000767e+00, 4.58079267e+00,
5.37859508e+00, 5.61803399e+00, 6.33159222e+00,
6.51554463e+00, 6.99619703e+00, 9.77724095e+00,
1.09210675e+01, 1.33061223e+01, 1.70551712e+01,
1.81366960e+01])
In [5]: eigvalsh(nx.normalized_laplacian_matrix(G).todense())
Out[5]:
array([ 6.28463560e-16, 1.32272329e-01, 2.87048985e-01,
3.87313233e-01, 6.12230540e-01, 6.48992947e-01,
7.07208202e-01, 7.39957989e-01, 7.70910617e-01,
8.22942852e-01, 8.64832945e-01, 9.06816002e-01,
1.00000000e+00, 1.00000000e+00, 1.00000000e+00,
1.00000000e+00, 1.00000000e+00, 1.00000000e+00,
1.00000000e+00, 1.00000000e+00, 1.00000000e+00,
1.00000000e+00, 1.10538084e+00, 1.15929996e+00,
1.26802355e+00, 1.35177826e+00, 1.39310454e+00,
1.41691585e+00, 1.44857938e+00, 1.49703011e+00,
1.56950660e+00, 1.58333333e+00, 1.61190959e+00,
1.71461135e+00])

I think this is an issue of different people having different definitions for the Laplacian.
In Bauer and Jost:
the Laplace operator ∆ can be considered as ∆ =: I − P, where I denotes the identity and P is transition probability operator of a random walk (or sometimes called the Markov operator), respectively. We should point our here that the normalized graph Laplace operator ∆ is not exactly the one studied by Fan Chung [10]. However, both Laplace operators are unitarily equivalent and therefore have the same spectrum
In Chung's work,
L(u, v) = 1 if u = v, and 1/sqrt(d_u d_v) if u and v have an edge and 0 otherwise.
In Networkx,
The graph Laplacian is the matrix L = D - A, where A is the adjacency matrix and D is the diagonal matrix of node degrees.
I'll take Bauer and Jost's word that theirs is equivalent to what Fan Chung did (which I confess is not obvious to me at first glance). But I'm not at all convinced that what she did would have the same eigenvalues as D-A.
edit Aric's answer makes clear that this is the issue and networkx also has the normalized matrix you are looking for.

Related

Finding a perfect matching in graphs

I have this question :
Airline company has N different planes and T pilots. Every pilot has a list of planes he can fly. Every flight needs 2 pilots. The company want to have as much flights simultaneously as possible. Find an algorithm that finds if you can have all the flights simultaneously.
This is the solution I thought about is finding max flow on this graph:
I am just not sure what the capacity should be. Can you help me with that?
Great idea to find the max flow.
For each edge from source --> pilot, assign a capacity of 1. Each pilot can only fly one plane at a time since they are running simultaneously.
For each edge from pilot --> plane, assign a capacity of 1. If this edge is filled with flow of 1, it represents that the given pilot is flying that plane.
For each edge from plane --> sink, assign a capacity of 2. This represents that each plane must be supplied by exactly 2 pilots.
Now, find a maximum flow. If the resulting maximum flow is two times the number of planes, then it's possible to satisfy the constraints. In this case, the edges between planes and pilots that are at capacity represent the matching.
The other answer is fine but you don't really need to involve flow as this can be reduced just as well to ordinary maximum bipartite matching:
For each plane, add another auxiliary plane to the plane partition with edges to the same pilots as the first plane.
Find a maximum bipartite matching M.
The answer is now true if and only if M = 2 N.
If you like, you can think of this as saying that each plane needs a pilot and a co-pilot, and the two vertices associated to each plane now represents those two roles.
The reduction to maximum bipartite matching is linear time, so using e.g. the Hopcroft–Karp algorithm to find the matching, you can solve the problem in O(|E| √|V|) where E is the number of edges between the partitions, and V = T + N.
In practice, the improvement over using a maximum flow based approach should depend on the quality of your implementations as well as the particular choice of representation of the graph, but chances are that you're better off this way.
Implementation example
To illustrate the last point, let's give an idea of how the two reductions could look in practice. One representation of a graph that's often useful due to its built-in memory locality is that of a CSR matrix, so let us assume that the input is such a matrix, whose rows correspond to the planes, and whose columns correspond to the pilots.
We will use the Python library SciPy which comes with algorithms for both maximum bipartite matching and maximum flow, and which works with CSR matrix representations for graphs under the hood.
In the algorithm given above, we will then need to construct the biadjacency matrix of the graph with the additional vertices added. This is nothing but the result of stacking the input matrix on top of itself, which is straightforward to phrase in terms of the CSR data structures: Following Wikipedia's notation, COL_INDEX should just be repeated, and ROW_INDEX should be replaced with ROW_INDEX concatenated with a copy of ROW_INDEX in which all elements are increased by the final element of ROW_INDEX.
In SciPy, a complete implementation which answers yes or no to the problem in OP would look as follows:
import numpy as np
from scipy.sparse.csgraph import maximum_bipartite_matching
def reduce_to_max_matching(a):
i, j = a.shape
data = np.ones(a.nnz * 2, dtype=bool)
indices = np.concatenate([a.indices, a.indices])
indptr = np.concatenate([a.indptr, a.indptr[1:] + a.indptr[-1]])
graph = csr_matrix((data, indices, indptr), shape=(2*i, j))
return (maximum_bipartite_matching(graph) != -1).sum() == 2 * i
In the maximum flow approach given by #HeatherGuarnera's answer, we will need to set up the full adjacency matrix of the new graph. This is also relatively straightforward; the input matrix will appear as a certain submatrix of the adjacency matrix, and we need to add a row for the source vertex and a column for the target. The example section of the documentation for SciPy's max flow solver actually contains an illustration of what this looks like in practice. Adopting this, a complete solution looks as follows:
import numpy as np
from scipy.sparse.csgraph import maximum_flow
def reduce_to_max_flow(a):
i, j = a.shape
n = a.nnz
data = np.concatenate([2*np.ones(i, dtype=int), np.ones(n + j, dtype=int)])
indices = np.concatenate([np.arange(1, i + 1),
a.indices + i + 1,
np.repeat(i + j + 1, j)])
indptr = np.concatenate([[0],
a.indptr + i,
np.arange(n + i + 1, n + i + j + 1),
[n + i + j]])
graph = csr_matrix((data, indices, indptr), shape=(2+i+j, 2+i+j))
flow = maximum_flow(graph, 0, graph.shape[0]-1)
return flow.flow_value == 2*i
Let us compare the timings of the two approaches on a single example consisting of 40 planes and 100 pilots, on a graph whose edge density is 0.1:
from scipy.sparse import random
inp = random(40, 100, density=.1, format='csr', dtype=bool)
%timeit reduce_to_max_matching(inp) # 191 µs ± 3.57 µs per loop
%timeit reduce_to_max_flow(inp) # 1.29 ms ± 20.1 µs per loop
The matching-based approach is faster, but not by a crazy amount. On larger problems, we'll start to see the advantages of using matching instead; with 400 planes and 1000 pilots:
inp = random(400, 1000, density=.1, format='csr', dtype=bool)
%timeit reduce_to_max_matching(inp) # 473 µs ± 5.52 µs per loop
%timeit reduce_to_max_flow(inp) # 68.9 ms ± 555 µs per loop
Again, this exact comparison relies on the use of specific predefined solvers from SciPy and how those are implemented, but if nothing else, this hints that simpler is better.

numerical diagonalization of a unitary matrix

To numerically diagonalize a unitary matrix I use the LAPACK routine zgeev.
The problem is: In case of degeneracies the degenerate subspace is not orthonormalized, since the routine is for general matrices.
However, since in my case the matrices are unitary, the basis can be always orthonormalized. Is there a better solution than applying QR-algorithm afterwards to the degenerate subspace?
Short answer: Schur decomposition!
If a square matrix A is complex, then its Schur factorization is A=ZTZ*, where Z is unitary and T is upper triangular.
If A happens to be unitary, T must also be unitary. Since T is both unitary and triangular, it is diagonal (proof here,.or there)
Let's consider the vectors Z.e_i, where e_i are the vectors of the canonical basis. These vectors obviously form an orthonormal basis. Moreover, these vectors are eigenvectors of the matrix A.
Hence, the columns of the unitary matrix Z are eigenvectors of the unitary matrix A and form an orthonormal basis.
As a consequence, computing a Schur decomposition of a unitary matrix is equivalent to finding one of its orthogonal basis of eigenvectors.
ZGEESX computes the eigenvalues, the Schur form, and, optionally, the matrix of Schur vectors for GE matrices
The resulting T can also be tested to check that A is unitary.
Here is a piece of python code testing it, though scipy's scipy.linalg.schur makes use of Lapack's zgees for Schur decomposition. I used hpaulj's code to generate random unitary matrix as shown in How to create random orthonormal matrix in python numpy
import numpy as np
import scipy.linalg
#from hpaulj, https://stackoverflow.com/questions/38426349/how-to-create-random-orthonormal-matrix-in-python-numpy
def rvs(dim=3):
random_state = np.random
H = np.eye(dim)
D = np.ones((dim,))
for n in range(1, dim):
x = random_state.normal(size=(dim-n+1,))
D[n-1] = np.sign(x[0])
x[0] -= D[n-1]*np.sqrt((x*x).sum())
# Householder transformation
Hx = (np.eye(dim-n+1) - 2.*np.outer(x, x)/(x*x).sum())
mat = np.eye(dim)
mat[n-1:, n-1:] = Hx
H = np.dot(H, mat)
# Fix the last sign such that the determinant is 1
D[-1] = (-1)**(1-(dim % 2))*D.prod()
# Equivalent to np.dot(np.diag(D), H) but faster, apparently
H = (D*H.T).T
return H
n=42
A= rvs(n)
A = A.astype(complex)
T,Z=scipy.linalg.schur(A,output='complex',lwork=None,overwrite_a=False,sort=None,check_finite=True)
#print T
normT=np.linalg.norm(T,ord=None) #2-norm
eigenvalues=[]
for i in range(n):
eigenvalues.append(T[i,i])
T[i,i]=0.
normTu=np.linalg.norm(T,ord=None)
print 'must be very low if A is unitary: ',normTu/normT
#print Z
for i in range(n):
v=Z[:,i]
w=A.dot(v)-eigenvalues[i]*v
print i,'must be very low if column i of Z is eigenvector of A: ',np.linalg.norm(w,ord=None)/np.linalg.norm(v,ord=None)

Finding certain arrangements of all 2-combinatons for a given list

Given a list L of an even number (2k) of elements, I'm looking for an algorithm to produce a list of 2k-1 sublists with the following properties:
each sublist includes exactly k 2-combinations (pairs where the order does not matter) of elements from L,
each sublist includes every elements from L exactly once, and
the union of all elements from all sublists is exactly the set of all possible 2-combinations of the elements from L.
For example, if the input list is L = [a, b, c, d], we have k = 2 with 3 sublists, each including 2 pairs. A possible solution would look like [[ab, cd], [ac, bd], [ad, bc]]. If we ignore the ordering for all elements in the lists (think of all lists as sets), it turns out that this is also the only solution for k = 2.
My aim now is not only to find a single solution but all possible solutions. As the number of involved combinations grows pretty quickly, it would be nice to have all results be constructed in a clever way instead of generating a huge list of candidates and removing the elements from it that don't satisfy the given properties. Such a naïve algorithm could look like the following:
Find the set C of all 2-combinations for L.
Find the set D of all k-combinations for C.
Choose all sets from D that union equals L, call the new set D'.
Find the set E of all (2k-1)-combinations for D'.
Choose all sets from E that union is the set C, and let the new set be the final output.
This algorithm is easy to implement but it's incredibly slow for bigger input lists. So is there a way to construct the result list more efficently?
Edit: Here is the result for L = [a,b,c,d,e,f] with k = 3, calculated by the above algorithm:
[[[ab,cd,ef],[ac,be,df],[ad,bf,ce],[ae,bd,cf],[af,bc,de]],
[[ab,cd,ef],[ac,bf,de],[ad,be,cf],[ae,bc,df],[af,bd,ce]],
[[ab,ce,df],[ac,bd,ef],[ad,be,cf],[ae,bf,cd],[af,bc,de]],
[[ab,ce,df],[ac,bf,de],[ad,bc,ef],[ae,bd,cf],[af,be,cd]],
[[ab,cf,de],[ac,bd,ef],[ad,bf,ce],[ae,bc,df],[af,be,cd]],
[[ab,cf,de],[ac,be,df],[ad,bc,ef],[ae,bf,cd],[af,bd,ce]]]
All properties are satisfied:
each sublist has k = 3 2-combinations,
each sublist only includes each element once, and
the union of all 2k-1 = 5 sublists for one solution is exactly the set of all possible 2-combinations for L.
Edit 2: Based on user58697's answer, I improved the calculation algorithm by using the round-robin tournament scheduling:
Let S be the result set, starting with an empty set, and P be the set of all permutations of L.
Repeat the following until P is empty:
Select an arbitrary permutation from P
Perform full RRT scheduling for this permutation. In each round, the arrangement of elements from L forms a permutation of L. Remove all these 2k permutations from P.
Add the resulting schedule to S.
Remove all lists from S if the union of their sublists has duplicate elements (i.e. doesn't add up to all 2-combinations of L).
This algorithm is much more performant than the first one. I was able to calculate the number of results for k = 4 as 960 and k = 5 as 67200. The fact that there doesn't seem to be an OEIS result for this sequence makes me wonder if the numbers are actually correct, though, i.e. if the algorithm is producing the complete solution set.
It is a round-robin tournament scheduling:
A pair is a match,
A list is a round (each team plays with some other team)
A set of list is an entire tournament (each team plays each other team exactly once).
Take a look here.
This was an interesting question. In the process of answering it (basically after writing the program included below, and looking up the sequence on OEIS), I learned that the problem has a name and rich theory: what you want is to generate all 1-factorizations of the complete graph K2k.
Let's first restate the problem in that language:
You are given a number k, and a list (set) L of size 2k. We can view L as the vertex set of a complete graph K2k.
For example, with k=3, L could be {a, b, c, d, e, f}
A 1-factor (aka perfect matching) is a partition of L into unordered pairs (sets of size 2). That is, it is a set of k pairs, whose disjoint union is L.
For example, ab-cd-ef is a 1-factor of L = {a, b, c, d, e, f}. This means that a is matched with b, c is matched with d, and e is matched with f. This way, L has been partitioned into three sets {a, b}, {c, d}, and {e, f}, whose union is L.
Let S (called C in the question) denote the set of all pairs of elements of L. (In terms of the complete graph, if L is its vertex set, S is its edge set.) Note that S contains (2k choose 2) = k(2k-1) pairs. So for k = 0, 1, 2, 3, 4, 5, 6…, S has size 0, 1, 6, 15, 28, 45, 66….
For example, S = {ab, ac, ad, ae, af, bc, bd, be, bf, cd, ce, cf, de, df, ef} for our L above (k = 3, so |S| = k(2k-1) = 15).
A 1-factorization is a partition of S into sets, each of which is itself a 1-factor (perfect matching). Note that as each of these matchings has k pairs, and S has size k(2k-1), the partition has size 2k-1 (i.e., is made of 2k-1 matchings).
For example, this is a 1-factorization: {ab-cd-ef, ac-be-df, ad-bf-ce, ae-bd-cf, af-bc-de}
In other words, every element of S (every pair) occurs in exactly one element of the 1-factorization, and every element of L occurs exactly once in each element of the 1-factorization.
The problem asks to generate all 1-factorizations.
Let M denote the set of all 1-factors (all perfect matchings) of L. It is easy to prove that M contains (2k)!/(k!2^k) = 1×3×5×…×(2k-1) matchings. For k = 0, 1, 2, 3, 4, 5, 6…, the size of M is 1, 1, 3, 15, 105, 945, 10395….
For example, for our L above, M = {ab-cd-ef, ab-ce-df, ab-cf-de, ac-bd-ef, ac-be-df, ac-bf-de, ad-bc-ef, ad-be-cf, ad-bf-ce, ae-bc-df, ae-bd-cf, ae-bf-cd, af-bc-de, af-bd-ce, af-be-cd} (For k=3 this number 15 is the same as the number of pairs, but this is just a coincidence as you can from the other numbers: this number grows much faster than the number of pairs.)
M is easy to generate:
def perfect_matchings(l):
if len(l) == 0:
yield []
for i in range(1, len(l)):
first_pair = l[0] + l[i]
for matching in perfect_matchings(l[1:i] + l[i+1:]):
yield [first_pair] + matching
For example, calling perfect_matchings('abcdef') yields the 15 elements ['ab', 'cd', 'ef'], ['ab', 'ce', 'df'], ['ab', 'cf', 'de'], ['ac', 'bd', 'ef'], ['ac', 'be', 'df'], ['ac', 'bf', 'de'], ['ad', 'bc', 'ef'], ['ad', 'be', 'cf'], ['ad', 'bf', 'ce'], ['ae', 'bc', 'df'], ['ae', 'bd', 'cf'], ['ae', 'bf', 'cd'], ['af', 'bc', 'de'], ['af', 'bd', 'ce'], ['af', 'be', 'cd'] as expected.
By definition, a 1-factorization is a partition of S into elements from M. Or equivalently, any (2k-1) disjoint elements of M form a 1-factorization. This lends itself to a straightforward backtracking algorithm:
start with an empty list (partial factorization)
for each matching from the list of perfect matchings, try adding it to the current partial factorization, i.e. check whether it's disjoint (it should not contain any pair already used)
if fine, add it to the partial factorization, and try extending
In code:
matching_list = []
pair_used = defaultdict(lambda: False)
known_matchings = [] # Populate this list using perfect_matchings()
def extend_matching_list(r, need):
"""Finds ways of extending the matching list by `need`, using matchings r onwards."""
if need == 0:
use_result(matching_list)
return
for i in range(r, len(known_matchings)):
matching = known_matchings[i]
conflict = any(pair_used[pair] for pair in matching)
if conflict:
continue # Can't use this matching. Some of its pairs have already appeared.
# Else, use this matching in the current matching list.
for pair in matching:
pair_used[pair] = True
matching_list.append(matching)
extend_matching_list(i + 1, need - 1)
matching_list.pop()
for pair in matching:
pair_used[pair] = False
If you call it with extend_matching_list(0, len(l) - 1) (after populating known_matchings), it generates all 1-factorizations. I've put the full program that does this here. With k=4 (specifically, the list 'abcdefgh'), it outputs 6240 1-factorizations; the full output is here.
It was at this point that I fed the sequence 1, 6, 6240 into OEIS, and discovered OEIS A000438, sequence 1, 1, 6, 6240, 1225566720, 252282619805368320,…. It shows that for k=6, the number of solutions ≈2.5×1017 means that we can give up hope of generating all solutions. Even for k=5, the ≈1 billion solutions (recall that we're trying to find 2k-1=9 disjoint sets out of the |M|=945 matchings) will require some carefully optimized programs.
The first optimization (which, embarrassingly, I only realized later by looking closely at trace output for k=4) is that (under natural lexicographic numbering) the index of the first matching chosen in the partition cannot be greater than the number of matchings for k-1. This is because the lexicographically first element of S (like "ab") occurs only in those matchings, and if we start later than this one we'll never find it again in any other matching.
The second optimization comes from the fact that the bottleneck of a backtracking program is usually the testing for whether a current candidate is admissible. We need to test disjointness efficiently: whether a given matching (in our partial factorization) is disjoint with the union of all previous matchings. (Whether any of its k pairs is one of the pairs already covered by earlier matchings.) For k=5, it turns out that the size of S, which is (2k choose 2) = 45, is less than 64, so we can compactly represent a matching (which is after all a subset of S) in a 64-bit integer type: if we number the pairs as 0 to 44, then any matching can be represented by an integer having 1s in the positions corresponding to elements it contains. Then testing for disjointness is a simple bitwise operation on integers: we just check whether the bitwise-AND of the current candidate matching and the cumulative union (bitwise-OR) of previous matchings in our partial factorization is zero.
A C++ program that does this is here, and just the backtracking part (specialized for k=5) does not need any C++ features so it's extracted out as a C program here. It runs in about 4–5 hours on my laptop, and finds all 1225566720 1-factorizations.
Another way to look at this problem is to say that two elements of M have an edge between them if they intersect (have a pair (element of S) in common), and that we're looking for all maximum independent set in M. Again, the simplest way to solve that problem would still probably be backtracking (we'd write the same program).
Our programs can be made quite a lot more efficient by exploiting the symmetry in our problem: for example we could pick any matching as our first 1-factor in the 1-factorization (and then generate the rest by relabelling, being careful not to avoid duplicates). This is how the number of 1-factorizations for K12 (the current record) was calculated.
A note on the wisdom of generating all solutions
In The Art of Computer Programming Volume 4A, at the end of section 7.2.1.2 Generating All Permutations, Knuth has this important piece of advice:
Think twice before you permute. We have seen several attractive algorithms for permutation generation in this section, but many algorithms are known by which permutations that are optimum for particular purposes can be found without running through all possibilities. For example, […] the best way to arrange records on a sequential storage […] takes only O(n log n) steps. […] the assignment problem, which asks how to permute the columns of a square matrix so that the sum of the diagonal elements is maximized […] can be solved in at most O(n3) operations, so it would be foolish to use a method of order n! unless n is extremely small. Even in cases like the traveling salesrep problem, when no efficient algorithm is known, we can usually find a much better approach than to examine every possible solution. Permutation generation is best used when there is good reason to look at each permutation individually.
This is what seems to have happened here (from the comments below the question):
I wanted to calculate all solutions to run different attribute metrics on these and find an optional match […]. As the number of results seems to grow quicker than expected, this is impractical.
Generally, if you're trying to "generate all solutions" and you don't have a very good reason for looking at each one (and one almost never does), there are many other approaches that are preferable, ranging from directly trying to solve an optimization problem, to generating random solutions and looking at them, or generating solutions from some subset (which is what you seem to have done).
Further reading
Following up references from OEIS led to a rich history and theory.
On 1-factorizations of the complete graph and the relationship to round robin schedules, Gelling (M. A. Thesis), 1973
On the number of 1-factorizations of the complete graph, Charles C Lindner, Eric Mendelsohn, Alexander Rosa (1974?) -- this shows that the number of nonisomorphic 1-factorizations on K2n goes to infinity as n goes to infinity.
E. Mendelsohn and A. Rosa. On some properties of 1-factorizations of complete graphs. Congr. Numer, 24 (1979): 739–752
E. Mendelsohn and A. Rosa. One factorizations of the complete graph: A survey. Journal of Graph Theory, 9 (1985): 43–65 (As long ago as 1985, this exact question was studied well-enough to need a survey!)
Via papers of Dinitiz:
D. K. Garnick and J. H. Dinitz, On the number of one-factorizations of the complete graph on 12 points, Congressus Numerantium, 94 (1993), pp. 159-168. They announced they were computing the number of nonisomorphic 1-factorizations of K12. Their algorithm was basically backtracking.
Jeffrey H. Dinitz, David K. Garnick, Brendan D. McKay: There are 526,915,620 nonisomorphic one-factorizations of K12 (also here), Journal of Combinatorial Designs 2 (1994), pp. 273 - 285: They completed the computation, and reported the numbers they found for K12 (526,915,620 nonisomorphic, 252,282,619,805,368,320 total).
Various One-Factorizations of Complete Graphs by Gopal, Kothapalli, Venkaiah, Subramanian (2007). A paper that is relevant to this question, and has many useful references.
W. D. Wallis, Introduction to Combinatorial Designs, Second Edition (2007). Chapter 10 is "One-Factorizations", Chapter 11 is "Applications of One-Factorizations". Both are very relevant and have many useful references.
Charles J. Colbourn and Jeffrey H. Dinitz, Handbook of Combinatorial Designs, Second Edition (2007). A goldmine. See chapters VI.3 Balanced Tournament Designs, VI.51 Scheduling a Tournament, VII.5 Factorizations of Graphs (including its sections 5.4 Enumeration and Tables, 5.5 Some 1-Factorizations of Complete Graphs), VII.6 Computational Methods in Design Theory (6.2 Exhaustive Search). This last chapter references:
[715] How K12 was calculated ("orderly algorithm"), a backtracking -- the Dinitz-Garnick-McKay paper mentioned above
[725] “Contains, among many other subjects related to factorization, a fast algorithm for finding 1-factorizations of K2n.” ("Room squares and related designs", J. H. Dinitz and S. R. Stinson)
[1270] (P. Kaski and P. R. J. Östergård, One-factorizations of regular graphs of order 12, Electron. J. Comb. 12, Research Paper 2, 25 pp. (2005))
[1271] “Contains the 1-factorizations of complete graphs up to order 10 in electronic form.” (P. Kaski and P. R. J. Östergård, Classification Algorithms for Codes and Designs, Springer, Berlin, 2006.)
[1860] “A survey on perfect 1-factorizations of K2n” (E. S. Seah, Perfect one-factorizations of the complete graph—A survey, Bull. Inst. Combin. Appl. 1 (1991) 59–70)
[2107] “A survey of 1-factorizations of complete graphs including most of the material of this chapter.” W. D. Wallis, One-factorizations of complete graphs, in Dinitz and Stinson (ed), Contemporary Design Theory, 1992
[2108] “A book on 1-factorizations of graphs.” W. D. Wallis, "One-Factorizations", Kluwer, Dordrecht, 1997
Some other stuff:
*Factors and Factorizations of Graphs by Jin Akiyama and Mikio Kano (2007). This looks like a great book. “Frank Harary predicted that graph theory will grow so much that each chapter of his book Graph Theory will eventually expand to become a book on its own. He was right. This book is an expansion of his Chapter 9, Factorization.” There's not much about this particular topic (1-factorizations of complete graphs), but there is a proof in Chapter 4 (Theorem 4.1.1) that K2n always has a 1-factorization.
Papers on special types of 1-factorizations:
[Symmetry Groups Of] Some Perfect 1-Factorizations Of Complete Graphs, B. A. Anderson, 1977 (1973). Considers 1-factorizations that are in fact "perfect", having the property that the union of any two 1-factors (matchings) is a Hamiltonian cycle. (There's one up to isomorphism for K2k k ≤ 5, and two for K12.)
On 4-semiregular 1-factorizations of complete graphs and complete bipartite graphs.
Low Density MDS Codes and Factors of Complete Graphs -- also about perfect 1-factorizations
Self-invariant 1-Factorizations of Complete Graphs and Finite Bol Loops of Exponent 2
See also OEIS index entry for [sequences related to tournaments].
AMS feature column: Mathematics and Sports (April 2010) -- despite the overly broad name, is quite related.

Solving double integral numerically in matlab

In the paper "The fractional Laplacian operator on bounded domains as a special case of the nonlocal diffusion operator". Where the author has solved a fractional laplacian equation on bounded domain as a non-local diffusion equation.
I am trying to implement the finite element approximation of the one dimensional problem(please refer to page 14 of the above mentioned paper) in matlab.
I am using the following definition of $\phi_k$ as it is mentioned in the paper that $\phi$ is a $hat\;function$
\begin{equation}
\phi_{k}(x)=\begin{cases} {x-x_{k-1} \over x_k\,-x_{k-1}} & \mbox{ if } x \in [x_{k-1},x_k], \\
{x_{k+1}\,-x \over x_{k+1}\,-x_k} & \mbox{ if } x \in [x_k,x_{k+1}], \\
0 & \mbox{ otherwise},\end{cases}
\end{equation}
$\Omega=(-1,1)$ and $\Omega_I=(-1-\lambda,-1) \cup (1,1+\lambda)$ so that $\Omega\cup\Omega_I=(-1-\lambda,1+\lambda)$
For the integers K,N we define the partition of $\overline{\Omega\cup\Omega_I}=[-1-\lambda,1+\lambda]$ as,
\begin{equation}
-1-\lambda=x_{-K}<...
Finally the equations that we have to solve to get the solution $\tilde{u_N}=\sum_{i=-K}^{K+N}U_j\phi_j(x)$ for some coefficients $U_j$ is:
Where $i=1,...,N-1$.
I need pointers in order to simplify and solve the LHS double integral in matlab.It is written in the paper(page 15) that I should use four point gauss quadrature for inner integral and quadgk.m function for outer integral, but since the limits of the inner integral are in terms of x how can I apply four point gauss quadrature on it??.Any help will be appreciated.
Thanks.
You can find the original question here.(Since SO does not support Latex)
For a first stab at the problem, take a look at dblquad and/or quad2d.
In the end, you'll want custom quadrature methods, so you should do something like the following:
% The integrand is of course a function of both x and y
integrand = #(x,y) (phi_j(y) - phi_j(x))*(phi_i(y) - phi_i(x))/abs(y-x)^(2*s+1);
% The inner integral is a function of x, and integrates over y
inner = #(x) quadgk(#(y)integrand(x,y), x-lambda, x+lambda);
% The inner integral is integrated over x to yield the value of the double integral
dblIntegral = quadgk(inner, -(1+lambda), 1+lambda)
where I've used quadgk twice, but you can replace by any other (custom) quadrature method you please.
By the way -- what is the reason for the authors to suggest a (non-adaptive) 4-point Gauss method? That way, you have no estimation of (and/or control over) the errors made in the inner integral...
You can do a 4 point 1D Gaussian quadrature. You seem to assume that it means a 2D integral. Not so - this is assuming a higher-order quadrature over 1D.
If you're solving a 1D finite element problem, it makes no sense whatsoever to integrate over a 2D domain.
I didn't read the paper, but that's what I recall from FEA that I learned.

How do I convert between a measure of similarity and a measure of difference (distance)?

Is there a general way to convert between a measure of similarity and a measure of distance?
Consider a similarity measure like the number of 2-grams that two strings have in common.
2-grams('beta', 'delta') = 1
2-grams('apple', 'dappled') = 4
What if I need to feed this to an optimization algorithm that expects a measure of difference, like Levenshtein distance?
This is just an example...I'm looking for a general solution, if one exists. Like how to go from Levenshtein distance to a measure of similarity?
I appreciate any guidance you may offer.
Let d denotes distance, s denotes similarity. To convert distance measure to similarity measure, we need to first normalize d to [0 1], by using d_norm = d/max(d). Then the similarity measure is given by:
s = 1 - d_norm.
where s is in the range [0 1], with 1 denotes highest similarity (the items in comparison are identical), and 0 denotes lowest similarity (largest distance).
If your similarity measure (s) is between 0 and 1, you can use one of these:
1-s
sqrt(1-s)
-log(s)
(1/s)-1
Doing 1/similarity is not going to keep the properties of the distribution.
the best way is
distance (a->b) = highest similarity - similarity (a->b).
with highest similarity being the similarity with the biggest value. You hence flip your distribution.
the highest similarity becomes 0 etc
Yes, there is a most general way to change between similarity and distance: a strictly monotone decreasing function f(x).
That is, with f(x) you can make similarity = f(distance) or distance = f(similarity). It works in both directions. Such function works, because the relation between similarity and distance is that one decreases when the other increases.
Examples:
These are some well-known strictly monotone decreasing candidates that work for non-negative similarities or distances:
f(x) = 1 / (a + x)
f(x) = exp(- x^a)
f(x) = arccot(ax)
You can choose parameter a>0 (e.g., a=1)
Edit 2021-08
A very practical approach is to use the function sim2diss belonging to the statistical software R. This functions provides a up to 13 methods to compute dissimilarity from similarities. Sadly the methods are not at all explained: you have to look into the code :-\
similarity = 1/difference
and watch out for difference = 0
According to scikit learn:
Kernels are measures of similarity, i.e. s(a, b) > s(a, c) if objects a and b are considered “more similar” than objects a and c. A kernel must also be positive semi-definite.
There are a number of ways to convert between a distance metric and a similarity measure, such as a kernel. Let D be the distance, and S be the kernel:
S = np.exp(-D * gamma), where one heuristic for choosing gamma is 1 /
num_features
S = 1. / (D / np.max(D))
In the case of Levenshtein distance, you could increase the sim score by 1 for every time the sequences match; that is, 1 for every time you didn't need a deletion, insertion or substitution. That way the metric would be a linear measure of how many characters the two strings have in common.
In one of my projects (based on Collaborative Filtering) I had to convert between correlation (cosine between vectors) which was from -1 to 1 (closer 1 is more similar, closer to -1 is more diverse) to normalized distance (close to 0 the distance is smaller and if it's close to 1 the distance is bigger)
In this case: distance ~ diversity
My formula was: dist = 1 - (cor + 1)/2
If you have similarity to diversity and the domain is [0,1] in both cases the simlest way is:
dist = 1 - sim
sim = 1 - dist
Cosine similarity is widely used for n-gram count or TFIDF vectors.
from math import pi, acos
def similarity(x, y):
return sum(x[k] * y[k] for k in x if k in y) / sum(v**2 for v in x.values())**.5 / sum(v**2 for v in y.values())**.5
Cosine similarity can be used to compute a formal distance metric according to wikipedia. It obeys all the properties of a distance that you would expect (symmetry, nonnegativity, etc):
def distance_metric(x, y):
return 1 - 2 * acos(similarity(x, y)) / pi
Both of these metrics range between 0 and 1.
If you have a tokenizer that produces N-grams from a string you could use these metrics like this:
>>> import Tokenizer
>>> tokenizer = Tokenizer(ngrams=2, lower=True, nonwords_set=set(['hello', 'and']))
>>> from Collections import Counter
>>> list(tokenizer('Hello World again and again?'))
['world', 'again', 'again', 'world again', 'again again']
>>> Counter(tokenizer('Hello World again and again?'))
Counter({'again': 2, 'world': 1, 'again again': 1, 'world again': 1})
>>> x = _
>>> Counter(tokenizer('Hi world once again.'))
Counter({'again': 1, 'world once': 1, 'hi': 1, 'once again': 1, 'world': 1, 'hi world': 1, 'once': 1})
>>> y = _
>>> sum(x[k]*y[k] for k in x if k in y) / sum(v**2 for v in x.values())**.5 / sum(v**2 for v in y.values())**.5
0.42857142857142855
>>> distance_metric(x, y)
0.28196592805724774
I found the elegant inner product of Counter in this SO answer

Resources