Fastest way to find triangles from connected line points - algorithm

I want to find out a way to detect all triangles from connected lines. I have array list with all available lines called lines, and when I add another line I want to find if it creates new triangle. As shown on the image below as soon as I add new line f, that connects the points with indices 1,3. I want fast way to detect that new triangle is created and add the indices 1,3,5 to another array list called triangles.
// array with lines indices, each line contains two point indices
val lines: ArrayList<Int> = arrayListOf(
0,1, // a
1,5, // b
5,4, // c
5,3, // d
5,2 // e
)
val triangles: ArrayList<Int> = arrayListOf()
One way I came out with, is to have an array list that contains separate array lists for each point with all the indices it is connected to. For example the point with index 5 is connected to the points with indices [1,2,3,4]. So when a new line that connect the indices 3,1 is added, we can check if the
array list circlesHash[3] and circlesHash[1] contains equal elements. In our case both arrays contain the index 5, that means triangle with indices 3,1,5 is created. If the array lists are sorted I can use the method Arrays.binarySearch. But if I have large amount of points for example 1000-2000 points, what is the quickest way to find if new triangle is created?
val circlesHash: ArrayList<ArrayList<Int>> = arrayListOf(
arrayListOf(1), // 0
arrayListOf(0,5), // 1
arrayListOf(5), // 2
arrayListOf(5), // 3
arrayListOf(5), // 4
arrayListOf(1,2,3,4) // 5
)
And another question is if there is existing algorithm to find all the existing triangles just from the indices of the existing lines. And what structure is preferable HashMaps, HashSet or some sort of Binary tree??

The simplest solution would be to pair edges by shared points. So two edges are paired if one end-point is the same. Then for each pair choose the two nodes that are not shared by the edges, insert an edge between and you've formed a triangle. Assuming your graph isn't a multigraph this algorithm won't produce any duplicates and finds all triangles.
Taking it from there we can create a list of edges that don't exist and will create a triangle. As pseudocode:
triangle_candidates(g):
triangle_edges = set()
for e1 in g.edges:
for e2 in g.edges:
if e1 and e2 share a node:
a, b = nodes not shared by e1 and e2
if a and b are not neighbors:
triangle_edges.add(tuple(a, b))
return triangle_edges
creates_new_triangle(g, a, b):
return tuple(a, b) in triangle_edges(g)
Note that tuples are assumed to be non-ordered! The above code is a demonstration of the basic principle; there's still plenty of room for optimizations in triangle_candidates.
The basic idea is to create a set of all node-pairs that are not neighbors, but would complete a cycle of length three if connected. Checking whether a new edge would create a triangle is a simple set-lookup, which should be fairly fast assuming a proper implementation.

Related

Generate random graph with probability p

Write a function in main.cpp, which creates a random graph of a certain size as follows. The function takes two parameters. The first parameter is the number of vertices n. The second parameter p (1 >= p >= 0) is the probability that an edge exists between a pair of nodes. In particular, after instantiating a graph with n vertices and 0 edges, go over all possible vertex pairs one by one, and for each such pair, put an edge between the vertices with probability p.
How to know if an edge exists between two vertices .
Here is the full question
PS: I don't need the code implementation
The problem statement clearly says that the first input parameter is the number of nodes and the second parameter is the probability p that an edge exists between any 2 nodes.
What you need to do is as follows (Updated to amend a mistake that was pointed out by #user17732522):
1- Create a bool matrix (2d nested array) of size n*n initialized with false.
2- Run a loop over the rows:
- Run an inner loop over the columns:
- if row_index != col_index do:
- curr_p = random() // random() returns a number between 0 and 1 inclusive
- if curr_p <= p: set matrix[row_index][col_index] = true
else: set matrix[row_index][col_index] = false
- For an undirected graph, also set matrix[col_index][row_index] = true/false based on curr_p
Note: Since we are setting both cells (both directions) in the matrix in case of a probability hit, we could potentially set an edge 2 times. This doesn't corrupt the correctnes of the probability and isn't much additional work. It helps to keep the code clean.
If you want to optimize this solution, you could run the loop such that you only visit the lower-left triangle (excluding the diagonal) and just mirror the results you get for those cells to the upper-right triangle.
That's it.

Looking for algorithm to match up objects from 2 lists depending of distance

So I have 2 lists of objects, with a position for each one. I would like to match every object from the first list with an object of the second list.
Once the object of the second list is selected for a match up, we remove it from the list (thus it can not be matched with another one). And most importantly, the total sum of distances between the matched up objects should be the least possible.
For example:
list1 { A, B, C } list2 { X, Y, Z }
So if I match up A->X (dist: 3meters) B->Z (dist: 2meters) C->Y (dist: 4meters)
Total sum = 3 + 2 + 4 = 9meters
We could have another match up with A->Y (4meters) B->X (1meter) C->Z (3meters)
Total sum = 4 + 1 + 3 = 8meters <======= Better solution
Thank you for your help.
Extra: Lists could have different length.
This problem is known as the Assignment Problem (a weighted matching in bipartite graphs).
An algorithm which solves this is the Hungarian algorithm. At the bottom of the wikipedia article is also a list of implementations.
If your data has special properties, like your two sets are 2D points and the weight of an edge is the euclidean distance, then there are better algorithms for this.

Implementing cartesian product, such that it can skip iterations

I want to implement a function which will return cartesian product of set, repeated given number. For example
input: {a, b}, 2
output:
aa
ab
bb
ba
input: {a, b}, 3
aaa
aab
aba
baa
bab
bba
bbb
However the only way I can implement it is firstly doing cartesion product for 2 sets("ab", "ab), then from the output of the set, add the same set. Here is pseudo-code:
function product(A, B):
result = []
for i in A:
for j in B:
result.append([i,j])
return result
function product1(chars, count):
result = product(chars, chars)
for i in range(2, count):
result = product(result, chars)
return result
What I want is to start computing directly the last set, without computing all of the sets before it. Is this possible, also a solution which will give me similar result, but it isn't cartesian product is acceptable.
I don't have problem reading most of the general purpose programming languages, so if you need to post code you can do it in any language you fell comfortable with.
Here's a recursive algorithm that builds S^n without building S^(n-1) "first". Imagine an infinite k-ary tree where |S| = k. Label with the elements of S each of the edges connecting any parent to its k children. An element of S^m can be thought of as any path of length m from the root. The set S^m, in that way of thinking, is the set of all such paths. Now the problem of finding S^n is a problem of enumerating all paths of length n - and we can name a path by considering the sequence of edge labels from beginning to end. We want to directly generate S^n without first enumerating all of S^(n-1), so a depth-first search modified to find all nodes at depth n seems appropriate. This is essentially how the below algorithm works:
// collection to hold generated output
members = []
// recursive function to explore product space
Products(set[1...n], length, current[1...m])
// if the product we're working on is of the
// desired length then record it and return
if m = length then
members.append(current)
return
// otherwise we add each possible value to the end
// and generate all products of the desired length
// with the new vector as a prefix
for i = 1 to n do
current.addLast(set[i])
Products(set, length, current)
currents.removeLast()
// reset the result collection and request the set be generated
members = []
Products([a, b], 3, [])
Now, a breadth-first approach is no less efficient than a depth-first one, and if you think about it would be no different from exactly what you're already doing. Indeed, and approach that generates S^n must necessarily generate S^(n-1) at least once, since that can be found in a solution to S^n.

How to store a Euler graph struct?

I'm working around the Euler Path issue and found a problem:How to define or store a Euler graph struct?
An usual way is using an "Adjoint Matrix",C[i][j] is defined to store the edge between i-j.It's concise and effective! But this kind of matrix is limited by the situation that the edge between 2 nodes is unique (figure 1).
class EulerPath
{
int[][] c;//adjoint matrix,c[i][j] means the edge between i and j
}
What if there are several edges (figure 2)?My solution might be using customized class ,like "Graph","Node","Edge" to store a graph,but dividing the graph into some discrete structs ,which means we have to take more class details into consideration,may hurt the efficiency and concision. So I'm very eager to hear your advice!Thanks a lot!
class EulerPath
{
class Graph
{
Node[] Nodes;
Edge[] Edges;
}
class Node{...}
class Edge{...}
}
You can use an adjacency matrix to store graphs with multi-edges. You just let the value of c[i][j] be the number of times that vertex i is adjacent to vertex j. In your first case, it's 1, in your second case, it's 3. See also Wikipedia -- adjacency matrices aren't defined as being composed of only 1 and 0, that's just the special case of an adjacency matrix for a simple graph.
EDIT: You can represent your second graph in an adjacency matrix like this:
1 2 3 4
1 0 3 1 1
2 3 0 1 1
3 1 1 0 0
4 1 1 0 0
You can do this in at least three ways:
Adjacency list
Meaning that you have a 2D array called al[N][N]
al[N][N] This N is the node index
al[N][N] This N is the neighbor node index
Example, a graph with this input:
0 => 1
1 => 2
2 => 3
3 => 1
The adjacency list will look like this:
0 [1]
1 [2,3]
2 [1,3]
3 [1,2]
PS: Since this is a 2D array, and not all horizontal cells are going to be used, you need to keep track of the number of connected neighbours for each graph index because some programming languages initialise array values with a zero which is a node index in the graph. This can be done easily by creating another array that will count the number of neighbours for each graph index. Example of this case: numLinks: [1,2,2,2]
Matrix
With a matrix, you create an N x N 2D array, and you put a 1 value in the intersection of row col neighobor nodes:
Example with the same input above:
0 1 2 3
0 0 1 0 0
1 1 0 1 1
2 0 1 0 1
3 0 1 1 0
Class Node
The last method is creating a class called Node that contain a dynamic array of type Node. And you can store in this array the other nodes connected
Consider using a vector of linked list. Add a class that will have a field for a Vertex as well as the Weight (let's name it Entry). Your weights should be preferably another vector or linked list (preferably ll) which will contain all possible weights to the according Vertex. Your main class will have a vector of vectors, or a vector of linked lists (I'd prefer linked lists since you will most likely not need random access, being forced to iterate through every Entry when performing any operation). You main class will have one more vector containing all vertices. In C++ this would look like this:
class Graph{
std::vector<std::forward_list<Entry>> adj_list;
std::vector<Vertex> vertices;
};
Where the Vertex that corresponds to vertices[i] has the corresponding list in adj_list[i]. Since every Entry contains the info regarding the Vertex to which you are connected and the according weights, you will have your graph represented by this class.
Efficiency for what type of operation?
If you want to find a route between two IP addresses on the internet, then your adjacency matrix might be a million nodes squared, ie a gigabyte of entries. And as finding all the nodes connected to a given node goes up as n, you could be looking at a million lookups per node just to find the nodes connected to that node. Horribly inefficient.
If your problem only involves a few nodes and is run infrequently, then adjacency matrices are simple and intuitive.
For most problems which involve traversing graphs, a better solution could be to create a class called node, which has a property a collection (say a List) of all the nodes it is connected to. For most real world applications, the list of connected nodes is much less than the total number of all nodes, so this works out as more compact. Plus it is highly efficient in finding edges - you can get a list of all connected nodes in fixed time per node.
If you use this structure, where you have a node class which contains as a property a collection of all the nodes it is connected to, then when you create a new edge (say between node A and node B) then you add B to the collection of nodes to which A is connected, and A to the collection of nodes to which B is connected. Excuse my Java/C#, something like
class Node{
Arraylist<Node> connectedNodes;
public Node() // initializer
{
connectedNodes = new ArrayList<Node>;
}
}
// and somewhere else you have this definition:
public addEdgeBetween(Node firstNode, Node secondNode) {
firstNode.connectedNodes.Add(secondNode);
secondNode.connectedNodes.Add(firstNode);
}
And similarly to delete an edge, remove the reference in A to B's collection and vice versa. There is no need to define a separate edge class, edges are implicit in the structure which cross-links the two nodes.
And that's about all you have to do to implement this structure, which is (for most real world problems) uses far less memory than an adjacency matrix, is much faster for large numbers of nodes for most problems, and is ultimately far more flexible.
Defining a node class also opens up a logical place to add enhancements of many sorts. For example, you might decide to generate for each node a list of all the nodes which are two steps away, because this improves path finding. You can easily add this in as another collection within the node class; this would be a pretty messy thing to do with adjacency matrices. You can obviously squeeze a lot more functionality into a class than a into a matrix of ints.
Your question concerning multiple links is unclear to me. If you want multiple edges between the same two points, then this can be accommodated in both ways of doing it. In adjacency matrices, simply have a number at that row and column which indicates the number of links. If you use a node class, just add each edge separately. Similarly directional graphs; an edge pointing from A to B has a reference to B in A's list of connected nodes, but B doesn't have A in its list.

Algorithm/Data Structure for finding combinations of minimum values easily

I have a symmetric matrix like shown in the image attached below.
I've made up the notation A.B which represents the value at grid point (A, B). Furthermore, writing A.B.C gives me the minimum grid point value like so: MIN((A,B), (A,C), (B,C)).
As another example A.B.D gives me MIN((A,B), (A,D), (B,D)).
My goal is to find the minimum values for ALL combinations of letters (not repeating) for one row at a time e.g for this example I need to find min values with respect to row A which are given by the calculations:
A.B = 6
A.C = 8
A.D = 4
A.B.C = MIN(6,8,6) = 6
A.B.D = MIN(6, 4, 4) = 4
A.C.D = MIN(8, 4, 2) = 2
A.B.C.D = MIN(6, 8, 4, 6, 4, 2) = 2
I realize that certain calculations can be reused which becomes increasingly important as the matrix size increases, but the problem is finding the most efficient way to implement this reuse.
Can point me in the right direction to finding an efficient algorithm/data structure I can use for this problem?
You'll want to think about the lattice of subsets of the letters, ordered by inclusion. Essentially, you have a value f(S) given for every subset S of size 2 (that is, every off-diagonal element of the matrix - the diagonal elements don't seem to occur in your problem), and the problem is to find, for each subset T of size greater than two, the minimum f(S) over all S of size 2 contained in T. (And then you're interested only in sets T that contain a certain element "A" - but we'll disregard that for the moment.)
First of all, note that if you have n letters, that this amounts to asking Omega(2^n) questions, roughly one for each subset. (Excluding the zero- and one-element subsets and those that don't include "A" saves you n + 1 sets and a factor of two, respectively, which is allowed for big Omega.) So if you want to store all these answers for even moderately large n, you'll need a lot of memory. If n is large in your applications, it might be best to store some collection of pre-computed data and do some computation whenever you need a particular data point; I haven't thought about what would work best, but for example computing data only for a binary tree contained in the lattice would not necessarily help you anything beyond precomputing nothing at all.
With these things out of the way, let's assume you actually want all the answers computed and stored in memory. You'll want to compute these "layer by layer", that is, starting with the three-element subsets (since the two-element subsets are already given by your matrix), then four-element, then five-element, etc. This way, for a given subset S, when we're computing f(S) we will already have computed all f(T) for T strictly contained in S. There are several ways that you can make use of this, but I think the easiest might be to use two such subset S: let t1 and t2 be two different elements of T that you may select however you like; let S be the subset of T that you get when you remove t1 and t2. Write S1 for S plus t1 and write S2 for S plus t2. Now every pair of letters contained in T is either fully contained in S1, or it is fully contained in S2, or it is {t1, t2}. Look up f(S1) and f(S2) in your previously computed values, then look up f({t1, t2}) directly in the matrix, and store f(T) = the minimum of these 3 numbers.
If you never select "A" for t1 or t2, then indeed you can compute everything you're interested in while not computing f for any sets T that don't contain "A". (This is possible because the steps outlined above are only interesting whenever T contains at least three elements.) Good! This leaves just one question - how to store the computed values f(T). What I would do is use a 2^(n-1)-sized array; represent each subset-of-your-alphabet-that-includes-"A" by the (n-1) bit number where the ith bit is 1 whenever the (i+1)th letter is in that set (so 0010110, which has bits 2, 4, and 5 set, represents the subset {"A", "C", "D", "F"} out of the alphabet "A" .. "H" - note I'm counting bits starting at 0 from the right, and letters starting at "A" = 0). This way, you can actually iterate through the sets in numerical order and don't need to think about how to iterate through all k-element subsets of an n-element set. (You do need to include a special case for when the set under consideration has 0 or 1 element, in which case you'll want to do nothing, or 2 elements, in which case you just copy the value from the matrix.)
Well, it looks simple to me, but perhaps I misunderstand the problem. I would do it like this:
let P be a pattern string in your notation X1.X2. ... .Xn, where Xi is a column in your matrix
first compute the array CS = [ (X1, X2), (X1, X3), ... (X1, Xn) ], which contains all combinations of X1 with every other element in the pattern; CS has n-1 elements, and you can easily build it in O(n)
now you must compute min (CS), i.e. finding the minimum value of the matrix elements corresponding to the combinations in CS; again you can easily find the minimum value in O(n)
done.
Note: since your matrix is symmetric, given P you just need to compute CS by combining the first element of P with all other elements: (X1, Xi) is equal to (Xi, X1)
If your matrix is very large, and you want to do some optimization, you may consider prefixes of P: let me explain with an example
when you have solved the problem for P = X1.X2.X3, store the result in an associative map, where X1.X2.X3 is the key
later on, when you solve a problem P' = X1.X2.X3.X7.X9.X10.X11 you search for the longest prefix of P' in your map: you can do this by starting with P' and removing one component (Xi) at a time from the end until you find a match in your map or you end up with an empty string
if you find a prefix of P' in you map then you already know the solution for that problem, so you just have to find the solution for the problem resulting from combining the first element of the prefix with the suffix, and then compare the two results: in our example the prefix is X1.X2.X3, and so you just have to solve the problem for
X1.X7.X9.X10.X11, and then compare the two values and choose the min (don't forget to update your map with the new pattern P')
if you don't find any prefix, then you must solve the entire problem for P' (and again don't forget to update the map with the result, so that you can reuse it in the future)
This technique is essentially a form of memoization.

Resources