Related
I have been using Union-Find (Disjoint set) for a lot of graph problems and know how this works. But I have almost always used this data structure with integers or numbers. While solving this leetcode problem I need to group strings and I am thinking of using Union-Find for this. But I do not know how to use this with strings. Looking for suggestions.
TLDR: Use the same union find code you would for an integer/number, but use a hash map instead of an array to store the parent of each element in the union find. This approach generalizes to any data type that can be stored in hash map, not just strings, i.e. in the code below the two unordered maps could have something other than strings or ints as keys.
class UnionFind {
public:
string find(string s) {
string stringForPathCompression = s;
while(parent[s] != s) s = parent[s];
// The following while loop implements what is known as path compression, which reduces the time complexity.
while(stringForPathCompression != s) {
string temp = parent[stringForPathCompression];
parent[stringForPathCompression] = s;
stringForPathCompression = temp;
}
return s;
}
void unify(string s1, string s2) {
string rootS1 = find(s1), rootS2 = find(s2);
if(rootS1 == rootS2) return;
// we unify the smaller component to the bigger component, thus preserving the bigger component.
// this is known as union by size, and reduces the time complexity
if(sz[rootS1] < sz[rootS2]) parent[rootS1] = rootS2, sz[rootS2] += sz[rootS1];
else parent[rootS2] = rootS1, sz[rootS1] += sz[rootS2];
}
private:
// If we were storing numbers in our union find, both of the hash maps below could be arrays
unordered_map<string, int> sz; // aka size.
unordered_map<string, string> parent;
};
Union Find doesn't really care what kind of data is in the objects. You can decide what strings to union in your main code, and then union find their representative values.
I am working on a project that heavily uses a tree structure for data processing. I am looking for a method to find matching patterns in the tree. for example, consider a tree like:
(1:a) ----- (2:b) ---- (4:c) ---- (5:e) ---- (8:c) ---- (9:f)
|---- (3:d) |--- (6:f) |--- (10:g)
|--- (7:g)
( 1 has two children 2 and 3, and 4 has children 5,6,7, and 8 has children 9 and 10 ) and the letters are the values of each node.
i need to find all the occurrences of something like
c ---- f
|--- g
which should return 4 and 8 as the indexes of the parent nodes. What is a good algorithm for that? It probably is BFS, but is there a more specialized search algorithm for this kind of searches?
This is some of my theory crafting, so feel free to correct me when I am wrong.
It is influenced by a prefix/suffix trie structure, which enables one to find matching substrings in a string. Although the Data Structure I will choose will be more tree-like, it will also be very graph-like in nature, by connecting references of nodes.
The output will ultimately (hopefully) show all indexes of the sub-tree roots that contains the pattern in a fast time.
The data structure I will decide to use is similar to a tree node, which contains the string value, indexes of every location of where this occurs, indexes of all possible parents of nodes containing the common value, and childs are stored as a Map for O(1) best case searching.
All following codes are done in C#.
public class Node
{
public String value; //This will be the value. ie: “a”
public Dictionary<int, int> connections; //connections will hold {int reference (key), int parent (value)} pairs
public Dictionary<String, Node> childs; //This will contain all childs, with it’s value
//as the key.
public Node()
{
connections = new Dictionary<int, int>();
childs = new Dictionary<String, Node>();
}
}
Second, we assume that your base data is a very traditional tree structure, although there may be few differences.
public class TreeNode
{
public int index;
public String value;
public List<TreeNode> childs;
public TreeNode()
{
childs = new List<TreeNode>();
}
public TreeNode(String value)
{
childs = new List<TreeNode>();
this.value = value;
}
public void add(String name)
{
TreeNode child = new TreeNode(name);
childs.Add(child);
}
}
Finally, the base TreeNode structure's nodes are all indexed (in your example, you have used a 1 based index, but the following is done in a 0 based index)
int index = 0;
Queue<TreeNode> tempQ = new Queue<TreeNode>();
tempQ.Enqueue(root);
while (tempQ.Count > 0)
{
temp = tempQ.Dequeue();
temp.index = index;
index++;
foreach (TreeNode tn in temp.childs)
{
tempQ.Enqueue(tn);
}
}
return root;
After we initialize our structure, assuming that the base data is stored in a traditional type of TreeNode structure, we will try to do three things:
Build a graph-like structure using the base TreeNode
One biggest property is that unique values will only be represented in ONE node. For example, {C}, {F}, and {G} from your example will each be represented with only ONE node, instead of two. (Simply stated, all nodes with common values will be grouped together into one.)
All unique nodes (from step 2) will be attached to the root element, and we will "rebuild" the tree by connecting references to references. (Graphic representation is soon shown below)
Here is the code in C# to build the structure, done in O(n):
private Node convert(TreeNode root)
{
Node comparisonRoot = new Node(); //root of our new comparison data structure.
//this main root will contain no data except
//for childs inside its child map, which will
//contain all childs with unique values.
TreeNode dataNode = root; //Root of base data.
Node workingNode = new Node(); //workingNode is our data structure's
//copy of the base data tree's root.
workingNode.value = root.value;
workingNode.connections.Add(0, -1);
// add workingNode to our data structure, because workingNode.value
// is currently unique to the empty map of the root's child.
comparisonRoot.childs.Add(workingNode.value, workingNode);
Stack<TreeNode> s = new Stack<TreeNode>();
s.Push(dataNode); //Initialize stack with root.
while (s.Count > 0) { //Iteratively traverse the tree using a stack
TreeNode temp = s.Pop();
foreach(TreeNode tn in temp.childs) {
//fill stack with childs
s.Push(tn);
}
//update workingNode to be the "parent" of the upcoming childs.
workingNode = comparisonRoot.childs[temp.value];
foreach(TreeNode child in temp.childs) {
if(!comparisonRoot.childs.ContainsKey(child.value)) {
//if value of the node is unique
//create a new node for the unique value
Node tempChild = new Node();
tempChild.value = child.value;
//store the reference/parent pair
tempChild.connections.Add(child.index, temp.index);
//because we are working with a unique value that first appeared,
//add the node to the parent AND the root.
workingNode.childs.Add(tempChild.value, tempChild);
comparisonRoot.childs.Add(tempChild.value, tempChild);
} else {
//if value of node is not unique (it already exists within our structure)
//update values, no need to create a new node.
Node tempChild = comparisonRoot.childs[child.value];
tempChild.connections.Add(child.index, temp.index);
if (!workingNode.childs.ContainsKey(tempChild.value)) {
workingNode.childs.Add(tempChild.value, tempChild);
}
}
}
}
return comparisonRoot;
}
All unique values are attached to a non-valued root, just for the purposes of using this root node as a map to quickly jump to any reference. (Shown below)
Here, you can see that all connections are made based on the original example tree, except that there are only one instance of nodes for each unique value. Finally, you can see that all of the nodes are also connected to the root.
The whole point is that there is only 1 real Node object for each unique copy, and points to all possible connections by having references to other nodes as childs. It's kind of like a graph structure with a root.
Each Node will contain all pairs of {[index], [parent index]}.
Here is a string representation of this data structure:
Childs { A, B, D, C, E, F, G }
Connections { A=[0, -1]; B=[1, 0]; D=[2, 0]; C=[3, 1][7, 4];
E=[4, 3]; F=[5, 3][8, 7]; G=[6, 3][9, 7] }
Here, the first thing you may notice is that node A, which has no true parent in your example, has a -1 for its parent index. It's just simply stating that Node A has no more parent and is the root.
Other things you may notice is that C has index values of 3 and 7, which respectively is connected to 1 and 4, which you can see is Node B and Node E (check your example if this doesn't make sense)
So hopefully, this was a good explanation of the structure.
So why would I decide to use this structure, and how will this help find out the index of the nodes when matched up with a certain pattern?
Similar to suffix tries, I thought that the most elegant solution would return all "successful searches" in a single operation, rather than getting traversing through all nodes to see if each node is a successful search (brute force).
So here is how the search will work.
Say we have the pattern
c ---- f
|--- g
from the example.
In a recursive approach, leaves simply return all possible parentIndex (retrieved from our [index, parentIndex] pairs).
Afterwards, in a natural DFS type of traversal, C will receive both return values of F and G.
Here, we do an intersection operation (AND operation) to all the childs and see which parentIndex the sets share in common.
Next, we do another AND operation, this time between the result from the previous step and all possible C's (our current branch's) index.
By doing so, we now have a set of all possible C's indexes that contains both G and F.
Although that pattern is only 2 levels deep, if we are looking at a pattern with a deeper level, we simply take the result set of C indexes, find all parent pairs of the result indexes utilizing our [index, parentIndex] map, and return that set of parentIndexes and return to step 2 of this method. (See the recursion?)
Here is the C# implementation of what was just explained.
private HashSet<int> search(TreeNode pattern, Node graph, bool isRoot)
{
if (pattern.childs.Count == 0)
{
//We are at a leaf, return the set of parents values.
HashSet<int> set = new HashSet<int>();
if (!isRoot)
{
//If we are not at the root of the pattern, we return the possible
//index of parents that can hold this leaf.
foreach (int i in graph.connections.Keys)
{
set.Add(graph.connections[i]);
}
}
else
{
//However if we are at the root of the pattern, we don't want to
//return the index of parents. We simply return all indexes of this leaf.
foreach (int i in graph.connections.Keys)
{
set.Add(i);
}
}
return set;
}
else
{
//We are at a branch. We recursively call this method to the
//leaves.
HashSet<int> temp = null;
foreach(TreeNode tn in pattern.childs) {
String value = tn.value;
//check if our structure has a possible connection with the next node down the pattern.
//return empty set if connection not found (pattern does not exist)
if (!graph.childs.ContainsKey(value)){
temp = new HashSet<int>();
return temp;
}
Node n = graph.childs[value];
//Simply recursively call this method to the leaves, and
//we do an intersection operation to the results of the
//recursive calls.
if (temp == null)
{
temp = search(tn, n, false);
}
else
{
temp.IntersectWith(search(tn, n, false));
}
}
//Now that we have the result of the intersection of all the leaves,
//we do a final intersection with the result and the current branch's
//index set.
temp.IntersectWith(graph.connections.Keys);
//Now we have all possible indexes. we have to return the possible
//parent indexes.
if (isRoot)
{
//However if we are at the root of the pattern, we don't want to
//return the parent index. We return the result of the intersection.
return temp;
}
else
{
//But if we are not at the root of the pattern, we return the possible
//index of parents.
HashSet<int> returnTemp = new HashSet<int>();
foreach (int i in temp)
{
returnTemp.Add(graph.connections[i]);
}
return returnTemp;
}
}
}
To call this method, simply
//pattern - root of the pattern, TreeNode object
//root - root of our generated structure, which was made with the compare() method
//boolean - a helper boolean just so the final calculation will return its
// own index as a result instead of its parent's indices
HashSet<int> answers = search(pattern, root.childs[pattern.value], true);
Phew, that was a long answer, and I'm not even sure if this is as efficient as other algorithms out there! I am also sure that there may be more efficient and elegant ways to search for a subtree inside a larger tree, but this was a method that came into my head! Feel free to leave any criticism, advice, edit, or optimize my solution :)
Idea 1
One simple way to improve the speed is to precompute a map from each letter to a list of all the locations in the tree where that letter occurs.
So in your example, c would map to [4,8].
Then when you search for a given pattern, you will only need to explore subtrees which have at least the first element correct.
Idea 2
An extension to this that might help for certain usage patterns is to also precompute a second map from each letter to a list of the parents of all locations in the tree where that letter occurs.
So for example, f would map to [4,8] and e to [4].
If the lists of locations are stored in sorted order then these maps can be used to efficiently find patterns with a head and certain children.
We get a list of possible locations by using the first map to look up the head, and additional lists by using the second map to look up the children.
You can then merge these lists (this can be done efficiently because the lists are sorted) to find entries that appear in every list - these will be all the matching locations.
I was working on Project Euler Problem 18 (I did solve the problem; I'm not cheating. "Proof" here) and found myself in need of a way to represent a data structure that looks like a Pascal triangle, but with different values. It looks very similar to a binary tree, but there's a very important distinction: a node's children are not exclusively its children. So the first three rows look like this:
75
/ \
95 64
/ \ / \
17 47 82
Note that 47 has two parents.
It's pretty easy to represent this as a linked structure, or even a two-dimensional array, but I'm hoping that there's a more elegant way. I love binary trees, mainly for how you can allocate a single chunk of memory, treat it as an array, and navigate between children and parent with a couple of arithmetic operations or integer division. Is there a way to do the same for this data structure?
My best solution involved using a two-dimensional array (where it's very easy to find children and parents). I dislike this implementation because (at least the way I did it) I called malloc for every row, even though I knew how big the structure would be ahead of time.
My question is very similar to this one, but I wasn't happy with the accepted answer. A comment alludes to the solution I seek, but no explanation is given.
Edit: To clarify, I'm looking for a way to index into a one-dimensional array in the same way that an binary tree stuffed sequentially into an array (starting at 1) gives the property that the children of a node at index i are at indexes 2 * i and 2 * i + 1. I'm also not very concerned about being able to find parents, so don't worry too much about the weird two parent.
Yes, it is possible to store a triangular data structure in a one-dimensional array (example in Java):
class Triangle<T> {
private T[] triangle;
public Triangle(T[] array, int rows) {
if (array.length != triangleNumber(rows)) {
throw new IllegalArgumentException("Array wrong size");
}
triangle = array;
}
public T get(int row, int col) {
return triangle[index(row, col)];
}
public void set(int row, int col, T val) {
triangle[index(row, col)] = val;
}
private int triangleNumber(int rows) {
return rows * (rows + 1) / 2;
}
private int index(int row, int col) {
if (row < 0 || col < 0 || col > row) {
throw new IndexOutOfBoundsException("Trying to access outside of triangle");
}
return triangleNumber(row) + col;
}
}
The array passed to the constructor is formed by concatenating the rows of the triangle one-by-one into the array: [t(0,0), t(1,0), t(1,1), t(2,0), t(2,1), t(2,2), ... , t(rows-1, rows-1)], where t(R, C) is the triangle cell at triangle row R and triangle column C.
For any cell (row, col):
left child would be at row+1, col
right child would be at row+1, col+1
left parent would be at row-1, col-1
right parent would be at row-1, col
Two parents and two children do not exist for all cells because they would lie outside the triangle. See the exception check in the index method.
Yes there is :
We start with your idea of a two-dimentional array , but with irregular row length.So each element is indexed by a two dimentional index (r,c);
(1,1)
(2,1)(2,2)
(3,1)(3,2)(3,3)
(4,1)(4,2)(4,3)(4,4)
Because the relationship are regular, you can express the positions we have :
for a node (r,c) is childrens are (r+1,min(1,c)),(r+1,max(c+1,r)) and his parent are : (r-1,min(1,c-1)),(r-1,max(c,r))
I am working on an implementation of Dijkstra's Algorithm to retrieve the shortest path between interconnected nodes on a network of routes. I have the implementation working. It returns all the shortest paths to all the nodes when I pass the start node into the algorithm.
My question:
How does one go about retrieving all possible paths from Node A to, say, Node G or even all possible paths from Node A and back to Node A?
Finding all possible paths is a hard problem, since there are exponential number of simple paths. Even finding the kth shortest path [or longest path] are NP-Hard.
One possible solution to find all paths [or all paths up to a certain length] from s to t is BFS, without keeping a visited set, or for the weighted version - you might want to use uniform cost search
Note that also in every graph which has cycles [it is not a DAG] there might be infinite number of paths between s to t.
I've implemented a version where it basically finds all possible paths from one node to the other, but it doesn't count any possible 'cycles' (the graph I'm using is cyclical). So basically, no one node will appear twice within the same path. And if the graph were acyclical, then I suppose you could say it seems to find all the possible paths between the two nodes. It seems to be working just fine, and for my graph size of ~150, it runs almost instantly on my machine, though I'm sure the running time must be something like exponential and so it'll start to get slow quickly as the graph gets bigger.
Here is some Java code that demonstrates what I'd implemented. I'm sure there must be more efficient or elegant ways to do it as well.
Stack connectionPath = new Stack();
List<Stack> connectionPaths = new ArrayList<>();
// Push to connectionsPath the object that would be passed as the parameter 'node' into the method below
void findAllPaths(Object node, Object targetNode) {
for (Object nextNode : nextNodes(node)) {
if (nextNode.equals(targetNode)) {
Stack temp = new Stack();
for (Object node1 : connectionPath)
temp.add(node1);
connectionPaths.add(temp);
} else if (!connectionPath.contains(nextNode)) {
connectionPath.push(nextNode);
findAllPaths(nextNode, targetNode);
connectionPath.pop();
}
}
}
I'm gonna give you a (somewhat small) version (although comprehensible, I think) of a scientific proof that you cannot do this under a feasible amount of time.
What I'm gonna prove is that the time complexity to enumerate all simple paths between two selected and distinct nodes (say, s and t) in an arbitrary graph G is not polynomial. Notice that, as we only care about the amount of paths between these nodes, the edge costs are unimportant.
Sure that, if the graph has some well selected properties, this can be easy. I'm considering the general case though.
Suppose that we have a polynomial algorithm that lists all simple paths between s and t.
If G is connected, the list is nonempty. If G is not and s and t are in different components, it's really easy to list all paths between them, because there are none! If they are in the same component, we can pretend that the whole graph consists only of that component. So let's assume G is indeed connected.
The number of listed paths must then be polynomial, otherwise the algorithm couldn't return me them all. If it enumerates all of them, it must give me the longest one, so it is in there. Having the list of paths, a simple procedure may be applied to point me which is this longest path.
We can show (although I can't think of a cohesive way to say it) that this longest path has to traverse all vertices of G. Thus, we have just found a Hamiltonian Path with a polynomial procedure! But this is a well known NP-hard problem.
We can then conclude that this polynomial algorithm we thought we had is very unlikely to exist, unless P = NP.
The following functions (modified BFS with a recursive path-finding function between two nodes) will do the job for an acyclic graph:
from collections import defaultdict
# modified BFS
def find_all_parents(G, s):
Q = [s]
parents = defaultdict(set)
while len(Q) != 0:
v = Q[0]
Q.pop(0)
for w in G.get(v, []):
parents[w].add(v)
Q.append(w)
return parents
# recursive path-finding function (assumes that there exists a path in G from a to b)
def find_all_paths(parents, a, b):
return [a] if a == b else [y + b for x in list(parents[b]) for y in find_all_paths(parents, a, x)]
For example, with the following graph (DAG) G given by
G = {'A':['B','C'], 'B':['D'], 'C':['D', 'F'], 'D':['E', 'F'], 'E':['F']}
if we want to find all paths between the nodes 'A' and 'F' (using the above-defined functions as find_all_paths(find_all_parents(G, 'A'), 'A', 'F')), it will return the following paths:
Here is an algorithm finding and printing all paths from s to t using modification of DFS. Also dynamic programming can be used to find the count of all possible paths. The pseudo code will look like this:
AllPaths(G(V,E),s,t)
C[1...n] //array of integers for storing path count from 's' to i
TopologicallySort(G(V,E)) //here suppose 's' is at i0 and 't' is at i1 index
for i<-0 to n
if i<i0
C[i]<-0 //there is no path from vertex ordered on the left from 's' after the topological sort
if i==i0
C[i]<-1
for j<-0 to Adj(i)
C[i]<- C[i]+C[j]
return C[i1]
If you actually care about ordering your paths from shortest path to longest path then it would be far better to use a modified A* or Dijkstra Algorithm. With a slight modification the algorithm will return as many of the possible paths as you want in order of shortest path first. So if what you really want are all possible paths ordered from shortest to longest then this is the way to go.
If you want an A* based implementation capable of returning all paths ordered from the shortest to the longest, the following will accomplish that. It has several advantages. First off it is efficient at sorting from shortest to longest. Also it computes each additional path only when needed, so if you stop early because you dont need every single path you save some processing time. It also reuses data for subsequent paths each time it calculates the next path so it is more efficient. Finally if you find some desired path you can abort early saving some computation time. Overall this should be the most efficient algorithm if you care about sorting by path length.
import java.util.*;
public class AstarSearch {
private final Map<Integer, Set<Neighbor>> adjacency;
private final int destination;
private final NavigableSet<Step> pending = new TreeSet<>();
public AstarSearch(Map<Integer, Set<Neighbor>> adjacency, int source, int destination) {
this.adjacency = adjacency;
this.destination = destination;
this.pending.add(new Step(source, null, 0));
}
public List<Integer> nextShortestPath() {
Step current = this.pending.pollFirst();
while( current != null) {
if( current.getId() == this.destination )
return current.generatePath();
for (Neighbor neighbor : this.adjacency.get(current.id)) {
if(!current.seen(neighbor.getId())) {
final Step nextStep = new Step(neighbor.getId(), current, current.cost + neighbor.cost + predictCost(neighbor.id, this.destination));
this.pending.add(nextStep);
}
}
current = this.pending.pollFirst();
}
return null;
}
protected int predictCost(int source, int destination) {
return 0; //Behaves identical to Dijkstra's algorithm, override to make it A*
}
private static class Step implements Comparable<Step> {
final int id;
final Step parent;
final int cost;
public Step(int id, Step parent, int cost) {
this.id = id;
this.parent = parent;
this.cost = cost;
}
public int getId() {
return id;
}
public Step getParent() {
return parent;
}
public int getCost() {
return cost;
}
public boolean seen(int node) {
if(this.id == node)
return true;
else if(parent == null)
return false;
else
return this.parent.seen(node);
}
public List<Integer> generatePath() {
final List<Integer> path;
if(this.parent != null)
path = this.parent.generatePath();
else
path = new ArrayList<>();
path.add(this.id);
return path;
}
#Override
public int compareTo(Step step) {
if(step == null)
return 1;
if( this.cost != step.cost)
return Integer.compare(this.cost, step.cost);
if( this.id != step.id )
return Integer.compare(this.id, step.id);
if( this.parent != null )
this.parent.compareTo(step.parent);
if(step.parent == null)
return 0;
return -1;
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Step step = (Step) o;
return id == step.id &&
cost == step.cost &&
Objects.equals(parent, step.parent);
}
#Override
public int hashCode() {
return Objects.hash(id, parent, cost);
}
}
/*******************************************************
* Everything below here just sets up your adjacency *
* It will just be helpful for you to be able to test *
* It isnt part of the actual A* search algorithm *
********************************************************/
private static class Neighbor {
final int id;
final int cost;
public Neighbor(int id, int cost) {
this.id = id;
this.cost = cost;
}
public int getId() {
return id;
}
public int getCost() {
return cost;
}
}
public static void main(String[] args) {
final Map<Integer, Set<Neighbor>> adjacency = createAdjacency();
final AstarSearch search = new AstarSearch(adjacency, 1, 4);
System.out.println("printing all paths from shortest to longest...");
List<Integer> path = search.nextShortestPath();
while(path != null) {
System.out.println(path);
path = search.nextShortestPath();
}
}
private static Map<Integer, Set<Neighbor>> createAdjacency() {
final Map<Integer, Set<Neighbor>> adjacency = new HashMap<>();
//This sets up the adjacencies. In this case all adjacencies have a cost of 1, but they dont need to.
addAdjacency(adjacency, 1,2,1,5,1); //{1 | 2,5}
addAdjacency(adjacency, 2,1,1,3,1,4,1,5,1); //{2 | 1,3,4,5}
addAdjacency(adjacency, 3,2,1,5,1); //{3 | 2,5}
addAdjacency(adjacency, 4,2,1); //{4 | 2}
addAdjacency(adjacency, 5,1,1,2,1,3,1); //{5 | 1,2,3}
return Collections.unmodifiableMap(adjacency);
}
private static void addAdjacency(Map<Integer, Set<Neighbor>> adjacency, int source, Integer... dests) {
if( dests.length % 2 != 0)
throw new IllegalArgumentException("dests must have an equal number of arguments, each pair is the id and cost for that traversal");
final Set<Neighbor> destinations = new HashSet<>();
for(int i = 0; i < dests.length; i+=2)
destinations.add(new Neighbor(dests[i], dests[i+1]));
adjacency.put(source, Collections.unmodifiableSet(destinations));
}
}
The output from the above code is the following:
[1, 2, 4]
[1, 5, 2, 4]
[1, 5, 3, 2, 4]
Notice that each time you call nextShortestPath() it generates the next shortest path for you on demand. It only calculates the extra steps needed and doesnt traverse any old paths twice. Moreover if you decide you dont need all the paths and end execution early you've saved yourself considerable computation time. You only compute up to the number of paths you need and no more.
Finally it should be noted that the A* and Dijkstra algorithms do have some minor limitations, though I dont think it would effect you. Namely it will not work right on a graph that has negative weights.
Here is a link to JDoodle where you can run the code yourself in the browser and see it working. You can also change around the graph to show it works on other graphs as well: http://jdoodle.com/a/ukx
find_paths[s, t, d, k]
This question is now a bit old... but I'll throw my hat into the ring.
I personally find an algorithm of the form find_paths[s, t, d, k] useful, where:
s is the starting node
t is the target node
d is the maximum depth to search
k is the number of paths to find
Using your programming language's form of infinity for d and k will give you all paths§.
§ obviously if you are using a directed graph and you want all undirected paths between s and t you will have to run this both ways:
find_paths[s, t, d, k] <join> find_paths[t, s, d, k]
Helper Function
I personally like recursion, although it can difficult some times, anyway first lets define our helper function:
def find_paths_recursion(graph, current, goal, current_depth, max_depth, num_paths, current_path, paths_found)
current_path.append(current)
if current_depth > max_depth:
return
if current == goal:
if len(paths_found) <= number_of_paths_to_find:
paths_found.append(copy(current_path))
current_path.pop()
return
else:
for successor in graph[current]:
self.find_paths_recursion(graph, successor, goal, current_depth + 1, max_depth, num_paths, current_path, paths_found)
current_path.pop()
Main Function
With that out of the way, the core function is trivial:
def find_paths[s, t, d, k]:
paths_found = [] # PASSING THIS BY REFERENCE
find_paths_recursion(s, t, 0, d, k, [], paths_found)
First, lets notice a few thing:
the above pseudo-code is a mash-up of languages - but most strongly resembling python (since I was just coding in it). A strict copy-paste will not work.
[] is an uninitialized list, replace this with the equivalent for your programming language of choice
paths_found is passed by reference. It is clear that the recursion function doesn't return anything. Handle this appropriately.
here graph is assuming some form of hashed structure. There are a plethora of ways to implement a graph. Either way, graph[vertex] gets you a list of adjacent vertices in a directed graph - adjust accordingly.
this assumes you have pre-processed to remove "buckles" (self-loops), cycles and multi-edges
You usually don't want to, because there is an exponential number of them in nontrivial graphs; if you really want to get all (simple) paths, or all (simple) cycles, you just find one (by walking the graph), then backtrack to another.
I think what you want is some form of the Ford–Fulkerson algorithm which is based on BFS. Its used to calculate the max flow of a network, by finding all augmenting paths between two nodes.
http://en.wikipedia.org/wiki/Ford%E2%80%93Fulkerson_algorithm
There's a nice article which may answer your question /only it prints the paths instead of collecting them/.
Please note that you can experiment with the C++/Python samples in the online IDE.
http://www.geeksforgeeks.org/find-paths-given-source-destination/
I suppose you want to find 'simple' paths (a path is simple if no node appears in it more than once, except maybe the 1st and the last one).
Since the problem is NP-hard, you might want to do a variant of depth-first search.
Basically, generate all possible paths from A and check whether they end up in G.
Is it possible to create collision free hash function for a data structure with specific properties.
The datastructure is int[][][]
It contains no duplicates
The range of integers that are contained in it is defined. Let's say it's 0..1000, the maximal integer is definitely not greater than 10000.
Big problem is that this hash function should also be very fast.
Is there a way to create such a hash function? Maybe at run time depending on the integer range?
ADDITION: I should say that the purpose of this hash function is to quckily check if the particular combination was processed. So when some combination of numbers in the data structure is processed, I calculate the hash value and store it. Then when processing another combination of numbers within the data structure I will compare the hash values.
I think what you want is a "perfect hash" or even a "minimal perfect hash":
http://en.wikipedia.org/wiki/Perfect_hash_function
Edit: That said, if you're sure and certain you'll never go above [0...1000] and depending on what you need to do you probably can simply "bucket" your results directly in an array. If you don't have many elements, that array would be sparse (and hence a bit of a waste) but for at most 1001 elements going from [0...1000] an Object[1001] (or int[1001] or whatever) will probably do.
what if you just use a 64-bit value and store the location in each level of the hierarchy into one section of bits?
something like(off the top of my head): hash = (a << 34) | (b << 17) | (c)
A perfect hash is likely not feasible, because it can take a lot of computation time to find one for your data set.
Would a bool[][][] work for you, where true means a certain x,y,z combination has been processed? Below is a prototype for a three-dimensional bit array. Because of the limits of an Int32, this will only work up to a maximum index of about 1,024 (but would fit within 128 MB). You could get to 10,000 by creating a BitArray[][]. However, this is probably not practical at that size, because it would occupy over 116 GB of RAM.
Depending on your exact problem size and needs, a plain old hash table (with collisions) may be your best bet. That said, here is the prototype code:
public class ThreeDimensionalBitArray
{
// todo: consider making the size configurable
private const int MAX_INDEX = 1000;
private BitArray _bits = new BitArray(MAX_INDEX * MAX_INDEX * MAX_INDEX);
public bool this[int x, int y, int z]
{
get { return _bits[getBitIndex(x, y, z)]; }
set { _bits[getBitIndex(x, y, z)] = value; }
}
public ThreeDimensionalBitArray()
{
}
private static int getBitIndex(int x, int y, int z)
{
// todo: bounds check x, y, and z
return (x * MAX_INDEX * MAX_INDEX) + (y * MAX_INDEX) + z;
}
}
public class BitArrayExample
{
public static void Main()
{
ThreeDimensionalBitArray bitArray = new ThreeDimensionalBitArray();
Console.WriteLine(bitArray[500, 600, 700]); // "false"
bitArray[500, 600, 700] = true;
Console.WriteLine(bitArray[500, 600, 700]); // "true"
}
}