I have a question about k-nearest neighbors (involving branch distance) for trees. My understanding is that by running say k=3 nearest neighbors on a particular node on a dendrogram (labelled here as 'Viewing'), we could anticipate a result having the neighbors following a single lineage within the tree, as indicated in the example. Is there some algorithm that finds the nearest neighbors with the constraint that the nodes found are along different, distinct branches? In other words, the leaves descending under each node do not overlap with the leaves of the other nodes? Any ideas would be much appreciated.
Image of tree example
Related
I have stored a set of Points in a Quadtree. Once the quadtree has been created with the points, I then add all the edges to the quadtree such that each edge gets stored in ALL the leaf nodes that it crosses, begins or ends at.
Now, I have a point, say A, and I need to find the closest edge to it. In my current Algorithm, I recurse to the leaf node that contains this Point A and find the distances between A and all line segments contained by this leaf node.
Now this may look like the right solution but it isn't as I have to compare edges that are there in adjacent nodes as well to be able to give an accurate answer.
Now my questions are
a)How do I go about extracting the closest edges?
b)Should I just compare all edges contained in the parent(to the point of interest) node?
(But I know for a fact that putting a hard limit on the number of levels one must go up to find the nearest edge is incorrect based on intuition)
Every node on the quad-tree represents a cube in space (where some sides may be at infinitum) and you can calculate the minimum distance between that cube and the target point A. Note that the distance is 0 for cubes containing A.
Starting from the root node you have to calculate the distance for every of its child cubes (nodes) to A and insert it into a min-heap.
Iteratively, you get the nearest cube at the top of heap and repeat the process. When you reach some leaf node, you just search for the nearest edge to A inside it using brute force.
Once the distance of the cube at top of the heap is greater than the distance of the nearest edge found so far, you can stop the search.
Update: BTW, this is actually the general approach for searching for anything using a quad-tree or a kd-tree or probably most spatial structures.
You can try a voronoi diagram and look for edges inside the voronoi cell only.
Input:
a rooted tree with n nodes;
each node p has positive integer weight w(p);
a node can have more than two children.
Problem:
divide the tree into k subtrees/partitions (obviously by removing k-1 edges);
subtree weight W(p) is the weight of all the nodes in a subtree rooted at node p;
all the subtrees should be weighted as evenly as possible - the difference between min(W(p)) and max(W(p)) should be as small as possible.
I've yet to find a suitable algorithm for this. Where should I start? Tips, instructions and pseudocode appreciated.
Assume you can't modify the tree other than to remove edges to create subtrees.
First understand that you cannot guarantee that by simply removing edges that you will have subtrees within an arbitrary bound. You can create tree that when you split them there is no way to create subtrees within a target bound. For example:
a(b(c,d,e,f),g)
You cannot split that into two balanced sections. The best you can do is remove the edge from a to b:
a(g) and b(c,d,e,f)
Also this criteria is a little underdefined when k > 2. What is better a split 10,10,10,1 or 10,10,6,5?
But you can come up with a method to split trees up in the most balanced way possible.
Implement you tree such that each node holds a count of all of its children. You can add this pretty efficiently to any tree. ( E.g. when you add a node you have to iterate up the chain of parent node incrementing the count. Remove a node and you iterate up subtracting from the count )
Then starting from the root iterate down, in a breadth first manner until you find a set of nodes that dominate child nodes in a way that is most balanced. I don't have an algorithm for this at the ready - but I think you can find one pretty readily.
I think something where when you want to divide into k subtrees you create an array of k tree roots. One of those nodes must always be the root of the current tree, then you iterate down looking for nodes to replace on of the k-1 candidates that improves the partitioning. You'll want some kind of terminating condition where you don't interate down to every leaf node. E.g. it never makes sense to subdivide anything by the largest candidate node.
I have the following queries:
I want to know how to create a graph dynamically
How to manage multiple weights on a graph
How to find the distance from a particular node to another in a minimum spanning tree using kruskal.In kruskals the minimum spanning tree is output as a vector of edges.Hence the vertices are not explicitly stored. I do not know how to get the distance for say an example node 0 to the node furthest from it. I tried getting the vertices using sourc and target and then storing the verices in an array.After that, locating node 0 and from there iterating and reverse iterating through the vertices calculating the weights to find the largest diatance from the node 0. But I fell I'm using the most round about way of going about it.There must be a function for this, or perhaps a clevere way of going about this.
Does kruskal store the edges in the spanning tree in order of the spanning tree? Or at least is the first node of the first edge stored the actual first node? How can I get the order of the nodes in spanning tree in kruskals?
Similarly how can I get the weight of the spanning tree using Prim? The way I did it was to use the predecessor array where predecessors are stored and find what edge in weightsMap and add it.Is there an easier way? And in prims the distanceMap stores the distance from node 0 to the others in the original graph and not the spanning tree right?
Say I have a series of several thousand nodes. For each pair of nodes I have a distance metric. This distance metric could be a physical distance ( say x,y coordinates for every node ) or other things that make nodes similar.
Each node can connect to up to N other nodes, where N is small - say 6.
How can I construct a graph that is fully connected ( e.g. I can travel between any two nodes following graph edges ) while minimizing the total distance between all graph nodes.
That is I don't want a graph where the total distance for any traversal is minimized, but where for any node the total distance of all the links from that node is minimized.
I don't need an absolute minimum - as I think that is likely NP complete - but a relatively efficient method of getting a graph that is close to the true absolute minimum.
I'd suggest a greedy heuristic where you select edges until all vertices have 6 neighbors. For example, start with a minimum spanning tree. Then, for some random pairs of vertices, find a shortest path between them that uses at most one of the unselected edges (using Dijkstra's algorithm on two copies of the graph with the selected edges, connected by the unselected edges). Then select the edge that yielded in total the largest decrease of distance.
You can use a kernel to create edges only for nodes under a certain cutoff distance.
If you want non-weighted edges You could simply use a basic cutoff to start with. You add an edge between 2 points if d(v1,v2) < R
You can tweak your cutoff R to get the right average number of edges between nodes.
If you want a weighted graph, the preferred kernel is often the gaussian one, with
K(x,y) = e^(-d(x,y)^2/d_0)
with a cutoff to keep away nodes with too low values. d_0 is the parameter to tweak to get the weights that suits you best.
While looking for references, I found this blog post that I didn't about, but that seems very explanatory, with many more details : http://charlesmartin14.wordpress.com/2012/10/09/spectral-clustering/
This method is used in graph-based semi-supervised machine learning tasks, for instance in image recognition, where you tag a small part of an object, and have an efficient label propagation to identify the whole object.
You can search on google for : semi supervised learning with graph
Yes this is homework. I was wondering if someone could explain the process of Sollin's (or Borůvka's) algorithm for determining a minimum spanning tree. Also if you could explain how to determine the number of iterations in the worst case, that would be great.
On a top level, the algorithm works as follows:
Maintain that you have a number of spanning trees for some subgraphs. Initially, every vertex of the graph is a m.s.t. with no edges.
In each iteration, for each of your spanning trees, find a cheapest edge connecting it to another spanning tree. (This is a simplification.)
The worst case in terms of iterations is that you always merge pairs of trees. In that case, the number of trees you have will halve in each iteration, so the number of iterations is logarithmic in the number of nodes.
Also note that there is a special trick involved in choosing the edges to add: if you were not careful, you might introduce a circle when tree A connects to tree B, tree B connects to tree C and tree C connects to tree A. (This can only happen if all three edges chosen have the same weight. The trick is to have an arbitrary but fixed tie-breaker, like a fixed order of the edges.)
So there, that's my back-of-index-card overview.
I'm using the layman's terminology.
First select a vertex
Check all the edges from that vertex and select one with the minimum
weight
Do this for all the vertices ( some edges may be selected more than
once)
You will get connected components.
From these connected components select one edge with minimum weight.
Your spanning tree with minimum weight will be formed