Tree with exponential split factor - data-structures

What do you call a search tree with a split factor of 2^k, where k is the dimensionality of the data points stored within the tree? (The data points are vectors x_1, ... x_k)
For k=1 we would get a normal binary search tree. For k=2 we would split into 4 quadrants in every node in the tree, etc.
What would be the proper name for such a tree for arbitrary k?

There are many data structures like this and I don't know if there's a specific name for it. For example, the quadtree and octree structures have these branching factors for k = 2 and k = 3, and the R-tree data structure does this in higher dimensional spaces (but also has some extra structure layered on top).
Typically, high-dimensional data structures don't have huge branching factors like this. Data structures like the k-d tree (or, more generally, BSP trees) store high-dimensional data but have a fixed branching factor of two to avoid exponentially increasing the space usage for high dimensions. Segment trees in high dimensions often use fractional cascading, which lets them use a low branching factor without sacrificing performance.
Hope this helps!

Related

Dynamically building a balanced BST with values "in the leaves"?

In their book Computational Geometry (2008), de Berg, et al., describe the data structure underlying their range search algorithm as a balanced BST where "leaves of T store the points of P and the internal nodes of T store splitting values to guide the search."
The Wikipedia page on range trees (link), which cites de Berg, says: "A 1-dimensional range tree on a set of n points is a binary search tree" such that "each node which is not a leaf stores the largest value of its left subtree."
Examples online construct such trees statically, by first sorting the set of points and then recursively pairing up nodes.
Does there exist an algorithm to build a BST of this nature dynamically (i.e., with the ability to insert additional values into the tree)? Where is it described?
It's possible to adapt just about any tree balancing procedure to work with these two examples, just by treating the leaves separately -- make a balanced tree of the internal nodes, and then take care to keep the leaves in order. Each operation, including balancing, will require you to recalculate the "summary statistics" on at most O(log N) nodes. Those are all the nodes that were updated and their ancestors.
This can be a little complicated, though, and doesn't work for the multi-dimensional range tree, because every level is treated differently from the ones above and below, and that makes tree rotations (which most balancing operations require) invalid.
For these kinds of trees, therefore, where different levels are handled differently, it is usually best to just avoid tree rotations by using a low-order B+tree variant like a 2-3 tree. In a tree like this, nodes can be split and merged, but they never have to change height -- you can implement them so that leaves are always leaves and internal nodes are always internal. The height of the tree is only ever changed by adding or removing the root.
Of course, if you use a tree that can have more than 2 children per node, then your search algorithms will need to change, but the changes are typically trivial.

Graph represented as adjacency list, as binary tree, is it possible?

apologies first, english is not my first language.
So here's my understanding on graph that's represented as adjancey list: It's usually used for sparse graph, which is the case for most of graphs, and it uses V (number of vertex) lists. so, V head pointers + 2e (# of edges) nodes for undirected graph. Therefore, space complexity = O(E+V)
Since any node can have upto V-1 edges (excluding itself) it has time complexity of O(V) to check a node's adjacency.
As to check all the edges, it takes O(2e + V) so O(v + e)
Now, since it's mostly used for sparse graph, it's rarely O(v) to check adjacency, but simply the number of edges a given vertex has (which is O(V) at worst since V-1 is the possible maximum)
What I'm wondering is, is it possible to make the list (the edge nodes) binary tree? So to find out whether node A is adjacent to node B, time complexity would be O(logn) and not linear O(n).
If it is possible, is it actually done quite often? Also, what is that kind of data structure called? I've been googling if such combinations are possible but couldn't find anything. I would be very grateful if anyone could explain this to me in detail as i'm new to data structure. Thank you.
Edit: I know binary search can be performed on arrays. I'm talking about linked list representation, I thought I made it obvious when I said heads to the lists but wow
There's no reason the adjacency list for each vertex couldn't be stored as a binary tree, but there are tradoffs.
As you say, this adjacency list representation is often used for sparse graphs. Often, "sparse graph" means that a particular vertex is adjacent to few others. So your "adjacency list" for a particular vertex would be very small. Whereas it's true that binary search is O(log n) and sequential search is O(n), when n is very small sequential search is faster. I've seen cases where sequential search beats binary search when n is smaller than 16. It depends on the implementation, of course, but don't count on binary search being faster for small lists.
Another thing to think about is memory. Linked list overhead is one pointer per node. Unless, of course, you're using a doubly linked list. Binary tree overhead is two pointers per node. Perhaps not a big deal, but if you're trying to represent a very large graph, that extra pointer will become important.
If the graph will be updated frequently at run time, you have to take that into account, too. Adding a new edge to a linked list of edges is an O(1) operation. But adding an edge to a binary tree will require O(log n). And you want to make sure you keep that tree balanced. An unbalanced tree starts to act like a linked list.
So, yes, you could make your adjacency lists binary trees. You have to decide whether it's worth the extra effort, based on your application's speed requirements and the nature of your data.

Do I have to implement Adjacency matrix with BFS?

I am trying to implement BFS algorithm using queue and I do not want to look for any online code for learning purposes. All what I am doing is just following algorithms and try to implement it. I have a question regarding for Adjacency matrix (data structure for graph).
I know one common graph data structures is adjacency matrix. So, my question here, Do I have to implement Adjacency matrix along with BFS algorithm or it does not matter.
I really got confused.
one of the things that confused me, the data for graph, where these data should be stored if there is not data structure ?
Sincerely
Breadth-first search assumes you have some kind of way of representing the graph structure that you're working with and its efficiency depends on the choice of representation you have, but you aren't constrained to use an adjacency matrix. Many implementations of BFS have the graph represented implicitly somehow (for example, as a 2D array storing a maze or as some sort of game) and work just fine. You can also use an adjacency list, which is particularly efficient for us in BFS.
The particular code you'll be writing will depend on how the graph is represented, but don't feel constrained to do it one way. Choose whatever's easiest for your application.
The best way to choose data structures is in terms of the operations. With a complete list of operations in hand, evaluate implementations wrt criteria important to the problem: space, speed, code size, etc.
For BFS, the operations are pretty simple:
Set<Node> getSources(Graph graph) // all in graph with no in-edges
Set<Node> getNeighbors(Node node) // all reachable from node by out-edges
Now we can evaluate graph data structure options in terms of n=number of nodes:
Adjacency matrix:
getSources is O(n^2) time
getNeighbors is O(n) time
Vector of adjacency lists (alone):
getSources is O(n) time
getNeighbors is O(1) time
"Clever" vector of adjacency lists:
getSources is O(1) time
getNeighbors is O(1) time
The cleverness is just maintaining the sources set as the graph is constructed, so the cost is amortized by edge insertion. I.e., as you create a node, add it to the sources list because it has no out edges. As you add an edge, remove the to-node from the sources set.
Now you can make an informed choice based on run time. Do the same for space, simplicity, or whatever other considerations are in play. Then choose and implement.

Is kd-tree always balanced?

I have used kd-tree algoritham and make tree.
But i found that tree is not balanced so my question is if we used kd-tree algoritham then that tree is always balanced if not then how can we make it balance ?.
We can use another algoritham likes AVL or Red-Black for balancing kd tree ?
I have some sample data for that i used kd-tree algoritham but that tree is not balanced.
(14,31), (15,32), (17,42), (16,44), (18,52), (16,62)
This is a fairly broad topic and the questions themselves are kind of general.
Hopefully this will give you some useful insights and material to work with:
Kd tree is not always balanced.
AVL and Red-Black will not work with K-D Trees, you will have either construct some balanced variant such as K-D-B-tree or use other balancing techniques.
K-d Tree are commonly used to store GeoSpatial data because they let you search over more then one key, contrary to 'traditional' tree which lets you do single dimensional search. GeoSpatial data certainly cannot be represented in single dimension.
Note that there are also specialized databases working with GeoSpatial data so it might be worth checking if the overhead could be shifted to them instead of making your own solution: Although i don't have much experience with this, maybe it is worth checking the postgis.
postgis
Here are some useful links showing how to build balanced K-D tree variant and usage of K-D trees with Spatial data:
balancing K-D-Tree
K-D-B-tree
spatial data k-d-trees
It depends on how you build the tree.
If built as originally published, the tree will be balanced, i.e. only at the leaf level it will have at most a height difference of 1. If your data set has 2^n-1 elements, the tree will be perfectly balanced.
When constructed with the median, then half of the objects must be on either branch of the tree, thus it has minimal height and is balanced.
However, this tree cannot be changed then. I am not aware of an insert or remove algorithm that would preserve this property, but YMMV. I bet there are two dozens of kd-tree extensions that aim at rebalancing and making insertions/deletions more effective.
The k-d-tree is not designed for changes, and will quickly lose efficiency. It relies on the median, and thus any change to the tree would worst-case propagate through all of the tree. Therefore, you need to allow some tolerance in the tree quality to support changes. It appears to be a common approach to just keep track of insertions/deletions and rebuild the tree eventually. You cannot combine it with red-black-trees or AVL-trees, because data with more than 1 dimension is not ordered; these trees only work for ordered data. Upon rotation of the tree the splitting axis changes; and there may be elements in either half that suddenly would need to move to the other branch. This does not happen in AVL or red-black trees.
But as you can imagine, people have published several indexes that remain balanced. Such as k-d-b-trees, and R-trees. These also work better for large data that needs to be stored on disk.
In order to make your kd-tree balanced use median value.
(14,31), (15,32), (17,42), (16,44), (18,52), (16,62)
In the root choose median of x-cordinates [14,15,16,16,17,18] which is 16,
So all the elements less than 16 goes to left part of the tree and
greater than or equal to goes to right side of tree.
as of now,
left part tree consists of [14,31],[15,32] ,now for y-axis find the median for [31,32]
so that the tree is balanced

Storing a big factor graph using linked lists

Could you please tell me can we use adjacency linked lists to store a big factor graph (an undirected factor graph with 500,000 nodes and around 1,000,000 edges) considering each node is a vector? If we cannot, what is the best option for factor graph implementation?
Thanks in advance.
I don't see why not, although using arrays will be more compact than linked lists.
If the factor graph has a fixed regular structure (e.g. a grid or a long HMM) then you could do away with the adjacency lists entirely if you index your nodes to take advantage of the structure.

Resources