Related
I have built a d dimensional KD-Tree. I want to do range search on this tree. Wikipedia mentions range search in KD-Trees, but doesn't talk about implementation/algorithm in any way. Can someone please help me with this? If not for any arbitrary d, any help for at least for d = 2 and d = 3 would be great. Thanks!
There are multiple variants of kd-tree. The one I used had the following specs:
Each internal node has max two nodes.
Each leaf node can have max maxCapacity points.
No internal node stores any points.
Side note: there are also versions where each node (irrespective of whether its internal or leaf) stores exactly one point. The algorithm below can be tweaked for those too. Its mainly the buildTree where the key difference lies.
I wrote an algorithm for this some 2 years back, thanks to the resource pointed to by #9mat .
Suppose the task is to find the number of points which lie in a given hyper-rectangle ("d" dimensions). This task can also be to list all points OR all points which lie in given range and satisfy some other criteria etc, but that can be a straightforward change to my code.
Define a base node class as:
template <typename T> class kdNode{
public: kdNode(){}
virtual long rangeQuery(const T* q_min, const T* q_max) const{ return 0; }
};
Then, an internal node (non-leaf node) can look like this:
class internalNode:public kdNode<T>{
const kdNode<T> *left = nullptr, *right = nullptr; // left and right sub trees
int axis; // the axis on which split of points is being done
T value; // the value based on which points are being split
public: internalNode(){}
void buildTree(...){
// builds the tree recursively
}
// returns the number of points in this sub tree that lie inside the hyper rectangle formed by q_min and q_max
int rangeQuery(const T* q_min, const T* q_max) const{
// num of points that satisfy range query conditions
int rangeCount = 0;
// check for left node
if(q_min[axis] <= value) {
rangeCount += left->rangeQuery(q_min, q_max);
}
// check for right node
if(q_max[axis] >= value) {
rangeCount += right->rangeQuery(q_min, q_max);
}
return rangeCount;
}
};
Finally, the leaf node would look like:
class leaf:public kdNode<T>{
// maxCapacity is a hyper - param, the max num of points you allow a node to hold
array<T, d> points[maxCapacity];
int keyCount = 0; // this is the actual num of points in this leaf (keyCount <= maxCapacity)
public: leaf(){}
public: void addPoint(const T* p){
// add a point p to the leaf node
}
// check if points[index] lies inside the hyper rectangle formed by q_min and q_max
inline bool containsPoint(const int index, const T* q_min, const T* q_max) const{
for (int i=0; i<d; i++) {
if (points[index][i] > q_max[i] || points[index][i] < q_min[i]) {
return false;
}
}
return true;
}
// returns number of points in this leaf node that lie inside the hyper rectangle formed by q_min and q_max
int rangeQuery(const T* q_min, const T* q_max) const{
// num of points that satisfy range query conditions
int rangeCount = 0;
for(int i=0; i < this->keyCount; i++) {
if(containsPoint(i, q_min, q_max)) {
rangeCount++;
}
}
return rangeCount;
}
};
In the code for range query inside the leaf node, it is also possible to do a "binary search" inside of "linear search". Since the points will be sorted along on the axis axis, you can do a binary search do find l and r values using q_min and q_max, and then do a linear search from l to r instead of 0 to keyCount-1 (of course in the worst case it wont help, but practically, and especially if you have a capacity of pretty high values, this may help).
This is my solution for a KD-tree, where each node stores points (so not just the leafs). (Note that adapting for where points are stored only in the leafs is really easy).
I leaf some of the optimizations out and will explain them at the end, this to reduce the complexity of the solution.
The get_range function has varargs at the end, and can be called like,
x1, y1, x2, y2 or
x1, y1, z1, x2, y2, z2 etc. Where first the low values of the range are given and then the high values.
(You can use as many dimensions as you like).
static public <T> void get_range(K_D_Tree<T> tree, List<T> result, float... range) {
if (tree.root == null) return;
float[] node_region = new float[tree.DIMENSIONS * 2];
for (int i = 0; i < tree.DIMENSIONS; i++) {
node_region[i] = -Float.MAX_VALUE;
node_region[i+tree.DIMENSIONS] = Float.MAX_VALUE;
}
_get_range(tree, result, tree.root, node_region, 0, range);
}
The node_region represents the region of the node, we start as large as possible. Cause for all we know this could be the region we are dealing with.
Here the recursive _get_range implementation:
static public <T> void _get_range(K_D_Tree<T> tree, List<T> result, K_D_Tree_Node<T> node, float[] node_region, int dimension, float[] target_region) {
if (dimension == tree.DIMENSIONS) dimension = 0;
if (_contains_region(tree, node_region, target_region)) {
_add_whole_branch(node, result);
}
else {
float value = _value(tree, dimension, node);
if (node.left != null) {
float[] node_region_left = new float[tree.DIMENSIONS*2];
System.arraycopy(node_region, 0, node_region_left, 0, node_region.length);
node_region_left[dimension + tree.DIMENSIONS] = value;
if (_intersects_region(tree, node_region_left, target_region)){
_get_range(tree, result, node.left, node_region_left, dimension+1, target_region);
}
}
if (node.right != null) {
float[] node_region_right = new float[tree.DIMENSIONS*2];
System.arraycopy(node_region, 0, node_region_right, 0, node_region.length);
node_region_right[dimension] = value;
if (_intersects_region(tree, node_region_right, target_region)){
_get_range(tree, result, node.right, node_region_right, dimension+1, target_region);
}
}
if (_region_contains_node(tree, target_region, node)) {
result.add(node.point);
}
}
}
One important thing that the other answer does not provide is this part:
if (_contains_region(tree, node_region, target_region)) {
_add_whole_branch(node, result);
}
With a range search for a KD-Tree you have 3 options for a node's region, it's:
fully outside
it intersects
it's fully contained
Once you know a region is fully contained, then you can add the whole branch without doing any dimension checks.
To make it more clear, here is the _add_whole_branch:
static public <T> void _add_whole_branch(K_D_Tree_Node<T> node, List<T> result) {
result.add(node.point);
if (node.left != null) _add_whole_branch(node.left, result);
if (node.right != null) _add_whole_branch(node.right, result);
}
In this image, all the big white dots where added using _add_whole_branch and only for the red dots a check for all dimensions had to be done.
Optimization
1)
Instead of starting with the root node for the _get_range function, instead you can find the split node. This is the first node that has it's point within the query range. To find the split node you will still need to start at the root node, but the calculations are a bit cheaper (cause you go either left or right till).
2)
Now I create the float[] node_region_left and float[] node_region_right, and since this happens in a recursive function it can lead to quite some arrays. However, you can reuse the one for the left for the right. I didn't do it in this example for clarity reasons.
I can also imagine storing the region size in the node, but this takes quite some more memory and might lead to a lot of cache misses.
I am interested in finding sets of vertices that are not ordered in a directed acyclic graph (in the sense of a topological order).
That is, for example: two vertices in non-connected subgraphs, or the pairs (B,C), (B,D) in cases such as :
The naive possibility I thought of was to enumerate all the topological sorts (in this case [ A, B, C, D ] and [ A, C, D, B ] & find all pairs whose order ends up being different in at least two sorts, but this would be pretty expensive computationally.
Are there other, faster possibilities for what I want to achieve ? I am using boost.graph.
Basically what you want is the pair of nodes (u,v) such that there is no path from u to v, and no path from v to u. You can find for each node, all nodes that are reachable from that node using DFS. Total Complexity O(n(n+m)).
Now all you have to do is for each pair check if neither of the 2 nodes are reachable by the other.
You can start with a simple topological sort. Boost's implementation conveniently returns a reverse ordered list of vertices.
You can iterate that list, marking each initial leaf node with a new branch id until a shared node is encountered.
Demo Time
Let's start with the simplests of graph models:
#include <boost/graph/adjacency_list.hpp>
using Graph = boost::adjacency_list<>;
We wish to map branches:
using BranchID = int;
using BranchMap = std::vector<BranchID>; // maps vertex id -> branch id
We want to build, map and visualize the mappings:
Graph build();
BranchMap map_branches(Graph const&);
void visualize(Graph const&, BranchMap const& branch_map);
int main() {
// sample data
Graph g = build();
// do the topo sort and distinguish branches
BranchMap mappings = map_branches(g);
// output
visualize(g, mappings);
}
Building Graph
Just the sample data from the question:
Graph build() {
Graph g(4);
enum {A,B,C,D};
add_edge(A, B, g);
add_edge(A, C, g);
add_edge(C, D, g);
return g;
}
Mapping The Branches
As described in the introduction:
#include <boost/graph/topological_sort.hpp>
std::vector<BranchID> map_branches(Graph const& g) {
std::vector<Vertex> reverse_topo;
boost::topological_sort(g, back_inserter(reverse_topo));
// traverse the output to map to unique branch ids
std::vector<BranchID> branch_map(num_vertices(g));
BranchID branch_id = 0;
for (auto v : reverse_topo) {
auto degree = out_degree(v, g);
if (0 == degree) // is leaf?
++branch_id;
if (degree < 2) // "unique" path
branch_map[v] = branch_id;
}
return branch_map;
}
Visualizing
Let's write a graph-viz representation with each branch colored:
#include <boost/graph/graphviz.hpp>
#include <iostream>
void visualize(Graph const& g, BranchMap const& branch_map) {
// display helpers
std::vector<std::string> const colors { "gray", "red", "green", "blue" };
auto name = [](Vertex v) -> char { return 'A'+v; };
auto color = [&](Vertex v) -> std::string { return colors[branch_map.at(v) % colors.size()]; };
// write graphviz:
boost::dynamic_properties dp;
dp.property("node_id", transform(name));
dp.property("color", transform(color));
write_graphviz_dp(std::cout, g, dp);
}
This uses a tiny shorthand helper to create the transforming property maps:
// convenience short-hand to write transformed property maps
template <typename F>
static auto transform(F f) { return boost::make_transform_value_property_map(f, boost::identity_property_map{}); };
To compile this on a non-c++14 compiler you can replace the call to transform with the expanded body
Full Listing
Live On Coliru
#include <boost/graph/adjacency_list.hpp>
using Graph = boost::adjacency_list<>;
using BranchID = int;
using BranchMap = std::vector<BranchID>; // maps vertex id -> branch id
Graph build();
BranchMap map_branches(Graph const&);
void visualize(Graph const&, BranchMap const& branch_map);
int main() {
// sample data
Graph g = build();
// do the topo sort and distinguish branches
BranchMap mappings = map_branches(g);
// output
visualize(g, mappings);
}
using Vertex = Graph::vertex_descriptor;
Graph build() {
Graph g(4);
enum {A,B,C,D};
add_edge(A, B, g);
add_edge(A, C, g);
add_edge(C, D, g);
return g;
}
#include <boost/graph/topological_sort.hpp>
std::vector<BranchID> map_branches(Graph const& g) {
std::vector<Vertex> reverse_topo;
boost::topological_sort(g, back_inserter(reverse_topo));
// traverse the output to map to unique branch ids
std::vector<BranchID> branch_map(num_vertices(g));
BranchID branch_id = 0;
for (auto v : reverse_topo) {
auto degree = out_degree(v, g);
if (0 == degree) // is leaf?
++branch_id;
if (degree < 2) // "unique" path
branch_map[v] = branch_id;
}
return branch_map;
}
#include <boost/property_map/transform_value_property_map.hpp>
// convenience short-hand to write transformed property maps
template <typename F>
static auto transform(F f) { return boost::make_transform_value_property_map(f, boost::identity_property_map{}); };
#include <boost/graph/graphviz.hpp>
#include <iostream>
void visualize(Graph const& g, BranchMap const& branch_map) {
// display helpers
std::vector<std::string> const colors { "gray", "red", "green", "blue" };
auto name = [](Vertex v) -> char { return 'A'+v; };
auto color = [&](Vertex v) -> std::string { return colors[branch_map.at(v) % colors.size()]; };
// write graphviz:
boost::dynamic_properties dp;
dp.property("node_id", transform(name));
dp.property("color", transform(color));
write_graphviz_dp(std::cout, g, dp);
}
Printing
digraph G {
A [color=gray];
B [color=red];
C [color=green];
D [color=green];
A->B ;
A->C ;
C->D ;
}
And the rendered graph:
Summary
Nodes in branches with different colors cannot be compared.
I am writing a Dijsktra program as per Cormen's book. Here, I have a class Graph which has a method dijsktra. I need to use a priority queue, Q inside this method. There are two arrays shortest[] and pred[]. This Q should sort the vertices based on shortest[] values.
Each node is represented as int.
I am getting an error in declaring a priority Queue with a comparator.
Any Help is much appreciated.
class Graph {
private:
vector<int> vertex;
map<int, vector<int> > adjlist;
int *shortest,*pred;
public:
Graph(int V){
fore(i,V) { vertex.pb(i); }
shortest = new int[V];
pred = new int[V];
}
struct Comp { // Error!
Comp(Graph& g): graph(g){ // Error!
}
Graph& graph;
bool operator()(int i1, int i2) {
return graph->shortest[i1] > graph->shortest[i2];
}
};
Comp comp;
void dijsktra(int s);
};
void Graph::dijsktra(int s) {
int n = vertex.size();
fore(i,n) {
shortest[i] = 99;
pred[i] = 0;
}
shortest[s] = 0;
priority_queue<int,vector<int>, Comp> Q(comp); // Error!
}
[EDIT]
Tried various methods. I believe error is in adding a comparator
Error:
dag.cpp: In constructor ‘Graph::Graph(int)’:
dag.cpp:15:14: error: no matching function for call to ‘Graph::Comp::Comp()’
Graph(int V){
^
dag.cpp:15:14: note: candidates are:
dag.cpp:21:3: note: Graph::Comp::Comp(Graph&)
Comp(Graph& g): graph(g){
^
dag.cpp:21:3: note: candidate expects 1 argument, 0 provided
dag.cpp:20:9: note: constexpr Graph::Comp::Comp(const Graph::Comp&)
struct Comp {
^
dag.cpp:20:9: note: candidate expects 1 argument, 0 provided
dag.cpp:20:9: note: constexpr Graph::Comp::Comp(Graph::Comp&&)
dag.cpp:20:9: note: candidate expects 1 argument, 0 provided
Compilation failed.
I think this is more of a philosophical question about readability and tupled types in C++11.
I am writing some code to produce Gaussian Mixture Models (the details are kind of irrelevant but it serves and a nice example.) My code is below:
GMM.hpp
#pragma once
#include <opencv2/opencv.hpp>
#include <vector>
#include <tuple>
#include "../Util/Types.hpp"
namespace LocalDescriptorAndBagOfFeature
{
// Weighted gaussian is defined as a (weight, mean vector, covariance matrix)
typedef std::tuple<double, cv::Mat, cv::Mat> WeightedGaussian;
class GMM
{
public:
GMM(int numGaussians);
void Train(const FeatureSet &featureSet);
std::vector<double> Supervector(const BagOfFeatures &bof);
int NumGaussians(void) const;
double operator ()(const cv::Mat &x) const;
private:
static double ComputeWeightedGaussian(const cv::Mat &x, WeightedGaussian wg);
std::vector<WeightedGaussian> _Gaussians;
int _NumGaussians;
};
}
GMM.cpp
using namespace LocalDescriptorAndBagOfFeature;
double GMM::ComputeWeightedGaussian(const cv::Mat &x, WeightedGaussian wg)
{
double weight;
cv::Mat mean, covariance;
std::tie(weight, mean, covariance) = wg;
cv::Mat precision;
cv::invert(covariance, precision);
double detp = cv::determinant(precision);
double outter = std::sqrt(detp / 2.0 * M_PI);
cv::Mat meanDist = x - mean;
cv::Mat meanDistTrans;
cv::transpose(meanDist, meanDistTrans);
cv::Mat symmetricProduct = meanDistTrans * precision * meanDist; // This is a "1x1" matrix e.g. a scalar value
double inner = symmetricProduct.at<double>(0,0) / -2.0;
return weight * outter * std::exp(inner);
}
double GMM::operator ()(const cv::Mat &x) const
{
return std::accumulate(_Gaussians.begin(), _Gaussians.end(), 0, [&x](double val, WeightedGaussian wg) { return val + ComputeWeightedGaussian(x, wg); });
}
In this case, am I gaining anything (clarity, readability, speed, ...) by using a tuple representation for the weighted Gaussian distribution over using a struct, or even a class with its own operator()?
You're reducing the size of your source code a little bit, but I'd argue that you're reducing its overall readability and type safety. Specifically, if you defined:
struct WeightedGaussian {
double weight;
cv::Mat mean, covariance;
};
then you wouldn't have a chance of writing the incorrect
std::tie(weight, covariance, mean) = wg;
and you'd guarantee that your users would use wg.mean instead of std::get<0>(wg). The biggest downside is that std::tuple comes with definitions of operator< and operator==, while you have to implement them yourself for a custom struct:
operator<(const WeightedGaussian& lhs, const WeightedGaussian& rhs) {
return std::tie(lhs.weight, lhs.mean, lhs.covariance) <
std::tie(rhs.weight, rhs.mean, rhs.covariance);
}
Assume you have many elements, and you need to keep track of the equivalence relations between them. If element A is equivalent to element B, it is equivalent to all the other elements B is equivalent to.
I am looking for an efficient data structure to encode this information. It should be possible to dynamically add new elements through an equivalence with an existing element, and from that information it should be possible to efficiently compute all the elements the new element is equivalent to.
For example, consider the following equivalence sets of the elements [0,1,2,3,4]:
0 = 1 = 2
3 = 4
where the equality sign denotes equivalence. Now we add a new element 5
0 = 1 = 2
3 = 4
5
and enforcing the equivalence 5=3, the data structure becomes
0 = 1 = 2
3 = 4 = 5
From this, one should be able to iterate efficiently through the equivalence set for any element. For 5, this set would be [3,4,5].
Boost already provides a convenient data structure called disjoint_sets that seems to meet most of my requirements. Consider this simple program that illustates how to implement the above example:
#include <cstdio>
#include <vector>
#include <boost/pending/disjoint_sets.hpp>
#include <boost/unordered/unordered_set.hpp>
/*
Equivalence relations
0 = 1 = 2
3 = 4
*/
int main(int , char* [])
{
typedef std::vector<int> VecInt;
typedef boost::unordered_set<int> SetInt;
VecInt rank (100);
VecInt parent (100);
boost::disjoint_sets<int*,int*> ds(&rank[0], &parent[0]);
SetInt elements;
for (int i=0; i<5; ++i) {
ds.make_set(i);
elements.insert(i);
}
ds.union_set(0,1);
ds.union_set(1,2);
ds.union_set(3,4);
printf("Number of sets:\n\t%d\n", (int)ds.count_sets(elements.begin(), elements.end()));
// normalize set so that parent is always the smallest number
ds.normalize_sets(elements.begin(), elements.end());
for (SetInt::const_iterator i = elements.begin(); i != elements.end(); ++i) {
printf("%d %d\n", *i, ds.find_set(*i));
}
return 0;
}
As seen above one can efficiently add elements, and dynamically expand the disjoint sets. How can one efficiently iterate over the elements of a single disjoint set, without having to iterate over all the elements?
Most probably you can't do that, disjoint_sets doesn't support iteration over one set only. The underlying data structure and algorithms wouldn't be able to do it efficiently anyway, i.e. even if there was support built in to disjoint_sets for iteration over one set only, that would be just as slow as iterating over all sets, and filtering out wrong sets.
Either I am missing something, you forgot to mention something, or maybe you were overthinking this ;)
Happily, equivalence is not equality. For A & B to be equivalent; they only need to share an attribute with the same value. this could be a scalar or even a vector. Anyway, I think your posted requirements can be achieved just using std::multiset and it's std::multiset::equal_range() member function.
//////////////////////////////////////
class E
{
//could be a GUID or something instead but the time complexity of
//std::multiset::equal_range with a simple int comparison should be logarithmic
static size_t _faucet;
public:
struct LessThan
{
bool operator()(const E* l, const E* r) const { return (l->eqValue() < r->eqValue()); }
};
using EL=std::vector<const E*>;
using ES=std::multiset<const E*, E::LessThan>;
using ER=std::pair<ES::iterator, ES::iterator>;
static size_t NewValue() { return ++_faucet; }
~E() { eqRemove(); }
E(size_t val) : _eqValue(val) {}
E(std::string name) : Name(name), _eqValue(NewValue()) { E::Elementals.insert(this); }
//not rly a great idea to use operator=() for this. demo only..
const E& operator=(const class E& other) { eqValue(other); return *this; }
//overriddable default equivalence interface
virtual size_t eqValue() const { return _eqValue; };
//clearly it matters how mutable you need your equivalence relationships to be,,
//in this implementation, if an element's equivalence relation changes then
//the element is going to be erased and re-inserted.
virtual void eqValue(const class E& other)
{
if (_eqValue == other._eqValue) return;
eqRemove();
_eqValue=other._eqValue;
E::Elementals.insert(this);
};
ES::iterator eqRemove()
{
auto range=E::Elementals.equal_range(this);
//worst-case complexity should be aprox linear over the range
for (auto it=range.first; it!=range.second; it++)
if (this == (*it))
return E::Elementals.erase(it);
return E::Elementals.end();
}
std::string Name; //some other attribute unique to the instance
static ES Elementals; //canonical set of elements with equivalence relations
protected:
size_t _eqValue=0;
};
size_t E::_faucet=0;
E::ES E::Elementals{};
//////////////////////////////////////
//random specialisation providing
//dynamic class-level equivalence
class StarFish : public E
{
public:
static void EqAssign(const class E& other)
{
if (StarFish::_id == other.eqValue()) return;
E el(StarFish::_id);
auto range=E::Elementals.equal_range(&el);
StarFish::_id=other.eqValue();
E::EL insertList(range.first, range.second);
E::Elementals.erase(range.first, range.second);
E::Elementals.insert(insertList.begin(), insertList.end());
}
StarFish() : E("starfish") {}
//base-class overrides
virtual size_t eqValue() const { return StarFish::_id; };
protected: //equivalence is a the class level
virtual void eqValue(const class E& other) { assert(0); }
private:
static size_t _id;
};
size_t StarFish::_id=E::NewValue();
//////////////////////////////////////
void eqPrint(const E& e)
{
std::cout << std::endl << "elementals equivalent to " << e.Name << ": ";
auto range=E::Elementals.equal_range(&e);
for (auto it=range.first; it!=range.second; it++)
std::cout << (*it)->Name << " ";
std::cout << std::endl << std::endl;
}
//////////////////////////////////////
void eqPrint()
{
for (auto it=E::Elementals.begin(); it!=E::Elementals.end(); it++)
std::cout << (*it)->Name << ": " << (*it)->eqValue() << " ";
std::cout << std::endl << std::endl;
}
//////////////////////////////////////
int main()
{
E e0{"zero"}, e1{"one"}, e2{"two"}, e3{"three"}, e4{"four"}, e5{"five"};
//per the OP
e0=e1=e2;
e3=e4;
e5=e3;
eqPrint(e0);
eqPrint(e3);
eqPrint(e5);
eqPrint();
StarFish::EqAssign(e3);
StarFish starfish1, starfish2;
starfish1.Name="fred";
eqPrint(e3);
//re-assignment
StarFish::EqAssign(e0);
e3=e0;
{ //out of scope removal
E e6{"six"};
e6=e4;
eqPrint(e4);
}
eqPrint(e5);
eqPrint(e0);
eqPrint();
return 0;
}
online demo
NB: C++ class inheritance also provides another kind of immutable equivalence that can be quite useful ;)