Boost R tree node remove - boost

I want to remove the nearest point node. and that should be satisfied the limit of distance.
but I think my code is not efficient.
How can I modify this?
for (int j = 0; j < 3; j++) {
bgi::rtree< value, bgi::quadratic<16> > nextRT;
// search for nearest neighbours
std::vector<value> matchPoints;
vector<pair<float, float>> pointList;
for (unsigned i = 0; i < keypoints[j + 1].size(); ++i) {
point p = point(keypoints[j + 1][i].pt.x, keypoints[j + 1][i].pt.y);
nextRT.insert(std::make_pair(p, i));
RT.query(bgi::nearest(p, 1), std::back_inserter(matchPoints));
if (bg::distance(p, matchPoints.back().first) > 3) matchPoints.pop_back();
else {
pointList.push_back(make_pair(keypoints[j + 1][i].pt.x, keypoints[j + 1][i].pt.y));
RT.remove(matchPoints.back());
}
}
and I also curious about result of matchPoints.
After query function works, there are values in matchPoints.
first one is point, and second one looks like some indexing number.
I don't know what second one means.

Q. and I also curious about result of matchPoints. After query function works, there are values in matchPoints. first one is point, and second one looks like some indexing number. I don't know what second one means.
Well, that's got to be a data member in your value type. What is in it depends solely on what you inserted into the rtree. it wouldn't surprise me if it was an ID that describes the geometry.
Since you do not even show the type of RT, we can only assume it is the same as nextRT. If so, we can assume that value is likely a pair like pair<box, unsigned> (because of what you insert). So, look at what got inserted for the unsigned value of the pair in RT...
Q.
if (bg::distance(p, matchPoints.back().first) > 3) matchPoints.pop_back();
else {
pointList.push_back(make_pair(keypoints[j + 1][i].pt.x, keypoints[j + 1][i].pt.y));
rtree.remove(matchPoints.back());
}
Simplify your code! Distilling the requirements:
It looks to me that for 4 sets of "key points", you want to create 4 rtrees containing all those key points with sequentially increasing ids.
Also for those 4 sets of "key points", you want to create a list of key points for which a geometry can be found with a radius of 3.
As a side-effect, remove those closely-matching geometries from the original rtree RT.
DECISION: Because these tasks are independent, let's do them separate:
// making up types that match the usage in your code:
struct keypoint_t { point pt; };
std::array<std::vector<keypoint_t>, 4> keypoints;
Now, let's do the tasks:
Note how RT is not used here:
for (auto const& current_key_set : keypoints) {
bgi::rtree< value, bgi::quadratic<16> > nextRT; // use a better name...
int i = 0;
for (auto const& kpd : current_key_set)
nextRT.insert(std::make_pair(kpd.pt, i++));
}
Creating the vector containing matched key-points (those with near geometries in RT):
for (auto const& current_key_set : keypoints) {
std::vector<point> matched_key_points;
for (auto const& kpd : current_key_set) {
point p = kpd.pt;
value match;
if (!RT.query(bgi::nearest(p, 1), &match))
continue;
if (bg::distance(p, match.first) <= 3) {
matched_key_points.push_back(p);
RT.remove(match);
}
}
}
Ironically, removing the matching geometries from RT became a bit of a minor issue in this: you can either delete by iterator or by a value. In this case, we use the overload that takes a value.
Summary
It was hard to understand the code enough to see what it did. I have shown how to clean up the code, and make it work. Maybe these aren't the things you need, but hopefully using the better separated code, you should be able to get further.
Note that the algorithms have side effects. This makes it hard to understand what really will happen. E.g.:
removing points from the original RT affects what the subsequent key points (even from subsequent sets (next j)) can match with
if you have the same key point multiple times, they may match more than 1 source RT point (because after removal of the first match, there might be a second match within radius 3)
key points are checked strictly sequentially. This means that if the first keypoint roughly matches a point X, this might cause a later keypoint to fail to match, even though the point X might be closer to that keypoint...
I'd suggest you THINK about the requirements really hard before implementing things with these side-effects. **Study the sample cases in the live demo below. If all these side-effects are exactly what you wanted, be sure to use much better naming and proper comments to describe what the code is doing.
Live Demo
Live On Coliru
#include <boost/geometry.hpp>
#include <boost/geometry/io/io.hpp>
#include <boost/geometry/index/rtree.hpp>
#include <iostream>
namespace bg = boost::geometry;
namespace bgi = bg::index;
typedef bg::model::point<float, 2, bg::cs::cartesian> point;
typedef std::pair<point, unsigned> pvalue;
typedef pvalue value;
int main() {
bgi::rtree< value, bgi::quadratic<16> > RT;
{
int i = 0;
for (auto p : { point(2.0f, 2.0f), point(2.5f, 2.5f) })
RT.insert(std::make_pair(p, i++));
}
struct keypoint_t { point pt; };
using keypoints_t = std::vector<keypoint_t>;
keypoints_t const keypoints[] = {
keypoints_t{ keypoint_t { point(-2, 2) } }, // should not match anything
keypoints_t{ keypoint_t { point(-1, 2) } }, // should match (2,2)
keypoints_t{ keypoint_t { point(2.0, 2.0) }, // matches (2.5,2.5)
{ point(2.5, 2.5) }, // nothing anymore...
},
};
for (auto const& current_key_set : keypoints) {
bgi::rtree< pvalue, bgi::quadratic<16> > nextRT; // use a better name...
int i = 0;
for (auto const& kpd : current_key_set)
nextRT.insert(std::make_pair(kpd.pt, i++));
}
for (auto const& current_key_set : keypoints) {
std::cout << "-----------\n";
std::vector<point> matched_key_points;
for (auto const& kpd : current_key_set) {
point p = kpd.pt;
std::cout << "Key: " << bg::wkt(p) << "\n";
value match;
if (!RT.query(bgi::nearest(p, 1), &match))
continue;
if (bg::distance(p, match.first) <= 3) {
matched_key_points.push_back(p);
std::cout << "\tRemoving close point: " << bg::wkt(match.first) << "\n";
RT.remove(match);
}
}
std::cout << "\nMatched keys: ";
for (auto& p : matched_key_points)
std::cout << bg::wkt(p) << " ";
std::cout << "\n\tElements remaining: " << RT.size() << "\n";
}
}
Prints
-----------
Key: POINT(-2 2)
Matched keys:
Elements remaining: 2
-----------
Key: POINT(-1 2)
Removing close point: POINT(2 2)
Matched keys: POINT(-1 2)
Elements remaining: 1
-----------
Key: POINT(2 2)
Removing close point: POINT(2.5 2.5)
Key: POINT(2.5 2.5)
Matched keys: POINT(2 2)
Elements remaining: 0

Related

Computed Members in C++ Class by Empty Struct Members With Overloaded Implicit Conversions

In some data structures, it would be useful to have members whose values are computed from the other data members upon access instead of stored.
For example, a typical rect class might store it's left, top, right and bottom coordinates in member data fields, and provide getter methods that return the computed width and height based on those values, for clients which require the relative dimensions instead of the absolute positions.
struct rect
{
int left, top, right, bottom;
// ...
int get_width() const { return right - left; }
int get_height() const { return bottom - top; }
};
This implementation allows us to get and set the absolute coordinates of the rectangles sides,
float center_y = (float)(box.top + box.bottom) / 2.0;
and additionally to get it's relative dimensions, albeit using the slightly different method-call operator expression syntax:
float aspect = (float)box.get_width() / (float)box.get_height();
The Problem
One could argue, however, that it is equally valid to store the relative width and height instead of absolute right and bottom coordinates, and require clients that need to compute the right and bottom values to use getter methods.
My Solution
In order to avoid the need to remember which case requires method call vs. data member access operator syntax, I have come up with some code that works in the current stable gcc and clang compilers. Here is a fully functional example implementation of a rect data structure:
#include <iostream>
struct rect
{
union {
struct {
union { int l; int left; };
union { int t; int top; };
union { int r; int right; };
union { int b; int bot; int bottom; };
};
struct {
operator int() {
return ((rect*)this)->r - ((rect*)this)->l;
}
} w, width;
struct {
operator int() {
return ((rect*)this)->b - ((rect*)this)->t;
}
} h, height;
};
rect(): l(0), t(0), r(0), b(0) {}
rect(int _w, int _h): l(0), t(0), r(_w), b(_h) {}
rect(int _l, int _t, int _r, int _b): l(_l), t(_t), r(_r), b(_b) {}
template<class OStream> friend OStream& operator<<(OStream& out, const rect& ref)
{
return out << "rect(left=" << ref.l << ", top=" << ref.t << ", right=" << ref.r << ", bottom=" << ref.b << ")";
}
};
/// #brief Small test program showing that rect.w and rect.h behave like data members
int main()
{
rect t(3, 5, 103, 30);
std::cout << "sizeof(rect) is " << sizeof(rect) << std::endl;
std::cout << "t is " << t << std::endl;
std::cout << "t.w is " << t.w << std::endl;
std::cout << "t.h is " << t.h << std::endl;
return 0;
}
Is there anything wrong with what I am doing here?
Something about the pointer-casts in the nested empty struct types' implicit conversion operators, i.e. these lines:
return ((rect*)this)->r - ((rect*)this)->l;
feels dirty, as though I may be violating good C++ style convention. If this or some other aspect of my solution is wrong, I'd like to know what the reasoning is, and ultimately, if this is bad practice then is there a valid way to achieve the same results.
One thing that I would normally expect to work doesn't:
auto w = t.w;
Also, one of the following lines works, the other does not:
t.l += 3;
t.w += 3; // compile error
Thus, you have not changed the fact that users need to know which members are data and which are functions.
I'd just make all of them functions. It is better encapsulation anyway. And I would prefer the full names, i.e. left, top, bottom, right, width and length. It might be a few more characters to write, but most code is read much more often than it is written. The extra few characters will pay off.

Finding every possible word out of a bigger word [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Hi I'm looking for an algorithm to extract every possible word out of a single word in C++.
For example from the word "overflow" I can get these : "love","flow","for","row,"over"...
So how can I get only valid english words efficiently.
Note: I have a dictionary, a big word list.
I can't think how to do this without bruit-forcing it with all the permutations.
Something like this:
#include <string>
#include <algorithm>
int main()
{
using size_type = std::string::size_type;
std::string word = "overflow";
// examine every permutation of the letters contained in word
while(std::next_permutation(word.begin(), word.end()))
{
// examine each substring permutation
for(size_type s = 0; s < word.size(); ++s)
{
std::string sub = word.substr(0, s);
// look up sub in a dictionary here...
}
}
return 0;
}
I can think of 2 ways to speed this up.
1) Keep a check on substrings of a given permutation already tried to avoid unnecessary dictionary lookups (std::set or std::unordered_set maybe).
2) Cache popular results, keeping the most frequently requested words (std::map or std::unordered_map perhaps).
NOTE:
It turns out even after adding cashing at various levels this is indeed a very slow algorithm for larger words.
However this uses a much faster algorithm:
#include <set>
#include <string>
#include <cstring>
#include <fstream>
#include <iostream>
#include <algorithm>
#define con(m) std::cout << m << '\n'
std::string& lower(std::string& s)
{
std::transform(s.begin(), s.end(), s.begin(), tolower);
return s;
}
std::string& trim(std::string& s)
{
static const char* t = " \t\n\r";
s.erase(s.find_last_not_of(t) + 1);
s.erase(0, s.find_first_not_of(t));
return s;
}
void usage()
{
con("usage: anagram [-p] -d <word-file> -w <word>");
con(" -p - (optional) find only perfect anagrams.");
con(" -d <word-file> - (required) A file containing a list of possible words.");
con(" -w <word> - (required) The word to find anagrams of in the <word-file>.");
}
int main(int argc, char* argv[])
{
std::string word;
std::string wordfile;
bool perfect_anagram = false;
for(int i = 1; i < argc; ++i)
{
if(!strcmp(argv[i], "-p"))
perfect_anagram = true;
else if(!strcmp(argv[i], "-d"))
{
if(!(++i < argc))
{
usage();
return 1;
}
wordfile = argv[i];
}
else if(!strcmp(argv[i], "-w"))
{
if(!(++i < argc))
{
usage();
return 1;
}
word = argv[i];
}
}
if(wordfile.empty() || word.empty())
{
usage();
return 1;
}
std::ifstream ifs(wordfile);
if(!ifs)
{
con("ERROR: opening dictionary: " << wordfile);
return 1;
}
// for analyzing the relevant characters and their
// relative abundance
std::string sorted_word = lower(word);
std::sort(sorted_word.begin(), sorted_word.end());
std::string unique_word = sorted_word;
unique_word.erase(std::unique(unique_word.begin(), unique_word.end()), unique_word.end());
// This is where the successful words will go
// using a set to ensure uniqueness
std::set<std::string> found;
// plow through the dictionary
// (storing it in memory would increase performance)
std::string line;
while(std::getline(ifs, line))
{
// quick rejects
if(trim(line).size() < 2)
continue;
if(perfect_anagram && line.size() != word.size())
continue;
if(line.size() > word.size())
continue;
// This may be needed if dictionary file contains
// upper-case words you want to match against
// such as acronyms and proper nouns
// lower(line);
// for analyzing the relevant characters and their
// relative abundance
std::string sorted_line = line;
std::sort(sorted_line.begin(), sorted_line.end());
std::string unique_line = sorted_line;
unique_line.erase(std::unique(unique_line.begin(), unique_line.end()), unique_line.end());
// closer rejects
if(unique_line.find_first_not_of(unique_word) != std::string::npos)
continue;
if(perfect_anagram && sorted_word != sorted_line)
continue;
// final check if candidate line from the dictionary
// contains only the letters (in the right quantity)
// needed to be an anagram
bool match = true;
for(auto c: unique_line)
{
auto n1 = std::count(sorted_word.begin(), sorted_word.end(), c);
auto n2 = std::count(sorted_line.begin(), sorted_line.end(), c);
if(n1 < n2)
{
match = false;
break;
}
}
if(!match)
continue;
// we found a good one
found.insert(std::move(line));
}
con("Found: " << found.size() << " word" << (found.size() == 1?"":"s"));
for(auto&& word: found)
con(word);
}
Explanation:
This algorithm works by concentrating on known good patterns (dictionary words) rather than the vast number of bad patterns generated by the permutation solution.
So it trundles through the dictionary looking for words to match the search term. It successively discounts the words based on tests that increase in accuracy as the more obvious words are discounted.
The crux logic used is to search each surviving dictionary word to ensure it contains every letter from the search term. This is achieved by finding a string that contains exactly one of each of the letters from the search term and the dictionary word. It uses std::unique to produce that string. If it survives this test then it goes on to check that the number of each letter in the dictionary word is reflected in the search term. This uses std::count().
A perfect_anagram is detected only if all the letters match in the dictionary word and the search term. Otherwise it is sufficient that the search term contains at least enough of the correct letters.

All of the option to replace an unknown number of characters

I am trying to find an algorithm that for an unknown number of characters in a string, produces all of the options for replacing some characters with stars.
For example, for the string "abc", the output should be:
*bc
a*c
ab*
**c
*b*
a**
***
It is simple enough with a known number of stars, just run through all of the options with for loops, but I'm having difficulties with an all of the options.
Every star combination corresponds to binary number, so you can use simple cycle
for i = 1 to 2^n-1
where n is string length
and set stars to the positions of 1-bits of binary representations of i
for example: i=5=101b => * b *
This is basically a binary increment problem.
You can create a vector of integer variables to represent a binary array isStar and for each iteration you "add one" to the vector.
bool AddOne (int* isStar, int size) {
isStar[size - 1] += 1
for (i = size - 1; i >= 0; i++) {
if (isStar[i] > 1) {
if (i = 0) { return true; }
isStar[i] = 0;
isStar[i - 1] += 1;
}
}
return false;
}
That way you still have the original string while replacing the characters
This is a simple binary counting problem, where * corresponds to a 1 and the original letter to a 0. So you could do it with a counter, applying a bit mask to the string, but it's just as easy to do the "counting" in place.
Here's a simple implementation in C++:
(Edit: The original question seems to imply that at least one character must be replaced with a star, so the count should start at 1 instead of 0. Or, in the following, the post-test do should be replaced with a pre-test for.)
#include <iostream>
#include <string>
// A cleverer implementation would implement C++'s iterator protocol.
// But that would cloud the simple logic of the algorithm.
class StarReplacer {
public:
StarReplacer(const std::string& s): original_(s), current_(s) {}
const std::string& current() const { return current_; }
// returns true unless we're at the last possibility (all stars),
// in which case it returns false but still resets current to the
// original configuration.
bool advance() {
for (int i = current_.size()-1; i >= 0; --i) {
if (current_[i] == '*') current_[i] = original_[i];
else {
current_[i] = '*';
return true;
}
}
return false;
}
private:
std::string original_;
std::string current_;
};
int main(int argc, const char** argv) {
for (int a = 1; a < argc; ++a) {
StarReplacer r(argv[a]);
do {
std::cout << r.current() << std::endl;
} while (r.advance());
std::cout << std::endl;
}
return 0;
}

How to partly sort arrays on CUDA?

Problem
Provided I have two arrays:
const int N = 1000000;
float A[N];
myStruct *B[N];
The numbers in A can be positive or negative (e.g. A[N]={3,2,-1,0,5,-2}), how can I make the array A partly sorted (all positive values first, not need to be sorted, then negative values)(e.g. A[N]={3,2,5,0,-1,-2} or A[N]={5,2,3,0,-2,-1}) on the GPU? The array B should be changed according to A (A is keys, B is values).
Since the scale of A,B can be very large, I think the sort algorithm should be implemented on GPU (especially on CUDA, because I use this platform). Surely I know thrust::sort_by_key can do this work, but it does muck extra work since I do not need the array A&B to be sorted entirely.
Has anyone come across this kind of problem?
Thrust example
thrust::sort_by_key(thrust::device_ptr<float> (A),
thrust::device_ptr<float> ( A + N ),
thrust::device_ptr<myStruct> ( B ),
thrust::greater<float>() );
Thrust's documentation on Github is not up-to-date. As #JaredHoberock said, thrust::partition is the way to go since it now supports stencils. You may need to get a copy from the Github repository:
git clone git://github.com/thrust/thrust.git
Then run scons doc in the Thrust folder to get an updated documentation, and use these updated Thrust sources when compiling your code (nvcc -I/path/to/thrust ...). With the new stencil partition, you can do:
#include <thrust/partition.h>
#include <thrust/execution_policy.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/tuple.h>
struct is_positive
{
__host__ __device__
bool operator()(const int &x)
{
return x >= 0;
}
};
thrust::partition(thrust::host, // if you want to test on the host
thrust::make_zip_iterator(thrust::make_tuple(keyVec.begin(), valVec.begin())),
thrust::make_zip_iterator(thrust::make_tuple(keyVec.end(), valVec.end())),
keyVec.begin(),
is_positive());
This returns:
Before:
keyVec = 0 -1 2 -3 4 -5 6 -7 8 -9
valVec = 0 1 2 3 4 5 6 7 8 9
After:
keyVec = 0 2 4 6 8 -5 -3 -7 -1 -9
valVec = 0 2 4 6 8 5 3 7 1 9
Note that the 2 partitions are not necessarily sorted. Also, the order may differ between the original vectors and the partitions. If this is important to you, you can use thrust::stable_partition:
stable_partition differs from partition in that stable_partition is
guaranteed to preserve relative order. That is, if x and y are
elements in [first, last), such that pred(x) == pred(y), and if x
precedes y, then it will still be true after stable_partition that x
precedes y.
If you want a complete example, here it is:
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/partition.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/tuple.h>
struct is_positive
{
__host__ __device__
bool operator()(const int &x)
{
return x >= 0;
}
};
void print_vec(const thrust::host_vector<int>& v)
{
for(size_t i = 0; i < v.size(); i++)
std::cout << " " << v[i];
std::cout << "\n";
}
int main ()
{
const int N = 10;
thrust::host_vector<int> keyVec(N);
thrust::host_vector<int> valVec(N);
int sign = 1;
for(int i = 0; i < N; ++i)
{
keyVec[i] = sign * i;
valVec[i] = i;
sign *= -1;
}
// Copy host to device
thrust::device_vector<int> d_keyVec = keyVec;
thrust::device_vector<int> d_valVec = valVec;
std::cout << "Before:\n keyVec = ";
print_vec(keyVec);
std::cout << " valVec = ";
print_vec(valVec);
// Partition key-val on device
thrust::partition(thrust::make_zip_iterator(thrust::make_tuple(d_keyVec.begin(), d_valVec.begin())),
thrust::make_zip_iterator(thrust::make_tuple(d_keyVec.end(), d_valVec.end())),
d_keyVec.begin(),
is_positive());
// Copy result back to host
keyVec = d_keyVec;
valVec = d_valVec;
std::cout << "After:\n keyVec = ";
print_vec(keyVec);
std::cout << " valVec = ";
print_vec(valVec);
}
UPDATE
I made a quick comparison with the thrust::sort_by_key version, and the thrust::partition implementation does seem to be faster (which is what we could naturally expect). Here is what I obtain on NVIDIA Visual Profiler, with N = 1024 * 1024, with the sort version on the left, and the partition version on the right. You may want to do the same kind of tests on your own.
How about this?:
Count how many positive numbers to determine the inflexion point
Evenly divide each side of the inflexion point into groups (negative-groups are all same length but different length to positive-groups. these groups are the memory chunks for the results)
Use one kernel call (one thread) per chunk pair
Each kernel swaps any out-of-place elements in the input groups into the desired output groups. You will need to flag any chunks that have more swaps than the maximum so that you can fix them during subsequent iterations.
Repeat until done
Memory traffic is swaps only (from original element position, to sorted position). I don't know if this algorithm sounds like anything already defined...
You should be able to achieve this in thrust simply with a modification of your comparison operator:
struct my_compare
{
__device__ __host__ bool operator()(const float x, const float y) const
{
return !((x<0.0f) && (y>0.0f));
}
};
thrust::sort_by_key(thrust::device_ptr<float> (A),
thrust::device_ptr<float> ( A + N ),
thrust::device_ptr<myStruct> ( B ),
my_compare() );

Range-based for loop with boost::adaptor::indexed

The C++11 range-based for loop dereferences the iterator. Does that mean that it makes no sense to use it with boost::adaptors::indexed? Example:
boost::counting_range numbers(10,20);
for(auto i : numbers | indexed(0)) {
cout << "number = " i
/* << " | index = " << i.index() */ // i is an integer!
<< "\n";
}
I can always use a counter but I like indexed iterators.
Is it possible to use them somehow with range-based for loops?
What is the idiom for using range-based loops with an index? (just a plain counter?)
This was fixed in Boost 1.56 (released August 2014); the element is indirected behind a value_type with index() and value() member functions.
Example: http://coliru.stacked-crooked.com/a/e95bdff0a9d371ea
auto numbers = boost::counting_range(10, 20);
for (auto i : numbers | boost::adaptors::indexed())
std::cout << "number = " << i.value()
<< " | index = " << i.index() << "\n";
It seems more useful when iterating over collection, where you may need the index position (to print the item number if not for anything else):
#include <boost/range/adaptors.hpp>
std::vector<std::string> list = {"boost", "adaptors", "are", "great"};
for (auto v: list | boost::adaptors::indexed(0)) {
printf("%ld: %s\n", v.index(), v.value().c_str());
}
Prints:
0: boost
1: adaptors
2: are
3: great
Any innovation for simply iterating over integer range is strongly challenged by the classic for loop, still very strong competitor:
for (int a = 10; a < 20; a++)
While this can be twisted up in a number of ways, it is not so easy to propose something that is obviously much more readable.
The short answer (as everyone in the comments mentioned) is "right, it makes no sense." I have also found this annoying. Depending your programming style, you might like the "zipfor" package I wrote (just a header): from github
It allows syntax like
std::vector v;
zipfor(x,i eachin v, icounter) {
// use x as deferenced element of x
// and i as index
}
Unfortunately, I cannot figure a way to use the ranged-based for syntax and have to resort to the "zipfor" macro :(
The header was originally designed for things like
std::vector v,w;
zipfor(x,y eachin v,w) {
// x is element of v
// y is element of w (both iterated in parallel)
}
and
std::map m;
mapfor(k,v eachin m)
// k is key and v is value of pair in m
My tests on g++4.8 with full optimizations shows that the resulting code is no slower than writing it by hand.

Resources