Range-based for loop with boost::adaptor::indexed

Range-based for loop with boost::adaptor::indexed - c++11

The C++11 range-based for loop dereferences the iterator. Does that mean that it makes no sense to use it with boost::adaptors::indexed? Example:
boost::counting_range numbers(10,20);
for(auto i : numbers | indexed(0)) {
cout << "number = " i
/* << " | index = " << i.index() */ // i is an integer!
<< "\n";
}
I can always use a counter but I like indexed iterators.
Is it possible to use them somehow with range-based for loops?
What is the idiom for using range-based loops with an index? (just a plain counter?)

This was fixed in Boost 1.56 (released August 2014); the element is indirected behind a value_type with index() and value() member functions.
Example: http://coliru.stacked-crooked.com/a/e95bdff0a9d371ea
auto numbers = boost::counting_range(10, 20);
for (auto i : numbers | boost::adaptors::indexed())
std::cout << "number = " << i.value()
<< " | index = " << i.index() << "\n";

It seems more useful when iterating over collection, where you may need the index position (to print the item number if not for anything else):
#include <boost/range/adaptors.hpp>
std::vector<std::string> list = {"boost", "adaptors", "are", "great"};
for (auto v: list | boost::adaptors::indexed(0)) {
printf("%ld: %s\n", v.index(), v.value().c_str());
}
Prints:
0: boost
1: adaptors
2: are
3: great
Any innovation for simply iterating over integer range is strongly challenged by the classic for loop, still very strong competitor:
for (int a = 10; a < 20; a++)
While this can be twisted up in a number of ways, it is not so easy to propose something that is obviously much more readable.

The short answer (as everyone in the comments mentioned) is "right, it makes no sense." I have also found this annoying. Depending your programming style, you might like the "zipfor" package I wrote (just a header): from github
It allows syntax like
std::vector v;
zipfor(x,i eachin v, icounter) {
// use x as deferenced element of x
// and i as index
}
Unfortunately, I cannot figure a way to use the ranged-based for syntax and have to resort to the "zipfor" macro :(
The header was originally designed for things like
std::vector v,w;
zipfor(x,y eachin v,w) {
// x is element of v
// y is element of w (both iterated in parallel)
}
and
std::map m;
mapfor(k,v eachin m)
// k is key and v is value of pair in m
My tests on g++4.8 with full optimizations shows that the resulting code is no slower than writing it by hand.

Related

SHA256 Find Partial Collision

I have two message:
messageA: "Frank is one of the "best" students topicId{} "
messageB: "Frank is one of the "top" students topicId{} "
I need to find SHA256 partially collision of these two messages(8 digits).
Therefore, The first 8 digests of SHA256(messageA) == The first 8 digest of SHA256(messageB)
We can put any letters and numbers in {}, Both {} should have same string
I have tried brute force and birthday attack with hash table to solve this problem, but it costs too much time. I know the cycle detection algorithm like Floyd and Brent, however i have no idea how to construct the cycle for this problem. Are there any other methods to solve this problem? Thank you so much!

This is pretty trivial to solve with a birthday attack. Here's how I did it in Python (v2):
def find_collision(ntries):
from hashlib import sha256
str1 = 'Frank is one of the "best" students topicId{%d} '
str2 = 'Frank is one of the "top" students topicId{%d} '
seen = {}
for n in xrange(ntries):
h = sha256(str1 % n).digest()[:4].encode('hex')
seen[h] = n
for n in xrange(ntries):
h = sha256(str2 % n).digest()[:4].encode('hex')
if h in seen:
print str1 % seen[h]
print str2 % n
find_collision(100000)
If your attempt took too long to find a solution, then either you simply made a mistake in your coding somewhere, or you were using the wrong data type.
Python's dictionary data type is implemented using hash tables. That means you can search for dictionary elements in constant time. If you implemented seen using a list instead of a dict in the above code, then the search at line 11 would take an awful lot longer.
Edit:
If the two topicId tokens have to be identical, then — as pointed out in the comments — there is little option but to grind through somewhere in the order of 231 values. You will find a collision eventually, but it could take a long time.
Just leave this running overnight and with a bit of luck you'll have an answer in the morning:
def find_collision():
from hashlib import sha256
str1 = 'Frank is one of the "best" students topicId{%x} '
str2 = 'Frank is one of the "top" students topicId{%x} '
seen = {}
n = 0
while True:
if sha256(str1 % n).digest()[:4] == sha256(str2 % n).digest()[:4]:
print str1 % n
print str2 % n
break
n += 1
find_collision()
If you're in a hurry, you could maybe look into using a GPU to speed up the hash calculations.

I'm assuming the space at the end of the strings in the question was intentional so I left it in.
"Frank is one of the "top" students topicId{59220691223} "
6026d9b323898bcd7ecdbcbcd575b0a1d9dc22fd9e60074aefcbaade494a50ae
"Frank is one of the "best" students topicId{59220691223} "
6026d9b31ba780bb9973e7cfc8c9f74a35b54448d441a61cc9bf8db0fcae5280
It actually took about 7 billion tries to find one using brute force, a lot more than I expected.
I figure 2^32 is roughly 4.3 billion and so chance of not finding any match after 4.3 billion tries is about 36.78%
I actually found a match after about 7 billion tries, there was less than a 20% chance of no matches in 7 billion tries.
This is the C++ code I used running on 7 threads, each thread gets a different starting point and it quits once a match is found on any thread. Each thread also updates its progress to cout every 1 million attempts.
I've fast forwarded to where the match was found on threadId=5, so it takes less than a minute to run. But if you change the starting point you can look for other matches.
And I'm not sure either how one would use Floyd and Brent since the strings have to use the same topicId so you are locked in on both the prefix and suffix.
/*
To compile go get picosha2 header file from https://github.com/okdshin/PicoSHA2
Copy this code into same directory as picosha2.h file, save it as hash.cpp for example.
On Linux go to command line and cd to directory where these files are.
To compile it:
g++ -O2 -o hash hash.cpp -l pthread
And run it:
./hash
*/
#include <iostream>
#include <string>
#include <thread>
#include <mutex>
// I used picoSHA2 header only file for the hashing
// https://github.com/okdshin/PicoSHA2
#include "picosha2.h"
// return 1st 4 bytes (8 chars) of SHA256 hash
std::string hash8(const std::string& src_str) {
std::vector<unsigned char> hash(picosha2::k_digest_size);
picosha2::hash256(src_str.begin(), src_str.end(), hash.begin(), hash.end());
return picosha2::bytes_to_hex_string(hash.begin(), hash.begin() + 4);
}
bool done = false;
std::mutex mtxCout;
void work(unsigned long long threadId) {
std::string a = "Frank is one of the \"best\" students topicId{",
b = "Frank is one of the \"top\" students topicId{";
// Each thread gets a different starting point, I've fast forwarded to the part
// where I found the match so this won't take long to run if you try it, < 1 minute.
// If you want to run a while drop the last "+ 150000000ULL" term and it will run
// for about 1 billion total (150 million each thread, assuming 7 threads) take
// about 30 minutes on Linux.
// Collision occurred on threadId = 5, so if you change it to use less than 6 threads
// then your mileage may vary.
unsigned long long start = threadId * (11666666667ULL + 147000000ULL) + 150000000ULL;
unsigned long long x = start;
for (;;) {
// Not concerned with making the reading/updating "done" flag atomic, unlikely
// 2 collisions are found at once on separate threads, and writing to cout
// is guarded anyway.
if (done) return;
std::string xs = std::to_string(x++);
std::string hashA = hash8(a + xs + "} "), hashB = hash8(b + xs + "} ");
if (hashA == hashB) {
std::lock_guard<std::mutex> lock(mtxCout);
std::cout << "*** SOLVED ***" << std::endl;
std::cout << (x-1) << std::endl;
std::cout << "\"" << a << (x - 1) << "} \" = " << hashA << std::endl;
std::cout << "\"" << b << (x - 1) << "} \" = " << hashB << std::endl;
done = true;
return;
}
if (((x - start) % 1000000ULL) == 0) {
std::lock_guard<std::mutex> lock(mtxCout);
std::cout << "thread: " << threadId << " = " << (x-start)
<< " tries so far" << std::endl;
}
}
}
void runBruteForce() {
const int NUM_THREADS = 7;
std::thread threads[NUM_THREADS];
for (int i = 0; i < NUM_THREADS; i++) threads[i] = std::thread(work, i);
for (int i = 0; i < NUM_THREADS; i++) threads[i].join();
}
int main(int argc, char** argv) {
runBruteForce();
return 0;
}

Boost R tree node remove

I want to remove the nearest point node. and that should be satisfied the limit of distance.
but I think my code is not efficient.
How can I modify this?
for (int j = 0; j < 3; j++) {
bgi::rtree< value, bgi::quadratic<16> > nextRT;
// search for nearest neighbours
std::vector<value> matchPoints;
vector<pair<float, float>> pointList;
for (unsigned i = 0; i < keypoints[j + 1].size(); ++i) {
point p = point(keypoints[j + 1][i].pt.x, keypoints[j + 1][i].pt.y);
nextRT.insert(std::make_pair(p, i));
RT.query(bgi::nearest(p, 1), std::back_inserter(matchPoints));
if (bg::distance(p, matchPoints.back().first) > 3) matchPoints.pop_back();
else {
pointList.push_back(make_pair(keypoints[j + 1][i].pt.x, keypoints[j + 1][i].pt.y));
RT.remove(matchPoints.back());
}
}
and I also curious about result of matchPoints.
After query function works, there are values in matchPoints.
first one is point, and second one looks like some indexing number.
I don't know what second one means.

Q. and I also curious about result of matchPoints. After query function works, there are values in matchPoints. first one is point, and second one looks like some indexing number. I don't know what second one means.
Well, that's got to be a data member in your value type. What is in it depends solely on what you inserted into the rtree. it wouldn't surprise me if it was an ID that describes the geometry.
Since you do not even show the type of RT, we can only assume it is the same as nextRT. If so, we can assume that value is likely a pair like pair<box, unsigned> (because of what you insert). So, look at what got inserted for the unsigned value of the pair in RT...
Q.
if (bg::distance(p, matchPoints.back().first) > 3) matchPoints.pop_back();
else {
pointList.push_back(make_pair(keypoints[j + 1][i].pt.x, keypoints[j + 1][i].pt.y));
rtree.remove(matchPoints.back());
}
Simplify your code! Distilling the requirements:
It looks to me that for 4 sets of "key points", you want to create 4 rtrees containing all those key points with sequentially increasing ids.
Also for those 4 sets of "key points", you want to create a list of key points for which a geometry can be found with a radius of 3.
As a side-effect, remove those closely-matching geometries from the original rtree RT.
DECISION: Because these tasks are independent, let's do them separate:
// making up types that match the usage in your code:
struct keypoint_t { point pt; };
std::array<std::vector<keypoint_t>, 4> keypoints;
Now, let's do the tasks:
Note how RT is not used here:
for (auto const& current_key_set : keypoints) {
bgi::rtree< value, bgi::quadratic<16> > nextRT; // use a better name...
int i = 0;
for (auto const& kpd : current_key_set)
nextRT.insert(std::make_pair(kpd.pt, i++));
}
Creating the vector containing matched key-points (those with near geometries in RT):
for (auto const& current_key_set : keypoints) {
std::vector<point> matched_key_points;
for (auto const& kpd : current_key_set) {
point p = kpd.pt;
value match;
if (!RT.query(bgi::nearest(p, 1), &match))
continue;
if (bg::distance(p, match.first) <= 3) {
matched_key_points.push_back(p);
RT.remove(match);
}
}
}
Ironically, removing the matching geometries from RT became a bit of a minor issue in this: you can either delete by iterator or by a value. In this case, we use the overload that takes a value.
Summary
It was hard to understand the code enough to see what it did. I have shown how to clean up the code, and make it work. Maybe these aren't the things you need, but hopefully using the better separated code, you should be able to get further.
Note that the algorithms have side effects. This makes it hard to understand what really will happen. E.g.:
removing points from the original RT affects what the subsequent key points (even from subsequent sets (next j)) can match with
if you have the same key point multiple times, they may match more than 1 source RT point (because after removal of the first match, there might be a second match within radius 3)
key points are checked strictly sequentially. This means that if the first keypoint roughly matches a point X, this might cause a later keypoint to fail to match, even though the point X might be closer to that keypoint...
I'd suggest you THINK about the requirements really hard before implementing things with these side-effects. **Study the sample cases in the live demo below. If all these side-effects are exactly what you wanted, be sure to use much better naming and proper comments to describe what the code is doing.
Live Demo
Live On Coliru
#include <boost/geometry.hpp>
#include <boost/geometry/io/io.hpp>
#include <boost/geometry/index/rtree.hpp>
#include <iostream>
namespace bg = boost::geometry;
namespace bgi = bg::index;
typedef bg::model::point<float, 2, bg::cs::cartesian> point;
typedef std::pair<point, unsigned> pvalue;
typedef pvalue value;
int main() {
bgi::rtree< value, bgi::quadratic<16> > RT;
{
int i = 0;
for (auto p : { point(2.0f, 2.0f), point(2.5f, 2.5f) })
RT.insert(std::make_pair(p, i++));
}
struct keypoint_t { point pt; };
using keypoints_t = std::vector<keypoint_t>;
keypoints_t const keypoints[] = {
keypoints_t{ keypoint_t { point(-2, 2) } }, // should not match anything
keypoints_t{ keypoint_t { point(-1, 2) } }, // should match (2,2)
keypoints_t{ keypoint_t { point(2.0, 2.0) }, // matches (2.5,2.5)
{ point(2.5, 2.5) }, // nothing anymore...
},
};
for (auto const& current_key_set : keypoints) {
bgi::rtree< pvalue, bgi::quadratic<16> > nextRT; // use a better name...
int i = 0;
for (auto const& kpd : current_key_set)
nextRT.insert(std::make_pair(kpd.pt, i++));
}
for (auto const& current_key_set : keypoints) {
std::cout << "-----------\n";
std::vector<point> matched_key_points;
for (auto const& kpd : current_key_set) {
point p = kpd.pt;
std::cout << "Key: " << bg::wkt(p) << "\n";
value match;
if (!RT.query(bgi::nearest(p, 1), &match))
continue;
if (bg::distance(p, match.first) <= 3) {
matched_key_points.push_back(p);
std::cout << "\tRemoving close point: " << bg::wkt(match.first) << "\n";
RT.remove(match);
}
}
std::cout << "\nMatched keys: ";
for (auto& p : matched_key_points)
std::cout << bg::wkt(p) << " ";
std::cout << "\n\tElements remaining: " << RT.size() << "\n";
}
}
Prints
-----------
Key: POINT(-2 2)
Matched keys:
Elements remaining: 2
-----------
Key: POINT(-1 2)
Removing close point: POINT(2 2)
Matched keys: POINT(-1 2)
Elements remaining: 1
-----------
Key: POINT(2 2)
Removing close point: POINT(2.5 2.5)
Key: POINT(2.5 2.5)
Matched keys: POINT(2 2)
Elements remaining: 0

Random numbers, C++11 vs Boost

I want to generate pseudo-random numbers in C++, and the two likely options are the feature of C++11 and the Boost counterpart. They are used in essentially the same way, but the native one in my tests is roughly 4 times slower.
Is that due to design choices in the library, or am I missing some way of disabling debug code somewhere?
Update: Code is here, https://github.com/vbeffara/Simulations/blob/master/tests/test_prng.cpp and looks like this:
cerr << "boost::bernoulli_distribution ... \ttime = ";
s=0; t=time();
boost::bernoulli_distribution<> dist(.5);
boost::mt19937 boostengine;
for (int i=0; i<n; ++i) s += dist(boostengine);
cerr << time()-t << ", \tsum = " << s << endl;
cerr << "C++11 style ... \ttime = ";
s=0; t=time();
std::bernoulli_distribution dist2(.5);
std::mt19937_64 engine;
for (int i=0; i<n; ++i) s += dist2(engine);
cerr << time()-t << ", \tsum = " << s << endl;
(Using std::mt19937 instead of std::mt19937_64 makes it even slower on my system.)

That’s pretty scary.
Let’s have a look:
boost::bernoulli_distribution<>
if(_p == RealType(0))
return false;
else
return RealType(eng()-(eng.min)()) <= _p * RealType((eng.max)()-(eng.min)());
std::bernoulli_distribution
__detail::_Adaptor<_UniformRandomNumberGenerator, double> __aurng(__urng);
if ((__aurng() - __aurng.min()) < __p.p() * (__aurng.max() - __aurng.min()))
return true;
return false;
Both versions invoke the engine and check if the output lies in a portion of the range of values proportional to the given probability.
The big difference is, that the gcc version calls the functions of a helper class _Adaptor.
This class’ min and max functions return 0 and 1 respectively and operator() then calls std::generate_canonical with the given URNG to obtain a value between 0 and 1.
std::generate_canonical is a 20 line function with a loop – which will never iteratate more than once in this case, but it adds complexity.
Apart from that, boost uses the param_type only in the constructor of the distribution, but then saves _p as a double member, whereas gcc has a param_type member and has to “get” the value of it.
This all comes together and the compiler fails in optimizing.
Clang chokes even more on it.
If you hammer hard enough you can even get std::mt19937 and boost::mt19937 en par for gcc.
It would be nice to test libc++ too, maybe i’ll add that later.
tested versions: boost 1.55.0, libstdc++ headers of gcc 4.8.2
line numbers on request^^

Number which occurs only once in the array [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Finding a single number in a list
Given an array of numbers, except for one number all the others, occur
twice. What should be the algorithm to find that number which occurs only once in the
array?
Example
a[1..n] = [1,2,3,4,3,1,2]
should return 4

Let the number which occurs only once in the array be x
x <- a[1]
for i <- 2 to n
x <- x ^ a[i]
return x
Since a ^ a = 0 and a ^ 0 = a
Numbers which occur in pair cancel out and the result gets stored in x
Working code in C++
#include <iostream>
template<typename T, size_t N>
size_t size(T(&a)[N])
{
return N;
}
int main()
{
int a [] = {1,2,3,4,3,1,2};
int x = a[0];
for (size_t i = 1; i< size(a) ; ++i)
{
x = x ^ a[i];
}
std::cout << x;
}

Create new int i = 0
XOR each item with i
After all iterations there will be expected number in i

If you have quantities which cannot be reasonably xored (Big Integers or numbers represented as Strings, for example), an alternate approach which is also O(n) time, (but O(n) space rather than O(1) space) would be to simply use a hash table. The algorithm looks like:
Create a hash table of the same size as the list
For every item in the list:
If item is a key in hash table
then remove item from hash table
else add item to hash table with nominal value
At the end, there should be exactly one item in the hash table
I would do, C or C++ code, but neither of them have hash tables built in. (Don't ask me why C++ doesn't have a hash table in the STL, but does have a hash map based on a red-black tree, because I have no idea what they were thinking.) And, unfortunately, I don't have a C# compiler handy to test for syntax errors, so I'm giving you Java code. It's pretty similar, though.
import java.util.Hashtable;
import java.util.List;
class FindUnique {
public static <T> T findUnique(List<T> list) {
Hashtable<T,Character> ht = new Hashtable<T,Character>(list.size());
for (T item : list) {
if (ht.containsKey(item)) {
ht.remove(item);
} else {
ht.put(item,'x');
}
}
return ht.keys().nextElement();
}
}

Well i only know of the Brute force algo and it is to traverse whole array and check
Code will be like (in C#):
k=0;
for(int i=0 ; i < array.Length ; i++)
{
k ^= array[i];
}
return k;

zerkms' answer in C++
int a[] = { 1,2,3,4,3,1,2 };
int i = std::accumulate(a, a + 7, 0, std::bit_xor<int>());

You could sort the array and then find the first element that doesn't have a pair. That would require several loops for sorting and a loop for finding the single element.
But a simplier method would be setting the double keys to zero or a value that is not possible in the current format. Depends on the programming language, as well, as you cannot change key types in c++ unlike c#.

"Double" assignment - should it be avoided?

Consider you have some expression like
i = j = 0
supposing this is well-defined in your language of choice. Would it generally be better to split this up into two expressions like
i = 0
j = 0
I see this sometimes in library code. It doesn't seem buy you much in terms of brevity and shouldn't perform any better than the two statements (though that may be compiler dependant). So, is there a reason to use one over the other? Or is it just personal preference? I know this sounds like a silly question but it's bugging me for a long time now :-).

Once upon a time there was a performance difference, which is one of the reason that this kind of assignment was used. The compilers would turn i = 0; j = 0; into:
load 0
store [i]
load 0
store [j]
So you could save an instruction by using i = j = 0 as the compiler would turn this into:
load 0
store [j]
store [i]
Nowadays compilers can do this type of optimisations by themselves. Also, as the current CPUs run several instructions at once, performance can no longer simply be measured in number of instructions. Instructions where one action doesn't rely on the result of another can run in parallel, so the version that uses a separate value for each variable might actually be faster.
Regarding programming style, you should use the way that best expresses the intention of the code.
You can for example chain the assignments when you simply want to clear some variables, and make it separate assignments when the value has a specific meaning. Especially if the meaning of setting one variable to the value is different from setting the other variable to the same value.

The two forms reflects different points of view on the assignment.
The first case treats assignment (at least the inner one) as an operator (a function returning a value).
The second case treats assignment as a statement (a command to do something).
There is some cases where the assignment as an operator has it's point, mostly for brevity, or to use in contexts that expect a result. However I feel it confusing. For several reasons:
Assignment operator are basically side effect operators, and nowadays it's a problem to optimize them for compilers. In languages like C and C++ they lead to many Undefined Behavior cases, or unoptimized code.
It is unclear what it should return. Should assignment operator return the value that as been assigned, or should it return the address of the place it has been stored. One or the other could be useful, depending on the context.
With composite assignments like +=, it's even worse. It is unclear if the operator should return the initial value, the combined result, or even the place it was stored to.
The assignment as a statement lead sometimes to intermediate variables, but that's the only drawback I see. It is clear and compilers know how to optimize efficiently successive such statements.
Basically, I would avoid assignment as operator whenever possible. The presented case is very simple and not really confusing, but as a general rule I would still prefer.
i = 0
j = 0
or
i, j = 0, 0
for languages that supports, parallel assignment.

It depends on the language. In highly-object-oriented languages, double assignment results in the same object being assigned to multiple variables, so changes in one variable are reflected in the other.
$ python -c 'a = b = [] ; a.append(1) ; print b'
[1]

Firstly, at a semantic level, it depends whether you want to say that i and j are the same value, or just happen to both have the same value.
For example, if i and j are the indexes into a 2D array, they both start at zero. j = i = 0 says i starts at zero, and j starts where i started. If you wanted to start at the second row, you wouldn't necessarily want to start at the second column, so I wouldn't initialise them both in the same statement - the indices for rows and columns independently happen to both start at zero.
Also, in languages where i and j represent complicated objects rather than integral variables, or where assignment may cause an implicit conversion, they are not equivalent:
#include <iostream>
class ComplicatedObject
{
public:
const ComplicatedObject& operator= ( const ComplicatedObject& other ) {
std::cout << " ComplicatedObject::operator= ( const ComplicatedObject& )\n";
return *this;
}
const ComplicatedObject& operator= ( int value ) {
std::cout << " ComplicatedObject::operator= ( int )\n";
return *this;
}
};
int main ()
{
{
// the functions called are not the same
ComplicatedObject i;
ComplicatedObject j;
std::cout << "j = i = 0:\n";
j = i = 0;
std::cout << "i = 0; j = 0:\n";
i = 0;
j = 0;
}
{
// the result of the first assignment is
// effected by implicit conversion
double j;
int i;
std::cout << "j = i = 0.1:\n";
j = i = 0.1;
std::cout << " i == " << i << '\n'
<< " j == " << j << '\n'
;
std::cout << "i = 0.1; j = 0.1:\n";
i = 0.1;
j = 0.1;
std::cout << " i == " << i << '\n'
<< " j == " << j << '\n'
;
}
}

Most of the people will find both possibilities equally readable. Some of these people will have a personal preference for either way. But there are people who might, at first glance, get confused by the "double assignment". I personally like the separate approach, bacause
It is 100% readable
It is not really verbose compared to the double variant
It allows me forget the rules of associativity for = operator

The second way is more readable and clear, I prefer it.
However I try to avoid "double" declaration:
int i, j;
instead of
int i;
int j;
if they're going consecutively. Especially in case of MyVeryLong.AndComplexType

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Range-based for loop with boost::adaptor::indexed - c++11

Related

SHA256 Find Partial Collision

Boost R tree node remove

Random numbers, C++11 vs Boost

Number which occurs only once in the array [duplicate]

"Double" assignment - should it be avoided?

Categories

Resources