Boost.Variant Vs Virtual Interface Performance - performance

I'm trying to measure a performance difference between using Boost.Variant and using virtual interfaces. For example, suppose I want to increment different types of numbers uniformly, using Boost.Variant I would use a boost::variant over int and float and a static visitor which increments each one of them. Using class interfaces I would use a pure virtual class number and number_int and number_float classes which derive from it and implement an "increment" method.
From my testing, using interfaces is far faster than using Boost.Variant.
I ran the code at the bottom and received these results:
Virtual: 00:00:00.001028
Variant: 00:00:00.012081
Why do you suppose this difference is? I thought Boost.Variant would be a lot faster.
** Note: Usually Boost.Variant uses heap allocations to guarantee that the variant would always be non-empty. But I read on the Boost.Variant documentation that if boost::has_nothrow_copy is true then it doesn't use heap allocations which should make things significantly faster. For int and float boost::has_nothrow_copy is true.
Here is my code for measuring the two approaches against each other.
#include <iostream>
#include <boost/variant/variant.hpp>
#include <boost/variant/static_visitor.hpp>
#include <boost/variant/apply_visitor.hpp>
#include <boost/date_time/posix_time/ptime.hpp>
#include <boost/date_time/posix_time/posix_time_types.hpp>
#include <boost/date_time/posix_time/posix_time_io.hpp>
#include <boost/format.hpp>
const int iterations_count = 100000;
// a visitor that increments a variant by N
template <int N>
struct add : boost::static_visitor<> {
template <typename T>
void operator() (T& t) const {
t += N;
}
};
// a number interface
struct number {
virtual void increment() = 0;
};
// number interface implementation for all types
template <typename T>
struct number_ : number {
number_(T t = 0) : t(t) {}
virtual void increment() {
t += 1;
}
T t;
};
void use_virtual() {
number_<int> num_int;
number* num = &num_int;
for (int i = 0; i < iterations_count; i++) {
num->increment();
}
}
void use_variant() {
typedef boost::variant<int, float, double> number;
number num = 0;
for (int i = 0; i < iterations_count; i++) {
boost::apply_visitor(add<1>(), num);
}
}
int main() {
using namespace boost::posix_time;
ptime start, end;
time_duration d1, d2;
// virtual
start = microsec_clock::universal_time();
use_virtual();
end = microsec_clock::universal_time();
// store result
d1 = end - start;
// variant
start = microsec_clock::universal_time();
use_variant();
end = microsec_clock::universal_time();
// store result
d2 = end - start;
// output
std::cout <<
boost::format(
"Virtual: %1%\n"
"Variant: %2%\n"
) % d1 % d2;
}

For those interested, after I was a bit frustrated, I passed the option -O2 to the compiler and boost::variant was way faster than a virtual call.
Thanks

This is obvious that -O2 reduces the variant time, because that whole loop is optimized away. Change the implementation to return the accumulated result to the caller, so that the optimizer wouldn't remove the loop, and you'll get the real difference:
Output:
Virtual: 00:00:00.000120 = 10000000
Variant: 00:00:00.013483 = 10000000
#include <iostream>
#include <boost/variant/variant.hpp>
#include <boost/variant/static_visitor.hpp>
#include <boost/variant/apply_visitor.hpp>
#include <boost/date_time/posix_time/ptime.hpp>
#include <boost/date_time/posix_time/posix_time_types.hpp>
#include <boost/date_time/posix_time/posix_time_io.hpp>
#include <boost/format.hpp>
const int iterations_count = 100000000;
// a visitor that increments a variant by N
template <int N>
struct add : boost::static_visitor<> {
template <typename T>
void operator() (T& t) const {
t += N;
}
};
// a visitor that increments a variant by N
template <typename T, typename V>
T get(const V& v) {
struct getter : boost::static_visitor<T> {
T operator() (T t) const { return t; }
};
return boost::apply_visitor(getter(), v);
}
// a number interface
struct number {
virtual void increment() = 0;
};
// number interface implementation for all types
template <typename T>
struct number_ : number {
number_(T t = 0) : t(t) {}
virtual void increment() { t += 1; }
T t;
};
int use_virtual() {
number_<int> num_int;
number* num = &num_int;
for (int i = 0; i < iterations_count; i++) {
num->increment();
}
return num_int.t;
}
int use_variant() {
typedef boost::variant<int, float, double> number;
number num = 0;
for (int i = 0; i < iterations_count; i++) {
boost::apply_visitor(add<1>(), num);
}
return get<int>(num);
}
int main() {
using namespace boost::posix_time;
ptime start, end;
time_duration d1, d2;
// virtual
start = microsec_clock::universal_time();
int i1 = use_virtual();
end = microsec_clock::universal_time();
// store result
d1 = end - start;
// variant
start = microsec_clock::universal_time();
int i2 = use_variant();
end = microsec_clock::universal_time();
// store result
d2 = end - start;
// output
std::cout <<
boost::format(
"Virtual: %1% = %2%\n"
"Variant: %3% = %4%\n"
) % d1 % i1 % d2 % i2;
}

Related

How to fix "segmentation fault (core dumped)" dependant on size

I created a class "config" that contains 12 bool values, organized in a std::array. The class has an "icing" function that returns a double value.
Trying to order a vector of 2^12 (4096) configs through a std:: sort (contained in #include ) using a predicate i have written, i get a segmentation fault error.
Shrinking the vector to 205 (not 1 more) eliminates the error, but I don't know why.
If i make the vector 4096 long, and try to sort only a little part, it works until the part is long 175+.
Shrinking the vector to for example around 1000, limits the partial sorting to around 20, before it gives the segmentation error.
#include <array>
#include <vector>
#include <algorithm>
#include <iostream>
using namespace std;
class config {
public:
config (){ //constructor, default
array<bool,12> t;
for (bool& b: t){
b=false;
}
val=t;
g=1;
}
config (const config& fro): val(fro.val){}; //copy constructor
array<bool,12> get_val(){ return val; } //returns the array
void set_tf(int n, bool tf){ val[n]=tf; } //sets a certain boolean in the array to false/true
void set_g(double d){ g=d; } //this sets the constant for calculation to a number
void print(){
cout<<"values: ";
for (auto b: val){ cout<<b<<" "; }
cout<<endl;
}
config & incr(int n=1){ //this increases the vector by 1 following the rules for binary numbers, but has the digits reversed
for(int j=0; j<n; j++){
int i=0;
bool out=false;
while(val[i]==true){
val[i]=false;
i++;
}
val[i]=true;
}
return *this;
}
double energy(){
int ct=0;
int cf=0;
for(auto b:val){ if(b==true){ ct++; } else { cf++; } }
return (abs(ct-cf));
}
double icing(){ //here is the "value" for ordering purposes
int n=0;
for(int i=0; i<11; i++){
if(val[i]!=val[i+1]){ n++; }
}
double temp=-g*n+this->energy();
return temp;
}
private:
array<bool,12> val;
double g;
};
bool pred (config c1, config c2){ return c1.icing()>c2.icing(); } //this sets the ordering predicate
template <typename T> //this orders the vector
void csort (vector <T>& in){
sort(in.begin(), in.end(), pred);
}
int main(){
vector<config> v;
for (int i=0; i<4096; i++){ //cicle that creates a vector of successive binaries
for(auto& c:v){
c.incr();
}
config t;
v.push_back(t);
}
sort(v.begin(), v.begin()+174, pred); //this gives seg.fault when 175+
csort(v); //this gives segmentation fault when the vec is 206 long or longer
}
I expected the code to order the vector, but it goes into segmentation fault.
Your program has undefined behaviour in sort function because your predicate takes config by value, so copies are made and in this place copy constructor is called which copies only array val, but not g.
bool pred (config c1, config c2){ return c1.icing()>c2.icing(); }
// takes by value, copy ctor is called
config (const config& fro): val(fro.val){}; // only val is copied, g HAS GARBAGE VALUE
// icing in pred uses g !! - stric weak ordering is violated because g has GARBAGE VALUE
Fix 1:
pass config by const config&:
bool pred (const config& c1, const config& c2){ return c1.icing()>c2.icing(); }
or fix 2:
g is initialized in copy constructor:
config (const config& fro): val(fro.val), g(fro.g){};

error: no matching function for call to 'swap'

I am trying to sort cakeTypes vector by the size of their weight. But getting the error in sort implementation.
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
class CakeType
{
public:
const unsigned int weight_;
const unsigned int value_;
CakeType(unsigned int weight = 0, unsigned int value = 0) :
weight_(weight),
value_(value)
{}
};
bool compareCakes(const CakeType& cake1, const CakeType& cake2) {
return cake1.weight_ < cake2.weight_;
}
unsigned long long maxDuffelBagValue(const std::vector<CakeType>& cakeTypes,
unsigned int weightCapacity)
{
// calculate the maximum value that we can carry
unsigned cakeTypesSize = cakeTypes.size();
unsigned long long valueCalculator[weightCapacity+1][cakeTypesSize+1];
for (unsigned int i = 0; i<=weightCapacity+1; i++) {
valueCalculator[i][0] = 0;
}
for (unsigned int i = 0; i<=cakeTypesSize+1; i++) {
valueCalculator[0][i] = 0;
}
vector<CakeType> sortedCakeTypes(cakeTypes);
sort(sortedCakeTypes.begin(), sortedCakeTypes.end(), compareCakes);
return 0;
}
This is part of there error:
exited with non-zero code (1).
In file included from solution.cc:1:
In file included from /usr/include/c++/v1/iostream:38:
In file included from /usr/include/c++/v1/ios:216:
In file included from /usr/include/c++/v1/__locale:15:
In file included from /usr/include/c++/v1/string:439:
/usr/include/c++/v1/algorithm:3856:17: error: no matching function for call to 'swap'
swap(*__first, *__last);
^~~~
I tried this solution sort() - No matching function for call to 'swap', but it is not the same issue.
Data type which is used by swap function in sort algorithm must be MoveAssignable, then you can perform operation like below
CakeType c1, c2;
c1 = move(c2); // <- move c2 to c1
But in your case CakeType has const data members. You can assign values to const data members only in constructors. Code cannot be compiled because default move/copy assignment operator can't be generated by this restriction (assignment to const member is illegal).
Remove const specifier from your class definition and code will work.
class CakeType
{
public:
unsigned int weight_;
unsigned int value_;
CakeType(unsigned int weight = 0, unsigned int value = 0) :
weight_(weight),
value_(value)
{}
};

UVA(820):Internet Bandwidth getting wrong answer?

I am trying to solve this Problem on UVA.The question is about finding the max-flow in the graph.I used Edmond-karp algorithm but I am continuously getting wrong answer.Can any one tell me what's wrong in my code ?
My code :
#include<bits/stdc++.h>
using namespace std;
#define MX 1000000007
#define LL long long
#define ri(x) scanf("%d",&x)
#define rl(x) scanf("%lld",&x)
#define len(x) x.length()
#define FOR(i,a,n) for(int i=a;i<n;i++)
#define FORE(i,a,n) for(int i=a;i<=n;i++)
template<class T1> inline T1 maxi(T1 a,T1 b){return a>b?a:b;}
template<class T2> inline T2 mini(T2 a,T2 b){return a<b?a:b;}
int parent[101],G[101][101],rG[101][101];
bool bfs(int s,int t,int n)
{
bool vis[n+2];
memset(parent,0,sizeof parent);
memset(vis,0,sizeof vis);
queue<int>Q;
Q.push(s);
vis[s]=true;
while(!Q.empty())
{
int fnt=Q.front();
Q.pop();
for(int v=1;v<=n;v++)
{
if(!vis[v] and G[fnt][v]>0)
{
vis[v]=true;
parent[v]=fnt;
Q.push(v);
}
}
}
return vis[t];
}
int main()
{
int n,tst=1;
ri(n);
while(n)
{
int s,t,c,flow=0;
ri(s),ri(t),ri(c);
FORE(i,1,c)
{
int x,y,z;
ri(x),ri(y),ri(z);
G[x][y]+=z;
G[y][x]+=z;
}
while(bfs(s,t,n))
{
int path=9999999;
for(int v=t;v!=s;v=parent[v])
{
int u=parent[v];
path=mini(path,G[u][v]);
}
for(int v=t;v!=s;v=parent[v])
{
int u=parent[v];
G[u][v]-=path;
G[v][u]+=path;
}
flow+=path;
}
printf("Network %d\nThe bandwidth is %d.\n\n", tst++, flow);
ri(n);
}
}
You push flow the other way around:
G[u][v]-=path;
G[v][u]+=path;
This should be:
G[u][v] += path;
G[v][u] -= path;
Also, I'm not sure about this part:
if(!vis[v] and G[fnt][v]>0)
[...]
path=mini(path,G[u][v]);
Because you are also allowed to take paths on which the flow is negative. You should not change G, which seems to be your capacities graph. Instead, you should have a matrix F that stores how much flow you send. Then your two conditions should be changed to:
if (!vis[v] && G[fnt][v] != F[fnt][v])
[...]
path = mini(path, G[u][v] - F[u][v])
And push flow on F, not G.
You seem to have thought about this since you declared a matrix rG, but you're never using it.
There might be other issues too. It's hard to tell without knowing what problems you're seeing.

c++11 insert into collection with a lambda functional map

It is kind of exasperating that std collections don't provide a functional map interface to fill a collection
std::vector< int > oldV = {1,3,5};
std::vector< int > newV = (oldV % [&](int v)-> int{ return v+1; });
newV.insert( oldV.begin(), oldV.end(), [&](int v)-> int{ return 2*v; });
Is there a simple header library that implements wrappers for functional style programming with std collections?
I don't see a way to do it such that it would apply both to things like std::vector and std::unordered_set without repeating the operator definition for each container. In the case of vector it would be like this:
#include <iostream>
#include <vector>
template <typename T, typename Lambda>
std::vector< T > operator |(const std::vector< T >& input, Lambda map)
{
std::vector< T > output;
for (const T& elem : input)
output.push_back( map(elem) );
return std::move(output);
};
int main()
{
std::vector< int > oldV = {1,3,5};
std::vector< int > newV = oldV | [&](int v) -> int { return v + 1; };
for(int i=0; i< newV.size() ; i++)
{
std::cout << newV[i] << std::endl;
}
};
For the case of std::unordered_set you would only have to replace push_back with insert
The pipe operator here has the same well known semantics as on Unix/Linux shells and some languages
You could use std::generate and std::transform to do this.

Save state of c++11 random generator without using iostream

What is the best way to store the state of a C++11 random generator without using the iostream interface. I would like to do like the first alternative listed here[1]? However, this approach requires that the object contains the PRNG state and only the PRNG state. In partucular, it fails if the implementation uses the pimpl pattern(at least this is likely to crash the application when reloading the state instead of loading it with bad data), or there are more state variables associated with the PRNG object that does not have to do with the generated sequence.
The size of the object is implementation defined:
g++ (tdm64-1) 4.7.1 gives sizeof(std::mt19937)==2504 but
Ideone http://ideone.com/41vY5j gives 2500
I am missing member functions like
size_t state_size();
const size_t* get_state() const;
void set_state(size_t n_elems,const size_t* state_new);
(1) shall return the size of the random generator state array
(2) shall return a pointer to the state array. The pointer is managed by the PRNG.
(3) shall copy the buffer std::min(n_elems,state_size()) from the buffer pointed to by state_new
This kind of interface allows more flexible state manipulation. Or are there any PRNG:s whose state cannot be represented as an array of unsigned integers?
[1]Faster alternative than using streams to save boost random generator state
I've written a simple (-ish) test for the approach I mentioned in the comments of the OP. It's obviously not battle-tested, but the idea is represented - you should be able to take it from here.
Since the amount of bytes read is so much smaller than if one were to serialize the entire engine, the performance of the two approaches might actually be comparable. Testing this hypothesis, as well as further optimization, are left as an exercise for the reader.
#include <iostream>
#include <random>
#include <chrono>
#include <cstdint>
#include <fstream>
using namespace std;
struct rng_wrap
{
// it would also be advisable to somehow
// store what kind of RNG this is,
// so we don't deserialize an mt19937
// as a linear congruential or something,
// but this example only covers mt19937
uint64_t seed;
uint64_t invoke_count;
mt19937 rng;
typedef mt19937::result_type result_type;
rng_wrap(uint64_t _seed) :
seed(_seed),
invoke_count(0),
rng(_seed)
{}
rng_wrap(istream& in) {
in.read(reinterpret_cast<char*>(&seed), sizeof(seed));
in.read(reinterpret_cast<char*>(&invoke_count), sizeof(invoke_count));
rng = mt19937(seed);
rng.discard(invoke_count);
}
void discard(unsigned long long z) {
rng.discard(z);
invoke_count += z;
}
result_type operator()() {
++invoke_count;
return rng();
}
static constexpr result_type min() {
return mt19937::min();
}
static constexpr result_type max() {
return mt19937::max();
}
};
ostream& operator<<(ostream& out, rng_wrap& wrap)
{
out.write(reinterpret_cast<char*>(&(wrap.seed)), sizeof(wrap.seed));
out.write(reinterpret_cast<char*>(&(wrap.invoke_count)), sizeof(wrap.invoke_count));
return out;
}
istream& operator>>(istream& in, rng_wrap& wrap)
{
wrap = rng_wrap(in);
return in;
}
void test(rng_wrap& rngw, int count, bool quiet=false)
{
uniform_int_distribution<int> integers(0, 9);
uniform_real_distribution<double> doubles(0, 1);
normal_distribution<double> stdnorm(0, 1);
if (quiet) {
for (int i = 0; i < count; ++i)
integers(rngw);
for (int i = 0; i < count; ++i)
doubles(rngw);
for (int i = 0; i < count; ++i)
stdnorm(rngw);
} else {
cout << "Integers:\n";
for (int i = 0; i < count; ++i)
cout << integers(rngw) << " ";
cout << "\n\nDoubles:\n";
for (int i = 0; i < count; ++i)
cout << doubles(rngw) << " ";
cout << "\n\nNormal variates:\n";
for (int i = 0; i < count; ++i)
cout << stdnorm(rngw) << " ";
cout << "\n\n\n";
}
}
int main(int argc, char** argv)
{
rng_wrap rngw(123456790ull);
test(rngw, 10, true); // this is just so we don't start with a "fresh" rng
uint64_t seed1 = rngw.seed;
uint64_t invoke_count1 = rngw.invoke_count;
ofstream outfile("rng", ios::binary);
outfile << rngw;
outfile.close();
cout << "Test 1:\n";
test(rngw, 10); // test 1
ifstream infile("rng", ios::binary);
infile >> rngw;
infile.close();
cout << "Test 2:\n";
test(rngw, 10); // test 2 - should be identical to 1
return 0;
}

Resources