Speeding up std:.vector filling with new c++11 std::async

Speeding up std:.vector filling with new c++11 std::async - c++11

i'm using vc++ 2013 express edition.
I'm studying some new feature of c++11 like std::async, and std::future.
I have a class Foo, with an std::shared_ptr<std::vector<unsigned int> >.
In Foo ctor i use std::make_shared to allocate in the heap the vector;
From Foo.h
Class Foo{
public:
Foo();
private:
std::shared_ptr<std::vector<unsigned int> > testVector;
unsigned int MAX_ITERATIONS = 800000000;
void fooFunction();
}
From Foo.cpp
Foo::Foo(){
testVector = std::make_shared<std::vector<unsigned int> >();
//fooFunction(); this take about 20 sec
std::async(std::launch::async, &Foo::fooFunction, this).get(); // and this about the same!!
}
void Foo:fooFunction(){
for (unsigned int i = 0; i < MAX_ITERATIONS; i++){
testVector->push_back(i);
}
}
Th problem is i can't see any gain between calling std::async(std::launch::async, &Foo::fooFunction, this).get(); and fooFunction();
Why??
Any help will be appreciated.
Best regards

std::async returns a std::future.
Calling get() on the future will make it wait until the result is available, then return it. So, even if it is run asynchronously, you are doing nothing but waiting for the result.
std::async doesn't magically parallelize the for loop in Foo::fooFunction.

Related

How to use std::bind properly with std::unique_ptr

I am trying to std::bind class functions in combination of std::unique_ptr and I have a lot of trouble getting it to work
First I have two classes
class simpleClass{
public:
simpleClass(int x){
this->simpleNumber = x;
}
int simpleNumber;
simpleClass(const simpleClass &toBeClone){
this->simpleNumber = toBeClone.simpleNumber;
}
simpleClass clone(){
simpleClass *cloned = new simpleClass(*this);
return *cloned;
}
};
class className{
public:
className(doube input){
this->someVariable = input;
}
void someFunction(std::vector<double> x, double c, std::unique_ptr<simpleClass> &inputClass, std::vector<double> &output){
std::vector<double> tempOutput;
for(int i = 0; i<x.size(); i++){
tempOutput.push_back(x[i] + c * this->someVariable + inputClass->simpleNumber);
}
output = tempOutput;
}
double someVariable;
className(const className &toBeClone){
this->someVariable = toBeClone.someVariable;
}
className clone(){
className *cloned = new className(*this);
return *cloned;
}
};
They are both some standard class, but I also implement a clone function to duplicate an initialized class. While cloning, I need to ensure that the original class and the cloned class points to different address. So I use std::unique_ptr to ensure this.
The is the main function, which also shows how I "clone"
int main(){
className testSubject(5);
std::vector<std::unique_ptr<className>> lotsOfTestSubject;
simpleClass easyClass(1);
std::vector<std::unique_ptr<simpleClass>> manyEasyClass;
for(int i = 0; i<10; i++){
std::unique_ptr<className> tempClass(new className(testSubject.clone()))
lotsOfTestSubject.push_back(std::move(tempClass));
std::unique_ptr<simpleClass> tempEasyClass(new simpleClass(easyClass.clone()))
manyEasyClass.push_back(std::move(tempEasyClass));
}
std::vector<std::vector<<double>> X; //already loaded with numbers
double C = 2;
std::vector<std::vector<<double>> OUT;
for(int i = 0; i<10; i++){
std::vector<double> tempOUT;
lotsOfTestSubject[i]->someFunction(X[i], C, manyEasyClass[i], tempOUT);
OUT.push_back(tempOUT);
//Here if I want to bind
/*
std::bind(&className::someFunction, lotsOfTestSubject[i], X[i], C, manyEasyClass[i], tempOUT);
*/
}
return 0;
}
The reason why I "clone" is because both simpleClass and className takes a lot of time for construction in my implementation, and I need a lot of them. And Since many of them will be initialized with the same parameters, I figured this is the easiest way to do so.
The code above works, but I am trying to improve the speed of the loop. The following line is where most of the computation takes place.
lotsOfTestSubject[i]->someFunction(X[i], C, manyEasyClass[i], tempOUT);
So I am attempting to use threads to delegate the work , and as far as I know, I need to std::bind first. So I tried
std::bind(&className::someFunction, lotsOfTestSubject[i], X[i], C, manyEasyClass[i], tempOUT);
But the compiler prints error like this
/usr/include/c++/5/tuple|206| recursively required from ‘constexpr std::_Tuple_impl<_Idx, _Head, _Tail ...>::_Tuple_impl(const _Head&, const _Tail& ...) [with long unsigned int _Idx = 1ul; _Head = std::vector<double>; _Tail = {double, std::unique_ptr<simpleClass, std::default_delete<simpleClass> >, std::unique_ptr<simpleClass, std::default_delete<simpleClass> >, std::vector<double>}]’|
/usr/include/c++/5/tuple|108|error: use of deleted function ‘std::unique_ptr<_Tp, _Dp>::unique_ptr(const std::unique_ptr<_Tp, _Dp>&) [with _Tp = className; _Dp = std::default_delete<className>]’|
I have no idea what this means as I just started self teaching c++. Any feedback and guidance is much appreciated.
I am using c++11 and g++ (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
Update
Thanks #rafix07, tried your solution and it works fine. but then I tried to do
auto theBinded = std::bind(&className::someFunction, &lotsOfTestSubject[i],
X[i], C, std::ref(manyEasyClass[i]), tempOUT);
std::thread testThread(theBinded);
and eventually want to testThread.join()
But the compiler says
error: pointer to member type ‘void (className::)(std::vector<double>, double, std::unique_ptr<simpleClass>&, std::vector<double>&)’ incompatible with object type ‘std::unique_ptr<className>’|
#kmdreko Thanks for you point out! I haven't notice memory leak yet, but I will fix it. Do I just use this?
std::unique_ptr<className> tempClass = new className(testSubject);

EDIT
If you want to call someFunction on instance stored in lotsOfTestSubject you need to pass pointer to className object on which this method will be called, so the line below
std::bind(&className::someFunction, lotsOfTestSubject[i]
should be replaced by:
auto theBinded = std::bind(&className::someFunction, lotsOfTestSubject[i].get(),
^^^
Second change is to use std::ref to pass original instance of unique_ptr of manyEasyClass instead of its copy. std::bind always copies or moved its arguments (see reference), but unique_ptr is non-copyable, that is why compilation failed.
So fixed line looks:
auto theBinded = std::bind(&className::someFunction, lotsOfTestSubject[i].get(),
X[i], C, std::ref(manyEasyClass[i]), std::ref(tempOUT));
tempOUT also must be passed by std::ref because you want to modify this vector by call operator() on functor created by bind.
LIVE DEMO

How to avoid C++ code bloat issued by template instantiation and symbol table?

I'd started a bare-metal (Cortex-M) project some years ago. At project setup we decided to use gcc toolchain with C++11 / C++14 etc. enabled and even for using C++ exceptions and rtti.
We are currently using gcc 4.9 from launchpad.net/gcc-arm-embedded (having some issue which prevent us currently to update to a more recent gcc version).
For example, I'd wrote a base class and a derived class like this (see also running example here):
class OutStream {
public:
explicit OutStream() {}
virtual ~OutStream() {}
OutStream& operator << (const char* s) {
write(s, strlen(s));
return *this;
}
virtual void write(const void* buffer, size_t size) = 0;
};
class FixedMemoryStream: public OutStream {
public:
explicit FixedMemoryStream(void* memBuffer, size_t memBufferSize): memBuffer(memBuffer), memBufferSize(memBufferSize) {}
virtual ~FixedMemoryStream() {}
const void* getBuffer() const { return memBuffer; }
size_t getBufferSize() const { return memBufferSize; }
const char* getText() const { return reinterpret_cast<const char*>(memBuffer); } ///< returns content as zero terminated C-string
size_t getSize() const { return index; } ///< number of bytes really written to the buffer (max = buffersize-1)
bool isOverflow() const { return overflow; }
virtual void write(const void* buffer, size_t size) override { /* ... */ }
private:
void* memBuffer = nullptr; ///< buffer
size_t memBufferSize = 0; ///< buffer size
size_t index = 0; ///< current write index
bool overflow = false; ///< flag if we are overflown
};
So that the customers of my class are now able to use e.g.:
char buffer[10];
FixedMemoryStream ms1(buffer, sizeof(buffer));
ms1 << "Hello World";
Now I'd want to make the usage of the class a bit more comfortable and introduced the following template:
template<size_t bufferSize> class FixedMemoryStreamWithBuffer: public FixedMemoryStream {
public:
explicit FixedMemoryStreamWithBuffer(): FixedMemoryStream(buffer, bufferSize) {}
private:
uint8_t buffer[bufferSize];
};
And from now, my customers can write:
FixedMemoryStreamWithBuffer<10> ms2;
ms2 << "Hello World";
But from now, I'd observed increasing size of my executable binary. It seems that gcc added symbol information for each different template instantiation of FixedMemoryStreamWithBuffer (because we are using rtti for some reason).
Might there be a way to get rid of symbol information only for some specific classes / templates / template instantiations?
It's ok to get a non portable gcc only solution for this.
For some reason we decided to prefer templates instead of preprocessor macros, I want to avoid a preprocessor solution.

First of all, keep in mind that compiler also generates separate v-table (as well as RTTI information) for every FixedMemoryStreamWithBuffer<> type instance, as well as every class in the inheritance chain.
In order to resolve the problem I'd recommend using containment instead of inheritance with some conversion function and/or operator inside:
template<size_t bufferSize>
class FixedMemoryStreamWithBuffer
{
uint8_t buffer[bufferSize];
FixedMemoryStream m_stream;
public:
explicit FixedMemoryStreamWithBuffer() : m_stream(m_buffer, bufferSize) {}
operator FixedMemoryStream&() { return m_stream; }
FixedMemoryStream& toStream() { return m_stream; }
};

Yes, there's a way to bring the necessary symbols almost down to 0: using the standard library. Your OutStream class is a simplified version of std::basic_ostream. Your OutStream::write is really just std::basic_ostream::write and so on. Take a look at it here. Overflow is handled really closely, though, for completeness' sake, it also deals with underflow i.e. the need for data retrieval; you may leave it as undefined (it's virtual too).
Similarly, your FixedMemoryStream is std::basic_streambuf<T> with a fixed-size (a std::array<T>) get/put area.
So, just make your classes inherit from the standard ones and you'll cut off on binary size since you're reusing already declared symbols.
Now, regarding template<size_t bufferSize> class FixedMemoryStreamWithBuffer. This class is very similar to std::array<std::uint8_t, bufferSize> as for the way memory is specified and acquired. You can't optimize much about that: each instantiation is a different type with all what that implies. The compiler cannot "merge" or do anything magic about them: each instantiation must have its own type.
So either fall back on std::vector or have some fixed-size specialized chunks, like 32, 128 etc. and for any values in between would choose the right one; this can be achieved entirely at compile-time, so no runtime cost.

Unexpected invocation of deleted move constructor by gcc

I'm trying to write a very simple array class with a function that returns a subsection of itself. It is easier to show it than to explain...
template<typename T>
class myArrayType
{
// Constructor; the buffer pointed to by 'data' must be held
// elsewhere and remain valid for the lifetime of the object
myArrayType(int size, T* data) : n(size), p(data)
{
}
// A move constructor and assign operator wouldn't make
//much sense for this type of object:
#ifndef _MSC_VER
myArrayType(myArrayType<T> &&source) = delete;
myArrayType & operator=(myArrayType<T> &&source) && = delete;
#else
#if _MSC_VER >= 2000
myArrayType(myArrayType<T> &&source) = delete;
myArrayType & operator=(myArrayType<T> &&source) && = delete;
#endif
// Earlier versions of Visual C++ do not generate default move members
#endif
// Various whole-array operations, which is the main reason for wanting to do this:
myArrayType & operator+=(const myArrayType &anotherArray) & noexcept
{
for (int i=0; i<n; ++i) p[i] += anotherArray.p[i];
return *this;
}
// etc.
// The interesting bit: create a new myArrayType object which is
// a subsection of this one and shares the same memory buffer
myArrayType operator()(int firstelement, int lastelement) noexcept
{
myArrayType newObject;
newObject.p = &p[firstelement];
newObject.n = lastelement - firstelement + 1;
return newObject;
}
private:
T* p;
int n;
}
What I'd like to do, of course, is to be able to write:
double aBigBlobOfMemory[1000]; // Keep it on the stack
myArrayType<double> myArray(1000, aBigBlobOfMemory);
myArrayType<double> mySmallerArray = myArray(250, 750);
...so that 'mySmallerArray' is a fully-formed myArrayType object which contains a pointer to a subset of myArray's memory.
In Visual Studio 2013 this seems to work (or at least, it compiles), but in gcc it fails in a way that I don't understand. The compiler error on the attempted creation of mySmallerArray is:
use of deleted function myArrayType(myArrayType<T> &&)
...with a caret pointing to the end of the line. In other words, gcc seems to think that in invoking the 'subarray operator' I'm actually trying to invoke a move constructor, but I can't for the life of me see where it would want to use one, or why.
Am I missing something really really obvious, or can anyone shed some light on this?

gcc is doing the right thing.
From operator() you are returning newObject, an instance of myArrayType. This has to be moved into the variable mySmallerArray. That's done with a move constructor, which you don't have.
You need to declare a move constructor.
It does make sense for this class to have a move constructor - it can move the pointer p from the existing instance to the new one.

Overloaded "operator new" wants to see the type it's allocating

This seems like it ought to be obvious, but I'm blanking on it. I have
class SimpleMemoryPool {
char buffer[10000];
size_t idx;
void *Alloc(size_t nbytes) { idx += nbytes; return &buffer[idx - nbytes]; }
};
inline void* operator new (size_t size, SimpleMemoryPool& pool)
{
return pool.Alloc(size);
}
inline void* operator new[] (size_t size, SimpleMemoryPool& pool)
{
return pool.Alloc(size);
}
The idea is that I can allocate new objects out of my SimpleMemoryPool and then they'll all be "released" when the SimpleMemoryPool is destroyed:
void foo()
{
SimpleMemoryPool pool;
int *arr = new (pool) int[10];
double *arr2 = new (pool) double(3.14);
...do things with arr and arr2...
return; // and arr, arr2 are "released" at this point
}
One nitpick I've simplified away: The above code is sketchy because the double won't be 8-byte-aligned. Don't worry about that; my real SimpleMemoryPool code returns maxaligned chunks.
Here's the next thing you're probably thinking at this point: "Who calls the destructors?!" I.e., if I accidentally write
std::string *arr3 = new (pool) std::string;
then I'm in a world of hurt, because the compiler will generate a call to std::string::string() for me, but nobody will ever call std::string::~string(). Memory leaks, bad stuff follows.
This is the problem I want to solve. What I want to do is basically
class SimpleMemoryPool {
...
// (std::enable_if omitted for brevity)
template<typename T, typename... Args>
T *New(Args... args) {
static_assert(std::is_trivially_destructible<T>::value, "T must be trivially destructible!");
void *ptr = this->Alloc(nelem * sizeof (T));
return new (ptr) T(std::forward<Args>(args)...);
}
template<typename ArrayT>
auto NewArray(size_t nelem) -> std::remove_extent<ArrayT>::type {
typedef typename std::remove_extent<ArrayT>::type T;
static_assert(std::is_trivially_destructible<T>::value, "T must be trivially destructible!");
void *ptr = this->Alloc(nelem * sizeof (T));
return new (ptr) T[ nelem ];
}
};
...
int *arr = pool.NewArray<int>(10);
double *arr2 = pool.New<double>(3.14);
std::string *arr3 = pool.New<string>(); // fails the static_assert, hooray!
The problem with this approach is that it's ugly. It looks bad, and it invites later maintainers to come along and "fix" the code by adding a "proper" operator new, at which point we lose the safety of the static_assert.
Is there any way to get the best of both worlds — type-safety via the static_assert, and also a nice syntax?
You may assume C++11. I also welcome C++14 answers, even though they won't be immediately useful to me.
Adding a member operator new to all my classes (in this example int and double) is not acceptable. Whatever I do has to work out-of-the-box without changing a million lines of code.
This is probably a duplicate of Get type of object being allocated in operator new but I'd still like answers tailored to this particular use-case. There might be some nice idiom of which I'm not aware.

const list, non-const element access

I've a problem with boost intrusive containers.
One of my classes has an intrusive list of some objects, whose lifetimes are strictly managed by it. The objects themselves are meant to be modified by the users of the class, but they are not supposed to modify the list itself. That's why I'm only providing access to the list through a "getList" function, which returns a const version of the intrusive list.
The problem with const intrusive lists is that the elements also turn out to be const when you're trying to iterate through them. But the users should be able to iterate through and modify the items.
I don't want to keep a separate list of pointers to give to the users, because that would invalidate one of the biggest advantages of using intrusive containers. Namely, the ability to remove items from the container in constant time, while the only thing you have is a pointer to the item.
It would be sad to have to give a non-const version of my list just because of a limitation of C++. So the question is: Is there a special const version of the boost intrusive containers, which magically allows item modifications while disallowing any modifications on the list itself?

You don't need to return a list, give an access to separate items by reference

OK, I've designed a complete solution to the problem. Andy's solution is nice, if you don't need to iterate over the items in an efficient manner. But I wanted something that's semantically equivalent to const std::list. Maybe it's an overkill, but performance wise there's almost no difference after optimizations:
The solution is to privately extend the intrusive list with a class called ConstList, which exposes just enough to let BOOST_FOREACH iterate, but not to make any changes by anyone. I've moved the list hook from the item to a child class, so that an item object cannot be used to change the list either. We're storing the child class with the hook, but our iterators are returning references to the item class. I've coded this solution into two templated classes for easy application to any item class.
I've made a header file with the ConstList class and the HookedItem class, followed by the tests.cpp, used to test and benchmark. You'll see that our ConstList class has equal performance while iterating.
It works quite cleanly, and the user code also stays clean. Then this begs the question: Why isn't this already in boost????!?!?
Feel free to use the following code for any purpose :)
P.S: I've had a moment of revelation while coming up with this solution: "const" is nothing but a syntactic sugar for a special case of what you can already achieve with the right class hierarchy. Is that true, or was I over-generalizing?
------------------ ConstList.h -----------------------
#include <boost/intrusive/list_hook.hpp>
template < typename T>
struct type_wrapper{ typedef T type;};
template<class listType, class owner, class item>
class ConstList: private listType {
friend class type_wrapper<owner>::type;
public:
class iterator {
typename listType::iterator it;
public:
typedef std::forward_iterator_tag iterator_category;
typedef item value_type;
typedef int difference_type;
typedef item* pointer;
typedef item& reference;
template<class T>
iterator(const T it): it(it){}
bool operator==(iterator & otherIt) {return it==otherIt.it;}
iterator & operator++() {
it++;
return *this;
}
item & operator*() {
return *it;
}
};
iterator begin() {
return iterator(listType::begin());
}
iterator end() {
return iterator(listType::end());
}
};
template<class item, class owner, class hooktype>
class HookedItem: public item {
friend class type_wrapper<owner>::type;
public:
hooktype hook_;
typedef boost::intrusive::member_hook<HookedItem, hooktype, &HookedItem::hook_> MemberHookOption;
private:
template<class Arg1, class Arg2>
HookedItem(Arg1 &arg1, Arg2 &arg2): item(arg1, arg2){}
};
------------------ tests.cpp -----------------------
#include<cstdio>
#include<boost/checked_delete.hpp>
#include<ConstList.h>
#include<boost/intrusive/list.hpp>
#include<boost/foreach.hpp>
using namespace boost::intrusive;
class myOwner;
class myItem {
public:
int a,b; //arbitrary members
myItem(int a, int b): a(a), b(b){};
};
typedef HookedItem<myItem,myOwner,list_member_hook<> > myHookedItem;
typedef list<myHookedItem, typename myHookedItem::MemberHookOption> myItemList;
typedef ConstList<myItemList,myOwner,myItem> constItemList;
class myOwner {
public:
constItemList constList;
myItemList & nonConstList;
myOwner(): nonConstList(constList) {}
constItemList & getItems() { return constList;}
myItem * generateItem(int a, int b) {
myHookedItem * newItem = new myHookedItem(a,b);
nonConstList.push_back(*newItem);
return newItem;
}
~myOwner() {nonConstList.clear_and_dispose(boost::checked_delete<myHookedItem>);}
};
int main(int argc, char **argv) {
myOwner owner;
int avoidOptimization=0;
for(int i=0; i<1000000; i++) {
owner.generateItem(i,i);
}
clock_t start = clock();
for(int i=0; i<1000; i++)
BOOST_FOREACH(myItem & item, owner.constList)
avoidOptimization+=item.a;
printf ( "%f\n", ( (double)clock() - start ) / CLOCKS_PER_SEC );
start = clock();
for(int i=0; i<1000; i++)
BOOST_FOREACH(myHookedItem & item, owner.nonConstList)
avoidOptimization+=item.a;
printf ( "%f\n", ( (double)clock() - start ) / CLOCKS_PER_SEC );
printf ("%d",avoidOptimization);
return 0;
}
------------ Console Output -----------------
4.690000
4.700000
1764472320

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Speeding up std:.vector filling with new c++11 std::async - c++11

std::async returns a std::future. Calling get() on the future will make it wait until the result is available, then return it. So, even if it is run asynchronously, you are doing nothing but waiting for the result. std::async doesn't magically parallelize the for loop in Foo::fooFunction.

Related

How to use std::bind properly with std::unique_ptr

How to avoid C++ code bloat issued by template instantiation and symbol table?

Unexpected invocation of deleted move constructor by gcc

Overloaded "operator new" wants to see the type it's allocating

const list, non-const element access

Categories

Resources