The regular std::vector has emplace_back which avoid an unnecessary copy. Is there a reason spsc_queue doesn't support this? Is it impossible to do emplace with lock-free queues for some reason?
I'm not a boost library implementer nor maintainer, so the rationale behind why not to include an emplace member function is beyond my knowledge, but it isn't too difficult to implement it yourself if you really need it.
The spsc_queue has a base class of either compile_time_sized_ringbuffer or runtime_sized_ringbuffer depending on if the size of the queue is known at compilation or not. These two classes maintain the actual buffer used with the obvious differences between a dynamic buffer and compile-time buffer, but delegate, in this case, their push member functions to a common base class - ringbuffer_base.
The ringbuffer_base::push function is relatively easy to grok:
bool push(T const & t, T * buffer, size_t max_size)
{
const size_t write_index = write_index_.load(memory_order_relaxed); // only written from push thread
const size_t next = next_index(write_index, max_size);
if (next == read_index_.load(memory_order_acquire))
return false; /* ringbuffer is full */
new (buffer + write_index) T(t); // copy-construct
write_index_.store(next, memory_order_release);
return true;
}
An index into the location where the next item should be stored is done with a relaxed load (which is safe since the intended use of this class is single producer for the push calls) and gets the appropriate next index, checks to make sure everything is in-bounds (with a load-acquire for appropriate synchronization with the thread that calls pop) , but the main statement we're interested in is:
new (buffer + write_index) T(t); // copy-construct
Which performs a placement new copy construction into the buffer. There's nothing inherently thread-unsafe about passing around some parameters to use to construct a T directly from viable constructor arguments. I wrote the following snippet and made the necessary changes throughout the derived classes to appropriately delegate the work up to the base class:
template<typename ... Args>
std::enable_if_t<std::is_constructible<T,Args...>::value,bool>
emplace( T * buffer, size_t max_size,Args&&... args)
{
const size_t write_index = write_index_.load(memory_order_relaxed); // only written from push thread
const size_t next = next_index(write_index, max_size);
if (next == read_index_.load(memory_order_acquire))
return false; /* ringbuffer is full */
new (buffer + write_index) T(std::forward<Args>(args)...); // emplace
write_index_.store(next, memory_order_release);
return true;
}
Perhaps the only difference is making sure that the arguments passed in Args... can actually be used to construct a T, and of course doing the emplacement via std::forward instead of a copy construction.
Related
So my problem sounds like this.
I have some platform dependent code (embedded system) which writes to some MMIO locations that are hardcoded at specific addresses.
I compile this code with some management code inside a standard executable (mainly for testing) but also for simulation (because it takes longer to find basic bugs inside the actual HW platform).
To alleviate the hardcoded pointers, i just redefine them to some variables inside the memory pool. And this works really well.
The problem is that there is specific hardware behavior on some of the MMIO locations (w1c for example) which makes "correct" testing hard to impossible.
These are the solutions i thought of:
1 - Somehow redefine the accesses to those registers and try to insert some immediate function to simulate the dynamic behavior. This is not really usable since there are various ways to write to the MMIO locations (pointers and stuff).
2 - Somehow leave the addresses hardcoded and trap the illegal access through a seg fault, find the location that triggered, extract exactly where the access was made, handle and return. I am not really sure how this would work (and even if it's possible).
3 - Use some sort of emulation. This will surely work, but it will void the whole purpose of running fast and native on a standard computer.
4 - Virtualization ?? Probably will take a lot of time to implement. Not really sure if the gain is justifiable.
Does anyone have any idea if this can be accomplished without going too deep? Maybe is there a way to manipulate the compiler in some way to define a memory area for which every access will generate a callback. Not really an expert in x86/gcc stuff.
Edit: It seems that it's not really possible to do this in a platform independent way, and since it will be only windows, i will use the available API (which seems to work as expected). Found this Q here:
Is set single step trap available on win 7?
I will put the whole "simulated" register file inside a number of pages, guard them, and trigger a callback from which i will extract all the necessary info, do my stuff then continue execution.
Thanks all for responding.
I think #2 is the best approach. I routinely use approach #4, but I use it to test code that is running in the kernel, so I need a layer below the kernel to trap and emulate the accesses. Since you have already put your code into a user-mode application, #2 should be simpler.
The answers to this question may provide help in implementing #2. How to write a signal handler to catch SIGSEGV?
What you really want to do, though, is to emulate the memory access and then have the segv handler return to the instruction after the access. This sample code works on Linux. I'm not sure if the behavior it is taking advantage of is undefined, though.
#include <stdint.h>
#include <stdio.h>
#include <signal.h>
#define REG_ADDR ((volatile uint32_t *)0x12340000f000ULL)
static uint32_t read_reg(volatile uint32_t *reg_addr)
{
uint32_t r;
asm("mov (%1), %0" : "=a"(r) : "r"(reg_addr));
return r;
}
static void segv_handler(int, siginfo_t *, void *);
int main()
{
struct sigaction action = { 0, };
action.sa_sigaction = segv_handler;
action.sa_flags = SA_SIGINFO;
sigaction(SIGSEGV, &action, NULL);
// force sigsegv
uint32_t a = read_reg(REG_ADDR);
printf("after segv, a = %d\n", a);
return 0;
}
static void segv_handler(int, siginfo_t *info, void *ucontext_arg)
{
ucontext_t *ucontext = static_cast<ucontext_t *>(ucontext_arg);
ucontext->uc_mcontext.gregs[REG_RAX] = 1234;
ucontext->uc_mcontext.gregs[REG_RIP] += 2;
}
The code to read the register is written in assembly to ensure that both the destination register and the length of the instruction are known.
This is how the Windows version of prl's answer could look like:
#include <stdint.h>
#include <stdio.h>
#include <windows.h>
#define REG_ADDR ((volatile uint32_t *)0x12340000f000ULL)
static uint32_t read_reg(volatile uint32_t *reg_addr)
{
uint32_t r;
asm("mov (%1), %0" : "=a"(r) : "r"(reg_addr));
return r;
}
static LONG WINAPI segv_handler(EXCEPTION_POINTERS *);
int main()
{
SetUnhandledExceptionFilter(segv_handler);
// force sigsegv
uint32_t a = read_reg(REG_ADDR);
printf("after segv, a = %d\n", a);
return 0;
}
static LONG WINAPI segv_handler(EXCEPTION_POINTERS *ep)
{
// only handle read access violation of REG_ADDR
if (ep->ExceptionRecord->ExceptionCode != EXCEPTION_ACCESS_VIOLATION ||
ep->ExceptionRecord->ExceptionInformation[0] != 0 ||
ep->ExceptionRecord->ExceptionInformation[1] != (ULONG_PTR)REG_ADDR)
return EXCEPTION_CONTINUE_SEARCH;
ep->ContextRecord->Rax = 1234;
ep->ContextRecord->Rip += 2;
return EXCEPTION_CONTINUE_EXECUTION;
}
So, the solution (code snippet) is as follows:
First of all, i have a variable:
__attribute__ ((aligned (4096))) int g_test;
Second, inside my main function, i do the following:
AddVectoredExceptionHandler(1, VectoredHandler);
DWORD old;
VirtualProtect(&g_test, 4096, PAGE_READWRITE | PAGE_GUARD, &old);
The handler looks like this:
LONG WINAPI VectoredHandler(struct _EXCEPTION_POINTERS *ExceptionInfo)
{
static DWORD last_addr;
if (ExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_GUARD_PAGE_VIOLATION) {
last_addr = ExceptionInfo->ExceptionRecord->ExceptionInformation[1];
ExceptionInfo->ContextRecord->EFlags |= 0x100; /* Single step to trigger the next one */
return EXCEPTION_CONTINUE_EXECUTION;
}
if (ExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_SINGLE_STEP) {
DWORD old;
VirtualProtect((PVOID)(last_addr & ~PAGE_MASK), 4096, PAGE_READWRITE | PAGE_GUARD, &old);
return EXCEPTION_CONTINUE_EXECUTION;
}
return EXCEPTION_CONTINUE_SEARCH;
}
This is only a basic skeleton for the functionality. Basically I guard the page on which the variable resides, i have some linked lists in which i hold pointers to the function and values for the address in question. I check that the fault generating address is inside my list then i trigger the callback.
On first guard hit, the page protection will be disabled by the system, but i can call my PRE_WRITE callback where i can save the variable state. Because a single step is issued through the EFlags, it will be followed immediately by a single step exception (which means that the variable was written), and i can trigger a WRITE callback. All the data required for the operation is contained inside the ExceptionInformation array.
When someone tries to write to that variable:
*(int *)&g_test = 1;
A PRE_WRITE followed by a WRITE will be triggered,
When i do:
int x = *(int *)&g_test;
A READ will be issued.
In this way i can manipulate the data flow in a way that does not require modifications of the original source code.
Note: This is intended to be used as part of a test framework and any penalty hit is deemed acceptable.
For example, W1C (Write 1 to clear) operation can be accomplished:
void MYREG_hook(reg_cbk_t type)
{
/** We need to save the pre-write state
* This is safe since we are assured to be called with
* both PRE_WRITE and WRITE in the correct order
*/
static int pre;
switch (type) {
case REG_READ: /* Called pre-read */
break;
case REG_PRE_WRITE: /* Called pre-write */
pre = g_test;
break;
case REG_WRITE: /* Called after write */
g_test = pre & ~g_test; /* W1C */
break;
default:
break;
}
}
This was possible also with seg-faults on illegal addresses, but i had to issue one for each R/W, and keep track of a "virtual register file" so a bigger penalty hit. In this way i can only guard specific areas of memory or none, depending on the registered monitors.
This is a class which contains image data.
class MyMat
{
public:
int width, height, format;
uint8_t *data;
}
I want to design MyMat with automatic memory management. The image data could be shared among many objects.
Common APIs which I'm going to design:
+) C++ 11
+) Assignment : share data
MyMat a2(w, h, fmt);
.................
a2 = a1;
+) Accessing data should be simple and short.
Can use raw pointer directly.
In general, I want to design MyMat like as OpenCV cv::Mat
Could you suggest me a proper design ?
1) Using std::vector<uint8_t> data
I have to write some code to remove copy constructor and assignment operator because someone can call them and causes memory copy.
The compiler must support copy ellision and return value optimization.
Always using move assignment and passing by reference are inconvenient
a2 = std::move(a1)
void test(MyMat &mat)
std::queue<MyMat> lists;
lists.push_back(std::move(a1))
..............................
2) Use share_ptr<uint8_t> data
Following this guideline http://www.codingstandard.com/rule/17-3-4-do-not-create-smart-pointers-of-array-type/,
we shouldn't create smart pointers of array type.
3) Use share_ptr< std::vector<uint8_t> > data
To access data, use *(a1.data)[0], the syntax is very inconvenient
4) Use raw pointer, uint8_t *data
Write proper constructor and destructor for this class.
To make automatic memory management, use smart pointer.
share_ptr<MyMat> mat
std::queue< share_ptr<MyMat> > lists;
Matrix classes are normally expected to be a value type with deep copying. So, stick with std::vector<uint8_t> and let the user decide whether copy is expensive or not in their specific context.
Instead of raw pointers for arrays prefer std::unique_ptr<T[]> (note the square brackets).
std::array - fixed length in-place buffer (beautified array)
std::vector - variable length buffer
std::shared_ptr - shared ownership data
std::weak_ptr - expiring view on shared data
std::unique_ptr - unique ownership
std::string_view, std::span, std::ref, &, * - reference to data with no assumption of ownership
Simplest design is to have a single owner and RAII-forced life time ensuring everything that needs to be alive at certain time is alive and needs no other ownership, so generally I'd see if I could live std::unique_ptr<T> before complicating further (unless I can fit all my data on the stack, then I don't even need a unique_ptr).
On a side note - shared pointers are not free, they need dynamic memory allocation for the shared state (two allocations if done incorrectly :) ), whereas unique pointers are true "zero" overhead RAII.
Matrixes should use value semantics, and they should be nearly free to move.
Matrixes should support a view type as well.
There are two approaches for a basic Matrix that make sense.
First, a Matrix type that wraps a vector<T> with a stride field. This has an overhead of 3 instead of 2 pointers (or 1 pointer and a size) compared to a hand-rolled one. I don't consider that significant; the ease of debugging a vector<T> etc makes it more than worth that overhead.
In this case you'd want to write a separate MatrixView.
I'd use CRTP to create a common base class for both to implement operator[] and stride fields.
A distinct basic Matrix approach is to make your Matrix immutable. In this case, the Matrix wraps a std::shared_ptr<T const> and a std::shared_ptr<std::mutex> and (local, or stored with the mutex) width, height and stride field.
Copying such a Matrix just duplciates handles.
Modifying such a Matrix causes you to acquire the std::mutex, then check that shared_ptr<T const> has a use_count()==1. If it does, you cast-away const and modify the data referred to in the shared_ptr. If it does not, you duplicate the buffer, create a new mutex, and operate on the new state.
Here is a copy on write matrix buffer:
template<class T>
struct cow_buffer {
std::size_t rows() const { return m_rows; }
std::size_t cols() const { return m_cols; }
cow_buffer( T const* in, std::size_t rows, std::size_t cols, std::size_t stride ) {
copy_in( in, rows, cols, stride );
}
void copy_in( T const* in, std::size_t rows, std::size_t cols, std::size_t stride ) {
// note it isn't *really* const, this matters:
auto new_data = std::make_shared<T[]>( rows*cols );
for (std::size_t i = 0; i < rows; ++i )
std::copy( in+i*stride, in+i*m_stride+m_cols, new_data.get()+i*m_cols );
m_data = new_data;
m_rows = rows;
m_cols = cols;
m_stride = cols;
m_lock = std::make_shared<std::mutex>();
}
template<class F>
decltype(auto) read( F&& f ) const {
return std::forward<F>(f)( m_data.get() );
}
template<class F>
decltype(auto) modify( F&& f ) {
auto lock = std::unique_lock<std::mutex>(*m_lock);
if (m_data.use_count()==1) {
return std::forward<F>(f)( const_cast<T*>(m_data.get()) );
}
auto old_data = m_data;
copy_in( old_data.get(), m_rows, m_cols, m_stride );
return std::forward<F>(f)( const_cast<T*>(m_data.get()) );
}
explicit operator bool() const { return m_data && m_lock; }
private:
std::shared_ptr<T> m_data;
std::shared_ptr<std::mutex> m_lock;
std::size_t m_rows = 0, m_cols = 0, m_stride = 0;
};
something like that.
The mutex is required to ensure synchonization between multiple threads who are sole owners modifying m_data and the data from the previous write not being synchronzied with the current one.
This issue is my misunderstanding of how the standard is using my custom allocator. I have a stateful allocator that keeps a vector of allocated blocks. This vector is pushed into when allocating and searched through during de-allocation.
From my debugging it appears that different instances of my object (this*'s differ) are being called on de-allocation. An example may be that MyAllocator (this* = 1) is called to allocate 20 bytes, then some time later MyAllocator (this* = 2) is called to de-allocate the 20 bytes allocated earlier. Abviously the vector in MyAllocator (this* = 2) doesn't contain the 20 byte block allocated by the other allocator so it fails to de-allocate. My understanding was that C++11 allows stateful allocators, what's going on and how do i fix this?
I already have my operator == set to only return true when this == &rhs
pseudo-code:
template<typename T>
class MyAllocator
{
ptr allocate(int n)
{
...make a block of size sizeof(T) * n
blocks.push_back(block);
return (ptr)block.start;
}
deallocate(ptr start, int n)
{
/*This fails because the the block array is not the
same and so doesn't find the block it wants*/
std::erase(std::remove_if(blocks.begin,blocks.end, []()
{
return block.start >= (uint64_t)ptr && block.end <= ((uint64_t)ptr + sizeof(T)*n);
}), blocks.end);
}
bool operator==(const MyAllocator& rhs)
{
//my attempt to make sure internal states are same
return this == &rhs;
}
private:
std::vector<MemoryBlocks> blocks;
}
Im using this allocator for an std::vector, on gcc. So as far as i know no weird rebind stuff is going on
As #Igor mentioned, allocators must be copyable. Importantly though they must share their state between copies, even AFTER they have been copied from. In this case the fix was easy, i made the blocks vector a shared_ptr as suggested and then now on copy all the updates to that vector occur to the same vector, since they all point to the same thing.
In many places i have code for one time initialization as below
int callback_method(void * userData)
{
/* This piece of code should run one time only */
static int init_flag = 0;
if (0 == init_flag)
{
/* do initialization stuff here */
init_flag = 1;
}
/* Do regular stuff here */
return 0;
}
just now I started using c++11. Is there a better way to replace the one time code with c++11 lambda or std::once functionality or any thing else?.
You can encapsulate your action to static function call or use something like Immediately invoked function expression with C++11 lambdas:
int action() { /*...*/ }
...
static int temp = action();
and
static auto temp = [](){ /*...*/ }();
respectively.
And yes, the most common solution is to use std::call_once, but sometimes it's a little bit overkill (you should use special flag with it, returning to initial question).
With C++11 standard all of these approaches are thread safe (variables will be initialized once and "atomically"), but you still must avoid races inside action if some shared resources used here.
Yes - std::call_once, see the docs on how to use this: http://en.cppreference.com/w/cpp/thread/call_once
In N3059 I found the description of piecewise construction of pairs (and tuples) (and it is in the new Standard).
But I can not see when I should use it. I found discussions about emplace and non-copyable entities, but when I tried it out, I could not create a case where I need piecewiese_construct or could see a performance benefit.
Example. I thought I need a class which is non-copyable, but movebale (required for forwarding):
struct NoCopy {
NoCopy(int, int) {};
NoCopy(const NoCopy&) = delete; // no copy
NoCopy& operator=(const NoCopy&) = delete; // no assign
NoCopy(NoCopy&&) {}; // please move
NoCopy& operator=(NoCopy&&) {}; // please move-assign
};
I then sort-of expected that standard pair-construction would fail:
pair<NoCopy,NoCopy> x{ NoCopy{1,2}, NoCopy{2,3} }; // fine!
but it did not. Actually, this is what I'd expected anyway, because "moving stuff around" rather then copying it everywhere in the stdlib, is it should be.
Thus, I see no reason why I should have done this, or so:
pair<NoCopy,NoCopy> y(
piecewise_construct,
forward_as_tuple(1,2),
forward_as_tuple(2,3)
); // also fine
So, what's a the usecase?
How and when do I use piecewise_construct?
Not all types can be moved more efficiently than copied, and for some types it may make sense to even explicitly disable both copying and moving. Consider std::array<int, BIGNUM> as an an example of the former kind of a type.
The point with the emplace functions and piecewise_construct is that such a class can be constructed in place, without needing to create temporary instances to be moved or copied.
struct big {
int data[100];
big(int first, int second) : data{first, second} {
// the rest of the array is presumably filled somehow as well
}
};
std::pair<big, big> pair(piecewise_construct, {1,2}, {3,4});
Compare the above to pair(big(1,2), big(3,4)) where two temporary big objects would have to be created and then copied - and moving does not help here at all! Similarly:
std::vector<big> vec;
vec.emplace_back(1,2);
The main use case for piecewise constructing a pair is emplacing elements into a map or an unordered_map:
std::map<int, big> map;
map.emplace(std::piecewise_construct, /*key*/1, /*value*/{2,3});
One power piecewise_construct has is to avoid bad conversions when doing overload resolution to construct objects.
Consider a Foo that has a weird set of constructor overloads:
struct Foo {
Foo(std::tuple<float, float>) { /* ... */ }
Foo(int, double) { /* ... */ }
};
int main() {
std::map<std::string, Foo> m1;
std::pair<int, double> p1{1, 3.14};
m1.emplace("Will call Foo(std::tuple<float, float>)",
p1);
m1.emplace("Will still call Foo(std::tuple<float, float>)",
std::forward_as_tuple(2, 3.14));
m1.emplace(std::piecewise_construct,
std::forward_as_tuple("Will call Foo(int, double)"),
std::forward_as_tuple(3, 3.14));
// Some care is required, though...
m1.emplace(std::piecewise_construct,
std::forward_as_tuple("Will call Foo(std::tuple<float, float>)!"),
std::forward_as_tuple(p1));
}