I have a use case where one thread reads message into a large buffer and the distributes the processing to a bunch of threads. The buffer is shared by multiple threads after that. Its read-only and when the last thread finishes, the buffer has to be freed. The buffer is allocated from a lock-free slab allocator.
My initial design was to use shared_ptr for the buffer. But the buffer can be of different size. My way of getting around it was do something like this.
struct SharedBuffer {
SharedBuffer (uint16_t len, std::shared_ptr<void> ptr)
: _length(len), _buf(std::move(ptr))
{
}
uint8_t data () { return (uint8_t *)_buf.get(); }
uint16_t length
std::shared_ptr<void> _buf; // type-erase the shared_ptr as the SharedBuffer
// need to stored in some other structs
};
Now the allocator will allocate the shared_ptr like this:
SharedBuffer allocate (size_t size)
{
auto buf = std::allocate_shared<std::array<uint8_t, 16_K>>(myallocator);
return SharedBuffer{16_K, buf}; // type erase the std::array
}
And the SharedBuffer is enqueued to each thread who wants it.
Now I think, I am doing lot of stuff unnecessarily, I can sort of make do with boost::intrusive_ptr with the below scheme. Things are bit C'ish- as I am using variable size array. Here I have changed the slab allocator with a operator new() for the sake of simplicity. I wanted to run it by to see if this implementation is okay.
template <typename T>
inline int atomicIncrement (T* t)
{
return __atomic_add_fetch(&t->_ref, 1, __ATOMIC_ACQUIRE);
}
template <typename T>
inline int atomicDecrement (T* t)
{
return __atomic_sub_fetch(&t->_ref, 1, __ATOMIC_RELEASE);
}
class SharedBuffer {
public:
friend int atomicIncrement<SharedBuffer>(SharedBuffer*);
friend int atomicDecrement<SharedBuffer>(SharedBuffer*);
SharedBuffer(uint16_t len) : _length(len) {}
uint8_t *data ()
{
return &_data[0];
}
uint16_t length () const
{
return _length;
}
private:
int _ref{0};
const uint16_t _length;
uint8_t _data[];
};
using SharedBufferPtr = boost::intrusive_ptr<SharedBuffer>;
SharedBufferPtr allocate (size_t size)
{
// dummy implementation
void *p = ::operator new (size + sizeof(SharedBuffer));
// I am not explicitly constructing the array of uint8_t
return new (p) SharedBuffer(size);
}
void deallocate (SharedBuffer* sbuf)
{
sbuf->~SharedBuffer();
// dummy implementation
::operator delete ((void *)sbuf);
}
void intrusive_ptr_add_ref(SharedBuffer* sbuf)
{
atomicIncrement(sbuf);
}
void intrusive_ptr_release (SharedBuffer* sbuf)
{
if (atomicDecrement(sbuf) == 0) {
deallocate(sbuf);
}
}
I'd use the simpler implementation (using shared_ptr) unless you are avoiding specific problems (i.e. profile first).
Side Note: you can use boost::shared_pointer<> with boost::make_shared<T[]>(N), which is being [added to the standard library in c++20.
Note that allocate_shared already embeds the control block into the same allocation like you do with the intrusive approach.
Finally, I'd use std::atomic_int so you have a clear contract that cannot (accidentally) be used wrong. At the same time, it'll remove the remaining bit of complexity.
Related
It there a way to change the data stored inside a std::vector inside a const function? See the following code to understand what I want to accomplish:
// class holding properties and data
class Output{
public:
int * values; // possibility 1: raw pointer
std::vector<int> vc; // possibility 2: std::vector
mutable std::vector<int> vm; // possibility 3: mutable vector
//std::vector<mutable int> vm; something like this,
};
class Node{
Output out;
void test()const{
// i want to change the "data" of the Output but not the Object
out.values[0] = 0;//works: i can change the output data
out.values = nullptr;//works: compile error, i cant change the pointer
out.vc[0] = 1; // compile error, not possible :(
out.vm[0] = 1; // that is what i want
out.vm.resize(3); // this is now possible, but should be not possible
}
};
I can use a raw pointer to achieve my goal, but i would prefer a std::vector if this is possible.
A mutable content vector may looks like this:
template<typename T>
class mutable_vector : public std::vector<T>{
public:
T& operator[](int index)const{
return const_cast<mutable_vector<T>*>(this)->data()[index];
}
typename std::vector<T>::iterator begin()const{
return const_cast<mutable_vector<T>*>(this)->begin();
}
typename std::vector<T>::iterator rbegin()const{
return const_cast<mutable_vector<T>*>(this)->rbegin();
}
};
I want to replace some code that uses boost::interprocess shared memory. One advantage of shared memory is that you can impose limits on the maximum amount of memory it can use. I'm looking for a custom allocator, based off std::allocator that can do this.
Only particular classes in the program will use this allocator, everything else uses the defaulted std::allocator and are only limited by available RAM.
I'm trying to write one of my own but I'm running into issues, mainly with how to share state among the allocator copies that are created by STL containers. State includes the number of free bytes remaining and the maximum size the allocator can use. I thought I could get away with making them thread_local but then several different instances of the same class will all allocate and deallocate from the same limited heap, which is not what I want. I'm beginning to think it's not possible, hence this question here. Neither contiguous allocation nor performance are major requirements for now.
The hard limit on the memory size cannot be a template parameter either, it's read from a config file.
Edit: The issue with sharing state is that some containers call the default constructor of the allocator type. Obviously this constructor cannot easily know anything about the outside world even if shared_ptr is used it will be nullptr initialised. For example, look at the source code for std::string::clear
g++ (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4
After following the hints above I came up with this which seems to work ok for POD types, but things fall apart when I try to make a Vector or Map that uses String:
#include <string>
#include <vector>
#include <map>
#include <atomic>
#include <memory>
struct SharedState
{
SharedState()
: m_maxSize(0),
m_bytesRemaining(0)
{
}
SharedState(std::size_t maxSize)
: m_maxSize(maxSize),
m_bytesRemaining(maxSize)
{
}
void allocate(std::size_t bytes) const {
if (m_bytesRemaining < bytes) {
throw std::bad_alloc();
}
m_bytesRemaining -= bytes;
}
void deallocate(std::size_t bytes) const {
m_bytesRemaining += bytes;
}
std::size_t getBytesRemaining() const {
return m_bytesRemaining;
}
const std::size_t m_maxSize;
mutable std::atomic<std::size_t> m_bytesRemaining;
};
// --------------------------------------
template <typename T>
class BaseLimitedAllocator : public std::allocator<T>
{
public:
using size_type = std::size_t;
using pointer = T*;
using const_pointer = const T*;
using propagate_on_container_move_assignment = std::true_type;
template <typename U>
struct rebind
{
typedef BaseLimitedAllocator<U> other;
};
BaseLimitedAllocator() noexcept = default;
BaseLimitedAllocator(std::size_t maxSize) noexcept
: m_state(new SharedState(maxSize)) {
}
BaseLimitedAllocator(const BaseLimitedAllocator& other) noexcept {
m_state = other.m_state;
}
template <typename U>
BaseLimitedAllocator(const BaseLimitedAllocator<U>& other) noexcept {
m_state = other.m_state;
}
pointer allocate(size_type n, const void* hint = nullptr) {
m_state->allocate(n * sizeof(T));
return std::allocator<T>::allocate(n, hint);
}
void deallocate(pointer p, size_type n) {
std::allocator<T>::deallocate(p, n);
m_state->deallocate(n * sizeof(T));
}
public:
std::shared_ptr<SharedState> m_state; // This must be public for the rebind copy constructor.
};
template <typename T, typename U>
inline bool operator==(const BaseLimitedAllocator<T>&, const BaseLimitedAllocator<U>&) {
return true;
}
template <typename T, typename U>
inline bool operator!=(const BaseLimitedAllocator<T>&, const BaseLimitedAllocator<U>&) {
return false;
}
struct LimitedAllocator : public BaseLimitedAllocator<char>
{
LimitedAllocator(std::size_t maxSize)
: BaseLimitedAllocator<char>(maxSize) {
}
template <typename U>
using Other = typename BaseLimitedAllocator<char>::template rebind<U>::other;
};
// -----------------------------------------
// Example usage:
class SomeClass
{
public:
using String = std::basic_string<char, std::char_traits<char>, LimitedAllocator::Other<char>>;
template <typename T>
using Vector = std::vector<T, LimitedAllocator::Other<T>>;
template <typename K, typename V>
using Map = std::map<K, V, std::less<K>, LimitedAllocator::Other<std::pair<const K, V>>>;
Complex()
: allocator(256),
s(allocator),
v(allocator),
m(std::less<int>(), allocator) // Cannot only specify the allocator. Annoying.
{
}
const LimitedAllocator allocator;
String s;
Vector<int> v;
Map<int, String> m;
};
I have leraned the boost sample about "Create vectors in shared_memory".
Now My data structure is like :
Data structure:
enum FuncIndex
{
enmFunc_glBegin,
...
}
class CGLParam {};
class Funcall
{
vector<CGLParam> vecParams;
};
class Global_Funcall
{
typedef allocator<CGLParam*, managed_shared_memory::segment_manager> ShmemAllocator;
typedef vector<CGLParam*, ShmemAllocator> MyVector;
MyVector<FunCall> vecFuncalls;
};
Global_Funcall()
{
shared_memory_object::remove("MySharedMemory");
managed_shared_memory segment(create_only, "MySharedMemory", 65536);
//Initialize shared memory STL-compatible allocator
const ShmemAllocator alloc_inst(segment.get_segment_manager());
//Construct a vector named "MyVector" in shared memory with argument alloc_inst
vecFuncalls= segment.construct<MyVector>("MyVector")(alloc_inst);
}
void InvokeFuncs(CGLParam *presult)
{
managed_shared_memory open_segment(open_only,"MySharedMemory");
listParams = open_segment.find<MyVector>("MyVector").first;
// MyVector::const_iterator it;
// for (it = listParams->cbegin(); it != listParams->cend(); it++)
// {
// (*it)->InvokeFunc(presult);
// }
}
My problem is "How to construct the vecParams and how to get it". the size of data is very big (opengl function calls)
The structure is use to save the opengl function calls.
Besides 'obvious' typos, you try to assign an IPC vector (MyVector*) to a standard vector in the GlobalFuncall constructor. That will never work. C++ is a strongly typed language, so the types have to match if you want to assign[1].
Besides this there seems to be a conceptual problem:
if the goal is to have a data collection that could be larger than fits in phyical memory, shared-memory per se isn't going to help. You'd want to look at memory-mapped files
if you want shared-memory because you can share it between processes (hence Boost Interprocess), you will need to think of process synchronization, or you will see complicated bugs because of data races.
you cannot safely store raw pointers inside this containers. Instead, store the actual elements there (or maybe look at bip::offset_ptr<> if you want to get really fancy).
Here's a 'fixed up' demonstration
fixing the C++ compilation issues,
changing the element type to be CGLParam instead of CGLParam*
fixing the member type to match the SHM vector and
adding basic shared mutex synchronization (this is an art in itself and you will want to read more about this)
See it Live On Coliru[1]
#include <vector>
#include <boost/interprocess/managed_mapped_file.hpp>
#include <boost/interprocess/managed_shared_memory.hpp>
#include <boost/interprocess/sync/named_mutex.hpp>
#include <boost/interprocess/sync/named_recursive_mutex.hpp>
#include <boost/interprocess/sync/scoped_lock.hpp>
namespace bip = boost::interprocess;
using mutex_type = bip::named_mutex;
class CGLParam {};
typedef bip::allocator<CGLParam, bip::managed_shared_memory::segment_manager> ShmemAllocator;
typedef std::vector<CGLParam, ShmemAllocator> MyVector;
class Funcall
{
std::vector<CGLParam> vecParams;
};
struct mutex_remove
{
mutex_remove() { mutex_type::remove("2faa9c3f-4cc0-49c5-8f79-f99ce5a5d526"); }
~mutex_remove(){ mutex_type::remove("2faa9c3f-4cc0-49c5-8f79-f99ce5a5d526"); }
} remover;
static mutex_type mutex(bip::open_or_create,"2faa9c3f-4cc0-49c5-8f79-f99ce5a5d526");
class Global_Funcall
{
MyVector* vecFuncalls;
Global_Funcall()
{
bip::scoped_lock<mutex_type> lock(mutex);
bip::shared_memory_object::remove("MySharedMemory");
bip::managed_shared_memory segment(bip::create_only, "MySharedMemory", 65536);
//Initialize shared memory STL-compatible allocator
const ShmemAllocator alloc_inst(segment.get_segment_manager());
//Construct a vector named "MyVector" in shared memory with argument alloc_inst
vecFuncalls = segment.construct<MyVector>("MyVector")(alloc_inst);
}
};
void InvokeFuncs(CGLParam *presult)
{
bip::scoped_lock<mutex_type> lock(mutex);
bip::managed_shared_memory open_segment(bip::open_only, "MySharedMemory");
auto listParams = open_segment.find<MyVector>("MyVector").first;
MyVector::const_iterator it;
for (it = listParams->cbegin(); it != listParams->cend(); it++)
{
//it->InvokeFunc(presult);
}
}
int main()
{
}
[1] Unless, of course, there's a suitable conversion
[2] Coliru doesn't support the required IPC mechanisms :/
async_read_until expects a basic_streambuf into which the data will be read. I don't want to allocate additional memory, but using a memory address (from a specified interface that I'm not allowed to change) as the target buffer.
Is it possible to create a streambuf with an external memory address or do I need to write a wrapper-class?
Finally solved the issue by writing my own async_read_until_delim class which expects a memory-pointer and a maximum value of bytes to read. It's as close as possible to the original boost implementation, but has a few adjustments which should lead to a more performant execution.
namespace {
template<typename read_handler>
class async_read_until_delim
{
public:
async_read_until_delim(tcp::socket& socket, void* buffer, std::size_t max_read_size_in_bytes,
char delim, read_handler& handler)
: m_socket(socket), m_cur(static_cast<char*>(buffer)),
m_end(static_cast<char*>(buffer) + max_read_size_in_bytes), m_delim(delim),
m_handler(handler), m_pos(0)
{
read_some();
}
async_read_until_delim(async_read_until_delim const& other)
: m_socket(other.m_socket), m_cur(other.m_cur), m_end(other.m_end), m_delim(other.m_delim),
m_handler(other.m_handler), m_pos(other.m_pos)
{
}
void operator()(boost::system::error_code const& error, std::size_t bytes_transferred)
{
if (!error)
{
if (std::find(m_cur, m_end, m_delim) != m_end)
{
m_handler(error, m_pos + bytes_transferred);
return;
}
else if (m_cur == m_end)
{
m_handler(boost::asio::error::not_found, -1);
return;
}
m_cur += bytes_transferred;
m_pos += bytes_transferred;
read_some();
}
else
m_handler(error, m_pos);
}
private:
void read_some()
{
m_socket.async_read_some(
boost::asio::buffer(m_cur, m_end - m_cur), async_read_until_delim(*this));
}
tcp::socket& m_socket;
char *m_cur,
*m_end;
char m_delim;
read_handler m_handler;
std::size_t m_pos;
};
template<typename read_handler>
inline void do_async_read_until_delim(tcp::socket& socket, void* buffer, std::size_t max_read_size_in_bytes,
char delim, read_handler& handler)
{
async_read_until_delim<read_handler>(socket, buffer, max_read_size_in_bytes, delim, handler);
}
} /* anonymous namespace */
So, I hope it will be usefull for someone too.
I working on some code in the linux kernel (2.4) and for some reason kmalloc returns the same address (I believe it only happens after the middle of the test). I checked that no calls to kfree were made between the calls to kmalloc (i.e memory is still in use).
maybe I'm out of memory ? (kmalloc didn't return NULL...)
any ideas on how such a thing can happen ?
thanks in advance for the help!
code:
typedef struct
{
char* buffer;
int read_count;
int write_count;
struct semaphore read_sm;
struct semaphore write_sm;
int reader_ready;
int writer_ready;
int createTimeStamp;
} data_buffer_t ;
typedef struct vsf_t vsf_t;
struct vsf_t
{
int minor;
int type;
int open_count;
int waiting_pid;
data_buffer_t* data;
list_t proc_list;
vsf_t* otherSide_vsf;
int real_create_time_stamp;
};
int create_vsf(struct inode *inode, struct file *filp, struct vsf_command_parameters* parms)
{
...
buff_data = allocate_buffer();
if (buff_data == NULL)
{
kfree(this_vsfRead);
kfree(this_vsfWrite);
return -ENOMEM;
}
...
}
data_buffer_t* allocate_buffer()
{
...
data_buffer_t* this_buff = (data_buffer_t*)kmalloc(sizeof(data_buffer_t), GFP_KERNEL);
if (this_buff == NULL)
{
printk( KERN_WARNING "failure at allocating memory\n" );
return NULL;
}
...
return this_buff;
}
*I print after every kmalloc and kfree,I'm absolutely sure that no kfree is called between kmalloc's (that return the same adress)
I don't know what kmalloc's data structures look like but you could imagine this happening if a previous double free caused a cycle in a linked list of buffers. Further frees could still chain on additional distinct buffers (able to be reallocated) but once those were exhausted that last buffer would be returned indefinitely.