Unordered set (const char) much slower than unordered set (string) - performance

I'm loading a very long list from disk into an unordered_set. If I use a set of strings, it is very fast. A test list of about 7 MB loads in about 1 second. However, using a set of char pointers takes about 2.1 minutes!
Here is the code for the string version:
unordered_set<string> Set;
string key;
while (getline(fin, key))
{
Set.insert(key);
}
Here is the code for the char* version:
struct unordered_eqstr
{
bool operator()(const char* s1, const char* s2) const
{
return strcmp(s1, s2) == 0;
}
};
struct unordered_deref
{
template <typename T>
size_t operator()(const T* p) const
{
return hash<T>()(*p);
}
};
unordered_set<const char*, unordered_deref, unordered_eqstr> Set;
string key;
while (getline(fin, key))
{
char* str = new(mem) char[key.size()+1];
strcpy(str, key.c_str());
Set.insert(str);
}
The "new(mem)" is because I'm using a custom memory manager so I can allocate big blocks of memory and give them out to tiny objects like c strings. However, I've tested this with regular "new" and the results are identical. I've also used my memory manager in other tools with no problems.
The two structs are necessary to make the insert and find hash based on the actual c string and not its address. The unordered_deref I actually found here on stack overflow.
Eventually I need to load multi-gigabyte files. This is why I'm using a custom memory manager, but it's also why this horrible slow down is unacceptable. Any ideas?

Here we go.
struct unordered_deref
{
size_t operator()(const char* p) const
{
return hash<string>()(p);
}
};

Related

Shared buffer using boost::intrusive_ptr

I have a use case where one thread reads message into a large buffer and the distributes the processing to a bunch of threads. The buffer is shared by multiple threads after that. Its read-only and when the last thread finishes, the buffer has to be freed. The buffer is allocated from a lock-free slab allocator.
My initial design was to use shared_ptr for the buffer. But the buffer can be of different size. My way of getting around it was do something like this.
struct SharedBuffer {
SharedBuffer (uint16_t len, std::shared_ptr<void> ptr)
: _length(len), _buf(std::move(ptr))
{
}
uint8_t data () { return (uint8_t *)_buf.get(); }
uint16_t length
std::shared_ptr<void> _buf; // type-erase the shared_ptr as the SharedBuffer
// need to stored in some other structs
};
Now the allocator will allocate the shared_ptr like this:
SharedBuffer allocate (size_t size)
{
auto buf = std::allocate_shared<std::array<uint8_t, 16_K>>(myallocator);
return SharedBuffer{16_K, buf}; // type erase the std::array
}
And the SharedBuffer is enqueued to each thread who wants it.
Now I think, I am doing lot of stuff unnecessarily, I can sort of make do with boost::intrusive_ptr with the below scheme. Things are bit C'ish- as I am using variable size array. Here I have changed the slab allocator with a operator new() for the sake of simplicity. I wanted to run it by to see if this implementation is okay.
template <typename T>
inline int atomicIncrement (T* t)
{
return __atomic_add_fetch(&t->_ref, 1, __ATOMIC_ACQUIRE);
}
template <typename T>
inline int atomicDecrement (T* t)
{
return __atomic_sub_fetch(&t->_ref, 1, __ATOMIC_RELEASE);
}
class SharedBuffer {
public:
friend int atomicIncrement<SharedBuffer>(SharedBuffer*);
friend int atomicDecrement<SharedBuffer>(SharedBuffer*);
SharedBuffer(uint16_t len) : _length(len) {}
uint8_t *data ()
{
return &_data[0];
}
uint16_t length () const
{
return _length;
}
private:
int _ref{0};
const uint16_t _length;
uint8_t _data[];
};
using SharedBufferPtr = boost::intrusive_ptr<SharedBuffer>;
SharedBufferPtr allocate (size_t size)
{
// dummy implementation
void *p = ::operator new (size + sizeof(SharedBuffer));
// I am not explicitly constructing the array of uint8_t
return new (p) SharedBuffer(size);
}
void deallocate (SharedBuffer* sbuf)
{
sbuf->~SharedBuffer();
// dummy implementation
::operator delete ((void *)sbuf);
}
void intrusive_ptr_add_ref(SharedBuffer* sbuf)
{
atomicIncrement(sbuf);
}
void intrusive_ptr_release (SharedBuffer* sbuf)
{
if (atomicDecrement(sbuf) == 0) {
deallocate(sbuf);
}
}
I'd use the simpler implementation (using shared_ptr) unless you are avoiding specific problems (i.e. profile first).
Side Note: you can use boost::shared_pointer<> with boost::make_shared<T[]>(N), which is being [added to the standard library in c++20.
Note that allocate_shared already embeds the control block into the same allocation like you do with the intrusive approach.
Finally, I'd use std::atomic_int so you have a clear contract that cannot (accidentally) be used wrong. At the same time, it'll remove the remaining bit of complexity.

Factory function for initialization of static const struct with array and lambda

I have a structure that should be statically initialized.
struct Option
{ char Option[8];
void (*Handler)(const char* value);
};
void ParseInto(const char* value, const char** target); // string option
void ParseInto(const char* value, int* target, int min, int max); // int option
static int count = 1;
static const char* name;
static const Option OptionMap[] =
{ { "count", [](const char* value) { ParseInto(value, &count, 1, 100); } }
, { "name", [](const char* value) { ParseInto(value, &name); } }
// ...
};
Up to this point it works.
To get rid of repeating the lambda function definition over and over (there are dozens) I want to use a factory like this:
struct Option
{ const char Option[8];
const void (*Handler)(const char* value);
template<typename ...A>
Option(const char (&option)[8], A... args)
: Option(option)
, Handler([args...](const char* value) { ParseInto(value, args...); })
{}
};
static const Option OptionMap[] =
{ { "count", &count, 1, 100 }
, { "name", &name }
};
This does not work for two reasons:
I did not find a type for the first constructor parameter option that perfectly forwards the initialization of the character array. The difficult part is that the length of the assigned array does not match the array length in general.
The even harder part is that the lambda function has a closure and therefore cannot decay to a function pointer. But all parameters are compile time constants. So It should be possible to make the constructor constexpr. However, lambdas seem not to support constexpr at all.
Anyone an idea how to solve this challenge?
The current work around is a variadic macro. Not that pretty, but of course, it works.
Context is C++11. I would not like to upgrade for now, but nevertheless a solution with a newer standard would be appreciated. Problems like this tend to reappear from time to time.
There are some further restrictions by the underlying (old) code. struct Option must be a POD type and the first member must be the character array so a cast from Option* to const char* is valid.

Custom allocator with hard limits

I want to replace some code that uses boost::interprocess shared memory. One advantage of shared memory is that you can impose limits on the maximum amount of memory it can use. I'm looking for a custom allocator, based off std::allocator that can do this.
Only particular classes in the program will use this allocator, everything else uses the defaulted std::allocator and are only limited by available RAM.
I'm trying to write one of my own but I'm running into issues, mainly with how to share state among the allocator copies that are created by STL containers. State includes the number of free bytes remaining and the maximum size the allocator can use. I thought I could get away with making them thread_local but then several different instances of the same class will all allocate and deallocate from the same limited heap, which is not what I want. I'm beginning to think it's not possible, hence this question here. Neither contiguous allocation nor performance are major requirements for now.
The hard limit on the memory size cannot be a template parameter either, it's read from a config file.
Edit: The issue with sharing state is that some containers call the default constructor of the allocator type. Obviously this constructor cannot easily know anything about the outside world even if shared_ptr is used it will be nullptr initialised. For example, look at the source code for std::string::clear
g++ (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4
After following the hints above I came up with this which seems to work ok for POD types, but things fall apart when I try to make a Vector or Map that uses String:
#include <string>
#include <vector>
#include <map>
#include <atomic>
#include <memory>
struct SharedState
{
SharedState()
: m_maxSize(0),
m_bytesRemaining(0)
{
}
SharedState(std::size_t maxSize)
: m_maxSize(maxSize),
m_bytesRemaining(maxSize)
{
}
void allocate(std::size_t bytes) const {
if (m_bytesRemaining < bytes) {
throw std::bad_alloc();
}
m_bytesRemaining -= bytes;
}
void deallocate(std::size_t bytes) const {
m_bytesRemaining += bytes;
}
std::size_t getBytesRemaining() const {
return m_bytesRemaining;
}
const std::size_t m_maxSize;
mutable std::atomic<std::size_t> m_bytesRemaining;
};
// --------------------------------------
template <typename T>
class BaseLimitedAllocator : public std::allocator<T>
{
public:
using size_type = std::size_t;
using pointer = T*;
using const_pointer = const T*;
using propagate_on_container_move_assignment = std::true_type;
template <typename U>
struct rebind
{
typedef BaseLimitedAllocator<U> other;
};
BaseLimitedAllocator() noexcept = default;
BaseLimitedAllocator(std::size_t maxSize) noexcept
: m_state(new SharedState(maxSize)) {
}
BaseLimitedAllocator(const BaseLimitedAllocator& other) noexcept {
m_state = other.m_state;
}
template <typename U>
BaseLimitedAllocator(const BaseLimitedAllocator<U>& other) noexcept {
m_state = other.m_state;
}
pointer allocate(size_type n, const void* hint = nullptr) {
m_state->allocate(n * sizeof(T));
return std::allocator<T>::allocate(n, hint);
}
void deallocate(pointer p, size_type n) {
std::allocator<T>::deallocate(p, n);
m_state->deallocate(n * sizeof(T));
}
public:
std::shared_ptr<SharedState> m_state; // This must be public for the rebind copy constructor.
};
template <typename T, typename U>
inline bool operator==(const BaseLimitedAllocator<T>&, const BaseLimitedAllocator<U>&) {
return true;
}
template <typename T, typename U>
inline bool operator!=(const BaseLimitedAllocator<T>&, const BaseLimitedAllocator<U>&) {
return false;
}
struct LimitedAllocator : public BaseLimitedAllocator<char>
{
LimitedAllocator(std::size_t maxSize)
: BaseLimitedAllocator<char>(maxSize) {
}
template <typename U>
using Other = typename BaseLimitedAllocator<char>::template rebind<U>::other;
};
// -----------------------------------------
// Example usage:
class SomeClass
{
public:
using String = std::basic_string<char, std::char_traits<char>, LimitedAllocator::Other<char>>;
template <typename T>
using Vector = std::vector<T, LimitedAllocator::Other<T>>;
template <typename K, typename V>
using Map = std::map<K, V, std::less<K>, LimitedAllocator::Other<std::pair<const K, V>>>;
Complex()
: allocator(256),
s(allocator),
v(allocator),
m(std::less<int>(), allocator) // Cannot only specify the allocator. Annoying.
{
}
const LimitedAllocator allocator;
String s;
Vector<int> v;
Map<int, String> m;
};

How do I write binary data to a file in Modern C++?

Writing binary data to a file in C is simple: use fwrite, passing the address of the object you want to write and the size of the object. Is there something more "correct" for Modern C++ or should I stick to using FILE* objects? As far as I can tell the IOStream library is for writing formatted data rather than binary data, and the write member asks for a char* leaving me littering my code with casts.
So the game here is to enable argument dependent lookup on reading and writing, and make sure you don't try to read/write things that are not flat data.
It fails to catch data containing pointers, which also should not be read/written this way, but it is better than nothing
namespace serialize {
namespace details {
template<class T>
bool write( std::streambuf& buf, const T& val ) {
static_assert( std::is_standard_layout<T>{}, "data is not standard layout" );
auto bytes = sizeof(T);
return buf.sputn(reinterpret_cast<const char*>(&val), bytes) == bytes;
}
template<class T>
bool read( std::streambuf& buf, T& val ) {
static_assert( std::is_standard_layout<T>{}, "data is not standard layout" );
auto bytes = sizeof(T);
return buf.sgetn(reinterpret_cast<char*>(&val), bytes) == bytes;
}
}
template<class T>
bool read( std::streambuf& buf, T& val ) {
using details::read; // enable ADL
return read(buf, val);
}
template<class T>
bool write( std::streambuf& buf, T const& val ) {
using details::write; // enable ADL
return write(buf, val);
}
}
namespace baz {
// plain old data:
struct foo {int x;};
// not standard layout:
struct bar {
bar():x(3) {}
operator int()const{return x;}
void setx(int s){x=s;}
int y = 1;
private:
int x;
};
// adl based read/write overloads:
bool write( std::streambuf& buf, bar const& b ) {
bool worked = serialize::write( buf, (int)b );
worked = serialize::write( buf, b.y ) && worked;
return worked;
}
bool read( std::streambuf& buf, bar& b ) {
int x;
bool worked = serialize::read( buf, x );
if (worked) b.setx(x);
worked = serialize::read( buf, b.y ) && worked;
return worked;
}
}
I hope you get the idea.
live example.
Possibly you should restrict said writing based off is_pod not standard layout, with the idea that if something special should happen on construction/destruction, maybe you shouldn't be binary blitting the type.
Since you are already bypassing all formatting, I would recommend using the std::filebuf class directly to avoid possible overheads from std::fstream; it's definitely better than FILE* due to RAII.
You can't escape from the casts this way, sadly. But it's not hard to wrap it, like:
template<class T>
void write(std::streambuf& buf, const T& val)
{
std::size_t to_write = sizeof val;
if (buf.sputn(reinterpret_cast<const char*>(&val), to_write) != to_write)
// do some error handling here
}

kmalloc returning the same adress over and over again [Linux 2.4]

I working on some code in the linux kernel (2.4) and for some reason kmalloc returns the same address (I believe it only happens after the middle of the test). I checked that no calls to kfree were made between the calls to kmalloc (i.e memory is still in use).
maybe I'm out of memory ? (kmalloc didn't return NULL...)
any ideas on how such a thing can happen ?
thanks in advance for the help!
code:
typedef struct
{
char* buffer;
int read_count;
int write_count;
struct semaphore read_sm;
struct semaphore write_sm;
int reader_ready;
int writer_ready;
int createTimeStamp;
} data_buffer_t ;
typedef struct vsf_t vsf_t;
struct vsf_t
{
int minor;
int type;
int open_count;
int waiting_pid;
data_buffer_t* data;
list_t proc_list;
vsf_t* otherSide_vsf;
int real_create_time_stamp;
};
int create_vsf(struct inode *inode, struct file *filp, struct vsf_command_parameters* parms)
{
...
buff_data = allocate_buffer();
if (buff_data == NULL)
{
kfree(this_vsfRead);
kfree(this_vsfWrite);
return -ENOMEM;
}
...
}
data_buffer_t* allocate_buffer()
{
...
data_buffer_t* this_buff = (data_buffer_t*)kmalloc(sizeof(data_buffer_t), GFP_KERNEL);
if (this_buff == NULL)
{
printk( KERN_WARNING "failure at allocating memory\n" );
return NULL;
}
...
return this_buff;
}
*I print after every kmalloc and kfree,I'm absolutely sure that no kfree is called between kmalloc's (that return the same adress)
I don't know what kmalloc's data structures look like but you could imagine this happening if a previous double free caused a cycle in a linked list of buffers. Further frees could still chain on additional distinct buffers (able to be reallocated) but once those were exhausted that last buffer would be returned indefinitely.

Resources