memory pool usage (boost::pool) for variable sized buffers? - memory-pool

The bottleneck of my current project is heap allocation... profiling stated about 50% of the time one critical thread spends with/in the new operator.
The application cannot use stack memory here and needs to allocate a lot of one central job structure—a custom job/buffer implementation: small and short-lived but variable in size. The object are itself heap memory std::shared_ptr/std::weak_ptr objects and carry a classic C-Array (char*) payload.
Depending on the runtime configuration and workload in different parts 300k-500k object might get created and are in use at the same time (but this should usually not happen). Since its a x64 application memory fragmentation isn't that big a deal (but it might get when also targeted at x86).
To increase speed and packet throughput and as well be save to memory fragmentation in the future I was thinking about using some memory management pool which lead me to boost::pool.
Almost all examples use fixed size object... but I'm unsure how to deal with a variable lengthed payload? A simplified object like this could be created using a boost::pool but I'm unsure what to do with the payload? Is it usable with a boost:pool at all?
class job {
public:
static std::shared_ptr<job> newObj();
private:
delegate_t call;
args_t * args;
unsigned char * payload;
size_t payload_size;
}
Usually the objects are destroyed when all references to the shared_ptr run out of scope and I wouldn't want to change the shared-ptr back to a c-ptr. A deferred destruction of the objects should also work to increase performance and from what I read should work better with a boost:pool. I haven't found if the pool supports an interaction with the smart_ptr? The alternative but quirky way would be to save a reference to the shared_ptr on creation together with the pool and release them in blocks.
Does anyone have experiences with the two? boost:pool usage with variable sized objects and smart pointer interaction?
Thank you!

Related

gsoap memory leak C applications

We are using gsoap for C client and server webservices implemented for blackfin running Linux.
We don't use any malloc in the application. But we see memory usage climbs over time. We are using soap_end to do a cleanup at the end of the call. But when the calls are invoked repeatedly memory usage slowly increasing, may be because of memory fragmentation. This is also impacting performance of the system
What's the preferred usage of gsoap where soap_malloc is not used much. For eg: If we use static arrays etc will it help?
Thanks,
nkr
I would not recommend using static data, there is no need for that.
To debug memory use, compile all your sources files with -DDEBUG. When you run your application you will see three files:
SENT.log the messages sent
RECV.log the messages received
TEST.log the debug log
The TEST.log is useful to check on messaging issues.
The other valuable information produced at runtime are error messages related to memory leaks or heap memory that is damaged (e.g. overruns) in your code. It is unlikely these will happen in the gSOAP engine, but better check.
To ensure proper allocation and deallocation of managed data:
soap_destroy(soap);
soap_end(soap);
I am using the auto-generated functions to allocate managed data:
SomeClass *obj = soap_new_SomeClass(soap);
and sporadically use soap_malloc for raw managed allocation, or to allocate an array of pointers, or a C string:
const char *s = soap_malloc(soap, 100);
but better is to allocate strings with:
std::string *s = soap_new_std__string(soap);
and arrays can be allocated with the second parameter, e.g. an array of 10 strings:
std::string *s = soap_new_std__string(soap, 10);
All managed allocations are deleted with soap_destroy() followed by soap_end(). After that, you can start allocating again and delete again, etc.
If you want to preserve data that otherwise gets deleted with these calls, use:
soap_unlink(soap, obj);
Now obj can be removed later with delete obj. But be aware that all pointer members in obj that point to managed data have become invalid after soap_destroy() and soap_end(). So you may have to invoke soap_unlink() on these members or risk dangling pointers.
A new cool feature of gSOAP is to generate deep copy and delete function for any data structures automatically, which saves a HUGE amount of coding time:
SomeClass *otherobj = soap_dup_SomeClass(NULL, obj);
This duplicates obj to unmanaged heap space. This is a deep copy that checks for cycles in the object graph and removes such cycles to avoid deletion issues. You can also duplicate the whole (cyclic) managed object to another context by using soap instead of NULL for the first argument of soap_dup_SomeClass.
To deep delete:
soap_del_SomeClass(obj);
This deletes obj but also the data pointed to by its members, and so on.
To use the soap_dup_X and soap_del_X functions use soapcpp2 with options -Ec and -Ed, respectively.
In principle, static and stack-allocated data can be serialized just as well. But consider using the managed heap instead.
Hope this helps.

When using CoTaskMemAlloc, should I always call CoTaskMemFree?

I'm writing some COM and ATL code, and for some reason all the code uses CoTaskMemAlloc to allocate memory instead of new or malloc. So I followed along this coding style and I also use CoTaskMemAlloc.
My teachers taught me to always delete or free when allocating memory. However I'm not sure if I should always be calling CoTaskMemFree if I use CoTaskMemAlloc?
Using the CRT's provided new/malloc and delete/free is a problem in COM interop. To make them work, it is very important that the same copy of the CRT both allocates and releases the memory. That's impossible to enforce in a COM interop scenario, your COM server and the client are practically guaranteed to use different versions of the CRT. Each using their own heap to allocate from. This causes undiagnosable memory leaks on Windows XP, a hard exception on Vista and up.
Which is why the COM heap exists, a single predefined heap in a process that's used both by the server and the client. IMalloc is the generic interface to access that shared heap, CoTaskMemAlloc() and CoTaskMemFree() are the system provided helper functions to use that interface.
That said, this is only necessary in a case where the server allocates memory and the client has to release it. Or the other way around. Which should always be rare in an interop scenario, the odds for accidents are just too large. In COM Automation there are just two such cases, a BSTR and a SAFEARRAY, types that are already wrapped. You avoid it in other cases by having the method caller provide the memory and the callee fill it in. Which also allows a strong optimization, the memory could come from the caller's stack.
Review the code and check who allocates the memory and who needs to release it. If both exist in the same module then using new/malloc is fine because there's now a hard guarantee that the same CRT instance takes care of it. If that's not the case then consider fixing it so the caller provides the memory and releases it.
The allocation and freeing of memory must always come from the same source. If you use CoTaskMemAlloc then you must use CoTaskMemFree to free the memory.
Note in C++ though the act of managing memory and object construction / destruction (new / delete) are independent actions. It's possible to customize specific objects to use a different memory allocator and still allow for the standard new / delete syntax which is preferred. For example
class MyClass {
public:
void* operator new(size_t size) {
return ::CoTaskMemAlloc(size);
}
void* operator new[](size_t size) {
return ::CoTaskMemAlloc(size);
}
void operator delete(void* pMemory) {
::CoTaskMemFree(pMemory);
}
void operator delete[](void* pMemory) {
::CoTaskMemFree(pMemory);
}
};
Now I can use this type just like any other C++ type and yet the memory will come from the COM heap
// Normal object construction but memory comes from CoTaskMemAlloc
MyClass *pClass = new MyClass();
...
// Normal object destruction and memory freed from CoTaskMemFree
delete pClass;
The answer to the question is: Yes, you should use CoTaskMemFree to free memory allocated with CoTaskMemAlloc.
The other answers do a good job explaining why CoTaskMemAlloc and CoTaskMemFree are necessary for memory passed between COM servers and COM clients, but they didn't directly answer your question.
Your teacher was right: You should always use the corresponding release function for any resource. If you use new, use delete. If you use malloc, use free. If you use CreateFile, use CloseHandle. Etc.
Better yet, in C++, use RAII objects that allocate the resource in the constructor and release the resource in the destructor, and then use those RAII wrappers instead of the bare functions. This makes it easier and cleaner to write code that doesn't leak, even if you get something like an exception.
The standard template library provides containers that implement RAII, which is why you should learn to use a std::vector or std::string rather than allocating bare memory and trying to manage it yourself. There are also smart pointers like std::shared_ptr and std::unique_ptr that can be used to make sure the right release call is always made at the right time.
ATL provides some classes like ATL::CComPtr which are wrapper objects that handle the reference counting of COM objects for you. They are not foolproof to use correctly, and, in fact, have a few more gotchas than most of the modern STL classes, so read the documentation carefully. When used correctly, it's relatively easy to make sure the AddRef and Release calls all match up.

Sharing GlobalAlloc() memory from DLL to multiple Win32 applications

I want to move my caching library to a DLL and allow multiple applications to share a single pointer allocated within the DLL using GlobalAlloc(). How could I accomplish this, and would it result in a significant performance decrease?
You could certainly do this and there won't be any performance implication for a single pointer.
Rather than use GlobalAlloc, a legacy API, you should opt for a different shared heap. For example the simplest to use is the COM allocator, CoTaskMemAlloc. Or you can use HeapAlloc passing the process heap obtained by GetProcessHeap.
For example, and neglecting to show error checking:
void *mem = HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, size);
Note that you only need to worry about heap sharing if you expect the memory to be deallocated in a different module from where it was created. If your DLL both creates and destroys the memory then you can use plain old malloc. Because all modules live in the same process address space, memory allocated by any module in that process, can be used by any other module.
Update
I failed on first reading of the question to pick up on the possibility that you may be wanting multiple process to have access to the same memory. If that's what you need then it is only possible with memory mapped files, or perhaps with some form of IPC.

Can address space be recycled for multiple calls to MapViewOfFileEx without chance of failure?

Consider a complex, memory hungry, multi threaded application running within a 32bit address space on windows XP.
Certain operations require n large buffers of fixed size, where only one buffer needs to be accessed at a time.
The application uses a pattern where some address space the size of one buffer is reserved early and is used to contain the currently needed buffer.
This follows the sequence:
(initial run) VirtualAlloc -> VirtualFree -> MapViewOfFileEx
(buffer changes) UnMapViewOfFile -> MapViewOfFileEx
Here the pointer to the buffer location is provided by the call to VirtualAlloc and then that same location is used on each call to MapViewOfFileEx.
The problem is that windows does not (as far as I know) provide any handshake type operation for passing the memory space between the different users.
Therefore there is a small opportunity (at each -> in my above sequence) where the memory is not locked and another thread can jump in and perform an allocation within the buffer.
The next call to MapViewOfFileEx is broken and the system can no longer guarantee that there will be a big enough space in the address space for a buffer.
Obviously refactoring to use smaller buffers reduces the rate of failures to reallocate space.
Some use of HeapLock has had some success but this still has issues - something still manages to steal some memory from within the address space.
(We tried Calling GetProcessHeaps then using HeapLock to lock all of the heaps)
What I'd like to know is there anyway to lock a specific block of address space that is compatible with MapViewOfFileEx?
Edit: I should add that ultimately this code lives in a library that gets called by an application outside of my control
You could brute force it; suspend every thread in the process that isn't the one performing the mapping, Unmap/Remap, unsuspend the suspended threads. It ain't elegant, but it's the only way I can think of off-hand to provide the kind of mutual exclusion you need.
Have you looked at creating your own private heap via HeapCreate? You could set the heap to your desired buffer size. The only remaining problem is then how to get MapViewOfFileto use your private heap instead of the default heap.
I'd assume that MapViewOfFile internally calls GetProcessHeap to get the default heap and then it requests a contiguous block of memory. You can surround the call to MapViewOfFile with a detour, i.e., you rewire the GetProcessHeap call by overwriting the method in memory effectively inserting a jump to your own code which can return your private heap.
Microsoft has published the Detour Library that I'm not directly familiar with however. I know that detouring is surprisingly common. Security software, virus scanners etc all use such frameworks. It's not pretty, but may work:
HANDLE g_hndPrivateHeap;
HANDLE WINAPI GetProcessHeapImpl() {
return g_hndPrivateHeap;
}
struct SDetourGetProcessHeap { // object for exception safety
SDetourGetProcessHeap() {
// put detour in place
}
~SDetourGetProcessHeap() {
// remove detour again
}
};
void MapFile() {
g_hndPrivateHeap = HeapCreate( ... );
{
SDetourGetProcessHeap d;
MapViewOfFile(...);
}
}
These may also help:
How to replace WinAPI functions calls in the MS VC++ project with my own implementation (name and parameters set are the same)?
How can I hook Windows functions in C/C++?
http://research.microsoft.com/pubs/68568/huntusenixnt99.pdf
Imagine if I came to you with a piece of code like this:
void *foo;
foo = malloc(n);
if (foo)
free(foo);
foo = malloc(n);
Then I came to you and said, help! foo does not have the same address on the second allocation!
I'd be crazy, right?
It seems to me like you've already demonstrated clear knowledge of why this doesn't work. There's a reason that the documention for any API that takes an explicit address to map into lets you know that the address is just a suggestion, and it can't be guaranteed. This also goes for mmap() on POSIX.
I would suggest you write the program in such a way that a change in address doesn't matter. That is, don't store too many pointers to quantities inside the buffer, or if you do, patch them up after reallocation. Similar to the way you'd treat a buffer that you were going to pass into realloc().
Even the documentation for MapViewOfFileEx() explicitly suggests this:
While it is possible to specify an address that is safe now (not used by the operating system), there is no guarantee that the address will remain safe over time. Therefore, it is better to let the operating system choose the address. In this case, you would not store pointers in the memory mapped file, you would store offsets from the base of the file mapping so that the mapping can be used at any address.
Update from your comments
In that case, I suppose you could:
Not map into contiguous blocks. Perhaps you could map in chunks and write some intermediate function to decide which to read from/write to?
Try porting to 64 bit.
As the earlier post suggests, you can suspend every thread in the process while you change the memory mappings. You can use SuspendThread()/ResumeThread() for that. This has the disadvantage that your code has to know about all the other threads and hold thread handles for them.
An alternative is to use the Windows debug API to suspend all threads. If a process has a debugger attached, then every time the process faults, Windows will suspend all of the process's threads until the debugger handles the fault and resumes the process.
Also see this question which is very similar, but phrased differently:
Replacing memory mappings atomically on Windows

Call to _freea really necessary?

I am developping on Windows with DevStudio, in C/C++ unmanaged.
I want to allocate some memory on the stack instead of the heap because I don't want to have to deal with releasing that memory manually (I know about smart pointers and all those things. I have a very specific case of memory allocation I need to deal with), similar to the use of A2W() and W2A() macros.
_alloca does that, but it is deprecated. It is suggested to use malloca instead. But _malloca documentation says that a call to ___freea is mandatory for each call to _malloca. It then defeats my purpose to use _malloca, I will use malloc or new instead.
Anybody knows if I can get away with not calling _freea without leaking and what the impacts are internally?
Otherwise, I will end-up just using deprecated _alloca function.
It is always important to call _freea after every call to _malloca.
_malloca is like _alloca, but adds some extra security checks and enhancements for your protection. As a result, it's possible for _malloca to allocate on the heap instead of the stack. If this happens, and you do not call _freea, you will get a memory leak.
In debug mode, _malloca ALWAYS allocates on the heap, so also should be freed.
Search for _ALLOCA_S_THRESHOLD for details on how the thresholds work, and why _malloca exists instead of _alloca, and it should make sense.
Edit:
There have been comments suggesting that the person just allocate on the heap, and use smart pointers, etc.
There are advantages to stack allocations, which _malloca will provide you, so there are reasons for wanting to do this. _alloca will work the same way, but is much more likely to cause a stack overflow or other problem, and unfortunately does not provide nice exceptions, but rather tends to just tear down your process. _malloca is much safer in this regard, and protects you, but the cost is that you still need to free your memory with _freea since it's possible (but unlikely in release mode) that _malloca will choose to allocate on the heap instead of the stack.
If your only goal is to avoid having to free memory, I would recommend using a smart pointer that will handle the freeing of memory for you as the member goes out of scope. This would assign memory on the heap, but be safe, and prevent you from having to free the memory. This will only work in C++, though - if you're using plain ol' C, this approach will not work.
If you are trying to allocate on the stack for other reasons (typically performance, since stack allocations are very, very fast), I would recommend using _malloca and living with the fact that you'll need to call _freea on your values.
Another thing to consider is using an RAII class to manage the allocation - of course that's only useful if your macro (or whatever) can be restricted to C++.
If you want to avoid hitting the heap for performance reasons, take a look at the techniques used by Matthew Wilson's auto_buffer<> template class (http://www.stlsoft.org/doc-1.9/classstlsoft_1_1auto__buffer.html). This will allocate on the stack unless your runtime size request exceeds a size specified at compiler time - so you get the speed of no heap allocation for the majority of allocations (if you size the template right), but everything still works correctly if your exceed that size.
Since STLsoft has a whole lot of cruft to deal with portability issues, you may want to look at a simpler version of auto_buffer<> which is described in Wilson's book, "Imperfect C++".
I found it quite handy in an embedded project.
To allocate memory on the stack, simply declare a variable of the appropriate type and size.
I answered this before, but I'd missed something fundamental that meant that it only worked in debug mode. I moved the call to _malloca into the constructor of a class that would auto-free.
In debug this is fine, as it always allocates on the heap. However, in release, it allocates on the stack, and upon returning from the constructor, the stack pointer has been reset, and the memory lost.
I went back and took a different approach, resulting in a combination of using a macro (eurgh) to allocate the memory and instantiate an object that will automatically call _freea on that memory. As it's a macro, it's allocated in the same stack frame, and so will actually work in release mode. It's just as convenient as my class, but slightly less nice to use.
I did the following:
class EXPORT_LIB_CLASS CAutoMallocAFree
{
public:
CAutoMallocAFree( void *pMem ) : m_pMem( pMem ) {}
~CAutoMallocAFree() { _freea( m_pMem ); }
private:
void *m_pMem;
CAutoMallocAFree();
CAutoMallocAFree( const CAutoMallocAFree &rhs );
CAutoMallocAFree &operator=( const CAutoMallocAFree &rhs );
};
#define AUTO_MALLOCA( Var, Type, Length ) \
Type* Var = (Type *)( _malloca( ( Length ) * sizeof ( Type ) ) ); \
CAutoMallocAFree __MALLOCA_##Var( (void *) Var );
This way I can allocate using the following macro call, and it's released when the instantiated class goes out of scope:
AUTO_MALLOCA( pBuffer, BYTE, Len );
Ar.LoadRaw( pBuffer, Len );
My apologies for posting something that was plainly wrong!
If you're using _malloca() then you must call _freea() to prevent memory leak because _malloca() can do the allocation either on stack or heap. It resorts to allocate on heap if the given size exceeds_ALLOCA_S_THRESHOLD value. Thus, it's safer to call _freea() which won't do anything if allocation happened on stack.
If you're using _alloca() which seems to be deprecated as of today; there is no need to call _freea() as the allocation happens on stack.
If your concern is having to free temp memory, and you know all about things like smart-pointers then why not use a similar pattern where memory is freed when it goes out of scope?
template <class T>
class TempMem
{
TempMem(size_t size)
{
mAddress = new T[size];
}
~TempMem
{
delete [] mAddress;
}
T* mAddress;
}
void foo( void )
{
TempMem<int> buffer(1024);
// alternatively you could override the T* operator..
some_memory_stuff(buffer.mAddress);
// temp-mem auto-freed
}

Resources