How does NSMutableArray allocate memory while adding many objects? - cocoa

How does NSMutableArray keep adding new objects without any limit? Is it some concept of linked list behind the api? We can keep adding because it can allocate memory in runtime and link with last one. If not then how can we make dynamic size in array concept?

Related

Sized Deallocation Feature In Memory Management in C++1y

Sized Deallocation feature has been proposed to include in C++1y. However I wanted to understand how it would affect/improve the current c++ low-level memory management?
This proposal is in N3778, which states following about the intent of this.
With C++11, programmers may define a static member function operator
delete that takes a size parameter indicating the size of the object
to be deleted. The equivalent global operator delete is not available.
This omission has unfortunate performance consequences.
Modern memory allocators often allocate in size categories, and, for
space efficiency reasons, do not store the size of the object near the
object. Deallocation then requires searching for the size category
store that contains the object. This search can be expensive,
particularly as the search data structures are often not in memory
caches. The solution is to permit implementations and programmers
to define sized versions of the global operator delete. The
compiler shall call the sized version in preference to the unsized
version when the sized version is available.
Well from above paragraph, it look like the size information which operator delete require can be maintained and hence passed by used program. This would avoid any search for the size while deallocation. But as per my understanding, while allocating, memory management store the size information in some sort of header(explained boundary-tag method in dlmalloc), which would be used while deallocation.
T* p = new T();
// Now size information would be stored in the header
// *(char*)(p - 0x4) = size;
// This would be used when we delete the memory????.
delete p;
If size information is stored in the header, why deallocation require searching for it?
It looks like I am missing something obvious and did not understand this concepts completely.
Additionally,how this feature can be used in program while dealing with the low level memory management in C++. Hope that somebody will help me to understand these concept.
As in your quote:
[Modern memory allocators] for space efficiency reasons, do not store the size of the object near the object.
Increasing the size of every allocation in order to add explicit size information is obviously going to use more memory than alternatives such as storing the size information once per allocation pool, or supplying the information upon deallocation.

Dynamic array - Does it deallocate memory when elements are removed?

Going by the article in wikipedia Dynamic Array
It automatically allocates memory in geometrical progression amounts as the last empty memory cell is filled up and then copies entire data to the new array. What happens when one removes elements in quantity larger than the amount by which it was increased? Does it automatically deallocate memory too? Or does it leave it as it is?
For example in the image on the top right of the above wikipedia link,
after the last step 2|7|1|3|8|4| one removes all the elements except 2. What happens then? Does it allocate memory of smaller size and copy the entire contents to the new one?
Side question: How or what decides what would be initial amount of memory allocated to a dynamic array?
The article you cite replies your question:
"Many dynamic arrays also deallocate some of the underlying storage if its size drops below a certain threshold, [...]"
It's really worth reading ;-)
For the cases where you know in advance that you will need a specific size, some implementations provide a specific method ("reserve()", in the C++ Standard Library).

Accessing a read-only global array as fast as possible with CUDA?

I have a huge array that has to be read by different threads in parallel. Each thread has to read different entries at different places in the whole array, from start to finish. The buffer is read-only, so I don't think a "critical section" is required.
But I'm afraid that this approach has very bad performance. But I don't see an other way to do it. I could load the whole array in shared memory for each block, but I don't think there's enough shared memory for that.
Any ideas?
Edit: Some of you have asked me why I have to access different parts of the array, so here is some explanation: I'm trying to implement the "auction algorithm". In one kernel, each thread (person) has to bid on an item, which has a price, depending on its interest for that item. Each thread has to check its interest for a given object in a big array, but that is not a problem and I can coalesce the reading in shared memory. The problem is when a thread has chosen to bid for an item, it has to check its price first, and since there are many many objects to bid for, I can't bring all this info into shared memory. Moreover, each thread has to access the whole buffer of prices since they can bid on any object. My only advantage here is that the buffer is read-only.
The fastest way to access global memory is via coalesced access, however in your case this may not be possible. You could investigate texture memory which is read only, though usually used for spatial 2D access.
Section 3.2 of the Cuda Best practice guide
has great information about this and other memory techniques.
Reading from the shared memory is much faster when compared to reading from global memory. Maybe you can load a subset of the array to the shared memory that is required by threads in the block. If the threads in a block require values from vastly different parts of the array, you should change your algorithm as that leads to non coallesced access which is slow.
moreover, while reading from shared memory, be careful of bank conflicts which occurs when two threads read from the same bank in shared memory. Texture memory may also be a good choice because it is cached

Memory footprint of NSDictionary and NSArray

The project I'm working on requires me to temporarily store hundreds and sometimes thousands of entries in a buffer. The easy way is to store each entry in an NSDictionary and all the entries in an NSArray. Each NSDictionary contains about a dozen objects (NSStrings and NSNumbers). During the entire operation, the NSArray with its dictionaries remains in memory hence my question.
Is this an expensive operation in terms of memory usage and what is a good way to test this?
Instruments contains a memory monitoring module. In the bottom-left corner of instruments, click on the gear icon, then choose Add Instrument > Memory Monitory. Apple's documentation should help you understand how to monitor memory with Instruments. See also this question.
In my experience, NSDictionary and NSArray are both fairly efficient in terms of memory usage. I have written several apps that store thousands of keys/values from .csv or .xml files, and there's a fairly linear increase in memory usage as the NSDictionary is filled up. My advice is to use the Instruments profiler in some corner cases if you can built unit tests for them.
I'm not sure I understand why you're storing the entries in both the NSDictionary and the NSArray, though.
One thing you may want to consider if you're reaching upper bounds on memory usage is to convert the entries into a SQLite database, and then index the columns you want to do lookup on.
EDIT: Be sure to check out this question if you want to go deep in understanding iPhone memory consumption.
Apple's collection classes are more efficient than you or I would create, I wouldn't worry about a dictionary with thousands of small entries, keep in mind that the values aren't copied when added to a dictionary, but the keys are. That being said only keep in memory what you need to.

in-place realloc with gcc/linux

Is there such a thing? I mean some function that would reallocate memory without moving it if possible or do nothing if not possible. In Visual C there is _expand which does what I want. Does anybody know about equivalents for other platforms, gcc/linux in particular? I'm mostly interested in shrinking memory in-place when possible (and standard realloc may move memory even when its size decreases, in case somebody asks).
I know there is no standard way to do this, and I'm explicitly asking for implementation-dependent dirty hackish tricks. List anything you know that works somewhere.
Aside from using mmap and munmap to eliminate the excess you don't need (or mremap, which could do the same but is non-standard), there is no way to reduce the size of an allocated block of memory. And mmap has page granularity (normally 4k) so unless you're dealing with very large objects, using it would be worse than just leaving the over-sized objects and not shrinking them at all.
With that said, shrinking memory in-place is probably not a good idea, since the freed memory will be badly fragmented. A good realloc implementation will want to move blocks when significantly shrinking them as an opportunity to defragment memory.
I would guess your situation is that you have an allocated block of memory with lots of other structures holding pointers into it, and you don't want to invalidate those pointers. If this is the case, here is a possible general solution:
Break your resizable object up into two allocations, a "head" object of fixed size which points to the second variable-sized object.
For other objects which need to point into the variable-size object, store a pointer to the head object and an integer offset (size_t or ptrdiff_t) into the variable-size object.
Now, even if the variable-size object moves to a new address, none of the references to it are invalidated.
If you're using these objects from multiple threads, you should put a read-write lock in the head object, read-locking it whenever you need to access the variable-sized object, and write-locking it whenever resizing the variable-sized object.
A similar question was asked on another forum. One of the more reasonable answers I saw involved using mmap for the initial allocation (using the MAP_ANONYMOUS flag) and calling mremap without the MREMAP_MAYMOVE flag. A limitation of this approach, though, is that the allocation sizes must be exact multiples to the system's page size.

Resources