Memory footprint of NSDictionary and NSArray

Memory footprint of NSDictionary and NSArray - cocoa

The project I'm working on requires me to temporarily store hundreds and sometimes thousands of entries in a buffer. The easy way is to store each entry in an NSDictionary and all the entries in an NSArray. Each NSDictionary contains about a dozen objects (NSStrings and NSNumbers). During the entire operation, the NSArray with its dictionaries remains in memory hence my question.
Is this an expensive operation in terms of memory usage and what is a good way to test this?

Instruments contains a memory monitoring module. In the bottom-left corner of instruments, click on the gear icon, then choose Add Instrument > Memory Monitory. Apple's documentation should help you understand how to monitor memory with Instruments. See also this question.
In my experience, NSDictionary and NSArray are both fairly efficient in terms of memory usage. I have written several apps that store thousands of keys/values from .csv or .xml files, and there's a fairly linear increase in memory usage as the NSDictionary is filled up. My advice is to use the Instruments profiler in some corner cases if you can built unit tests for them.
I'm not sure I understand why you're storing the entries in both the NSDictionary and the NSArray, though.
One thing you may want to consider if you're reaching upper bounds on memory usage is to convert the entries into a SQLite database, and then index the columns you want to do lookup on.
EDIT: Be sure to check out this question if you want to go deep in understanding iPhone memory consumption.

Apple's collection classes are more efficient than you or I would create, I wouldn't worry about a dictionary with thousands of small entries, keep in mind that the values aren't copied when added to a dictionary, but the keys are. That being said only keep in memory what you need to.

Related

What is Chrome doing with this "GC Event"?

I'm getting the following timeline from a very simple site I'm working on. Chrome tells me it's cleaning up 10MB via the GC but I don't know how it could be! Any insight into this would be appreciated.
I thought SO would let you expand the image but I guess not - here's the full size: http://i.imgur.com/FZJiGlH.png

Is this a site that we can check out or is it internal? I'd like to take a peek. I came across the excerpt below while googling on the Google Developers pages, Memory Analysis 101:
Object Sizes
Memory can be held by an object in two ways: directly by the object
itself, and implicitly by holding references to other objects, and
thus preventing them from being automatically disposed by a garbage
collector (GC for short).
The size of memory that is held by the object itself is called shallow
size. Typical JavaScript objects have some memory reserved for their
description and for storing immediate values.
Usually, only arrays and strings can have significant shallow sizes.
However, strings often have their main storage in renderer memory,
exposing only a small wrapper object on the JavaScript heap.
Nevertheless, even a small object can hold a large amount of memory
indirectly, by preventing other objects from being disposed by the
automatic garbage collection process. The size of memory that will be
freed, when the object itself is deleted, and its dependent objects
made unreachable from GC roots, is called retained size.
The last bit seems to be potentially your issue.
Investigating memory issues
If you're inclined you might want to enable this feature of chrome, chrome --enable-memory-info, and take a peak behind the curtain to see what chrome is potentially getting hung up on.
Once you have Chrome running with the memory profiling enabled you’ll have access to two new properties:
window.performance.memory.totalJSHeapSize; // currently used heap memory
window.performance.memory.usedJSHeapSize; // total heap memory
This feature is covered in more details here.

Accessing a read-only global array as fast as possible with CUDA?

I have a huge array that has to be read by different threads in parallel. Each thread has to read different entries at different places in the whole array, from start to finish. The buffer is read-only, so I don't think a "critical section" is required.
But I'm afraid that this approach has very bad performance. But I don't see an other way to do it. I could load the whole array in shared memory for each block, but I don't think there's enough shared memory for that.
Any ideas?
Edit: Some of you have asked me why I have to access different parts of the array, so here is some explanation: I'm trying to implement the "auction algorithm". In one kernel, each thread (person) has to bid on an item, which has a price, depending on its interest for that item. Each thread has to check its interest for a given object in a big array, but that is not a problem and I can coalesce the reading in shared memory. The problem is when a thread has chosen to bid for an item, it has to check its price first, and since there are many many objects to bid for, I can't bring all this info into shared memory. Moreover, each thread has to access the whole buffer of prices since they can bid on any object. My only advantage here is that the buffer is read-only.

The fastest way to access global memory is via coalesced access, however in your case this may not be possible. You could investigate texture memory which is read only, though usually used for spatial 2D access.
Section 3.2 of the Cuda Best practice guide
has great information about this and other memory techniques.

Reading from the shared memory is much faster when compared to reading from global memory. Maybe you can load a subset of the array to the shared memory that is required by threads in the block. If the threads in a block require values from vastly different parts of the array, you should change your algorithm as that leads to non coallesced access which is slow.
moreover, while reading from shared memory, be careful of bank conflicts which occurs when two threads read from the same bank in shared memory. Texture memory may also be a good choice because it is cached

Xcode - Arc and storyboards - heap grow 6 - 10KB on each UIViewController switch

I'am using ARC and Storyboard on my new iPad project. I get no memory leaks, if I Analyze with Instruments, but i'am getting a heap grow of 6KB - 10KB on each switch between the UIViewControllers. I'am using Storyboards build-in methods to do the switchs.
Why do I get a increase of 6-10KB ? - I know 6-10KB is not much, but I can't understand where they are coming from.
/Morten

This kind of memory usage could easily be attributed to usual allocations that would occur while presenting a new view controller. Alone, I can imagine that the memory needed for the new CALayers, UIViews, etc. would be enough to eat up this much memory. It may also be that, since you are using Story Board, certain pieces of XIB files are getting loaded into memory indefinitely.
These two factors are more than enough to explain why the memory is being allocated. Usually I wouldn't stop to worry about something like 10KB, considering that most iOS devices have around 250MB of memory at your disposal.

When (not) to abuse NSUserDefaults

I wonder what the guidelines are for:
1 - how often I can read from NSUserDefaults
2 - how much data I can reasonably store in NSUserDefaults
Obviously, there are limits to how much NSUserDefaults can be used but I have trouble determining what's reasonable and what isn't.
Some examples among others:
If my game has an option for the computer to be one of the players, I will use NSUserDefaults to save that boolean value. That much is clear. But is it also reasonable to access NSUserDefaults during my game every time I want to know whether the computer is a player or should I be using an instance variable for that instead? Assume here I need to check that boolean every second. Is the answer the same is it's 100 ms instead? What about every 10 s?
If my game has 50 moving objects and I want their positions and speeds to be stored when the user quits the app, is NSUserDefaults a reasonable place to store that data? What about 20 moving objects? What about 200?

I wonder what the guidelines are for:
1 - how often I can read from NSUserDefaults
quite regularly. expect the defaults' overhead to be similar to a thread-safe NSDictionary
2 - how much data I can reasonably store in NSUserDefaults
physically, more than you'll need it to. the logical maximum is how fast you need it to be, and how much space it takes on disk. also remember that this representation is read written to/from disk at startup/shut down and various other times.
If my game has an option for the computer to be one of the players, I will use NSUserDefaults to save that boolean value. That much is clear. But is it also reasonable to access NSUserDefaults during my game every time I want to know whether the computer is a player or should I be using an instance variable for that instead?
just add a const bool to the opponent object. zero runtime loss, apart from the memory, which will not be significant.
Assume here I need to check that boolean every second. Is the answer the same is it's 100 ms instead? What about every 10 s?
again, it's like a thread-safe NSDictionary (hashing). it will be fairly fast, and fast enough for reading at that frequency. whether it's the best design or not depends on the program. if it becomes huge, then yes the performance will suffer.
If my game has 50 moving objects and I want their positions and speeds to be stored when the user quits the app, is NSUserDefaults a reasonable place to store that data? What about 20 moving objects? What about 200?
it would be fine, although i would not read/write via user defaults during game-play; just save/load the state as needed.
i don't recommend saving all this in user defaults. just create a file representation for your game's state and use user defaults for what it's designed for. if it's huge and you write to it often, then the implementation may flush the state to disk regularly, which could take a relatively long time.

Don't worry about limits. Instead, ask yourself this simple question:
Is this a preference?
If it is a preference, then it should be in user defaults. That's what user defaults is for. If not, then it should be in the Documents directory (or, on the Mac, possibly in Application Support).
On iOS, you might tell whether it's a preference or not by whether it would be appropriate to (if possible) put it in your settings bundle for display and editing in the Settings application. On Mac OS X, you can usually tell whether it's a preference or not by whether it would be appropriate to put it in the Preferences window.
Of course, that relies on your judgment. Stanza for Mac, for example, gets it wrong, putting non-preferences in its Preferences window.
You can also consider the question by its converse:
Is this user-created data?
A preference that you will have a default value for is not user-created data; it is user-overridden data. No less bad to lose it, but it informs where you should keep it.

The main performance issue no-one here is mentioning is that the user’s home directory may be on a network volume, and may not be particularly fast. It’s not an ideal situation, but it happens, so if you’re worried about performance that’s what you should be testing against.
That said, NSUserDefaults uses an in-memory cache, and the cost is only incurred when synchronizing. According to the documentation, synchronization happens “automatically … at periodic intervals”; I believe this only applies if something has changed, though.
So, for the case of checking whether the computer is a player, using NSUserDefaults once a frame shouldn’t be a problem since it’s cached. For storing game state, it may be a performance problem if you updated it constantly, and as Peter Hosey says it’s a semantic abuse.

Hundreds to thousands of items are fine in NSUserDefaults (it's basically just a wrapper around property list serialization). With respect to the overhead for your application, the best thing to do is try it and use a profiler.

NSUserDefaults is basically a wrapper for loading a NSDictionary from a .plist file from the disk (and also writing it to the disk). You can store as much data in NSUserDefaults, but you have little control over how much memory it uses, and how it reads from the disk.
I would use different technologies for different information/data.
Small bits of data from servers, preferences, user info, et cetera, I would use NSUserDefaults.
For login information (access tokens, sensitive data), I would use the keychain. The keychain could also be used for data that should not be deleted when the app is deleted.
For large amounts of server data or game data, I would write it to the disk, but keep it in memory.
In your situation, I would keep it in memory (probably a #property), but I would periodically write it to the disk (perhaps every 1 to 5 times it changes, use an int ivar). Make sure that this disk writing method is in the AppDelegate, so that it won't fail when you close the view controller that is executing it.
This way, the data is easily accessed, but its also saved to the disk for safe keeping.

Optimizing locations of on-disk data for sequential access

I need to store large amounts of data on-disk in approximately 1k blocks. I will be accessing these objects in a way that is hard to predict, but where patterns probably exist.
Is there an algorithm or heuristic I can use that will rearrange the objects on disk based on my access patterns to try to maximize sequential access, and thus minimize disk seek time?

On modern OSes (Windows, Linux, etc) there is absolutely nothing you can do to optimise seek times! Here's why:
You are in a pre-emptive multitasking system. Your application and all it's data can be flushed to disk at any time - user switches task, screen saver kicks in, battery runs out of charge, etc.
You cannot guarantee that the file is contiguous on disk. Doing Aaron's first bullet point will not ensure an unfragmented file. When you start writing the file, the OS doesn't know how big the file is going to be so it could put it in a small space, fragmenting it as you write more data to it.
Memory mapping the file only works as long as the file size is less than the available address range in your application. On Win32, the amount of address space available is about 2Gb - memory used by application. Mapping larger files usually involves un-mapping and re-mapping portions of the file, which won't be the best of things to do.
Putting data in the centre of the file is no help as, for all you know, the central portion of the file could be the most fragmented bit.
To paraphrase Raymond Chen, if you have to ask about OS limits, you're probably doing something wrong. Treat your filesystem as an immutable black box, it just is what it is (I know, you can use RAID and so on to help).
The first step you must take (and must be taken whenever you're optimising) is to measure what you've currently got. Never assume anything. Verify everything with hard data.
From your post, it sounds like you haven't actually written any code yet, or, if you have, there is no performance problem at the moment.
The only real solution is to look at the bigger picture and develop methods to get data off the disk without stalling the application. This would usually be through asynchronous access and speculative loading. If your application is always accessing the disk and doing work with small subsets of the data, you may want to consider reorganising the data to put all the useful stuff in one place and the other data elsewhere. Without knowing the full problem domain it's not possible to to be really helpful.

Depending on what you mean by "hard to predict", I can think of a few options:
If you always seek based on the same block field/property, store the records on disk sorted by that field. This lets you use binary search for O(log n) efficiency.
If you seek on different block fields, consider storing an external index for each field. A b-tree gives you O(log n) efficiency. When you seek, grab the appropriate index, search it for your block's data file address and jump to it.
Better yet, if your blocks are homogeneous, consider breaking them down into database records. A database gives you optimized storage, indexing, and the ability to perform advanced queries for free.

Use memory-mapped file access rather than the usual open-seek-read/write pattern. This technique works on Windows and Unix platforms.
In this way the operating system's virtual memory system will handle the caching for you. Accesses of blocks that are already in memory will result in no disk seek or read time. Writes from memory back to disk are handled automatically and efficiently and without blocking your application.
Aaron's notes are good too as they will affect initial-load time for a chunk that's not in memory. Combine that with the memory-mapped technique -- after all it's easier to reorder chunks using memcpy() than by reading/writing from disk and attempting swapouts etc.

The most simple way to solve this is to use an OS which solves that for you under the hood, like Linux. Give it enough RAM to hold 10% of the objects in RAM and it will try to keep as many of them in the cache as possible reducing the load time to 0. The recent server versions of Windows might work, too (some of them didn't for me, that's why I'm mentioning this).
If this is a no go, try this algorithm:
Create a very big file on the harddisk. It is very important that you write this in one go so the OS will allocate a continuous space on disk.
Write all your objects into that file. Make sure that each object is the same size (or give each the same space in the file and note the length in the first few bytes of of each chunk). Use an empty harddisk or a disk which has just been defragmented.
In a data structure, keep the offsets of each data chunk and how often it is accessed. When it is accessed very often, swap its position in the file with a chunk that is closer to the start of the file and which has a lesser access count.
[EDIT] Access this file with the memory-mapped API of your OS to allow the OS to effectively cache the most used parts to get best performance until you can optimize the file layout next time.
Over time, heavily accessed chunks will bubble to the top. Note that you can collect the access patterns over some time, analyze them and do the reorder over night when there is little load on your machine. Or you can do the reorder on a completely different machine and swap the file (and the offset table) when that's done.
That said, you should really rely on a modern OS where a lot of clever people have thought long and hard to solve these issues for you.

That's an interesting challenge. Unfortunately, I don't know how to solve this out of the box, either. Corbin's approach sounds reasonable to me.
Here's a little optimization suggestion, at least: Place the most-accessed items at the center of your disk (or unfragmented file), not at the start of end. That way, seeking to lesser-used data will be closer by average. Err, that's pretty obvious, though.
Please let us know if you figure out a solution yourself.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio