I wonder what the guidelines are for:
1 - how often I can read from NSUserDefaults
2 - how much data I can reasonably store in NSUserDefaults
Obviously, there are limits to how much NSUserDefaults can be used but I have trouble determining what's reasonable and what isn't.
Some examples among others:
If my game has an option for the computer to be one of the players, I will use NSUserDefaults to save that boolean value. That much is clear. But is it also reasonable to access NSUserDefaults during my game every time I want to know whether the computer is a player or should I be using an instance variable for that instead? Assume here I need to check that boolean every second. Is the answer the same is it's 100 ms instead? What about every 10 s?
If my game has 50 moving objects and I want their positions and speeds to be stored when the user quits the app, is NSUserDefaults a reasonable place to store that data? What about 20 moving objects? What about 200?
I wonder what the guidelines are for:
1 - how often I can read from NSUserDefaults
quite regularly. expect the defaults' overhead to be similar to a thread-safe NSDictionary
2 - how much data I can reasonably store in NSUserDefaults
physically, more than you'll need it to. the logical maximum is how fast you need it to be, and how much space it takes on disk. also remember that this representation is read written to/from disk at startup/shut down and various other times.
If my game has an option for the computer to be one of the players, I will use NSUserDefaults to save that boolean value. That much is clear. But is it also reasonable to access NSUserDefaults during my game every time I want to know whether the computer is a player or should I be using an instance variable for that instead?
just add a const bool to the opponent object. zero runtime loss, apart from the memory, which will not be significant.
Assume here I need to check that boolean every second. Is the answer the same is it's 100 ms instead? What about every 10 s?
again, it's like a thread-safe NSDictionary (hashing). it will be fairly fast, and fast enough for reading at that frequency. whether it's the best design or not depends on the program. if it becomes huge, then yes the performance will suffer.
If my game has 50 moving objects and I want their positions and speeds to be stored when the user quits the app, is NSUserDefaults a reasonable place to store that data? What about 20 moving objects? What about 200?
it would be fine, although i would not read/write via user defaults during game-play; just save/load the state as needed.
i don't recommend saving all this in user defaults. just create a file representation for your game's state and use user defaults for what it's designed for. if it's huge and you write to it often, then the implementation may flush the state to disk regularly, which could take a relatively long time.
Don't worry about limits. Instead, ask yourself this simple question:
Is this a preference?
If it is a preference, then it should be in user defaults. That's what user defaults is for. If not, then it should be in the Documents directory (or, on the Mac, possibly in Application Support).
On iOS, you might tell whether it's a preference or not by whether it would be appropriate to (if possible) put it in your settings bundle for display and editing in the Settings application. On Mac OS X, you can usually tell whether it's a preference or not by whether it would be appropriate to put it in the Preferences window.
Of course, that relies on your judgment. Stanza for Mac, for example, gets it wrong, putting non-preferences in its Preferences window.
You can also consider the question by its converse:
Is this user-created data?
A preference that you will have a default value for is not user-created data; it is user-overridden data. No less bad to lose it, but it informs where you should keep it.
The main performance issue no-one here is mentioning is that the user’s home directory may be on a network volume, and may not be particularly fast. It’s not an ideal situation, but it happens, so if you’re worried about performance that’s what you should be testing against.
That said, NSUserDefaults uses an in-memory cache, and the cost is only incurred when synchronizing. According to the documentation, synchronization happens “automatically … at periodic intervals”; I believe this only applies if something has changed, though.
So, for the case of checking whether the computer is a player, using NSUserDefaults once a frame shouldn’t be a problem since it’s cached. For storing game state, it may be a performance problem if you updated it constantly, and as Peter Hosey says it’s a semantic abuse.
Hundreds to thousands of items are fine in NSUserDefaults (it's basically just a wrapper around property list serialization). With respect to the overhead for your application, the best thing to do is try it and use a profiler.
NSUserDefaults is basically a wrapper for loading a NSDictionary from a .plist file from the disk (and also writing it to the disk). You can store as much data in NSUserDefaults, but you have little control over how much memory it uses, and how it reads from the disk.
I would use different technologies for different information/data.
Small bits of data from servers, preferences, user info, et cetera, I would use NSUserDefaults.
For login information (access tokens, sensitive data), I would use the keychain. The keychain could also be used for data that should not be deleted when the app is deleted.
For large amounts of server data or game data, I would write it to the disk, but keep it in memory.
In your situation, I would keep it in memory (probably a #property), but I would periodically write it to the disk (perhaps every 1 to 5 times it changes, use an int ivar). Make sure that this disk writing method is in the AppDelegate, so that it won't fail when you close the view controller that is executing it.
This way, the data is easily accessed, but its also saved to the disk for safe keeping.
Related
I am building a ML application for binary classification using ML.NET. It will have multiple ML models of varying sizes (built using different training data) which will be stored in SQL server database as Blob. Clients will send items for classification to this app in random order and based on client ID, corresponding model is to be used for classification. To classify item, model needs be read from database and then loaded into memory. Loading model in memory is taking considerable time depending on size and I don't see any way to optimize it. Hence I am planning to cache models in memory. If I cache many heavy models, it may put pressure on memory hampering performance of other processes running on server. So there is no straightforward way to limit caching. So looking for suggestions to handle this.
Spawn a new process
In my opinion this is the only viable option to accomplish what you're trying to do. Spawn a complete new process that communicates (via IPC?) with your "main application". You could set a memory limit using this property https://learn.microsoft.com/en-us/dotnet/api/system.gcmemoryinfo.totalavailablememorybytes?view=net-5.0 or maybe even use a 3rd-party-library (e.g. https://github.com/lowleveldesign/process-governor), that kills your process if it reaches a specific amount of RAM. Both of these approaches are quite rough and will basically kill your process.
If you have control over your side car application running, it might make sense to really monitor the RAM usage with something like this Getting a process's ram usage and gracefully stop the process.
Do it yourself solution (not recommended)
Basically there is no built in way of limiting memory usage by thread or similar.
What counts towards the memory limit?
Shared resources
Since you have a running process, you need to define what exactly counts towards the memory limit. For example if you have some static Dictionary that is manipulated by the running thread - what did it occupy? Only the diff between the old value and the new value? The whole new value? The key and the value?
There are many more cases like this you'll have to take into consideration.
The actual measuring
You need some kind of way to count the actual memory usage. This will probably be hard/near impossible to "implement":
Reference counting needed?
If you have a hostile thread, it might spawn an infinite amount of references to one object, no new keyword used. For each reference you'd have to count 32/64 bits.
What about built in types?
It might be "easy" to measure a byte[] included in your own type definition, but what about built in classes? If someone initializes a string with 100MB this might be an amount you need to keep track of.
... and many more ...
As you maybe noticed with previous samples, there is no easy definition of "RAM used by a thread". This is the reason there also is no easy to get the value of it.
In my opinion it's insanely complex to do such a thing and needs a lot of definition work to do on your side. It might be feasable with lots of effort but I'm not sure if that really is what you want. Even if you manage to - what will do you about it? Only killing the thread might not clean up the ressources.
Therefore I'd really think about having a OS managed, independent, process, that you can kill whenever you feel like it.
How big are your models? Even large models 100meg+ load pretty quickly off of fast/SSD storage. I would consider caching them on fast drives/SSDs, because pulling off of SQL Server is going to be much slower than raw disk. See if this helps your performance.
I am working on an analysis tool that reads output from a process and continuously converts this to an internal format. After the "logging phase" is complete, analysis is done on the data. The data is all held in memory.
However, due to the fact that all logged information is held in memory, there is a limit on the duration of the logging. For most use cases this is ok, but it should be possible to run for longer, even if this will hurt performance.
Ideally, the program should be able to start using hard drive space in addition to RAM once the RAM usage reaches a certain limit.
This leads to my question:
Are there any existing solutions for doing this? It has to work on both Unix and Windows.
To use the disk after memory is full, we use Cache technologies such as EhCache. They can be configured with the amount of memory to use, and to overflow to disk.
But they also have smarter algorithms you can configure as needed, such as sending to disk data not used in the last 10 minutes etc... This could be a plus for you.
Without knowing more about your application it is not possible to provide a perfect answer. However it does sound a bit like you are re-inventing the wheel. Have you considered using an in-process database library like sqlite?
If you used that or similar it will take care of moving the data to and from the disk and memory and give you powerful SQL query capabilities at the same time. Even if your logging data is in a custom format if each item has a key or index of some kind a small light database may be a good fit.
This might seem too obvious, but what about memory mapped files? This does what you want and even allows a 32 bit application to use much more than 4GB of memory. The principle is simple, you allocate the memory you need (on disk) and then map just a portion of that into system memory. You could, for example, map something like 75% of the available physical memory size. Then work on it, and when you need another portion of the data, just re-map. The downside to this is that you have to do the mapping manually, but that's not necessarily bad. The good thing is that you can use more data than what fits into physical memory and into the per-process memory limit. It works really great if you actually use only part of the data at any given time.
There may be libraries that do this automatically, like the one KLE suggested (though I do not know that one). Doing it manually means you'll learn a lot about it and have more control, though I'd prefer a library if it does exactly what you want with regard to how and when the disk is being used.
This works similar on both Windows on Unix. For Windows, here is an article by Raymond Chen that shows a simple example.
Cocoa provides for page-aligned memory areas that it calls Memory Zones, and provides a few memory management functions that take a zone as an argument.
Let's assume you need to allocate a block of memory (not for an object, but for arbitrary data). If you call malloc(size), the buffer will always be allocated in the default zone. However, somebody may have used allocWithZone: to allocate your object in another zone besides the default. In that case, it would seem better to use NSZoneMalloc([self zone], size), which keeps your buffer and owning object in the same area of memory.
Do you follow this practice? Have you ever made use of memory zones?
Update: I think there is a tendency on Stack Overflow to respond to questions about low-level topics with a lecture about premature optimization. I understand that zones probably mattered more in 1993 on NeXT hardware than they do today, and a Google search makes it pretty clear that virtually nobody is concerned with them. I am asking anyway, to see if somebody could describe a project where they made use of memory zones.
I've written software for NeXTStep, GNUstep on Linux and Cocoa on Mac OS X, and have never needed to use custom memory zones. The condition which would suggest it as a good improvement to the software has either never arisen, or never been detected as significant.
You're absolutely right in your entire question, but in practice, nobody really uses zones. As the page you link to puts it:
In most circumstances, using the default zone is faster and more efficient than creating a separate zone.
The benefit of making your own zone is:
If a page fault occurs when trying to access one of the objects, loading the page brings in all of the related objects, which could significantly reduce the number of future page faults.
If a page fault occurs, that means that the system was recently paging things out and is therefore slow anyway, and that either your app is not responsible or the solution is in the part of your app that allocated too much memory at once in the first place.
So, basically, the question is “can you prove that you really do need to create your own zone to fix a performance problem or make your app wicked fast”, and the answer is “no”.
If you find yourself doing this, you're probably operating at a lower level than you really ought to be. The subsystem pretty much ignores them; any calls to +alloc or such will get you objects in the default zone. malloc and NSAllocateCollectable are all you need to know.
I need to store large amounts of data on-disk in approximately 1k blocks. I will be accessing these objects in a way that is hard to predict, but where patterns probably exist.
Is there an algorithm or heuristic I can use that will rearrange the objects on disk based on my access patterns to try to maximize sequential access, and thus minimize disk seek time?
On modern OSes (Windows, Linux, etc) there is absolutely nothing you can do to optimise seek times! Here's why:
You are in a pre-emptive multitasking system. Your application and all it's data can be flushed to disk at any time - user switches task, screen saver kicks in, battery runs out of charge, etc.
You cannot guarantee that the file is contiguous on disk. Doing Aaron's first bullet point will not ensure an unfragmented file. When you start writing the file, the OS doesn't know how big the file is going to be so it could put it in a small space, fragmenting it as you write more data to it.
Memory mapping the file only works as long as the file size is less than the available address range in your application. On Win32, the amount of address space available is about 2Gb - memory used by application. Mapping larger files usually involves un-mapping and re-mapping portions of the file, which won't be the best of things to do.
Putting data in the centre of the file is no help as, for all you know, the central portion of the file could be the most fragmented bit.
To paraphrase Raymond Chen, if you have to ask about OS limits, you're probably doing something wrong. Treat your filesystem as an immutable black box, it just is what it is (I know, you can use RAID and so on to help).
The first step you must take (and must be taken whenever you're optimising) is to measure what you've currently got. Never assume anything. Verify everything with hard data.
From your post, it sounds like you haven't actually written any code yet, or, if you have, there is no performance problem at the moment.
The only real solution is to look at the bigger picture and develop methods to get data off the disk without stalling the application. This would usually be through asynchronous access and speculative loading. If your application is always accessing the disk and doing work with small subsets of the data, you may want to consider reorganising the data to put all the useful stuff in one place and the other data elsewhere. Without knowing the full problem domain it's not possible to to be really helpful.
Depending on what you mean by "hard to predict", I can think of a few options:
If you always seek based on the same block field/property, store the records on disk sorted by that field. This lets you use binary search for O(log n) efficiency.
If you seek on different block fields, consider storing an external index for each field. A b-tree gives you O(log n) efficiency. When you seek, grab the appropriate index, search it for your block's data file address and jump to it.
Better yet, if your blocks are homogeneous, consider breaking them down into database records. A database gives you optimized storage, indexing, and the ability to perform advanced queries for free.
Use memory-mapped file access rather than the usual open-seek-read/write pattern. This technique works on Windows and Unix platforms.
In this way the operating system's virtual memory system will handle the caching for you. Accesses of blocks that are already in memory will result in no disk seek or read time. Writes from memory back to disk are handled automatically and efficiently and without blocking your application.
Aaron's notes are good too as they will affect initial-load time for a chunk that's not in memory. Combine that with the memory-mapped technique -- after all it's easier to reorder chunks using memcpy() than by reading/writing from disk and attempting swapouts etc.
The most simple way to solve this is to use an OS which solves that for you under the hood, like Linux. Give it enough RAM to hold 10% of the objects in RAM and it will try to keep as many of them in the cache as possible reducing the load time to 0. The recent server versions of Windows might work, too (some of them didn't for me, that's why I'm mentioning this).
If this is a no go, try this algorithm:
Create a very big file on the harddisk. It is very important that you write this in one go so the OS will allocate a continuous space on disk.
Write all your objects into that file. Make sure that each object is the same size (or give each the same space in the file and note the length in the first few bytes of of each chunk). Use an empty harddisk or a disk which has just been defragmented.
In a data structure, keep the offsets of each data chunk and how often it is accessed. When it is accessed very often, swap its position in the file with a chunk that is closer to the start of the file and which has a lesser access count.
[EDIT] Access this file with the memory-mapped API of your OS to allow the OS to effectively cache the most used parts to get best performance until you can optimize the file layout next time.
Over time, heavily accessed chunks will bubble to the top. Note that you can collect the access patterns over some time, analyze them and do the reorder over night when there is little load on your machine. Or you can do the reorder on a completely different machine and swap the file (and the offset table) when that's done.
That said, you should really rely on a modern OS where a lot of clever people have thought long and hard to solve these issues for you.
That's an interesting challenge. Unfortunately, I don't know how to solve this out of the box, either. Corbin's approach sounds reasonable to me.
Here's a little optimization suggestion, at least: Place the most-accessed items at the center of your disk (or unfragmented file), not at the start of end. That way, seeking to lesser-used data will be closer by average. Err, that's pretty obvious, though.
Please let us know if you figure out a solution yourself.
In a typical handheld/portable embedded system device Battery life is a major concern in design of H/W, S/W and the features the device can support. From the Software programming perspective, one is aware of MIPS, Memory(Data and Program) optimized code.
I am aware of the H/W Deep sleep mode, Standby mode that are used to clock the hardware at lower Cycles or turn of the clock entirel to some unused circutis to save power, but i am looking for some ideas from that point of view:
Wherein my code is running and it needs to keep executing, given this how can I write the code "power" efficiently so as to consume minimum watts?
Are there any special programming constructs, data structures, control structures which i should look at to achieve minimum power consumption for a given functionality.
Are there any s/w high level design considerations which one should keep in mind at time of code structure design, or during low level design to make the code as power efficient(Least power consuming) as possible?
Like 1800 INFORMATION said, avoid polling; subscribe to events and wait for them to happen
Update window content only when necessary - let the system decide when to redraw it
When updating window content, ensure your code recreates as little of the invalid region as possible
With quick code the CPU goes back to deep sleep mode faster and there's a better chance that such code stays in L1 cache
Operate on small data at one time so data stays in caches as well
Ensure that your application doesn't do any unnecessary action when in background
Make your software not only power efficient, but also power aware - update graphics less often when on battery, disable animations, less hard drive thrashing
And read some other guidelines. ;)
Recently a series of posts called "Optimizing Software Applications for Power", started appearing on Intel Software Blogs. May be of some use for x86 developers.
Zeroith, use a fully static machine that can stop when idle. You can't beat zero Hz.
First up, switch to a tickless operating system scheduler. Waking up every millisecend or so wastes power. If you can't, consider slowing the scheduler interrupt instead.
Secondly, ensure your idle thread is a power save, wait for next interrupt instruction.
You can do this in the sort of under-regulated "userland" most small devices have.
Thirdly, if you have to poll or perform user confidence activities like updating the UI,
sleep, do it, and get back to sleep.
Don't trust GUI frameworks that you haven't checked for "sleep and spin" kind of code.
Especially the event timer you may be tempted to use for #2.
Block a thread on read instead of polling with select()/epoll()/ WaitForMultipleObjects().
Puts stress on the thread scheuler ( and your brain) but the devices generally do okay.
This ends up changing your high-level design a bit; it gets tidier!.
A main loop that polls all the things you Might do ends up slow and wasteful on CPU, but does guarantee performance. ( Guaranteed to be slow)
Cache results, lazily create things. Users expect the device to be slow so don't disappoint them. Less running is better. Run as little as you can get away with.
Separate threads can be killed off when you stop needing them.
Try to get more memory than you need, then you can insert into more than one hashtable and save ever searching. This is a direct tradeoff if the memory is DRAM.
Look at a realtime-ier system than you think you might need. It saves time (sic) later.
They cope better with threading too.
Do not poll. Use events and other OS primitives to wait for notifiable occurrences. Polling ensures that the CPU will stay active and use more battery life.
From my work using smart phones, the best way I have found of preserving battery life is to ensure that everything you do not need for your program to function at that specific point is disabled.
For example, only switch Bluetooth on when you need it, similarly the phone capabilities, turn the screen brightness down when it isn't needed, turn the volume down, etc.
The power used by these functions will generally far outweigh the power used by your code.
To avoid polling is a good suggestion.
A microprocessor's power consumption is roughly proportional to its clock frequency, and to the square of its supply voltage. If you have the possibility to adjust these from software, that could save some power. Also, turning off the parts of the processor that you don't need (e.g. floating-point unit) may help, but this very much depends on your platform. In any case, you need a way to measure the actual power consumption of your processor, so that you can find out what works and what not. Just like speed optimizations, power optimizations need to be carefully profiled.
Consider using the network interfaces the least you can. You might want to gather information and send it out in bursts instead of constantly send it.
Look at what your compiler generates, particularly for hot areas of code.
If you have low priority intermittent operations, don't use specific timers to wake up to deal with them, but deal with when processing other events.
Use logic to avoid stupid scenarios where your app might go to sleep for 10 ms and then have to wake up again for the next event. For the kind of platform mentioned it shouldn't matter if both events are processed at the same time.
Having your own timer & callback mechanism might be appropriate for this kind of decision making. The trade off is in code complexity and maintenance vs. likely power savings.
Simply put, do as little as possible.
Well, to the extent that your code can execute entirely in the processor cache, you'll have less bus activity and save power. To the extent that your program is small enough to fit code+data entirely in the cache, you get that benefit "for free". OTOH, if your program is too big, and you can divide your programs into modules that are more or less independent of the other, you might get some power saving by dividing it into separate programs. (I suppose it's also possible to make a toolchain that spreas out related bundles of code and data into cache-sized chunks...)
I suppose that, theoretically, you can save some amount of unnecessary work by reducing the number of pointer dereferencing, and by refactoring your jumps so that the most likely jumps are taken first -- but that's not realistic to do as a programmer.
Transmeta had the idea of letting the machine do some instruction optimization on-the-fly to save power... But that didn't seem to help enough... And look where that got them.
Set unused memory or flash to 0xFF not 0x00. This is certainly true for flash and eeprom, not sure about s or d ram. For the proms there is an inversion so a 0 is stored as a 1 and takes more energy, a 1 is stored as a zero and takes less. This is why you read 0xFFs after erasing a block.
Rather timely this, article on Hackaday today about measuring power consumption of various commands:
Hackaday: the-effect-of-code-on-power-consumption
Aside from that:
- Interrupts are your friends
- Polling / wait() aren't your friends
- Do as little as possible
- make your code as small/efficient as possible
- Turn off as many modules, pins, peripherals as possible in the micro
- Run as slowly as possible
- If the micro has settings for pin drive strengh, slew rate, etc. check them & configure them, the defaults are often full power / max speed.
- returning to the article above, go back and measure the power & see if you can drop it by altering things.
also something that is not trivial to do is reduce precision of the mathematical operations, go for the smallest dataset available and if available by your development environment pack data and aggregate operations.
knuth books could give you all the variant of specific algorithms you need to save memory or cpu, or going with reduced precision minimizing the rounding errors
also, spent some time checking for all the embedded device api - for example most symbian phones could do audio encoding via a specialized hardware
Do your work as quickly as possible, and then go to some idle state waiting for interrupts (or events) to happen. Try to make the code run out of cache with as little external memory traffic as possible.
On Linux, install powertop to see how often which piece of software wakes up the CPU. And follow the various tips that the powertop site links to, some of which are probably applicable to non-Linux, too.
http://www.lesswatts.org/projects/powertop/
Choose efficient algorithms that are quick and have small basic blocks and minimal memory accesses.
Understand the cache size and functional units of your processor.
Don't access memory. Don't use objects or garbage collection or any other high level constructs if they expands your working code or data set outside the available cache. If you know the cache size and associativity, lay out the entire working data set you will need in low power mode and fit it all into the dcache (forget some of the "proper" coding practices that scatter the data around in separate objects or data structures if that causes cache trashing). Same with all the subroutines. Put your working code set all in one module if necessary to stripe it all in the icache. If the processor has multiple levels of cache, try to fit in the lowest level of instruction or data cache possible. Don't use floating point unit or any other instructions that may power up any other optional functional units unless you can make a good case that use of these instructions significantly shortens the time that the CPU is out of sleep mode.
etc.
Don't poll, sleep
Avoid using power hungry areas of the chip when possible. For example multipliers are power hungry, if you can shift and add you can save some Joules (as long as you don't do so much shifting and adding that actually the multiplier is a win!)
If you are really serious,l get a power-aware debugger, which can correlate power usage with your source code. Like this