I am new to Python, writing something with pygame and it is very bitmap intensive. Here are certain (current) facts about it:
All graphics files have the potential to be reused at any point in a program instance.
It can take up 1GB+ memory if I pre-load everything in the beginning, even when there are no duplicates.
It is not hard to load the images when they are (almost) needed i.e. the file sizes are very small compared to the memory usage, and it is easy to predict what will come next.
There are many suggestions not to use del, and I do not know if that applies to my case. I have thought about utilizing the garbage collection mechanism, by implementing a resource manager that holds the only reference to any loaded image, and it juggles through different images, roughly by removing the reference for one while re-loading an other.
However, I am not very sure if this really frees any memory at any point, and I don't know how make the GC to actually keep the memory down consistently, as it seems that gc calls is quite expensive (and by default too infrequent)
So in summary, I would like to know whether the method outlined above is worth a try, and if not I hope someone could teach me other ways such as properly using del, and whether that fits pygame. Any help will be appreciated.
Try this, see if its good enough. http://www.pygame.org/wiki/LazyImageLoading?parent=CookBook
When you first reference an item in an ImageController instance, it is
loaded and returned. While a reference is kept to the image, it
remains available in the ImageController. When the image no longer has
any active references, it is removed from memory, and will be reloaded
next time it is referenced.
Keep your initial texture manager design as simple as possible. Afterwards, if profiling says you need more performance, then optimize.
Related
I am building a ML application for binary classification using ML.NET. It will have multiple ML models of varying sizes (built using different training data) which will be stored in SQL server database as Blob. Clients will send items for classification to this app in random order and based on client ID, corresponding model is to be used for classification. To classify item, model needs be read from database and then loaded into memory. Loading model in memory is taking considerable time depending on size and I don't see any way to optimize it. Hence I am planning to cache models in memory. If I cache many heavy models, it may put pressure on memory hampering performance of other processes running on server. So there is no straightforward way to limit caching. So looking for suggestions to handle this.
Spawn a new process
In my opinion this is the only viable option to accomplish what you're trying to do. Spawn a complete new process that communicates (via IPC?) with your "main application". You could set a memory limit using this property https://learn.microsoft.com/en-us/dotnet/api/system.gcmemoryinfo.totalavailablememorybytes?view=net-5.0 or maybe even use a 3rd-party-library (e.g. https://github.com/lowleveldesign/process-governor), that kills your process if it reaches a specific amount of RAM. Both of these approaches are quite rough and will basically kill your process.
If you have control over your side car application running, it might make sense to really monitor the RAM usage with something like this Getting a process's ram usage and gracefully stop the process.
Do it yourself solution (not recommended)
Basically there is no built in way of limiting memory usage by thread or similar.
What counts towards the memory limit?
Shared resources
Since you have a running process, you need to define what exactly counts towards the memory limit. For example if you have some static Dictionary that is manipulated by the running thread - what did it occupy? Only the diff between the old value and the new value? The whole new value? The key and the value?
There are many more cases like this you'll have to take into consideration.
The actual measuring
You need some kind of way to count the actual memory usage. This will probably be hard/near impossible to "implement":
Reference counting needed?
If you have a hostile thread, it might spawn an infinite amount of references to one object, no new keyword used. For each reference you'd have to count 32/64 bits.
What about built in types?
It might be "easy" to measure a byte[] included in your own type definition, but what about built in classes? If someone initializes a string with 100MB this might be an amount you need to keep track of.
... and many more ...
As you maybe noticed with previous samples, there is no easy definition of "RAM used by a thread". This is the reason there also is no easy to get the value of it.
In my opinion it's insanely complex to do such a thing and needs a lot of definition work to do on your side. It might be feasable with lots of effort but I'm not sure if that really is what you want. Even if you manage to - what will do you about it? Only killing the thread might not clean up the ressources.
Therefore I'd really think about having a OS managed, independent, process, that you can kill whenever you feel like it.
How big are your models? Even large models 100meg+ load pretty quickly off of fast/SSD storage. I would consider caching them on fast drives/SSDs, because pulling off of SQL Server is going to be much slower than raw disk. See if this helps your performance.
After reading this article https://developer.ibm.com/tutorials/l-memory-leaks/ I'm wondering is there a way to cancel thread execution and avoid memory leaks. Since my understanding is that the join functionality is releasing the allocated space. That should be possible to do also by other commands. The thing that interest me how does join releases the memory space and other functions cant? Is there a function that gives to witch thread a memory space is assigned? Can this be given out (the mapping)? I know one should not do crazy things with that since it represents an potential safety issue. But still are there ways to achieve that?
For example if I have a third party lib then I can identify its threads but I have the problem that I cannot identify allocated memory spaces in the lib, or I do not know how to do that (the lib is a binary).
If the library doesn't support that, you can't. Your understanding of the issue is slightly off. It doesn't matter who allocated the memory, it matters whether the memory still needs to be allocated or not. If the library provides some way to get to the point where the memory no longer needs to be allocated, that provided way would also provide a way to free the memory. If the library doesn't provide any way to get to the point where the memory no longer needs to be allocated, some way to free it would not be helpful.
Coding such stuff is a rabbit hole and should be done on the OS level.
Can't be done. The OS has no way to know when the code that allocated some chunk of memory still needs it and when it doesn't. Only the code that allocated the memory can possibly know that.
Posix allows canceling but not identifying the individual threads, and not all Posix functionality works on linux. Posix is just a layer over the stl stuff in the OS.
Right, so POSIX is not the place where this goes. It requires understanding of the application and so must be done at the application layer. If you need this functionality, code it. If you need it in other people's code and they don't supply it, talk to them. Presumably, if their code is decent and appropriate, it has some way to d what you need. If not, your complaint is with the code that doesn't do what you need.
My thoughts on that were that somewhere in Linux the system tracks what allocation on heap were made by the threads if some option is enabled since I know by default there is nothing.
That doesn't help. Which thread allocated memory tells you absolutely nothing about when it is no longer needed. Only the same code that decided it was needed can tell when it is no longer needed. So if this is needed in some code that allocates memory, that code must implement this. If the person who implemented that code did not provide this kind of facility, then that means they decided it wasn't needed. You may wish to ask them why they made that decision. Their answer may well surprise you.
But I see there is no answer to a serious question.
The answer is to code what you need. If it's someone else's code and they didn't code it, then they didn't think you would need it. They're most likely right. But if they're wrong, then don't use their code.
I've got an iMac whose VRAM appears to have gone on the fritz. On boot, things are mostly fine for a while, but eventually, as more and more windows are opened (i.e. textures are created on the GPU), I eventually hit the glitchy VRAM, and I get these bizarre "noisy" grid-like patterns of red and green in the windows.
I had an idea, but I'm mostly a newb when it comes to OpenGL and GPU programming in general, so I figured I'd ask here to see if it was plausible:
What if I wrote a little app, that ran on boot, and would allocate GPU textures (of some reasonable quantum -- I dunno, maybe 256K?) until it consumed all available VRAM (i.e. can't allocate any more textures). Then have it upload a specific pattern of data into each texture. Next it would readback the texture from the GPU and checksum the data against the original pattern. If it checks out, then release it (for the rest of the system to use). If it doesn't checksum, hang onto it (forever).
Flaws I can see: a user space app is not going to be able to definitively run through ALL the VRAM, since the system will have grabbed some, but really, I'm just trying to squeeze some extra life out of a dying machine here, so anything that helps in that regard is welcome. I'm also aware that reading back from VRAM is comparatively slow, but I'm not overly concerned with performance -- this is a practical endeavor, to be sure.
Does this sound plausible, or is there some fundamental truth about GPUs that I'm missing here?
Your approach is interesting, although I think there other ways that might be easier to implement if you're looking for a quick fix or work-around. If your VRAM is on the fritz then it's likely that there is a specific location the corruption is taking place. If you're able to determine consistently that it happens at a certain point (VRAM is consuming x amount of memory, etc.) then you can work with it.
It's quite easy to create a RAM disk, and another possibility would be to allocate regular memory for VRAM. I know both of these are very possible, because I've done it. If someone says something "won't work" (no offense Pavel), it shouldn't discourage you from at least trying. If you're interested in the techniques that I mentioned I'd be happy to provide more info, however, this is about your idea and I'd like to know if you can make it work.
If you are able to write an app that ran on boot even before an OS loaded, that would be in the bootloader - why wouldnt you just then do a self-test of memory at that time ?
Or did you mean an userland app after the OS boots into the login ? An userland app will not be able to do what you mentioned of cycling through every address simply because there is no mapping to userland directly for every page.
If you are sure that RAM is a problem, did you try replacing the RAM ?
I have been tasked with reducing memory footprint of a Windows CE 5.0 application. I came across Rob Tiffany's highly cited article which recommends using managed DLL to keep the code out of the process's slot. But there is something I don't understand.
The article says that
The JIT compiler is running in your slot and it pulls in IL from the 1
GB space as needed to compile the current call stack.
This means that all the code in the managed DLL can potentially eventually end up in the process's slot. While this will help other processes by not loading the code in common area how does it help this process? FWIW the article does mention that
It also reduces the amount of memory that has to be allocated inside your
My only thought is that just as the code is pulled into the slot it is also pushed/swapped out. But that is just a wild guess and probably completely false.
CF assemblies aren't loaded into the process slot like native DLLs are. They're actually accessed as memory-mapped files. This means that the size of the DLL is effectively irrelevant.
The managed heap also lies in shared memory, not your process slot, so object allocations are far less likely to cause process slot fragmentation or OOM's.
The JITter also doesn't just JIT and hold forever. It compiles what is necessary, and during a GC may very well pitch compiled code that is not being used, or that hasn't been used in a while. You're never going to see an entire assembly JITTed and pulled into the process slow (well if it's a small assembly maybe, but it's certainly not typical).
Obviously some process slot memory has to be used to create some pointers, stack storage, etc etc, but by and large managed code has way less impact on the process slot limitations than native code. Of course you can still hit the limit with large stacks, P/Invokes, native allocations and the like.
In my experience, the area people get into trouble most often with CF apps an memory is with GDI objects and drawing. Bitmaps take up a lot of memory. Even though it's largely in shared memory, creating lots of them (along with brushes, pens, etc) and not caching and reusing is what most often give a large managed app memory footprint.
For a bit more detail this MSDN webcast on Compact Framework Memory Management, while old, is still very relevant.
Cocoa provides for page-aligned memory areas that it calls Memory Zones, and provides a few memory management functions that take a zone as an argument.
Let's assume you need to allocate a block of memory (not for an object, but for arbitrary data). If you call malloc(size), the buffer will always be allocated in the default zone. However, somebody may have used allocWithZone: to allocate your object in another zone besides the default. In that case, it would seem better to use NSZoneMalloc([self zone], size), which keeps your buffer and owning object in the same area of memory.
Do you follow this practice? Have you ever made use of memory zones?
Update: I think there is a tendency on Stack Overflow to respond to questions about low-level topics with a lecture about premature optimization. I understand that zones probably mattered more in 1993 on NeXT hardware than they do today, and a Google search makes it pretty clear that virtually nobody is concerned with them. I am asking anyway, to see if somebody could describe a project where they made use of memory zones.
I've written software for NeXTStep, GNUstep on Linux and Cocoa on Mac OS X, and have never needed to use custom memory zones. The condition which would suggest it as a good improvement to the software has either never arisen, or never been detected as significant.
You're absolutely right in your entire question, but in practice, nobody really uses zones. As the page you link to puts it:
In most circumstances, using the default zone is faster and more efficient than creating a separate zone.
The benefit of making your own zone is:
If a page fault occurs when trying to access one of the objects, loading the page brings in all of the related objects, which could significantly reduce the number of future page faults.
If a page fault occurs, that means that the system was recently paging things out and is therefore slow anyway, and that either your app is not responsible or the solution is in the part of your app that allocated too much memory at once in the first place.
So, basically, the question is “can you prove that you really do need to create your own zone to fix a performance problem or make your app wicked fast”, and the answer is “no”.
If you find yourself doing this, you're probably operating at a lower level than you really ought to be. The subsystem pretty much ignores them; any calls to +alloc or such will get you objects in the default zone. malloc and NSAllocateCollectable are all you need to know.