Possible to keep bad VRAM "occupied"? - macos

I've got an iMac whose VRAM appears to have gone on the fritz. On boot, things are mostly fine for a while, but eventually, as more and more windows are opened (i.e. textures are created on the GPU), I eventually hit the glitchy VRAM, and I get these bizarre "noisy" grid-like patterns of red and green in the windows.
I had an idea, but I'm mostly a newb when it comes to OpenGL and GPU programming in general, so I figured I'd ask here to see if it was plausible:
What if I wrote a little app, that ran on boot, and would allocate GPU textures (of some reasonable quantum -- I dunno, maybe 256K?) until it consumed all available VRAM (i.e. can't allocate any more textures). Then have it upload a specific pattern of data into each texture. Next it would readback the texture from the GPU and checksum the data against the original pattern. If it checks out, then release it (for the rest of the system to use). If it doesn't checksum, hang onto it (forever).
Flaws I can see: a user space app is not going to be able to definitively run through ALL the VRAM, since the system will have grabbed some, but really, I'm just trying to squeeze some extra life out of a dying machine here, so anything that helps in that regard is welcome. I'm also aware that reading back from VRAM is comparatively slow, but I'm not overly concerned with performance -- this is a practical endeavor, to be sure.
Does this sound plausible, or is there some fundamental truth about GPUs that I'm missing here?

Your approach is interesting, although I think there other ways that might be easier to implement if you're looking for a quick fix or work-around. If your VRAM is on the fritz then it's likely that there is a specific location the corruption is taking place. If you're able to determine consistently that it happens at a certain point (VRAM is consuming x amount of memory, etc.) then you can work with it.
It's quite easy to create a RAM disk, and another possibility would be to allocate regular memory for VRAM. I know both of these are very possible, because I've done it. If someone says something "won't work" (no offense Pavel), it shouldn't discourage you from at least trying. If you're interested in the techniques that I mentioned I'd be happy to provide more info, however, this is about your idea and I'd like to know if you can make it work.

If you are able to write an app that ran on boot even before an OS loaded, that would be in the bootloader - why wouldnt you just then do a self-test of memory at that time ?
Or did you mean an userland app after the OS boots into the login ? An userland app will not be able to do what you mentioned of cycling through every address simply because there is no mapping to userland directly for every page.
If you are sure that RAM is a problem, did you try replacing the RAM ?

Related

Python 2.7 memory management with pygame

I am new to Python, writing something with pygame and it is very bitmap intensive. Here are certain (current) facts about it:
All graphics files have the potential to be reused at any point in a program instance.
It can take up 1GB+ memory if I pre-load everything in the beginning, even when there are no duplicates.
It is not hard to load the images when they are (almost) needed i.e. the file sizes are very small compared to the memory usage, and it is easy to predict what will come next.
There are many suggestions not to use del, and I do not know if that applies to my case. I have thought about utilizing the garbage collection mechanism, by implementing a resource manager that holds the only reference to any loaded image, and it juggles through different images, roughly by removing the reference for one while re-loading an other.
However, I am not very sure if this really frees any memory at any point, and I don't know how make the GC to actually keep the memory down consistently, as it seems that gc calls is quite expensive (and by default too infrequent)
So in summary, I would like to know whether the method outlined above is worth a try, and if not I hope someone could teach me other ways such as properly using del, and whether that fits pygame. Any help will be appreciated.
Try this, see if its good enough. http://www.pygame.org/wiki/LazyImageLoading?parent=CookBook
When you first reference an item in an ImageController instance, it is
loaded and returned. While a reference is kept to the image, it
remains available in the ImageController. When the image no longer has
any active references, it is removed from memory, and will be reloaded
next time it is referenced.
Keep your initial texture manager design as simple as possible. Afterwards, if profiling says you need more performance, then optimize.

Drastic performance inprovement in .NET CF after app gets moved out of the foreground. Why?

I have a large (500K lines) .NET CF (C#) program, running on CE6/.NET CF 3.5 (v.3.5.10181.0). This is running on a FreeScale i.Mx31 (ARM) # 400MHz. It has 128MB RAM, with ~80MB available to applications. My app is the only significant one running (this is a dedicated, embedded system). Managed memory in use (as reported by GC.Collect) is about 18MB.
To give a better idea of the app size, here's some stats culled from .NET CF Remote Performance Monitor after staring up the application:
GC:
Garbage Collections 131
Bytes Collected by GC 97,919,260
Managed Bytes in use after GC 17,774,992
Total Bytes in use after GC 24,117,424
GC Compactions 41
JIT:
Native Bytes Jitted: 10,274,820
Loader:
Classes Loaded 7,393
Methods Loaded 27,691
Recently, I have been trying to track down a performance problem. I found that my benchmark after running the app in two different startup configurations would run in approximately 2 seconds (slow case) vs. 1 second (fast case). In the slow case, the time for the benchmark could change randomly from EXE run to EXE run from 1.1 to 2 seconds, but for any given EXE run, would not change for the life of the application. In other words, you could re-run the benchmark and the time for the test stays the same until you restart the EXE, at which point a new time is established and consistent.
I could not explain the 1.1 to 2x slowdown via any conventional mechanism, or by narrowing the slowdown to any particular part of the benchmark code. It appeared that the overall process was just running slower, almost like a thread was spinning and taking away some of "my" CPU.
Then, I randomly discovered that just by switching away from my app (the GUI loses the foreground) to another app, my performance issue disappears. It stays gone even after returning my app to the foreground. I now have a tentative workaround where my app after startup launches an auxiliary app with a 1x1 size window that kills itself after 5ms. Thus the aux app takes the foreground, then relinquishes it.
The question is, why does this speed up my application?
I know that code gets pitched when a .NET CF app loses the foreground. I also notice that when performing a "GC Heap" capture with .NET CF Remote Performance Monitor, a Code Pitch is logged -- and this also triggers the performance improvement in my app. So I suspect somehow that code pitching is related or even responsible for fixing performance. But I'm at a loss as to figure out how to determine if that is really the case, or even to explain why pitching code could help in this way. Does pitching out lots of code somehow significantly help locality of reference of code pages (that are re-JITted, presumably near each other in memory) enough to help this much? (My benchmark spans probably 3 dozen routines and hundreds of lines of code.)
Most importantly, what can I do in my app to reliably avoid this slower condition. Any pointers to relevant .NET CF / JIT / Code pitching information would be greatly appreciated.
Your app going to the background auto-triggers a GC.Collect, which collects, may compact the GC Heap and may pitch code. Have you checked to see if a manual GC.Collect without going to the background gives the same behavior? It might not be pitching that's giving the perf gain, it might be collection or compaction. If a significant number of dead roots are swept up, walking the root tree may be getting faster. Can't say I've specifically seen this issue, so this is all conjecture.
Send your app a wm_hibernate before your performance loop. Will clean up things
We have a similar issue with our .NET CF application.
Over time, our application progressively slows down, eventually to a halt with what I anticipate is due to high CPU load, or as #wil-s says, as if thread is spinning consuming CPU. The only assumption / conclusion I've made to so far is either we have a rogue thread in our code, or there's an under the cover issue in .NET CF, maybe with the JITter.
Closing the application and re-launching returns our application to normal expected performance.
I am yet to implement code change to test issuing WM_HIBERNATE or launch a dummy app which quits itself (as above) to force a code pitch, but am fairly sure this will resolve our issue based on the above comments. (so many thanks for that)
However, I'm subsequently interested to know whether a root cause was ever found to this specific issue?
Incidentally and seemingly off topic (but bear with me), we're using a Freescale i.MX28 processor and are experiencing unpredictable FlashDisk corruption. Seeing 2K blocks of 0xFFs (erased blocks) in random files located on NAND Flash.
I'm mentioning this as I now believe the CPU and FlashDisk corruption issues are linked, due to this article as well as this one:
https://electronics.stackexchange.com/questions/26720/flash-memory-corruption-due-to-electricals
In the article, #jwygralak67 comments:
I recently worked through a flash corruption issue, on a WinCE system,
as part of a development team. We would sporadically find 2K blocks of
flash that were erased. (All bytes 0xFF) For about 6 months we tested
everything from ESD, to dirty power to EMI and RFI interference, we
bought brand new devices and tracked usage to make sure we weren't
exceeding the erase cycle limit and buring out blocks, we went through
our (application level) software with a fine toothed comb.
In the end it turned out to be an obscure bug in the very low level
flash driver code, which only occurred under periods of heavy CPU
load. The driver came from a 3rd party. We informed them of the issue
we found, but I don't know if they ever released a patch.
Unfortunately, we're yet to make contact with him.
With all of this in mind, potentially if we work around the high CPU load, maybe the corruption will no longer be present. Another conjecture case!
On that assumption however, this doesn't give a firm root cause for either situation, which I'm desperately seeking!
Any knowledge or insight, however small, would be very gratefully received.
#ctacke - we've spoken before regarding OpenNETCF via email, so I'm pleased to see your name!

Dynamically adapting caches to available memory

Is there a way to implement dynamically adapting caches in userspace? I would like my programs to allocate caches that employ some fair share of the available physical memory. If the system is running out of physical memory, caches should be dropped as chosen by the program, and in no case should they be swapped out. It is preferrable that no special privilege was needed, so it is not necessary to actually lock the memory. The program should just get to know that pages are swapped out, so it is not going to use them. All in all, it should work something like caches and buffers implemented in the kernel. Can you point out general ideas and APIs how that can be done? Platforms I am interested in are Linux and Windows.
Why do you think there is any reasonable way to define "fair share"? It's not really a great UX when the application tries to know too much: far better would be to find a sensible, minimal default, and offer the user a config option to adjust it. Even better is to provide the user with stats to show how well the current-sized cache is doing - bigger isn't always better.
There is no "cooperative memory management" API in Linux - no way for the kernel to tell user-space to use less memory. The closest I can think of is that the (relatively new) memory cgroup controller can provide a "notifier" when a memory limit is reached (rather than OOM-killing the allocating process.) That's not exactly nice to use, but then again, any such interface is going to flirt with being race/deadlock-prone. Polling with mincore might work in somewhat contrived/constrained situations, but given that the app has no way to understand the changing system-wide demand for memory, it's not going to work well.

Have you ever used NSZoneMalloc() instead of malloc()?

Cocoa provides for page-aligned memory areas that it calls Memory Zones, and provides a few memory management functions that take a zone as an argument.
Let's assume you need to allocate a block of memory (not for an object, but for arbitrary data). If you call malloc(size), the buffer will always be allocated in the default zone. However, somebody may have used allocWithZone: to allocate your object in another zone besides the default. In that case, it would seem better to use NSZoneMalloc([self zone], size), which keeps your buffer and owning object in the same area of memory.
Do you follow this practice? Have you ever made use of memory zones?
Update: I think there is a tendency on Stack Overflow to respond to questions about low-level topics with a lecture about premature optimization. I understand that zones probably mattered more in 1993 on NeXT hardware than they do today, and a Google search makes it pretty clear that virtually nobody is concerned with them. I am asking anyway, to see if somebody could describe a project where they made use of memory zones.
I've written software for NeXTStep, GNUstep on Linux and Cocoa on Mac OS X, and have never needed to use custom memory zones. The condition which would suggest it as a good improvement to the software has either never arisen, or never been detected as significant.
You're absolutely right in your entire question, but in practice, nobody really uses zones. As the page you link to puts it:
In most circumstances, using the default zone is faster and more efficient than creating a separate zone.
The benefit of making your own zone is:
If a page fault occurs when trying to access one of the objects, loading the page brings in all of the related objects, which could significantly reduce the number of future page faults.
If a page fault occurs, that means that the system was recently paging things out and is therefore slow anyway, and that either your app is not responsible or the solution is in the part of your app that allocated too much memory at once in the first place.
So, basically, the question is “can you prove that you really do need to create your own zone to fix a performance problem or make your app wicked fast”, and the answer is “no”.
If you find yourself doing this, you're probably operating at a lower level than you really ought to be. The subsystem pretty much ignores them; any calls to +alloc or such will get you objects in the default zone. malloc and NSAllocateCollectable are all you need to know.

Windows Stalls When My Program Uses Swapfile

I am running a user mode program on normal priority. My program is searching an NP problem, and as a result, uses up a lot of memory which eventually ends up in the swap file.
Then my mouse freezes up, and it takes forever for task manager to open up and let me end the process.
What I want to know is how I can stop my Windows operating system from completely locking up from this even though only 1 out of my 2 cores are being used.
Edit:
Thanks for the replies.
I know that making it use less memory will help, but it just doesn't make sense to me that the whole OS should lock up.
The obvious answer is "use less memory". When your app uses up all the
available memory, the OS has to page the task manager (etc.) out to make room for your app. When you switch programs, the OS has to page the other programs back in (as they are needed).
Disk reads are slower than memory reads, so everything appears to be
going slower.
If you want to avoid this, have your app manage its own memory, or
use a better algorithm than brute force. (There are genetic
algorithms, simulated annealing, etc.)
The problem is that when another program (e.g. explorer.exe) is going to execute, all of its code and memory has been swapped out. To make room for the other program Windows has to first write data that your program is using to disk, then load up the other program's memory. Every new page of code that is executed in the other program requires disk access, causing it to run slowly.
I don't know the access pattern of your program, but I'm guessing it touches all of its memory pages a lot in a random fashion, which makes the problem worse because as soon as Windows evicts a memory page from your program, suddenly you need it again and Windows has to find some other page to give the same treatment.
To give other processes more RAM to live in, you can use SetProcessWorkingSetSize to reduce the maximum amount of RAM that your program may use. Of course this will make your program run more slowly because it has to do more swapping.
Another alternative you could try is to add more drives to the system, and distribute the swap files over those. You may have a dual-core CPU, but you have only a single drive. Distributing the swap file over multiple drives allows Windows to balance work across them (although I don't have first-hand experience of how well it does this).
I don't think there's a programming answer to this question, aside from "restructure your app to use less memory." The swapfile problem is most likely due to the bottleneck in accessing the disk, especially if you're using an IDE HDD or a highly fragmented swapfile.
It's a bit extreme, but you could always minimise your swap file so you don't have all the disk thashing, and your program isn't allowed to allocate much virtual memory. Under Control panel / Advanced / Advanced tab / Perfromance / Virtual memory, set the page file to custom size and enter a value of 2mb (smallest allowed on XP). When an allocation fails, you should get an exception and be able exit gracefully. It doesn't quite fix your problem, just speeds it up ;)
Another thing worth considering would be if you are ona 32bit platform, port to a 64bit system and get a box with much more addressable RAM.

Resources