Is there any way to force JavaFX to release video memory?

Is there any way to force JavaFX to release video memory? - image

I'm writing an application leveraging JavaFX that scrolls a large amount of image content on and off of the screen every 20-30 seconds. It's meant to be able to run for multiple hours, pulling in completely new content and discarding old content every couple minutes. I have 512Mb of graphics memory on my system and after several minutes, all of that memory has been consumed by JavaFX and no matter what I do with my JavaFX scene, none of it is released. I've been very careful to discard nodes when they drop off of the scene, and at most I have 50-60 image nodes in memory at one time. I really need to be able to do a hard release of the graphics memory that was backing these images, but haven't been able to figure out how to accomplish that, as the Image interface in JavaFX seems to be very high level. JavaFX will continue to run fine, but other graphics heavy applications will fail to load due to limited resources.
I'm looking for something like the flush() method on java.awt.image.Image:
http://docs.oracle.com/javase/7/docs/api/java/awt/Image.html#flush()
I'm running java 7u13 on Linux.
EDIT:
I managed to work out a potential workaround ( see below ), but have also entered a JavaFX JIRA ticket to request the functionality described above:
RT-28661
Add explicit access to a native resource cleanup function on nodes.

The best workaround that I could come up with was to set my JVM's max heap to half of the available limit of my graphics card. ( I have 512mb of graphics memory, so I set this to -Xmx256m ) This forces the GC to be more proactive in cleaning up my discarded javafx.image.Image objects, which in turn seems to trigger graphics memory cleanup on the part of JavaFX.
Previously my heap space was set to 512mb, ( I have 4gb of system memory, so this is a very manageable limit ). The problem with that seems to be that the JVM was being very lazy about cleaning up my images until it started approaching this 512mb limit. Since all of my image data was copied into graphics memory, this meant I had most likely exhausted my graphics memory before the JVM really started really caring about cleanup.
I did try some of the suggestions by jewelsea:
I am calling setCache(false), so this may be having a positive affect, but I didn't notice an improvement until I dropped my max heap size.
I tried running with Java8 with some positive results. It did seem to behave better in graphics memory management, but it still ate up all of my memory and didn't seem to start caring about graphics memory until I was almost out. If reducing your the application's heap limit is not feasible, then evaluating the Java8 pre-release may be worthwhile.
I will be posting some feature requests to the JavaFX project and will provide links to the JIRA tickets.

Perhaps you are encountering behaviour related to the root cause of the following issue:
RT-16011 Need mechanism for PG nodes to know when they are no longer part of a scene graph
From the issue description:
Some PG nodes contain handles to non-heap resources, such as GPU textures, which we would want to aggressively reclaim when the node is no longer part of a scene graph. Unfortunately, there is no mechanism to report this state change to them so that they can release their resources so we must rely on a combination of GC, Ref queues, and sometimes finalization to reclaim the resources. Lazy reclamation of some of these resources can result in exceptions when garbage collection gets behind and we run out of these limited resources.
There are numerous other related issues you can see when you look at the issue page I linked (signup is required to view the issue, but anybody can signup).
A sample related issue is:
RT-15516 image data associated with cached nodes that are removed from a scene are not aggressively released
On which a user commented:
I found a workaround for my app just settihg up an using of cashe to false for all frequently using nodes. 2 days working without any crashes.
So try calling setCache(false) on your nodes.
Also try using a Java 8 preview release where some of these issues have been fixed and see if it increases the stability of your application. Though currently, even in the Java 8 branch, there are still open issues such as the following:
RT-25323 Need a unified Texture resource management system for Prism
Currently texture resources are managed separately in at least 2 places depending on how it is used; one is a texture cache for images and the other is the ImagePool for RTTs. This approach is flawed in its design, i.e. the 2 caches are unaware of each other and it assumes system has unlimited native resources.
Using a video card with more memory may either reduce or eliminate the issue.
You may also wish to put together a minimal executable example which demonstrates your issue and raise a bug request against the JavaFX Runtime project so that a JavaFX developer can investigate your scenario and see if it is new or a duplicate of a known issue.

Related

Is Xcode's debug navigator useless?

I am building an app in Xcode and am now deep into the memory management portion of the project. When I use Allocations and Leaks I seem to get entirely different results from what I see in Xcode's debug panel: particularly the debug panel seems to show much higher memory usage than what I see in Allocations and it also seems to highlight leaks that as far as I can tell (1) do not exist and (2) are confirmed to not exist by the Leaks tool. Is this thing useless, or even worse, misleading?
Here was a new one: today it told me I was using >1 GB of memory but its little memory meter read significantly <1 GB (and was still wrong if the Allocations data is accurate). Picture below.
UPDATE: I ran VM Tracker in a 38-minute session and it does appear virtual memory accounts for the difference between allocations / leaks and the memory gauge. Picture below. I'm not entirely sure how to think about this yet. Our game uses a very large number of textures that are swapped. I imagine this is common in most games of our scale (11 boards, 330 levels; each board and map screen has unique artwork).

You are probably using the Memory Gauge while running in the Simulator using a Debug build configuration. Both of those will give you misleading memory results. The only reliable way to know how memory is being managed is to run on a device using a Release build. Instruments uses the Release build configuration, so it's already going to be better than just running and using the Memory Gauge.
Moreover, it is a known flaw that the Xcode built-in memory tools, such as the Memory Debugger, can generate false positives for leaks.
However, Instruments has its flaws as well. My experience is that, for example, it fails to catch leaks generated during app startup. Another problem is that people don't always understand how to read its output. For example, you say:
the debug panel seems to show much higher memory usage than what I see in Allocations
Yes, but Allocations is not the whole story. You are probably failing to look at the VM allocations. Those are shown separately and often constitute the reason for high memory use (because they include the backing stores for images and the view rendering tree). The Memory Gauge does include virtual memory, so this alone might account for the "difference" you think you're seeing.
So, the answer to your question is: No, the Memory Gauge is not useless. It gives a pretty good idea of when you might need to be alert to a memory issue. But you are then expected to switch to Instruments for a proper analysis.

Multiple contexts per application vs multiple applications per context

I was wondering whether it is a good idea to create a "system" wide rendering server that is responsible for the rendering of all application elements. Currently, applications usually have their own context, meaning whatever data might be identical across different applications, it will be duplicated in GPU memory and the more frequent resource management calls only decrease the count of usable render calls. From what I understand, the OpenGL execution engine/server itself is sequential/single threaded in design. So technically, everything that might be reused across applications, and especially heavy stuff like bitmap or geometry caches for text and UI, is just clogging the server with unnecessary transfers and memory usage.
Are there any downsides to having a scenegraph shared across multiple applications? Naturally, assuming the correct handling of clients which accidentally freeze.

I was wondering whether it is a good idea to create a "system" wide rendering server that is responsible for the rendering of all application elements.
That depends on the task at hand. A small detour: Take a webbrowser for example, where JavaScript performs manipulations on the DOM; CSS transform and SVG elements define graphical elements. Each JavaScript called in response to an event may run as a separate thread/lighweight process. In a matter of sense the webbrowser is a rendering engine (heck they're internally even called rendering engines) for a whole bunch of applications.
And for that it's a good idea.
And in general display servers are a very good thing. Just have a look at X11, which has an incredible track record. These days Wayland is all the hype, and a lot of people drank the Kool-Aid, but you actually want the abstraction of a display server. However not for the reasons you thought. The main reason to have a display server is to avoid redundant code (not redundant data) and to have only a single entity to deal with the dirty details (color spaces, device physical properties) and provide optimized higher order drawing primitives.
But in regard with the direct use of OpenGL none of those considerations matter:
Currently, applications usually have their own context, meaning whatever data might be identical across different applications,
So? Memory is cheap. And you don't gain performance by coalescing duplicate data, because the only thing that matters for performance is the memory bandwidth required to process this data. But that bandwidth doesn't change because it only depends on the internal structure of the data, which however is unchanged by coalescing.
In fact deduplication creates significant overhead, since when one application made changes, that are not to affect the other application a copy-on-write operation has to be invoked which is not for free, usually means a full copy, which however means that while making the whole copy the memory bandwidth is consumed.
However for a small, selected change in the data of one application, with each application having its own copy the memory bus is blocked for much shorter time.
it will be duplicated in GPU memory and the more frequent resource management calls only decrease the count of usable render calls.
Resource management and rendering normally do not interfere with each other. While the GPU is busy turning scalar values into points, lines and triangles, the driver on the CPU can do the housekeeping. In fact a lot of performance is gained by keeping making the CPU do non-rendering related work while the GPU is busy rendering.
From what I understand, the OpenGL execution engine/server itself is sequential/single threaded in design
Where did you read that? There's no such constraint/requirement on this in the OpenGL specifications and real OpenGL implementations (=drivers) are free to parallelize as much as they want.
just clogging the server with unnecessary transfers and memory usage.
Transfer happens only once, when the data gets loaded. Memory bandwidth consumption is unchanged by deduplication. And memory is so cheap these days, that data deduplication simply isn't worth the effort.
Are there any downsides to having a scenegraph shared across multiple applications? Naturally, assuming the correct handling of clients which accidentally freeze.
I think you completely misunderstand the nature of OpenGL. OpenGL is not a scene graph. There's no scene, there are mo models in OpenGL. Each applications has its own layout of data and eventually this data gets passed into OpenGL to draw pixels onto the screen.
To OpenGL however there are just drawing commands to turn arrays of scalar values into points, lines and triangles on the screen. There's nothing more to it.

What is Chrome doing with this "GC Event"?

I'm getting the following timeline from a very simple site I'm working on. Chrome tells me it's cleaning up 10MB via the GC but I don't know how it could be! Any insight into this would be appreciated.
I thought SO would let you expand the image but I guess not - here's the full size: http://i.imgur.com/FZJiGlH.png

Is this a site that we can check out or is it internal? I'd like to take a peek. I came across the excerpt below while googling on the Google Developers pages, Memory Analysis 101:
Object Sizes
Memory can be held by an object in two ways: directly by the object
itself, and implicitly by holding references to other objects, and
thus preventing them from being automatically disposed by a garbage
collector (GC for short).
The size of memory that is held by the object itself is called shallow
size. Typical JavaScript objects have some memory reserved for their
description and for storing immediate values.
Usually, only arrays and strings can have significant shallow sizes.
However, strings often have their main storage in renderer memory,
exposing only a small wrapper object on the JavaScript heap.
Nevertheless, even a small object can hold a large amount of memory
indirectly, by preventing other objects from being disposed by the
automatic garbage collection process. The size of memory that will be
freed, when the object itself is deleted, and its dependent objects
made unreachable from GC roots, is called retained size.
The last bit seems to be potentially your issue.
Investigating memory issues
If you're inclined you might want to enable this feature of chrome, chrome --enable-memory-info, and take a peak behind the curtain to see what chrome is potentially getting hung up on.
Once you have Chrome running with the memory profiling enabled you’ll have access to two new properties:
window.performance.memory.totalJSHeapSize; // currently used heap memory
window.performance.memory.usedJSHeapSize; // total heap memory
This feature is covered in more details here.

How does the size of managed code affect memory footprint?

I have been tasked with reducing memory footprint of a Windows CE 5.0 application. I came across Rob Tiffany's highly cited article which recommends using managed DLL to keep the code out of the process's slot. But there is something I don't understand.
The article says that
The JIT compiler is running in your slot and it pulls in IL from the 1
GB space as needed to compile the current call stack.
This means that all the code in the managed DLL can potentially eventually end up in the process's slot. While this will help other processes by not loading the code in common area how does it help this process? FWIW the article does mention that
It also reduces the amount of memory that has to be allocated inside your
My only thought is that just as the code is pulled into the slot it is also pushed/swapped out. But that is just a wild guess and probably completely false.

CF assemblies aren't loaded into the process slot like native DLLs are. They're actually accessed as memory-mapped files. This means that the size of the DLL is effectively irrelevant.
The managed heap also lies in shared memory, not your process slot, so object allocations are far less likely to cause process slot fragmentation or OOM's.
The JITter also doesn't just JIT and hold forever. It compiles what is necessary, and during a GC may very well pitch compiled code that is not being used, or that hasn't been used in a while. You're never going to see an entire assembly JITTed and pulled into the process slow (well if it's a small assembly maybe, but it's certainly not typical).
Obviously some process slot memory has to be used to create some pointers, stack storage, etc etc, but by and large managed code has way less impact on the process slot limitations than native code. Of course you can still hit the limit with large stacks, P/Invokes, native allocations and the like.
In my experience, the area people get into trouble most often with CF apps an memory is with GDI objects and drawing. Bitmaps take up a lot of memory. Even though it's largely in shared memory, creating lots of them (along with brushes, pens, etc) and not caching and reusing is what most often give a large managed app memory footprint.
For a bit more detail this MSDN webcast on Compact Framework Memory Management, while old, is still very relevant.

Drastic performance inprovement in .NET CF after app gets moved out of the foreground. Why?

I have a large (500K lines) .NET CF (C#) program, running on CE6/.NET CF 3.5 (v.3.5.10181.0). This is running on a FreeScale i.Mx31 (ARM) # 400MHz. It has 128MB RAM, with ~80MB available to applications. My app is the only significant one running (this is a dedicated, embedded system). Managed memory in use (as reported by GC.Collect) is about 18MB.
To give a better idea of the app size, here's some stats culled from .NET CF Remote Performance Monitor after staring up the application:
GC:
Garbage Collections 131
Bytes Collected by GC 97,919,260
Managed Bytes in use after GC 17,774,992
Total Bytes in use after GC 24,117,424
GC Compactions 41
JIT:
Native Bytes Jitted: 10,274,820
Loader:
Classes Loaded 7,393
Methods Loaded 27,691
Recently, I have been trying to track down a performance problem. I found that my benchmark after running the app in two different startup configurations would run in approximately 2 seconds (slow case) vs. 1 second (fast case). In the slow case, the time for the benchmark could change randomly from EXE run to EXE run from 1.1 to 2 seconds, but for any given EXE run, would not change for the life of the application. In other words, you could re-run the benchmark and the time for the test stays the same until you restart the EXE, at which point a new time is established and consistent.
I could not explain the 1.1 to 2x slowdown via any conventional mechanism, or by narrowing the slowdown to any particular part of the benchmark code. It appeared that the overall process was just running slower, almost like a thread was spinning and taking away some of "my" CPU.
Then, I randomly discovered that just by switching away from my app (the GUI loses the foreground) to another app, my performance issue disappears. It stays gone even after returning my app to the foreground. I now have a tentative workaround where my app after startup launches an auxiliary app with a 1x1 size window that kills itself after 5ms. Thus the aux app takes the foreground, then relinquishes it.
The question is, why does this speed up my application?
I know that code gets pitched when a .NET CF app loses the foreground. I also notice that when performing a "GC Heap" capture with .NET CF Remote Performance Monitor, a Code Pitch is logged -- and this also triggers the performance improvement in my app. So I suspect somehow that code pitching is related or even responsible for fixing performance. But I'm at a loss as to figure out how to determine if that is really the case, or even to explain why pitching code could help in this way. Does pitching out lots of code somehow significantly help locality of reference of code pages (that are re-JITted, presumably near each other in memory) enough to help this much? (My benchmark spans probably 3 dozen routines and hundreds of lines of code.)
Most importantly, what can I do in my app to reliably avoid this slower condition. Any pointers to relevant .NET CF / JIT / Code pitching information would be greatly appreciated.

Your app going to the background auto-triggers a GC.Collect, which collects, may compact the GC Heap and may pitch code. Have you checked to see if a manual GC.Collect without going to the background gives the same behavior? It might not be pitching that's giving the perf gain, it might be collection or compaction. If a significant number of dead roots are swept up, walking the root tree may be getting faster. Can't say I've specifically seen this issue, so this is all conjecture.

Send your app a wm_hibernate before your performance loop. Will clean up things

We have a similar issue with our .NET CF application.
Over time, our application progressively slows down, eventually to a halt with what I anticipate is due to high CPU load, or as #wil-s says, as if thread is spinning consuming CPU. The only assumption / conclusion I've made to so far is either we have a rogue thread in our code, or there's an under the cover issue in .NET CF, maybe with the JITter.
Closing the application and re-launching returns our application to normal expected performance.
I am yet to implement code change to test issuing WM_HIBERNATE or launch a dummy app which quits itself (as above) to force a code pitch, but am fairly sure this will resolve our issue based on the above comments. (so many thanks for that)
However, I'm subsequently interested to know whether a root cause was ever found to this specific issue?
Incidentally and seemingly off topic (but bear with me), we're using a Freescale i.MX28 processor and are experiencing unpredictable FlashDisk corruption. Seeing 2K blocks of 0xFFs (erased blocks) in random files located on NAND Flash.
I'm mentioning this as I now believe the CPU and FlashDisk corruption issues are linked, due to this article as well as this one:
https://electronics.stackexchange.com/questions/26720/flash-memory-corruption-due-to-electricals
In the article, #jwygralak67 comments:
I recently worked through a flash corruption issue, on a WinCE system,
as part of a development team. We would sporadically find 2K blocks of
flash that were erased. (All bytes 0xFF) For about 6 months we tested
everything from ESD, to dirty power to EMI and RFI interference, we
bought brand new devices and tracked usage to make sure we weren't
exceeding the erase cycle limit and buring out blocks, we went through
our (application level) software with a fine toothed comb.
In the end it turned out to be an obscure bug in the very low level
flash driver code, which only occurred under periods of heavy CPU
load. The driver came from a 3rd party. We informed them of the issue
we found, but I don't know if they ever released a patch.
Unfortunately, we're yet to make contact with him.
With all of this in mind, potentially if we work around the high CPU load, maybe the corruption will no longer be present. Another conjecture case!
On that assumption however, this doesn't give a firm root cause for either situation, which I'm desperately seeking!
Any knowledge or insight, however small, would be very gratefully received.
#ctacke - we've spoken before regarding OpenNETCF via email, so I'm pleased to see your name!

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio