Metal managed buffers and didModifyRange on Intel macs - macos

Just noticed that some managed buffers in my code are missing didModifyRange call after updating their contents on the CPU side. But they are still updated on the GPU side somehow: the rendered image is reflecting the latest changes. What's the reason behind that?
I'm concerned since the buffers in question are rarely and carefully updated to minimize the CPU-to-GPU traffic. The total working set size is less than 500MB on a 4GB GPU. In the worst case that might mean that non-dirty buffers are still transferred to the GPU each frame.

Related

webgl bufferSubData call cost vs cost of transfering bytes

I have an array buffer. Parts of the buffer needs to be changed, and parts do not need to be changed. If the parts of the buffer that needs change are subsequent, a call to bufferSubData ranging over the part that needs to be changed is more efficient than updating the whole buffer, including changing bytes that does not need to change. The problem is if the bytes that need changing are far apart within the buffer, with many bytes between that does not need changing. Is it better to make two bufferSubData calls for each chunk that needs updating, or is it better to just make one call that unnecessarily update the ones in between as well? How costly is a bufferSubData call versus updating one more byte of data?
Assuming bufferSubData is routed to native glBufferSubData provided by the driver, it is best to be avoided altogether. It is known to be extremely slow in some mobile GPU drivers. See this page for reference (search for glBufferSubData).
I have run into extreme glBufferSubData slowness in quite recent mobile GPUs used on Android devices (Mali, PowerVR, Adreno). Interestingly, not on PowerVR GPUs used with iOS, which clearly indicates a software issue. The practical approach which seems to run well everywhere is replacing the whole buffer with glBufferData once per frame (or as few times as possible, combining the data for multiple draw calls).

Xcode Memory Utilized

So in xcode, the Debug Navigator shows CPU Usage and Memory usage. When you click on Memory it says 'Memory Utilized'.
In my app I am using the latest Restkit (0.20.x) and every time I make a GET request using getObjectsAtPath (which doesn't even return a very large payload), the memory utilized increases about 2mb. So if I refresh my app 100 times, the Memory Utilized will have grown over 200mb.
However, when I run the Leaks tool, the Live Bytes remain fairly small and do not increase with each new request. Live bytes remains below 10mb the whole time.
So do I have a memory issue or not? Memory Utilized grows like crazy, but Live Bytes suggests everything is okay.
You can use Heapshot Analysis to evaluate the situation. If that shows no growth, then the memory consumption may be virtual memory which may (for example) reside in a cache/store which may support eviction and recreation -- so you should also identify growth in Virtual Memory regions.
If you keep making requests (e.g. try 200 refreshes), the memory will likely decrease at some point - or you will have memory warnings and ultimately allocation requests may fail. Determine how memory is reduced, if this is the case. Otherwise, you will need to determine where it is created and possibly referenced.
Also, test on a device in this case. The simulator is able to utilise much more memory than a device simply because it has more to work with. Memory constraints are not simulated.

Why the data transfer is slow from GPU to CPU?

Today I have figured out something that really made me wondering. I have the Samsung Exynos 4412 ARM9 CPU which has a GPU400(QuadCore). I tried to get a texture from the GPU to CPU by all known methods and its really slow. The same scenario and slow speed happens also in modern CPUs and GPUs in the PC Platform. My wondering is how that happens and the Samsung Exynos is an SoC and both of them has the same memory and I should not care about the bus.
Why that happens ?
The data from the GPU to the CPU is transferred by many methods which I have tried glReadpixels, gltexSubImage2D, gltexImage2d, FBO.
The frame rate drops from 40FPS to 7FPs or 7FPS while using any of those methods, on a texture 1024*1024 24bits.
Possible answers taken from the OpenGL forums:
Latency: it takes time for the read command to reach the hardware.
OpenGL command buffering: Reading the data requires the OpenGL driver to complete all outstanding commands.
Hardware buffering: Hardware must empty all GPU core pipelines before doing a readback.
Possible solution:
- Copy the data internally on the GPU to another location and read it back some number of frames after computing it. This should allow everything writing to that location to have completed before you attempt to read it.

Is there any way to force JavaFX to release video memory?

I'm writing an application leveraging JavaFX that scrolls a large amount of image content on and off of the screen every 20-30 seconds. It's meant to be able to run for multiple hours, pulling in completely new content and discarding old content every couple minutes. I have 512Mb of graphics memory on my system and after several minutes, all of that memory has been consumed by JavaFX and no matter what I do with my JavaFX scene, none of it is released. I've been very careful to discard nodes when they drop off of the scene, and at most I have 50-60 image nodes in memory at one time. I really need to be able to do a hard release of the graphics memory that was backing these images, but haven't been able to figure out how to accomplish that, as the Image interface in JavaFX seems to be very high level. JavaFX will continue to run fine, but other graphics heavy applications will fail to load due to limited resources.
I'm looking for something like the flush() method on java.awt.image.Image:
http://docs.oracle.com/javase/7/docs/api/java/awt/Image.html#flush()
I'm running java 7u13 on Linux.
EDIT:
I managed to work out a potential workaround ( see below ), but have also entered a JavaFX JIRA ticket to request the functionality described above:
RT-28661
Add explicit access to a native resource cleanup function on nodes.
The best workaround that I could come up with was to set my JVM's max heap to half of the available limit of my graphics card. ( I have 512mb of graphics memory, so I set this to -Xmx256m ) This forces the GC to be more proactive in cleaning up my discarded javafx.image.Image objects, which in turn seems to trigger graphics memory cleanup on the part of JavaFX.
Previously my heap space was set to 512mb, ( I have 4gb of system memory, so this is a very manageable limit ). The problem with that seems to be that the JVM was being very lazy about cleaning up my images until it started approaching this 512mb limit. Since all of my image data was copied into graphics memory, this meant I had most likely exhausted my graphics memory before the JVM really started really caring about cleanup.
I did try some of the suggestions by jewelsea:
I am calling setCache(false), so this may be having a positive affect, but I didn't notice an improvement until I dropped my max heap size.
I tried running with Java8 with some positive results. It did seem to behave better in graphics memory management, but it still ate up all of my memory and didn't seem to start caring about graphics memory until I was almost out. If reducing your the application's heap limit is not feasible, then evaluating the Java8 pre-release may be worthwhile.
I will be posting some feature requests to the JavaFX project and will provide links to the JIRA tickets.
Perhaps you are encountering behaviour related to the root cause of the following issue:
RT-16011 Need mechanism for PG nodes to know when they are no longer part of a scene graph
From the issue description:
Some PG nodes contain handles to non-heap resources, such as GPU textures, which we would want to aggressively reclaim when the node is no longer part of a scene graph. Unfortunately, there is no mechanism to report this state change to them so that they can release their resources so we must rely on a combination of GC, Ref queues, and sometimes finalization to reclaim the resources. Lazy reclamation of some of these resources can result in exceptions when garbage collection gets behind and we run out of these limited resources.
There are numerous other related issues you can see when you look at the issue page I linked (signup is required to view the issue, but anybody can signup).
A sample related issue is:
RT-15516 image data associated with cached nodes that are removed from a scene are not aggressively released
On which a user commented:
I found a workaround for my app just settihg up an using of cashe to false for all frequently using nodes. 2 days working without any crashes.
So try calling setCache(false) on your nodes.
Also try using a Java 8 preview release where some of these issues have been fixed and see if it increases the stability of your application. Though currently, even in the Java 8 branch, there are still open issues such as the following:
RT-25323 Need a unified Texture resource management system for Prism
Currently texture resources are managed separately in at least 2 places depending on how it is used; one is a texture cache for images and the other is the ImagePool for RTTs. This approach is flawed in its design, i.e. the 2 caches are unaware of each other and it assumes system has unlimited native resources.
Using a video card with more memory may either reduce or eliminate the issue.
You may also wish to put together a minimal executable example which demonstrates your issue and raise a bug request against the JavaFX Runtime project so that a JavaFX developer can investigate your scenario and see if it is new or a duplicate of a known issue.

What is the overhead of constantly uploading new Textures to the GPU in OpenGL?

What is the overhead of continually uploading textures to the GPU (and replacing old ones). I'm working on a new cross-platform 3D windowing system that uses OpenGL, and am planning on uploading a single Bitmap for each window (containing the UI elements). That Bitmap would be updated in sync with the GPU (using the VSync). I was wondering if this is a good idea, or if constantly writing bitmaps would incur too much of a performance overhead. Thanks!
Well something like nVidia's Geforce 460M has 60GB/sec bandwidth on local memory.
PCI express 2.0 x16 can manage 8GB/sec.
As such if you are trying to transfer too many textures over the PCIe bus you can expect to come up against memory bandwidth problems. It gives you about 136 meg per frame at 60Hz. Uncompressed 24-bit 1920x1080 is roughly 6 meg. So, suffice to say you could upload a fair few frames of video per frame on a 16x graphics card.
Sure its not as simple as that. There is PCIe overhead of around 20%. All draw commands must be uploaded over that link too.
In general though you should be fine providing you don't over do it. Bear in mind that it would be sensible to upload a texture in one frame that you aren't expecting to use until the next (or even later). This way you don't create a bottleneck where the rendering is halted waiting for a PCIe upload to complete.
Ultimately, your answer is going to be profiling. However, some early optimizations you can make are to avoid updating a texture if nothing has changed. Depending on the size of the textures and the pixel format, this could easily be prohibitively expensive.
Profile with a simpler situation that simulates the kind of usage you expect. I suspect the performance overhead (without the optimization I mentioned, at least) will be unusable if you have a handful of windows bigger, depending on the size of these windows.

Resources