Why would LWJGL be so much slower than say the Unity implementation of OpenGL or even Ogre3D? I'll begin with some "benchmarks" (if you would even call them that) on what I've tested.
Hardware:
i5 - 3570k # 4.3GHZ
GTX 780 # 1150 MHZ
First Test: Place 350,000 triangles on screen (modified Stanford Dragon)
Results:
GTX 780 Renders at 37 FPS (USING LWJGL)
GTX 780 Renders at ~300 FPS (USING UNITY3D)
GTX 780 Renders at ~280 FPS (USING OGRE3D)
Second Test: Render Crytek Sponza w/ Textures (I believe around 200,000 vertices?)
Results:
GTX 780 Renders at 2 FPS (USING LWJGL)
GTX 780 Renders at ~150 FPS (USING UNITY3D)
GTX 780 Renders at ~130 FPS (USING OGRE3D)
Normally I use either Ogre3D, Unity3D, or Panda3D in order to render my game projects, but the difference in frame rates is staggering. I know Unity has things like Occlusion Culling so it's generally the quickest, but even when using similar calls with Ogre3D, I would think to expect similar results to LWJGL... Ogre3D and LWJGL are both doing Front face only culling, but LWJGL doesn't get any sort of performance increase vs. rendering everything. One last thing, LWJGL tends to break 2.5 GB of RAM usage rendering Sponza, but that doesn't explain the other results.
If anyone is having the same issue, the issue is NOT java I've realized. The use of recording immediate draw calls into Display Lists is depreciated and it yields poor performance. You MUST use VBOs and not display list. You can expect performance to increase up to 600x in the case of my laptop.
Related
I have started using blender to make simple 3d animations, but my laptop will heat up too fast when rendering, and also it takes too much time to render even simple scenes.
I just want to output easily and fast a low res image of the scene, like the ones on the 3D View, but I didn't find any easy nor fast way to do that.
Preferably I will be using cycles.
Can someone help me?
Making renders faster is almost a university degree by itself. Here is a forty seven minute video with some tips.
The simplest step with any render engine is to reduce the resolution. In the render dimensions panel set the percentage to 50% or 25%.
Using 2.79 you can do opengl renders, which is what you see in the viewport. You can enable the display of only render objects to hide all the viewport overlays.
For cycles set the samples low and use the denoising options. The daily builds, which will be 2.81 when released also includes the new denoise composite node that uses Intel's OpenImageDenoise for better results.
If you really want to speed up render times, use blender 2.80 and the EVEE render engine, it is almost realtime and offers amazing results. With 2.80 almost all of the material nodes from cycles will produce the same result in evee and for 2.81 this has been improved even more.
We are attempting to display 40 windows, each containing almost the same UI, but displaying a different entity, with scene changes with animations happening every few seconds on each window. All of our tests have ended with the windows being drawn at 3 to 5 frames per second at between 10 and 20 windows open, on a reasonably old and low-powered discrete nVidia GPU and on Windows.
Things we have tried:
Disabling animations - performance improves, but not nearly enough
CPU profiling - shows over 90% of the CPU time being spent in the system DLLs and the nVidia driver; on other machines, the CPU usage is not significant, but the frame rate is still low.
QML profiling - The QT Creator profiling shows the render loop executing its steps at 60 fps and within a couple of milliseconds at most for each frame.
Images/textures are loaded once to the GPU and never reloaded
All the rendering backends - OpenGL, ANGLE and Qt Quick 2D Renderer - perform more or less the same
Is there something major we are missing?
So, I've been working on a project for a while now in DirectX11, but some people have been suggesting that I should be doing it in Direct2D. So, I've been playing with the idea in my project. What I've ended up with is HORRIFIC performance. Is Direct2D intended for use with hundreds of thousands of verteces? Because that's what I'm using.
Direct2D gives you a simple to use API to draw high quality 2d graphics, but it comes at a cost in performance compared to some fine tuned dx11 rendering.
Here is a primitive cost per draw for reference
FillRoundedRectangle (1 pixel corner) : 96
DrawRoundedRectancle: 264
FillRectangle : 6
DrawRectangle : 190
FillEllipse : 204
DrawEllipse : 204
Line (whatever vertical/size...) : 46
Bezier : from 312 -> 620
Direct2d is built on a feature level 10 device, and builds vertices in an internal buffer (only case where it uses instancing is to draw text).
So if you need to batch primitives an instanced drawer can yield a pretty hefty performance gain (as personal example, my timeline keyframe rendering went down from 15ms to 2ms swapping the draws from d2d to custom dx11 instanced shader).
If you are on windows 8, you can easily mix direct2d and direct3d in the same "viewport", so it can be worth a look (in my use case I use dx11 and structured buffers based instancing for all heavy parts, and swap to a direct2d context for text and other small bits).
If you need to draw custom geometry (specially with a reasonably high polygon count),it's best to stick to Direct3D11,since it's totally designed for it.
I'm developing a card game in Android using SurfaceView and canvas to draw the UI.
I've tried to optimize everything as much as possible but I still have two questions:
During the game I'll need to draw 40 bitmaps (the 40 cards in the italian deck), is it better to create all the bitmaps on the onCreate method of my customized SurfaceView (storing them in an array), or create them as needed (every time the user get a new card for example)?
I'm able to get over 90 fps on an old Samsung I5500 (528 MHz, with a QVGA screen), 60 fps on an Optimus Life (800 MHz and HVGA screen) and 60 fps with a Nexus One/Motorola Razr (1 GHz and dual core 1GHz with WVGA and qHD screens) but when I run the game on an Android tablet (Motorola Xoom dual core 1 GHz and 1 GB of Ram) I get only 30/40 fps... how is that possible that a 528 MHz cpu with 256 MB of RAM can handle 90+ fps and a dual core processor can't handle 60 fps? I'm not seeing any kind of GC calling at runtime....
EDIT: Just to clarify I've tried both ARGB_888 and RGB_565 without any changes in the performance...
Any suggestions?
Thanks
Some points for you to consider:
It is recommended not to create new objects while your game is running, otherwise, you may get unexpected garbage collections.
Your FPS numbers doesn't sound good, you may have measurement errors, However my guess is that you are resizing the images to fit the screen size and that affects the memory usage of your game and may cause slow rendering times on tablets.
You can use profiling tools to confirm: TraceView
OpenGL would be much faster
last tip: don't draw overlapping cards if you can, draw only the visible ones.
Good Luck
Ok so it's better to create the bitmap in the onCreate method, that is what I'm doing right now...
They are ok, I believe that the 60 fps on some devices are just some restrictions made by producers since you won't find any advantage in getting more than 60 fps (I'm making this assumption since it doesn't change rendering 1 card, 10 cards or no card... the OnDraw method is called 60 times per second, but if I add for example 50/100 cards it drops accordingly) I don't resize any card cause I use the proper folder (mdpi, hdpi, ecc) for each device, and I get the exact size of the image, without resizing it...
I've tried to look at it but from what I understand all the time of the app execution is used to draw the bitmap, not to resize or update its position here it is:
I know, but it would add complexity to the developing and I believe that using a canvas for 7 cards on the screen should be just fine….
I don't draw every card of the deck.. I just swap bitmap as needed :)
UPDATE: I've tried to run the game on a Xoom 2, Galaxy Tab 7 plus and Asus Transformer Prime and it runs just fine with 60 fps…. could it be just a problem of Tegra 2 devices?
I have a considerable (120-240) amount of 640x480 images that will be displayed as textured flat surfaces (4 vertex polygons) in a 3D environment. About 30-50% of them will be visible in a given frame. It is possible for them to crossover. Nothing else will be present in the environment.
The question is - will the modern and/or few-years-old (lets say Radeon 9550) GPU cope with that, and what frame rate can I expect? I aim for 20FPS, but 30-40 would be nice. Would changing the resolution to 320x240 make it more probable to happen?
I do not have any previous experience with performance issues of 3D graphics on modern GPUs, and unfortunately I must make a design choice. I don't want to waste time on doing something that couldn't have worked :-)
Assuming you have RGB textures, that would be 640*480*3*120 Bytes = 105 MB minimum of texture data, which should fit in VRAM of more recent graphics cards without swapping, so this wont be of an issue. However, texture lookups might get a bit problematic but this is hard to judge for me without trying. Given that you only need to process 50% of 105 MB, that is about 50 MB (very rough estimate) while targetting 20 FPS means 20*50MB/sec = about 1GB/sec. This should be possible to throughput even on older hardware.
Reading the specs of an older Radeon 9600 XT, it says peak fill-rate of 2000Mpixels/sec and if i'm not mistake you require far less than 100Mpixels/sec. Peak memory b/w is specified with 9.6GB/s, while you'd need about 1 GB/s (as explained above).
It would argue that this should be possible, if done correctly - esp. current hardware should have not problem at all.
Anyways, you should simply try out: Loading some random 120 textures and displaying them in some 120 quads can be done in very few lines of code with hardly any effort.
First of all, you should realize that the dimensions of textures should normally be powers of two, so if you can change them something like 512x256 (for example) would be a better starting point.
From that, you can create MIPmaps of the original, which are simply versions of the original scaled down by powers of two, so if you started with 512x256, you'd then create versions at 256x128, 128x64, 64x32, 32x16, 16x8, 8x4, 4x2, 2x1 and 1x1. When you've done this, OpenGL can/will select the "right" one for the size it'll show up at in the final display. This generally reduces the work (and improves quality) in scaling the texture to the desired size.
The obvious sticking point with that would be running out of texture memory. If memory serves, in the 9550 timeframe you could probably expect 256 MB of on-board memory, which would be about sufficient, but chances are pretty good that some of the textures would be in system RAM. That overflow would probably be fairly small though, so it probably won't be terribly difficult to maintain the kind of framerate you're hoping for. If you were to add a lot more textures, however, it would eventually become a problem. In that case, reducing the original size by 2 in each dimension (for example) would reduce your memory requirement by a factor of 4, which would make fitting them into memory a lot easier.