Speed of large amount of animated bitmaps in EaselJS - performance

I seem to have a bit of trouble with using a large amount of animated bitmaps (all based on the same spritesheet) when using EaselJS. When I run a couple of these at once on my stage, there is no problem at all, but when running a higher amount of them at the same time (starting at around 30 to 40) whilst moving them around I start to have them "flicker" quite a bit, even at an fps of around 10.
I'm not using any shadows or anything else along those lines. Just using animated bitmaps and moving them around.
Does anyone have any good advice around increasing this performance?

Without seeing your code it's hard to know exactly where the bottleneck is. But here are a few places to start looking (starting with the more trivial fixes):
Make sure you are using a modern browser. In the very least, check across a few other browsers/platforms to see if that has any significant change in performance. From what I understand, EaselJS performance is significantly worse on non-hardware accelerated canvas implementations.
If you can, use createJS's version of TweenJS over other tweening libraries. TweenJS will tie itself to EaselJS's Ticker class, which is more efficient.
Do not call stage.update() unless absolutely necessary. Since stage.update() is such an expensive call, you should be as stingy as possible. In fact, you shouldn't really call it at all if you are using the Ticker to regularly update the stage.
Cache wisely and aggressively. If you have complex static elements on the stage, caching them will save some cycles. However, there is an overhead to caching so save it for containers with a lot of static elements or complexly drawn shapes.
Lower the frequency that EaselJS checks for mouseovers. If you have enabled mouse over on the stage, pass in a lower frequency (documentation). If you don't need it (if you are only listening to clicks), don't enable it at all. Monitoring mouse overs is pretty dang expensive, especially if you have plenty of elements on stage.
Set stage.snapToPixelsEnabled to true. This may or may not help. Theoretically, rendering bitmaps on whole pixels is much more efficient, however this may cause some animations to become jagged and I haven't played around with it enough to know what the other pros and cons are.
I was able to get decent performance with around 600-800 spritesheets at 30FPS and basic tweening using Chrome on a 4 year old iMac (just a quick test).

try using several Stage objects at the same time.

Related

Pharo: How to increase MouseMoveEvent Frequency?

In the Pharo book there is an example for a Paint Canvas.
The problem is that the frequency in which mouse move events are passed to the handler is rather low, therefore you cannot draw continous paths if you move the mouse too quickly.
Is there some way to increase the update frequency for a morph? In Squeak, there is a SketchMorphEditor which does not have that problem, but I have not figured out why yet.
I am using Pharo 5.0.
As far as I know there is no way to increase the sampling rate. Even if it could be done, it would be a very bad idea for several reasons.
First, linear interpolation yields fairly good results (which can be improved with techniques like anti-aliasing, if necessary):
Second, we cannot rely on the sampling rate to be the same on every machine and to have consistent results. And third, since I plan to use a gesture recognizer, algorithms like the $1 Recognizer do not rely on sampling rates and work surprisingly well.

performance of layered canvases vs manual drawImage()

I've written a small graphics engine for my game that has multiple canvases in a tree(these basically represent layers.) Whenever something in a layer changes, the engine marks the affected layers as "soiled" and in the render code the lowest affected layer is copied to its parent via drawImage(), which is then copied to its parent and so on up to the root layer(the onscreen canvas.) This can result in multiple drawImage() calls per frame but also prevents rerendering anything below the affected layer. However, in frames where nothing changes no rendering or drawImage() calls take place, and in frames where only foreground objects move, rendering and drawImage() calls are minimal.
I'd like to compare this to using multiple onscreen canvases as layers, as described in this article:
http://www.ibm.com/developerworks/library/wa-canvashtml5layering/
In the onscreen canvas approach, we handle rendering on a per-layer basis and let the browser handle displaying the layers on screen properly. From the research I've done and everything I've read, this seems to be generally accepted as likely more efficient than handling it manually with drawImage(). So my question is, can the browser determine what needs to be re-rendered more efficiently than I can, despite my insider knowledge of exactly what has changed each frame?
I already know the answer to this question is "Do it both ways and benchmark." But in order to get accurate data I need real-world application, and that is months away. By then if I have an acceptable approach I will have bigger fish to fry. So I'm hoping someone has been down this road and can provide some insight into this.
The browser cannot determine anything when it comes to the canvas element and the rendering as it is a passive element - everything in it is user rendered by the means of JavaScript. The only thing the browser does is to pipe what's on the canvas to the display (and more annoyingly clear it from time to time when its bitmap needs to be re-allocated).
There is unfortunately no golden rule/answer to what is the best optimization as this will vary from case to case - there are many techniques that could be mentioned but they are merely tools you can use but you will still have to figure out what would be the right tool or the right combination of tools for your specific case. Perhaps layered is good in one case and perhaps it doesn't bring anything to another case.
Optimization in general is very much an in-depth analysis and break-down of patterns specific to the scenario, that are then isolated and optimized. The process if often experiment, benchmark, re-adjust, experiment, benchmark, re-adjust, experiment, benchmark, re-adjust... of course experience reduce this process to a minimum but even with experience the specifics comes in a variety of combinations that still require some fine-tuning from case to case (given they are not identical).
Even if you find a good recipe for your current project it is not given that it will work optimal with your next project. This is one reason no one can give an exact answer to this question.
However, when it comes canvas what you want to achieve is a minimum of clear operations and minimum areas to redraw (drawImage or shapes). The point with layers is to groups elements together to enable this goal.

D3: What are the most expensive operations?

I was rewriting my code just now and it feels many magnitudes slower. Previously it was pretty much instant, now my animations take 4 seconds to react to mouse hovers.
I tried removing transitions and not having opacity changes but it's still really slow.
Though it is more readable. - -;
The only thing I did was split large functions into smaller more logical ones and reordered the grouping and used new selections. What could cause such a huge difference in speed? My dataset isn't large either...16kb.
edit: I also split up my monolithic huge chain.
edit2: I fudged around with my code a bit, and it seems that switching to nodeGroup.append("path") caused it to be much slower than svg.append("path"). The inelegant thing about this though is that I have to transform the drawn paths to the middle when using svg while the entire group is already transformed. Can anyone shed some insight and group.append vs svg.append?
edit3: Additionally I was using opacity:0 to hide all my path line before redrawing, which caused it to become slower and slower because these lines were never removed. Switched to remove();
Without data it is hard to work with or suggest a solution. You don't need to share private data but it helps to generate some fake data with the same structure. It's also not clear where your performance hit comes if we can't see how many dom elements you are trying to make/interact with.
As for obvious things that stand out, you are not doing things in a data driven way for drawing your segments. Any time you see a for loop it is a hint that you are not using d3's selections when you could.
You should bind listEdges to your paths and draw them from within the selection, it's ok to transform them to the center from there. also, you shouldn't do d3.select when you can do nodeGroup.select, this way you don't need to traverse the entire page when searching for your circles.

How to draw graphs using d3.js for a big dataset?

I tried creating 10 linecharts all of them had 3000 points, 300*300 svg size. It crashed my browser, I checked task manager, google renderer was going crazy with memory utilization 1.2G and CPU utilization 100%.
There's no easy solution for things like this. You can scrutinize your code and make it as efficient as possible, but no matter what, if your code needs to do hundreds of thousands of operations in one "thread" things will freeze up.
A general solution to avoid this freeze-up is to split the drawing process into smaller tasks, which you call asynchronously (i.e. from inside a setTimeout). This way the browser doesn't lock up for extended periods while it runs your JS code, and perhaps (I'm no expert on this) the garbage collector has a chance to clean things up midway too.
The result is not a faster overall draw time, but to a user it "feels" faster, because the browser doesn't freeze. And you can even add a progress bar then.
Some drawing operations can't be broken down into sub-tasks. For example, you can't split up svg.line(), the d3 function that generates your graph's path definitions. However, you can split up the drawing code of the 10 charts such that it draws one chart at a time on every tick of a setTimeout. You can also similarly split up the preparation of the data from the actual drawing.
I wrote an answer to a different scenario but a similar problem here: CSS transitions blocked by JavaScript

Improving raytracer performance

I'm writing a comparatively straightforward raytracer/path tracer in D (http://dsource.org/projects/stacy), but even with full optimization it still needs several thousand processor cycles per ray. Is there anything else I can do to speed it up? More generally, do you know of good optimizations / faster approaches for ray tracing?
Edit: this is what I'm already doing.
Code is already running highly parallel
temporary data is structured in a cache-efficient fashion as well as aligned to 16b
Screen divided into 32x32-tiles
Destination array is arranged in such a way that all subsequent pixels in a tile are sequential in memory
Basic scene graph optimizations are performed
Common combinations of objects (plane-plane CSG as in boxes) are replaced with preoptimized objects
Vector struct capable of taking advantage of GDC's automatic vectorization support
Subsequent hits on a ray are found via lazy evaluation; this prevents needless calculations for CSG
Triangles neither supported nor priority. Plain primitives only, as well as CSG operations and basic material properties
Bounding is supported
The typical first order improvement of raytracer speed is some sort of spatial partitioning scheme. Based only on your project outline page, it seems you haven't done this.
Probably the most usual approach is an octree, but the best approach may well be a combination of methods (e.g. spatial partitioning trees and things like mailboxing). Bounding box/sphere tests are a quick cheap and nasty approach, but you should note two things: 1) they don't help much in many situations and 2) if your objects are already simple primitives, you aren't going to gain much (and might even lose). You can more easily (than octree) implement a regular grid for spatial partitioning, but it will only work really well for scenes that are somewhat uniformly distributed (in terms of surface locations)
A lot depends on the complexity of the objects you represent, your internal design (i.e. do you allow local transforms, referenced copies of objects, implicit surfaces, etc), as well as how accurate you're trying to be. If you are writing a global illumination algorithm with implicit surfaces the tradeoffs may be a bit different than if you are writing a basic raytracer for mesh objects or whatever. I haven't looked at your design in detail so I'm not sure what, if any, of the above you've already thought about.
Like any performance optimization process, you're going to have to measure first to find where you're actually spending the time, then improving things (algorithmically by preference, then code bumming by necessity)
One thing I learned with my ray tracer is that a lot of the old rules don't apply anymore. For example, many ray tracing algorithms do a lot of testing to get an "early out" of a computationally expensive calculation. In some cases, I found it was much better to eliminate the extra tests and always run the calculation to completion. Arithmetic is fast on a modern machine, but a missed branch prediction is expensive. I got something like a 30% speed-up on my ray-polygon intersection test by rewriting it with minimal conditional branches.
Sometimes the best approach is counter-intuitive. For example, I found that many scenes with a few large objects ran much faster when I broke them down into a large number of smaller objects. Depending on the scene geometry, this can allow your spatial subdivision algorithm to throw out a lot of intersection tests. And let's face it, intersection tests can be made only so fast. You have to eliminate them to get a significant speed-up.
Hierarchical bounding volumes help a lot, but I finally grokked the kd-tree, and got a HUGE increase in speed. Of course, building the tree has a cost that may make it prohibitive for real-time animation.
Watch for synchronization bottlenecks.
You've got to profile to be sure to focus your attention in the right place.
Is there anything else I can do to speed it up?
D, depending on the implementation and compiler, puts forth reasonably good performance. As you haven't explained what ray tracing methods and optimizations you're using already, then I can't give you much help there.
The next step, then, is to run a timing analysis on the program, and recode the most frequently used code or slowest code than impacts performance the most in assembly.
More generally, check out the resources in these questions:
Literature and Tutorials for Writing a Ray Tracer
Anyone know of a really good book about Ray Tracing?
Computer Graphics: Raytracing and Programming 3D Renders
raytracing with CUDA
I really like the idea of using a graphics card (a massively parallel computer) to do some of the work.
There are many other raytracing related resources on this site, some of which are listed in the sidebar of this question, most of which can be found in the raytracing tag.
I don't know D at all, so I'm not able to look at the code and find specific optimizations, but I can speak generally.
It really depends on your requirements. One of the simplest optimizations is just to reduce the number of reflections/refractions that any particular ray can follow, but then you start to lose out on the "perfect result".
Raytracing is also an "embarrassingly parallel" problem, so if you have the resources (such as a multi-core processor), you could look into calculating multiple pixels in parallel.
Beyond that, you'll probably just have to profile and figure out what exactly is taking so long, then try to optimize that. Is it the intersection detection? Then work on optimizing the code for that, and so on.
Some suggestions.
Use bounding objects to fail fast.
Project the scene at a first step (as common graphic cards do) and use raytracing only for light calculations.
Parallelize the code.
Raytrace every other pixel. Get the color in between by interpolation. If the colors vary greatly (you are on an edge of an object), raytrace the pixel in between. It is cheating, but on simple scenes it can almost double the performance while you sacrifice some image quality.
Render the scene on GPU, then load it back. This will give you the first ray/scene hit at GPU speeds. If you do not have many reflective surfaces in the scene, this would reduce most of your work to plain old rendering. Rendering CSG on GPU is unfortunately not completely straightforward.
Read source code of PovRay for inspiration. :)
You have first to make sure that you use very fast algorithms (implementing them can be a real pain, but what do you want to do and how far want you to go and how fast should it be, that's a kind of a tradeof).
some more hints from me
- don't use mailboxing techniques, in papers it is sometimes discussed that they don't scale that well with the actual architectures because of the counting overhead
- don't use BSP/Octtrees, they are relative slow.
- don't use the GPU for Raytracing, it is far too slow for advanced effects like reflection and shadows and refraction and photon-mapping and so on ( i use it only for shading, but this is my beer)
For a complete static scene kd-Trees are unbeatable and for dynamic scenes there are clever algorithms there that scale very well on a quadcore (i am not sure about the performance above).
And of course, for a realy good performance you need to use very much SSE code (with of course not too much jumps) but for not "that good" performance (im talking here about 10-15% maybe) compiler-intrinsics are enougth to implement your SSE stuff.
And some decent Papers about some Algorithms i was talking about:
"Fast Ray/Axis-Aligned Bounding Box - Overlap Tests using Ray Slopes"
( very fast very good paralelisizable (SSE) AABB-Ray hit test )( note, the code in the paper is not all code, just google for the title of the paper, youll find it)
http://graphics.tu-bs.de/publications/Eisemann07RS.pdf
"Ray Tracing Deformable Scenes using Dynamic Bounding Volume Hierarchies"
http://www.sci.utah.edu/~wald/Publications/2007///BVH/download//togbvh.pdf
if you know how the above algorithm works then this is a much greater algorithm:
"The Use of Precomputed Triangle Clusters for Accelerated Ray Tracing in Dynamic Scenes"
http://garanzha.com/Documents/UPTC-ART-DS-8-600dpi.pdf
I'm also using the pluecker-test to determine fast (not thaat accurate, but well, you can't have all) if i hit a polygon, works very pretty with SSE and above.
So my conclusion is that there are so many great papers out there about so much Topics that do relate to raytracing (How to build fast, efficient trees and how to shade (BRDF models) and so on and so on), it is an realy amazing and interesting field of "experimentating", but you need to have also much sparetime because it is so damn complicated but funny.
My first question is - are you trying to optimize the tracing of one single still screen,
or is this about optimizing the tracing of multiple screens in order to calculate an animation ?
Optimizing for a single shot is one thing, if you want to calculate successive frames in an animation there are lots of new things to think about / optimize.
You could
use an SAH-optimized bounding volume hierarchy...
...eventually using packet traversal,
introduce importance sampling,
access the tiles ordered by Morton code for better cache coherency, and
much more - but those were the suggestions I could immediately think of. In more words:
You can build an optimized hierarchy based on statistics in order to quickly identify candidate nodes when intersecting geometry. In your case you'll have to combine the automatic hierarchy with the modeling hierarchy, that is either constrain the build or have it eventually clone modeling information.
"Packet traversal" means you use SIMD instructions to compute 4 parallel scalars, each of an own ray for traversing the hierarchy (which is typically the hot spot) in order to squeeze the most performance out of the hardware.
You can perform some per-ray-statistics in order to control the sampling rate (number of secondary rays shot) based on the contribution to the resulting pixel color.
Using an area curve on the tile allows you to decrease the average space distance between the pixels and thus the probability that your performance benefits from cache hits.

Resources