Game loop performance and component approach - performance

I have an idea of organising a game loop. I have some doubts about performance. May be there are better ways of doing things.
Consider you have an array of game components. They all are called to do some stuff at every game loop iteration. For example:
GameData data; // shared
app.registerComponent("AI", ComponentAI(data) );
app.registerComponent("Logic", ComponentGameLogic(data) );
app.registerComponent("2d", Component2d(data) );
app.registerComponent("Menu", ComponentMenu(data) )->setActive(false);
//...
while (ok)
{
//...
app.runAllComponents();
//...
}
Benefits:
good component-based application, no dependencies, good modularity
we can activate/deactivate, register/unregister components dynamically
some components can be transparently removed or replaced and the system still will be working as nothing have happened (change 2d to 3d)(team-work: every programmer creates his/her own components and does not require other components to compile the code)
Doubts:
inner loop in the game loop with virtual calls to Component::run()
I would like Component::run() to return bool value and check this value. If returned false, component must be deactivated. So inner loop becomes more expensive.
Well, how good is this solution? Have you used it in real projects?

Some C++ programmers have way too many fears about the overhead of virtual functions. The overhead of the virtual call is usually negligible compared to whatever the function does. A boolean check is not very expensive either.
Do whatever results in the easiest-to-maintain code. Optimize later only if you need to do so. If you do find you need to optimize, eliminating virtual calls will probably not be the optimization you need.

In most "real" games, there are pretty strict requirements for interdependencies between components, and ordering does matter.
This may or may not effect you, but it's often important to have physics take effect before (or after) user interaction proecssing, depending on your scenario, etc. In this situation, you may need some extra processing involved for ordering correctly.
Also, since you're most likely going to have some form of scene graph or spatial partitioning, you'll want to make sure your "components" can take advantage of that, as well. This probably means that, given your current description, you'd be walking your tree too many times. Again, though, this could be worked around via design decisions. That being said, some components may only be interested in certain portions of the spatial partition, and again, you'd want to design appropriately.

I used a similar approach in a modular synthesized audio file generator.
I seem to recall noticing that after programming 100 different modules, there was an impact upon performance when coding new modules in.
On the whole though,I felt it was a good approach.

Maybe I'm oldschool, but I really don't see the value in generic components because I don't see them being swapped out at runtime.
struct GameObject
{
Ai* ai;
Transform* transform;
Renderable* renderable;
Collision* collision;
Health* health;
};
This works for everything from the player to enemies to skyboxes and triggers; just leave the "components" that you don't need in your given object NULL. You want to put all of the AIs into a list? Then just do that at construction time. With polymorphism you can bolt all sorts of different behaviors in there (e.g. the player's "AI" is translating the controller input), and beyond this there's no need for a generic base class for everything. What would it do, anyway?
Your "update everything" would have to explicitly call out each of the lists, but that doesn't change the amount of typing you have to do, it just moves it. Instead of obfuscatorily setting up the set of sets that need global operations, you're explicitly enumerating the sets that need the operations at the time the operations are done.
IMHO, it's not that virtual calls are slow. It's that a game entity's "components" are not homogenous. They all do vastly different things, so it makes sense to treat them differently. Indeed, there is no overlap between them, so again I ask, what's the point of a base class if you can't use a pointer to that base class in any meaningful way without casting it to something else?

Related

How small should functions be? [duplicate]

This question already has answers here:
When is a function too long? [closed]
(24 answers)
Closed 5 years ago.
How small should I be making functions? For example, if I have a cake baking program.
bakeCake(){
if(cakeType == "chocolate")
fetchIngredients("chocolate")
else
if(cakeType == "plain")
fetchIngredients("plain")
else
if(cakeType == "Red velvet")
fetchIngredients("Red Velvet")
//Rest of program
My question is, while this stuff is simple enough on its own, when I add much more stuff to the bakeCake function it becomes cluttered. But lets say that this program has to bake thousands of cakes per second. From what I've heard, it takes significantly longer (relative to computer time) to use another function compared to just doing the statements in the current function. So something that's similar like this should be very easy to read, and if efficiency is important wouldn't I want to keep it in there?
Basically, at what point do I sacrifice readability for efficiency. And a quick bonus question, at what point does having too many functions decrease readability? Here's an example of Apple's swift tutorial.
func isCandyAmountAcceptable(bandMemberCount: Int, candyCount: Int) -> Bool {
return candyCount % bandMemberCount == 0
They said that because the function name isCandyAmountAcceptable was easier to read than candyCount % bandMemberCount == 0 that it'd be good to make a function for that. But from my perspective it may take a few seconds to figure out what the second option is saying, but it's also more readable when ti comes to knowing how it works.
Sorry about being all over the place and kinda asking 2 questions in one. Just to summarize my questions:
Does using functions extraneously make efficiency (speed) suffer? If it does how can I figure out what the cutoff between readability and efficiency is?
How small and simple should I make functions for? Obviously I'd make them if I ever have to repeat the function, but what about one time use functions?
Thanks guys, sorry if these questions are ignorant or anything but I'd really appreciate an answer.
Does using functions extraneously make efficiency (speed) suffer? If
it does how can I figure out what the cutoff between readability and
efficiency is?
For performance I would generally not factor in any overhead of direct function calls against any decent optimizer, since it can even make those come free of charge. When it doesn't, it's still a negligible overhead in, say, 99.9% of scenarios. That applies even for performance-critical fields. I work in areas like raytracing, mesh processing, and image processing and still the cost of a function call is typically on the bottom of the priority list as opposed to, say, locality of reference, efficient data structures, parallelization, and vectorization. Even when you're micro-optimizing, there are much bigger priorities than the cost of a direct function call, and even when you're micro-optimizing, you often want to leave a lot of the optimization for your optimizer to perform instead of trying to fight against it and do it all by hand (unless you're actually writing assembly code).
Of course with some compilers you might deal with ones that never inline function calls and have a bit of an overhead to every function call. But in that case I'd still say it's relatively negligible since you probably shouldn't be worrying about such micro-level optimizations when using those languages and interpreters/compilers. Even then it will probably often be bottom on the priority list, relatively speaking, as opposed to more impactful things like improving locality of reference and thread efficiency.
It's like if you're using a compiler with very simplistic register allocation that has a stack spill for every single variable you use, that doesn't mean you should be trying to use and reuse as few variables as possible to work around its tendencies. It means reach for a new compiler in those cases where that's a non-negligible overhead (ex: write some C code into a dylib and use that for the most performance-critical parts), or focus on higher-level optimizations like making everything run in parallel.
How small and simple should I make functions for? Obviously I'd make
them if I ever have to repeat the function, but what about one time
use functions?
This is where I'm going to go slightly off-kilter and actually suggest you consider avoiding the teeniest of functions for maintainability reasons. This is admittedly a controversial opinion although at least John Carmack seems to somewhat agree (specifically in respect to inlining code and avoiding excess function calls for cases where side effects occur to make the side effects easier to comprehend).
However, if you are going to make a lot of state changes, having them
all happen inline does have advantages; you should be made constantly
aware of the full horror of what you are doing.
The reason I believe it can sometimes be good to err on the side of meatier functions is because there's often more to comprehend than that of a simple function to understand all the information necessary to make a change or fix a problem.
Which is simpler to comprehend, a function whose logic consists of 80 lines of inlined code, or one distributed across a couple dozen functions and possibly ones that lead to disparate places throughout the codebase?
The answer is not so clear cut. Naturally if the teeny functions are used widely, like say sqrt or abs, then the reader can simply skim over the function call, knowing full well what it does like the back of his hand. But if there are a lot of teeny exotic functions that are only used one time, then the ability to comprehend the operation as a whole requires looking them up and understanding what they all individually do before you can get a proper comprehension of what's going on in terms of the big picture.
I actually disagree with that Apple Swift tutorial somewhat with that one-liner function because while it is easier to understand than figuring out what the arithmetic and comparison are supposed to do, in exchange it might require looking it up to see what it does in scenarios where you can't just say, isCandyAmountAcceptable is enough information for me and need to figure out exactly what makes an amount acceptable. Instead I would actually prefer a simple comment:
// Determine if candy amount is acceptable.
if (candyCount % bandMemberCount == 0)
...
... because then you don't have to jump to disparate places in code (the analogy of a book referring its reader to other pages in the book causing the readers to constantly have to flip back and forth between pages) to figure that out. Of course the idea behind this isCandyAmountAcceptable kind of function is that you shouldn't have to be concerned with such details about what makes a candy amount of acceptable, but too often in practice, we do end up having to understand the details more often than we optimally should to debug the code or make changes to it. If the code never needs to be debugged or changed, then it doesn't really matter how it's written. It could even be written in binary code for all we care. But if it's written to be maintained, as in debugged and changed in the future, then sometimes it is helpful to avoid making the readers have to jump through lots of hoops. The details do often matter in those scenarios.
So sometimes it doesn't help to understand the big picture by fragmenting it into the teeniest of puzzle pieces. It's a balancing act, but certain types of developers can err on the side of overly dicing up their systems into the most granular bits and pieces and finding maintenance problems that way. Those types are still often promising engineers -- they just have to find their balance. The other extreme is the one that writes 500-line functions and doesn't even consider refactoring -- those are kinda hopeless. But I think you fit in the former category, and for you, I'd actually suggest erring on the side of meatier functions ever-so-slightly just to keep the puzzle pieces a healthy size (not too small, not too big).
There's even a balancing act I see between code duplication and minimizing dependencies. An image library doesn't necessarily become easier to comprehend by shaving off a few dozen lines of duplicated math code if the exchange is a dependency to a complex math library with 800,000 lines of code and an epic manual on how to use it. In such cases, the image library might very well be easier to comprehend as well as use and deploy in new projects if it chooses instead to duplicate a few math functions here and there to avoid external dependencies, isolating its complexity instead of distributing it elsewhere.
Basically, at what point do I sacrifice readability for efficiency.
As stated above, I don't think readability of the small picture and comprehensibility of the big picture are synonymous. It can be really easy to read a two-line function and know what it does and still be miles away from understanding what you need to understand to make the necessary changes. Having many of those teeny one-shot two-liners can even delay the ability to comprehend the big picture.
But if I use "comprehensibility vs. efficiency" instead, I'd say upfront at the design-level for cases where you anticipate processing huge inputs. As an example, a video processing application with custom filters knows it's going to be looping over millions of pixels many times per frame. That knowledge should be utilized to come up with an efficient design for looping over millions of pixels repeatedly. But that's with respect to design -- towards the central aspects of the system that many other places will depend upon because big central design changes are too costly to apply late in hindsight.
That doesn't mean it has to start applying hard-to-understand SIMD code right off the bat. That's an implementation detail provided the design leaves enough breathing room to explore such an optimization in hindsight. Such a design would imply abstracting at the Image level, at the level of a million+ pixels, not at the level of a single IPixel. That's the worthy thing to take into consideration upfront.
Then later on, you can optimize hotspots and potentially use some difficult-to-understand algorithms and micro-optimizations here and there for those truly critical cases where there's a strong perceived business need for the operation to go faster, and hopefully with good tools (profilers, i.e.) in hand. The user cases guide you about what operations to optimize based on what the users do most often and find a strong desire to spend less time waiting. The profiler guides you about precisely what parts of the code involved in that operation need to be optimized.
Readability, performance and maintainability are three different things. Readability will make your code look simple and understandable, not necessarily best way to go. Performance is always going to be important, unless you are running this code in non-production environment where end result is more important than how it was achieved. Enter the world of enterprise applications, maintainability suddenly gains lot more importance. What you work on today will be handed over to somebody else after 6 months and they will be fixing/changing your code. This is why suddenly standard design patterns become so important. In a way, the readability is part of maintainability on larger scale. If the cake baking program above is something more complex than what its looking like, first thing stands out as a code smell is existence if if-else. Its gotta get replaced with polymorphism. Same goes with switch case kind of construct.
At what point do you decide to sacrifice one for other? That purely depends upon what business your code is achieving. Is it academic? Its got to be the perfect solution even if it means 90% devs struggle to figure out at first glance what the hell is happening. Is it a website belonging to retail store being maintained by distributed team of 50 devs working from 2 or more different geographic locations? Follow the conventional design patterns.
A rule of thumb I have always seen being followed in almost all situations is that if a function is growing beyond half the screen, its a candidate for refactoring. Do you have functions that end up you having your editor long length scroll bars? Refactor!!!

Programming Chess in Functional programming

A year ago I programmed a chess AI using the Alphabeta prunning algorithm. This was relatively straight forward to do in c++. One of the main issues I considered while doing this was making my code efficient. I did this by using having a data type I called a "game" that I passed around through the search tree made by the algorithm. To increase efficiency I didn't ever copy the "game" data type but rather mutated it while keeping the nessisary information needed to return it to its previous states.
Recently I have been reading about functional programming and the concept of purely using functions that do not change the state of the parameters they are passed appeals to me. I am wondering how I would using the paradigm of functional programming while still taking efficiency of the program into account.
In OOP the solution seems quite straight forward (which is what I implemented) while in functional programming it seems that copying data types is nessisary which decreases efficiency. Is it possible to use functional programming without this loss of efficiency?
In functional programming, data structures are not always copied completely. In many cases, only the part that changes needs to be copied, while the old parts can be referenced (since no mutation is allowed, this is safe).
The article on persistant data structures describes this in more detail.
Jephron's answer points out the important fact that only small parts of a persistent data structure need to get updated, thus the bigger part is shared between the old state and the new state.
To be honest, this would still be slower than a mutation in most cases.
But immutable, persistent data structures have other advantages. Let's assume you have already completed the playing engine. And now, you want to implement a history (for example to allow the player to undo earlier moves). This is dead simple: Just remember all states in a list. You'll find that you need to touch only a few functions to take a list of states instead of just the last state, and you're done. You don't need to worry about compromising your game engine --- there is no global variable or something you could destroy.
Another thing is taking advantage of the many CPU cores you probably have by employing parallelism. Needless to say that you can't let many tasks, threads, fibers or whatever operate on a single mutable data structure. This would just become a synchronization nightmare, and your code would probably go slower even. However, there simply are no synchronization problems on immutable data, as they are read only for all threads.
This could very well speed up your code in such a way that it dwarfs the C++ solution, even if "doing a move" on a functional data structure is much slower than on mutable data.
Here is an example for changing a board game (TTT) from single threaded to parallel: https://dierk.gitbooks.io/fregegoodness/content/src/docs/asciidoc/incremental_episode4.html

Computing Efficiency Question

I'm wondering about computational efficiency. I'm going to use Java in this example, but it is a general computing question. Lets say I have a string and I want to get the value of the first letter of the string, as a string. So I can do
String firstletter = String.valueOf(somestring.toCharArray()[0]);
Or I could do:
char[] stringaschar = somestring.toCharArray();
char firstchar = stringaschar[0];
String firstletter = String.valueOf(firstchar);
My question is, are the two ways essentially the same, computationally? I mean, the second way I explicitly had to create 2 intermediate variables, to be stored in memory (the stack?) temporarily.
But the first way, too, the computer will have to still create the same variables, implicitly, right? And the number of operations doesn't change. My thinking is, the two ways are the same. But I'd like to know for sure.
In most cases the two ways should produce the same, or nearly the same, object code. Optimizing compilers usually detect that the intermediate variables in the second option are not necessary to get the correct result, and will collapse the call graph accordingly.
This all depends on how your Java interpreter decides to translate your code into an intermediary language for runtime execution. It may actually have optimizations which translate the two approaches to be the same exact byte code.
The two should be essentially the same. In both cases you make the same calls converting the string to an array, finding the first character, and getting the value of the character. There may be minor differences in how the compiler handles these, but they should be insignficant.
The earlier answers are coincident and right, AFAIK.
However, I think there are a few additional and general considerations you should be aware of each time you wonder about the efficiency of any computational asset (code, for example).
First, if everything is under your strict control you could in principle count clock cycles one by one from assembly code. Or from some more abstract reasoning find the computational cost of an operation/algorithm.
So far so good. But don't forget to measure afterwards. You may find that measuring execution times is not so easy and straightforward, and sometimes is elusive (How to account for interrupts, for I/O wait, for network bottlenecks ...). But it pays. You ask here for counsel, but YOUR Compiler/Interpreter/P-code generator/Whatever could be set with just THAT switch in the third layer of your config scripts.
The other consideration, more to your current point is the existence of Black Boxes. You are not alone in the world and a Black Box is any piece used to run your code, which is essentially out of your control. Compilers, Operating Systems, Networks, Storage Systems, and the World in general fall into this category.
What we do with Black Boxes (they are black, either because their code is not public or because we just happen to use our free time fishing instead of digging library source code) is establishing mental models to help us understand how they work. (BTW, This is an extraordinary book about how we humans forge our mental models). But you should always beware that they are models, not the real thing. Models help us to explain things ... to a certain extent. Classical Mechanics reigned until Relativity and Quantum Mechanics fluorished. None of them is wrong They have limits, and so have all our models.
Even if you happen to be friend with your router OS, or your Linux kernel, when confronting an efficiency problem, design a good experiment and measure.
HTH!
NB: By design a good experiment I mean beware of the tar pits. Examples: measuring your measurement code instead the target of the experiment, being influenced by external factors, forget external factors that will influence the production code, test with data whose cardinality, orthogonality, or whatever-ality is dissimilar with the "real world", mapping wrongly the production and testing Client/server workhorses, et c, et c, et c.
So go, and meassure your code. Your results will be the most interesting thing in this page.

Data-oriented design in practice?

There has been one more question on what data-oriented design is, and there's one article which is often referred to (and I've read it like 5 or 6 times already). I understand the general concept of this, especially when dealing with, for example, 3d models, where you'd like to keep all vertexes together, and not pollute your faces with normals, etc.
However, I do have a hard time visualizing how data-oriented design might work for anything but the most trivial cases (3d models, particles, BSP-trees, and so on). Are there any good examples out there which really embraces data-oriented design and shows how this might work in practice? I can plow through large code-bases if needed.
What I'm especially interested in is the mantra "where there's one there are many", which I can't really seem to connect with the rest here. Yes, there are always more than one enemy, yet, you still need to update each enemy individually, cause they're not moving the same way now are they? Same goes for the 'balls'-example in the accepted answer to the question above (I actually asked this in a comment to that answer, but haven't gotten a reply yet). Is it simply that the rendering will only need the positions, and not the velocities, whereas the game simulation will need both, but not the materials? Or am I missing something? Perhaps I've already understood it and it's a far more straightforward concept than I thought.
Any pointers would be much appreciated!
So, what is DOD all about? Obviously, it's about performance, but it's not just that. It's also about well-designed code that is readable, easy to understand and even reusable.
Now Object Oriented design is all about designing code and data to fit into encapsulated virtual "objects". Each object is a seperate entity with variables for properties that object might have and methods to take action on itself or other objects in the world. The advantage of OO design is that it's easy to mentally model your code into objects because the whole (real) world around us seems to work in the same way. Objects with properties that can interact with each other.
Now the problem is that the cpu in your computer works in a completely different way. It works best when you let it do the same things again and again. Why is that? Because of a little thing called cache. Accessing RAM on a modern computer can take 100 or 200 CPU cycles (and the CPU has to wait all that time!), which is way too long. So there's this small portion of memory on the CPU that can be accessed really quickly, cache memory. Problem is it's only a few MB tops. So every time you need data that wasn't in cache, you still need to go the long way to RAM. That's not just that way for data, the same goes for code. Trying to execute a function that's not in instruction cache will cause a stall while the code is loaded from RAM.
Back to OO programming. Objects are big, but most functions need only a small portion of that data, so we're wasting cache by loading unnecessary data. Methods call other methods which call other methods, thrashing your instruction cache. Still, we often do a lot of the same stuff over and over again. Let's take a bullet from a game for example. In a naive implementation each bullet could be a separate object. There might be a bullet manager class. It calls the first bullet's update function. It updates the 3D position using the direction/velocity. This causes a lot of other data from the object to be loaded into the cache. Next, we call the World Manager class to check for a collision with other objects. This loads lots of other stuff into the cache, maybe it even causes code from the original bullet manager class to get dropped from instruction cache. Now we return to the bullet update, there was no collision, so we return to bullet manager. It might need to load some code again. Next up, bullet #2 update. This loads lots of data into the cache, calls world... etc. So in this hypthetical situation, we've got 2 stalls for loading code and let's say 2 stalls for loading data. That's at least 400 cycles wasted, for 1 bullet, and we haven't taken bullets that hit something else into account. Now a CPU runs at 3+ GHz so we're not going to notice one bullet, but what if we've got 100 bullets? Or even more?
So this is the where there's one there's many story. Yes, there are some cases where you've only got on object, your manager classes, file access, etc. But more often, there's a lot of similar cases. Naive, or even not-naive object oriented design will lead to lots of problems. So enter data oriented design. The key of DOD is to model your code around your data, not the other way around as with OO-design. This starts at the first stages of design. You do not first design your OO code and then optimize it. You start by listing and examining your data and thinking out how you want to modify it(I'll get to a practical example in a moment). Once you know how your code is going to modify the data you can lay it out in a way that makes it as efficient as possible to process it. Now you may think this can only lead to a horrible soup of code and data everywhere but that is only the case if you design it badly (bad design is just as easy with OO programming). If you design it well, code and data can be neatly designed around specific functionality, leading to very readable and even very reusable code.
So back to our bullets. Instead of creating a class for each bullet, we only keep the bullet manager. Each bullet has a position and a velocity. Each bullet's position needs to be updated. Each bullet has to have a collision check and all bullets that have hit something need to take some action accordingly. So just by taking a look at this description I can design this whole system in a much better way. Let's put the positions of all bullets in an array/vector. Let's put the velocity of all bullets in an array/vector. Now let's start by iterating allong those two arrays and updating each position value with it's corresponding velocity. Now, all data loaded into the data cache is data we're going to use. We can even put a smart pre-loading command to already pre-load some array data in advance so the data's in cache when we get to it. Next, collision check. I'm not going into detail here, but you can imagine how updating all bullets after each other can help. Also note that if there's a collision, we're not going to call a new function or do anything. We just keep a vector with all bullets that had collision and when collision checking is done, we can update all those after each other. See how we just went from lots of memory access to almost none by laying our data out differently? Did you also notice how our code and data, even though not designed in an OO way any more, are still easy to understand and easy to reuse?
So to get back to the "where there's one there's many". When designing OO code you think about one object, the prototype/class. A bullet has a velocity, a bullet has a position, a bullet will move each frame by it's velocity, a bullet can hit something, etc. When you think about that, you will think about a class, with velocity, position, and an update function which moves the bullet and checks for collision. However, when you have multiple objects you need to think about all of them. Bullets have positions, velocity. Some bullets may have collision. Do you see how we're not thinking about an individual object any longer? We're thinking about all of them and are designing code a lot differently now.
I hope this helps answer your second part of the question. It's not about whether you need to update each enemy or not, it's about the most efficient way to update them. And while designing only your enemies using DOD may not help gain much performance, designing the entire game around these principles (only where applicable!) may lead to a lot of performance gains!
So onto the first part of the question, that is other examples of DOD. I'm sorry but I don't have that much there. There is one really good example though, I came across this some time ago, a series on data oriented design of a behavior tree by Bjoern Knafla: http://bjoernknafla.com/data-oriented-behavior-tree-overview You probably want to start at the first one in the series of 4, links are in the article itself.
Hope this still helps, despite the old question. Or maybe some other SO user come across this question and have some use from this answer.
I read the question you linked to and the article.
I've read one book on the subject of data driven design.
I'm pretty much in the same boat as you.
The way I understand Noel's article is that you design your game in the typical object oriented way. You have classes and methods that work on the classes.
After you've done your design, you ask yourself the following question:
How can I arrange all of the data I've designed in one huge blob?
Think of it in terms of writing your entire design as one functional method, with lots of subordinate methods. It reminds me of the massive 500,000 line Cobol programs of my youth.
Now, you probably won't write the entire game as one huge functional method. Really, in the article, Noel is talking about the rendering portion of a game. Think of it as a game engine (the one huge functional method) and the code to drive the game engine (the OOP code).
What I'm especially interested in is the mantra "where there's one there are many", which I can't really seem to connect with the rest here. Yes, there are always more than one enemy, yet, you still need to update each enemy individually, cause they're not moving the same way now are they?
You're thinking in terms of objects. Try thinking in terms of functionality.
Each enemy update is an iteration of a loop.
What's important is that the enemy data is structured to be in one memory location, rather than spread across enemy object instantiations.

Does coding towards an interface rather then an implementation imply a performance hit?

In day to day programs I wouldn't even bother thinking about the possible performance hit for coding against interfaces rather than implementations. The advantages largely outweigh the cost. So please no generic advice on good OOP.
Nevertheless in this post, the designer of the XNA (game) platform gives as his main argument to not have designed his framework's core classes against an interface that it would imply a performance hit. Seeing it is in the context of a game development where every fps possibly counts, I think it is a valid question to ask yourself.
Does anybody have any stats on that? I don't see a good way to test/measure this as don't know what implications I should bear in mind with such a game (graphics) object.
Coding to an interface is always going to be easier, simply because interfaces, if done right, are much simpler. Its palpably easier to write a correct program using an interface.
And as the old maxim goes, its easier to make a correct program run fast than to make a fast program run correctly.
So program to the interface, get everything working and then do some profiling to help you meet whatever performance requirements you may have.
What Things Cost in Managed Code
"There does not appear to be a significant difference in the raw cost of a static call, instance call, virtual call, or interface call."
It depends on how much of your code gets inlined or not at compile time, which can increase performance ~5x.
It also takes longer to code to interfaces, because you have to code the contract(interface) and then the concrete implementation.
But doing things the right way always takes longer.
First I'd say that the common conception is that programmers time is usually more important, and working against implementation will probably force much more work when the implementation changes.
Second with proper compiler/Jit I would assume that working with interface takes a ridiculously small amount of extra time compared to working against the implementation itself.
Moreover, techniques like templates can remove the interface code from running.
Third to quote Knuth : "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."
So I'd suggest coding well first, and only if you are sure that there is a problem with the Interface, only then I would consider changing.
Also I would assume that if this performance hit was true, most games wouldn't have used an OOP approach with C++, but this is not the case, this Article elaborates a bit about it.
It's hard to talk about tests in a general form, naturally a bad program may spend a lot of time on bad interfaces, but I doubt if this is true for all programs, so you really should look at each particular program.
Interfaces generally imply a few hits to performance (this however may change depending on the language/runtime used):
Interface methods are usually implemented via a virtual call by the compiler. As another user points out, these can not be inlined by the compiler so you lose that potential gain. Additionally, they add a few instructions (jumps and memory access) at a minimum to get the proper PC in the code segment.
Interfaces, in a number of languages, also imply a graph and require a DAG (directed acyclic graph) to properly manage memory. In various languages/runtimes you can actually get a memory 'leak' in the managed environment by having a cyclic graph. This imposes great stress (obviously) on the garbage collector/memory in the system. Watch out for cyclic graphs!
Some languages use a COM style interface as their underlying interface, automatically calling AddRef/Release whenever the interface is assigned to a local, or passed by value to a function (used for life cycle management). These AddRef/Release calls can add up and be quite costly. Some languages have accounted for this and may allow you to pass an interface as 'const' which will not generate the AddRef/Release pair automatically cutting down on these calls.
Here is a small example of a cyclic graph where 2 interfaces reference each other and neither will automatically be collected as their refcounts will always be greater than 1.
interface Parent {
Child c;
}
interface Child {
Parent p;
}
function createGraph() {
...
Parent p = ParentFactory::CreateParent();
Child c = ChildFactory::CreateChild();
p.c = c;
c.p = p;
... // do stuff here
// p has a reference to c and c has a reference to p.
// When the function goes out of scope and attempts to clean up the locals
// it will note that p has a refcount of 1 and c has a refcount of 1 so neither
// can be cleaned up (of course, this is depending on the language/runtime and
// if DAGS are allowed for interfaces). If you were to set c.p = null or
// p.c = null then the 2 interfaces will be released when the scope is cleaned up.
}
I think object lifetime and the number of instances you're creating will provide a coarse-grain answer.
If you're talking about something which will have thousands of instances, with short lifetimes, I would guess that's probably better done with a struct rather than a class, let alone a class implementing an interface.
For something more component-like, with low numbers of instances and moderate-to-long lifetime, I can't imagine it's going to make much difference.
IMO yes, but for a fundamental design reason far more subtle and complex than virtual dispatch or COM-like interface queries or object metadata required for runtime type information or anything like that. There is overhead associated with all of that but it depends a lot on the language and compiler(s) used, and also depends on whether the optimizer can eliminate such overhead at compile-time or link-time. Yet in my opinion there's a broader conceptual reason why coding to an interface implies (not guarantees) a performance hit:
Coding to an interface implies that there is a barrier between you and
the concrete data/memory you want to access and transform.
This is the primary reason I see. As a very simple example, let's say you have an abstract image interface. It fully abstracts away its concrete details like its pixel format. The problem here is that often the most efficient image operations need those concrete details. We can't implement our custom image filter with efficient SIMD instructions, for example, if we had to getPixel one at a time and setPixel one at a time and while oblivious to the underlying pixel format.
Of course the abstract image could try to provide all these operations, and those operations could be implemented very efficiently since they have access to the private, internal details of the concrete image which implements that interface, but that only holds up as long as the image interface provides everything the client would ever want to do with an image.
Often at some point an interface cannot hope to provide every function imaginable to the entire world, and so such interfaces, when faced with performance-critical concerns while simultaneously needing to fulfill a wide range of needs, will often leak their concrete details. The abstract image might still provide, say, a pointer to its underlying pixels through a pixels() method which largely defeats a lot of the purpose of coding to an interface, but often becomes a necessity in the most performance-critical areas.
Just in general a lot of the most efficient code often has to be written against very concrete details at some level, like code written specifically for single-precision floating-point, code written specifically for 32-bit RGBA images, code written specifically for GPU, specifically for AVX-512, specifically for mobile hardware, etc. So there's a fundamental barrier, at least with the tools we have so far, where we cannot abstract that all away and just code to an interface without an implied penalty.
Of course our lives would become so much easier if we could just write code, oblivious to all such concrete details like whether we're dealing with 32-bit SPFP or 64-bit DPFP, whether we're writing shaders on a limited mobile device or a high-end desktop, and have all of it be the most competitively efficient code out there. But we're far from that stage. Our current tools still often require us to write our performance-critical code against concrete details.
And lastly this is kind of an issue of granularity. Naturally if we have to work with things on a pixel-by-pixel basis, then any attempts to abstract away concrete details of a pixel could lead to a major performance penalty. But if we're expressing things at the image level like, "alpha blend these two images together", that could be a very negligible cost even if there's virtual dispatch overhead and so forth. So as we work towards higher-level code, often any implied performance penalty of coding to an interface diminishes to a point of becoming completely trivial. But there's always that need for the low-level code which does do things like process things on a pixel-by-pixel basis, looping through millions of them many times per frame, and there the cost of coding to an interface can carry a pretty substantial penalty, if only because it's hiding the concrete details necessary to write the most efficient implementation.
In my personal opinion, all the really heavy lifting when it comes to graphics is passed on to the GPU anwyay. These frees up your CPU to do other things like program flow and logic. I am not sure if there is a performance hit when programming to an interface but thinking about the nature of games, they are not something that needs to be extendable. Maybe certain classes but on the whole I wouldn't think that a game needs to programmed with extensibility in mind. So go ahead, code the implementation.
it would imply a performance hit
The designer should be able to prove his opinion.

Resources