I work on a project that leans on an app-level provide for global state management. At first, the provided object was small, but it has grown with the project.
How much does app performance suffer for injecting this object where it's data is needed? Short of implementing a global state management tool, like Pinia, is it worthwhile to decompose the large object we're providing into individual objects to provide "chunks" of data where its needed?
Since it's global, I guess it's quite big memory wise. Take a few hours to setup Pinia, will be faster/easier since it will come with only the parts you need.
We can't give you an exact number to your solution since it depends of a lot of things mostly.
PS: it can also be totally irrelevant and coming from another part of your code tbh.
Related
I gain my first experience with Microservices and need help on an important descision.
Example: We have
CustomerService (contains CustomerDTO)
InvoiceService (contains CustomerDTO, InvoiceDTO)
PrintService (contains CustomerDTO, InvoiceDTO)
As you see I have a heavy Code Duplication here, each time I modify CustomerDTO, I need to do it in 3 different Microservices.
My potential solution is, to exclude the duplicated classes into a library and share this library between microservices. But in my opinion this will break the microservice approach.
So, what is the right way to handle my problem?
There is no single right way to solve this problem as with most things. There are pros and cons to each approach depending on your circumstances.
Here are probably your main options
Create a new common service:
If your logic is sufficiently complex, you have many services that need this logic, and you have time on your hands then look to creating a common service that will fulfill this need. This may not be a small undertaking as you now have to create one more service you need to manage/deploy/scale
Just accept the duplication:
This may seem counter-intuitive, however if the logic is sufficiently small enough it may be better to just duplicate it rather than couple your micro-services with a library
Create the library
Real life is different than the textbooks. We are often constrained by time and budget among other things. If your micro-services are small enough and you know this logic will not change as much, or you just need to get things out and duplication would take more time. Take this approach. There is nothing to say you can't address this later when you have the time/budget. Countless blog articles will scream at you for doing this however, they aren't you and don't know the circumstances of your project.
We are planning for Java 8 migration for our application from Java 7. As part of this migration, the most important thing that we want to achieve is to recompile our source code using JDK 8 and gain out of the performance improvement made in JVM, garbage collection model etc. Besides this, we want to also set stage to take advantage from the new features added in Java 8.
My question to this group is to get some advice on how should we plan our testing. What are the key areas that we should be watching out for? What are some of the challenges that others have faced?
Note: Our application is intended for low latency use.
A few things...
Some things that compile under jdk-7 might not with jdk-8. This is because lots of bugs were fixed and some code now potentially might be much closer to the jls (this is probably more about generics, but might affect other areas as well).
If you have external libraries, not all are compatible with jdk-8.
HashMap internals have changed. If you rely on some iteration order (I've seen that), it might now fail; otherwise the internal changes will only make your HashMap usages faster.
You say that your app is intended for low latency. Be aware that Stream operations are slower and require more resources then simpler structures. BUT unless you actually measure this to be an impact (it wasn't my case when migrating), there's nothing to worry about.
This is a great example where if you have your test cases in place - they would help A LOT. You would be catching all the major problems at that point (if any).
I'd say that the biggest challenge for me was not the migration itself, but the post-migration. Lots of people (including me) have done multiple mistakes in basic stuff - since lambda and Streams were quite new. My personal advice here - do not be afraid to ask. Better late then sorry.
P.S. As noted in comment, you should also check this guide.
There has been one more question on what data-oriented design is, and there's one article which is often referred to (and I've read it like 5 or 6 times already). I understand the general concept of this, especially when dealing with, for example, 3d models, where you'd like to keep all vertexes together, and not pollute your faces with normals, etc.
However, I do have a hard time visualizing how data-oriented design might work for anything but the most trivial cases (3d models, particles, BSP-trees, and so on). Are there any good examples out there which really embraces data-oriented design and shows how this might work in practice? I can plow through large code-bases if needed.
What I'm especially interested in is the mantra "where there's one there are many", which I can't really seem to connect with the rest here. Yes, there are always more than one enemy, yet, you still need to update each enemy individually, cause they're not moving the same way now are they? Same goes for the 'balls'-example in the accepted answer to the question above (I actually asked this in a comment to that answer, but haven't gotten a reply yet). Is it simply that the rendering will only need the positions, and not the velocities, whereas the game simulation will need both, but not the materials? Or am I missing something? Perhaps I've already understood it and it's a far more straightforward concept than I thought.
Any pointers would be much appreciated!
So, what is DOD all about? Obviously, it's about performance, but it's not just that. It's also about well-designed code that is readable, easy to understand and even reusable.
Now Object Oriented design is all about designing code and data to fit into encapsulated virtual "objects". Each object is a seperate entity with variables for properties that object might have and methods to take action on itself or other objects in the world. The advantage of OO design is that it's easy to mentally model your code into objects because the whole (real) world around us seems to work in the same way. Objects with properties that can interact with each other.
Now the problem is that the cpu in your computer works in a completely different way. It works best when you let it do the same things again and again. Why is that? Because of a little thing called cache. Accessing RAM on a modern computer can take 100 or 200 CPU cycles (and the CPU has to wait all that time!), which is way too long. So there's this small portion of memory on the CPU that can be accessed really quickly, cache memory. Problem is it's only a few MB tops. So every time you need data that wasn't in cache, you still need to go the long way to RAM. That's not just that way for data, the same goes for code. Trying to execute a function that's not in instruction cache will cause a stall while the code is loaded from RAM.
Back to OO programming. Objects are big, but most functions need only a small portion of that data, so we're wasting cache by loading unnecessary data. Methods call other methods which call other methods, thrashing your instruction cache. Still, we often do a lot of the same stuff over and over again. Let's take a bullet from a game for example. In a naive implementation each bullet could be a separate object. There might be a bullet manager class. It calls the first bullet's update function. It updates the 3D position using the direction/velocity. This causes a lot of other data from the object to be loaded into the cache. Next, we call the World Manager class to check for a collision with other objects. This loads lots of other stuff into the cache, maybe it even causes code from the original bullet manager class to get dropped from instruction cache. Now we return to the bullet update, there was no collision, so we return to bullet manager. It might need to load some code again. Next up, bullet #2 update. This loads lots of data into the cache, calls world... etc. So in this hypthetical situation, we've got 2 stalls for loading code and let's say 2 stalls for loading data. That's at least 400 cycles wasted, for 1 bullet, and we haven't taken bullets that hit something else into account. Now a CPU runs at 3+ GHz so we're not going to notice one bullet, but what if we've got 100 bullets? Or even more?
So this is the where there's one there's many story. Yes, there are some cases where you've only got on object, your manager classes, file access, etc. But more often, there's a lot of similar cases. Naive, or even not-naive object oriented design will lead to lots of problems. So enter data oriented design. The key of DOD is to model your code around your data, not the other way around as with OO-design. This starts at the first stages of design. You do not first design your OO code and then optimize it. You start by listing and examining your data and thinking out how you want to modify it(I'll get to a practical example in a moment). Once you know how your code is going to modify the data you can lay it out in a way that makes it as efficient as possible to process it. Now you may think this can only lead to a horrible soup of code and data everywhere but that is only the case if you design it badly (bad design is just as easy with OO programming). If you design it well, code and data can be neatly designed around specific functionality, leading to very readable and even very reusable code.
So back to our bullets. Instead of creating a class for each bullet, we only keep the bullet manager. Each bullet has a position and a velocity. Each bullet's position needs to be updated. Each bullet has to have a collision check and all bullets that have hit something need to take some action accordingly. So just by taking a look at this description I can design this whole system in a much better way. Let's put the positions of all bullets in an array/vector. Let's put the velocity of all bullets in an array/vector. Now let's start by iterating allong those two arrays and updating each position value with it's corresponding velocity. Now, all data loaded into the data cache is data we're going to use. We can even put a smart pre-loading command to already pre-load some array data in advance so the data's in cache when we get to it. Next, collision check. I'm not going into detail here, but you can imagine how updating all bullets after each other can help. Also note that if there's a collision, we're not going to call a new function or do anything. We just keep a vector with all bullets that had collision and when collision checking is done, we can update all those after each other. See how we just went from lots of memory access to almost none by laying our data out differently? Did you also notice how our code and data, even though not designed in an OO way any more, are still easy to understand and easy to reuse?
So to get back to the "where there's one there's many". When designing OO code you think about one object, the prototype/class. A bullet has a velocity, a bullet has a position, a bullet will move each frame by it's velocity, a bullet can hit something, etc. When you think about that, you will think about a class, with velocity, position, and an update function which moves the bullet and checks for collision. However, when you have multiple objects you need to think about all of them. Bullets have positions, velocity. Some bullets may have collision. Do you see how we're not thinking about an individual object any longer? We're thinking about all of them and are designing code a lot differently now.
I hope this helps answer your second part of the question. It's not about whether you need to update each enemy or not, it's about the most efficient way to update them. And while designing only your enemies using DOD may not help gain much performance, designing the entire game around these principles (only where applicable!) may lead to a lot of performance gains!
So onto the first part of the question, that is other examples of DOD. I'm sorry but I don't have that much there. There is one really good example though, I came across this some time ago, a series on data oriented design of a behavior tree by Bjoern Knafla: http://bjoernknafla.com/data-oriented-behavior-tree-overview You probably want to start at the first one in the series of 4, links are in the article itself.
Hope this still helps, despite the old question. Or maybe some other SO user come across this question and have some use from this answer.
I read the question you linked to and the article.
I've read one book on the subject of data driven design.
I'm pretty much in the same boat as you.
The way I understand Noel's article is that you design your game in the typical object oriented way. You have classes and methods that work on the classes.
After you've done your design, you ask yourself the following question:
How can I arrange all of the data I've designed in one huge blob?
Think of it in terms of writing your entire design as one functional method, with lots of subordinate methods. It reminds me of the massive 500,000 line Cobol programs of my youth.
Now, you probably won't write the entire game as one huge functional method. Really, in the article, Noel is talking about the rendering portion of a game. Think of it as a game engine (the one huge functional method) and the code to drive the game engine (the OOP code).
What I'm especially interested in is the mantra "where there's one there are many", which I can't really seem to connect with the rest here. Yes, there are always more than one enemy, yet, you still need to update each enemy individually, cause they're not moving the same way now are they?
You're thinking in terms of objects. Try thinking in terms of functionality.
Each enemy update is an iteration of a loop.
What's important is that the enemy data is structured to be in one memory location, rather than spread across enemy object instantiations.
The topic of algorithms class today was reimplementing data structures, specifically ArrayList in Java. The fact that you can customize a structure for in various ways definitely got me interested, particularly with variations of add() & iterator.remove() methods.
But is reimplementing and customizing a data structure something that is of more interest to the academics vs the real-world programmers? Has anyone reimplemented their own version of a data structure in a commercial application/program, and why did you pick that route over your particular language's implementation?
Knowing how data structures are implemented and can be implemented is definitely of interest to everyone, not just academics. While you will most likely not reimplement a datastructure if the language already provides an implementation with suitable functions and performance characteristics, it is very possible that you will have to create your own data structure by composing other data structures... or you may need to implement a data structure with slightly different behavior than a well-known data structure. In that case, you certainly will need to know how the original data structure is implemented. Alternatively, you may end up needing a data structure that does not exist or which provides similar behavior to an existing data structure, but the way in which it is used requires that it be optimized for a different set of functions. Again, such a situation would require you to know how to implement (and alter) the data structure, so yes it is of interest.
Edit
I am not advocating that you reimplement existing datastructures! Don't do that. What I'm saying is that the knowledge does have practical application. For example, you may need to create a bidirectional map data structure (which you can implement by composing two unidirectional map data structures), or you may need to create a stack that keeps track of a variety of statistics (such as min, max, mean) by using an existing stack data structure with an element type that contains the value as well as these various statistics. These are some trivial examples of things that you might need to implement in the real world.
I have re-implemented some of a language's built-in data structures, functions, and classes on a number of occasions. As an embedded developer, the main reason I would do that is for speed or efficiency. The standard libraries and types were designed to be useful in a variety of situations, but there are many instances where I can create a more specialized version that is custom-tailored to take advantage of the features and limitations of my current platform. If the language doesn't provide a way to open up and modify existing classes (like you can in Ruby, for instance), then re-implementing the class/function/structure can be the only way to go.
For example, one system I worked on used a MIPS CPU that was speedy when working with 32-bit numbers but slower when working with smaller ones. I re-wrote several data structures and functions to use 32-bit integers instead of 16-bit integers, and also specified that the fields be aligned to 32-bit boundaries. The result was a noticable speed boost in a section of code that was bottlenecking other parts of the software.
That being said, it was not a trivial process. I ended up having to modify every function that used that structure and I ended up having to re-write several standard library functions as well. In this particular instance, the benefits outweighed the work. In the general case, however, it's usually not worth the trouble. There's a big potential for hard-to-debug problems, and it's almost always more work than it looks like. Unless you have specific requirements or restrictions that the existing structures/classes don't meet, I would recommend against re-implementing them.
As Michael mentions, it is indeed useful to know how to re-implement structures even if you never do so. You may find a problem in the future that can be solved by applying the principles and techniques used in existing data structures.
My Question:
Performance tests are generally done after an application is integrated with various modules and ready for deploy.
Is there any way to identify performance bottlenecks during the development phase. Does code analysis throw any hints # performance?
It all depends on rules that you run during code analysis but I don't think that you can prevent performance bottlenecks just by CA.
From my expired it looks that performance problems are usually quite complicated and to find real problems you have to run performance tests.
No, except in very minor cases (eg for Java, use StringBuilder in a loop rather than string appends).
The reason is that you won't know how a particular piece of code will affect the application as a whole, until you're running the whole application with relevant dataset.
For example: changing bubblesort to quicksort wouldn't significantly affect your application if you're consistently sorting lists of a half-dozen elements. Or if you're running the sort once, in the middle of the night, and it doesn't delay other processing.
If we are talking .NET, then yes and no... FxCop (or built-in code analysis) has a number of rules in it that deal with performance concerns. However, this list is fairly short and limited in nature.
Having said that, there is no reason that FxCop could not be extended with a lot more rules (heuristic or otherwise) that catch potential problem areas and flag them. It's simply a fact that nobody (that I know of) has put significant work into this (yet).
Generally, no, although from experience I can look at a system I've never seen before and recognize some design approaches that are prone to performance problems:
How big is it, in terms of lines of code, or number of classes? This correlates strongly with performance problems caused by over-design.
How many layers of abstraction are there? Each layer is a chance to spend more cycles than necessary, and this effect compounds, especially if each operation is perceived as being "pretty efficient".
Are there separate data structures that need to be kept in agreement? If so, how is this done? If there is an attempt, through notifications, to keep the data structures tightly in sync, that is a red flag.
Of the categories of input information to the system, does some of it change at low frequency? If so, chances are it should be "compiled" rather than "interpreted". This can be a huge win both in performance and ease of development.
A common motif is this: Programmer A creates functions that wrap complex operations, like DB access to collect a good chunk of information. Programmer A considers this very useful to other programmers, and expects these functions to be used with a certain respect, not casually. Programmer B appreciates these powerful functions and uses them a lot because they get so much done with only a single line of code. (Programmers B and A can be the same person.) You can see how this causes performance problems, especially if distributed over multiple layers.
Those are the first things that come to mind.