Data-oriented design in practice? - data-oriented-design

There has been one more question on what data-oriented design is, and there's one article which is often referred to (and I've read it like 5 or 6 times already). I understand the general concept of this, especially when dealing with, for example, 3d models, where you'd like to keep all vertexes together, and not pollute your faces with normals, etc.
However, I do have a hard time visualizing how data-oriented design might work for anything but the most trivial cases (3d models, particles, BSP-trees, and so on). Are there any good examples out there which really embraces data-oriented design and shows how this might work in practice? I can plow through large code-bases if needed.
What I'm especially interested in is the mantra "where there's one there are many", which I can't really seem to connect with the rest here. Yes, there are always more than one enemy, yet, you still need to update each enemy individually, cause they're not moving the same way now are they? Same goes for the 'balls'-example in the accepted answer to the question above (I actually asked this in a comment to that answer, but haven't gotten a reply yet). Is it simply that the rendering will only need the positions, and not the velocities, whereas the game simulation will need both, but not the materials? Or am I missing something? Perhaps I've already understood it and it's a far more straightforward concept than I thought.
Any pointers would be much appreciated!

So, what is DOD all about? Obviously, it's about performance, but it's not just that. It's also about well-designed code that is readable, easy to understand and even reusable.
Now Object Oriented design is all about designing code and data to fit into encapsulated virtual "objects". Each object is a seperate entity with variables for properties that object might have and methods to take action on itself or other objects in the world. The advantage of OO design is that it's easy to mentally model your code into objects because the whole (real) world around us seems to work in the same way. Objects with properties that can interact with each other.
Now the problem is that the cpu in your computer works in a completely different way. It works best when you let it do the same things again and again. Why is that? Because of a little thing called cache. Accessing RAM on a modern computer can take 100 or 200 CPU cycles (and the CPU has to wait all that time!), which is way too long. So there's this small portion of memory on the CPU that can be accessed really quickly, cache memory. Problem is it's only a few MB tops. So every time you need data that wasn't in cache, you still need to go the long way to RAM. That's not just that way for data, the same goes for code. Trying to execute a function that's not in instruction cache will cause a stall while the code is loaded from RAM.
Back to OO programming. Objects are big, but most functions need only a small portion of that data, so we're wasting cache by loading unnecessary data. Methods call other methods which call other methods, thrashing your instruction cache. Still, we often do a lot of the same stuff over and over again. Let's take a bullet from a game for example. In a naive implementation each bullet could be a separate object. There might be a bullet manager class. It calls the first bullet's update function. It updates the 3D position using the direction/velocity. This causes a lot of other data from the object to be loaded into the cache. Next, we call the World Manager class to check for a collision with other objects. This loads lots of other stuff into the cache, maybe it even causes code from the original bullet manager class to get dropped from instruction cache. Now we return to the bullet update, there was no collision, so we return to bullet manager. It might need to load some code again. Next up, bullet #2 update. This loads lots of data into the cache, calls world... etc. So in this hypthetical situation, we've got 2 stalls for loading code and let's say 2 stalls for loading data. That's at least 400 cycles wasted, for 1 bullet, and we haven't taken bullets that hit something else into account. Now a CPU runs at 3+ GHz so we're not going to notice one bullet, but what if we've got 100 bullets? Or even more?
So this is the where there's one there's many story. Yes, there are some cases where you've only got on object, your manager classes, file access, etc. But more often, there's a lot of similar cases. Naive, or even not-naive object oriented design will lead to lots of problems. So enter data oriented design. The key of DOD is to model your code around your data, not the other way around as with OO-design. This starts at the first stages of design. You do not first design your OO code and then optimize it. You start by listing and examining your data and thinking out how you want to modify it(I'll get to a practical example in a moment). Once you know how your code is going to modify the data you can lay it out in a way that makes it as efficient as possible to process it. Now you may think this can only lead to a horrible soup of code and data everywhere but that is only the case if you design it badly (bad design is just as easy with OO programming). If you design it well, code and data can be neatly designed around specific functionality, leading to very readable and even very reusable code.
So back to our bullets. Instead of creating a class for each bullet, we only keep the bullet manager. Each bullet has a position and a velocity. Each bullet's position needs to be updated. Each bullet has to have a collision check and all bullets that have hit something need to take some action accordingly. So just by taking a look at this description I can design this whole system in a much better way. Let's put the positions of all bullets in an array/vector. Let's put the velocity of all bullets in an array/vector. Now let's start by iterating allong those two arrays and updating each position value with it's corresponding velocity. Now, all data loaded into the data cache is data we're going to use. We can even put a smart pre-loading command to already pre-load some array data in advance so the data's in cache when we get to it. Next, collision check. I'm not going into detail here, but you can imagine how updating all bullets after each other can help. Also note that if there's a collision, we're not going to call a new function or do anything. We just keep a vector with all bullets that had collision and when collision checking is done, we can update all those after each other. See how we just went from lots of memory access to almost none by laying our data out differently? Did you also notice how our code and data, even though not designed in an OO way any more, are still easy to understand and easy to reuse?
So to get back to the "where there's one there's many". When designing OO code you think about one object, the prototype/class. A bullet has a velocity, a bullet has a position, a bullet will move each frame by it's velocity, a bullet can hit something, etc. When you think about that, you will think about a class, with velocity, position, and an update function which moves the bullet and checks for collision. However, when you have multiple objects you need to think about all of them. Bullets have positions, velocity. Some bullets may have collision. Do you see how we're not thinking about an individual object any longer? We're thinking about all of them and are designing code a lot differently now.
I hope this helps answer your second part of the question. It's not about whether you need to update each enemy or not, it's about the most efficient way to update them. And while designing only your enemies using DOD may not help gain much performance, designing the entire game around these principles (only where applicable!) may lead to a lot of performance gains!
So onto the first part of the question, that is other examples of DOD. I'm sorry but I don't have that much there. There is one really good example though, I came across this some time ago, a series on data oriented design of a behavior tree by Bjoern Knafla: http://bjoernknafla.com/data-oriented-behavior-tree-overview You probably want to start at the first one in the series of 4, links are in the article itself.
Hope this still helps, despite the old question. Or maybe some other SO user come across this question and have some use from this answer.

I read the question you linked to and the article.
I've read one book on the subject of data driven design.
I'm pretty much in the same boat as you.
The way I understand Noel's article is that you design your game in the typical object oriented way. You have classes and methods that work on the classes.
After you've done your design, you ask yourself the following question:
How can I arrange all of the data I've designed in one huge blob?
Think of it in terms of writing your entire design as one functional method, with lots of subordinate methods. It reminds me of the massive 500,000 line Cobol programs of my youth.
Now, you probably won't write the entire game as one huge functional method. Really, in the article, Noel is talking about the rendering portion of a game. Think of it as a game engine (the one huge functional method) and the code to drive the game engine (the OOP code).
What I'm especially interested in is the mantra "where there's one there are many", which I can't really seem to connect with the rest here. Yes, there are always more than one enemy, yet, you still need to update each enemy individually, cause they're not moving the same way now are they?
You're thinking in terms of objects. Try thinking in terms of functionality.
Each enemy update is an iteration of a loop.
What's important is that the enemy data is structured to be in one memory location, rather than spread across enemy object instantiations.

Related

How small should functions be? [duplicate]

This question already has answers here:
When is a function too long? [closed]
(24 answers)
Closed 5 years ago.
How small should I be making functions? For example, if I have a cake baking program.
bakeCake(){
if(cakeType == "chocolate")
fetchIngredients("chocolate")
else
if(cakeType == "plain")
fetchIngredients("plain")
else
if(cakeType == "Red velvet")
fetchIngredients("Red Velvet")
//Rest of program
My question is, while this stuff is simple enough on its own, when I add much more stuff to the bakeCake function it becomes cluttered. But lets say that this program has to bake thousands of cakes per second. From what I've heard, it takes significantly longer (relative to computer time) to use another function compared to just doing the statements in the current function. So something that's similar like this should be very easy to read, and if efficiency is important wouldn't I want to keep it in there?
Basically, at what point do I sacrifice readability for efficiency. And a quick bonus question, at what point does having too many functions decrease readability? Here's an example of Apple's swift tutorial.
func isCandyAmountAcceptable(bandMemberCount: Int, candyCount: Int) -> Bool {
return candyCount % bandMemberCount == 0
They said that because the function name isCandyAmountAcceptable was easier to read than candyCount % bandMemberCount == 0 that it'd be good to make a function for that. But from my perspective it may take a few seconds to figure out what the second option is saying, but it's also more readable when ti comes to knowing how it works.
Sorry about being all over the place and kinda asking 2 questions in one. Just to summarize my questions:
Does using functions extraneously make efficiency (speed) suffer? If it does how can I figure out what the cutoff between readability and efficiency is?
How small and simple should I make functions for? Obviously I'd make them if I ever have to repeat the function, but what about one time use functions?
Thanks guys, sorry if these questions are ignorant or anything but I'd really appreciate an answer.
Does using functions extraneously make efficiency (speed) suffer? If
it does how can I figure out what the cutoff between readability and
efficiency is?
For performance I would generally not factor in any overhead of direct function calls against any decent optimizer, since it can even make those come free of charge. When it doesn't, it's still a negligible overhead in, say, 99.9% of scenarios. That applies even for performance-critical fields. I work in areas like raytracing, mesh processing, and image processing and still the cost of a function call is typically on the bottom of the priority list as opposed to, say, locality of reference, efficient data structures, parallelization, and vectorization. Even when you're micro-optimizing, there are much bigger priorities than the cost of a direct function call, and even when you're micro-optimizing, you often want to leave a lot of the optimization for your optimizer to perform instead of trying to fight against it and do it all by hand (unless you're actually writing assembly code).
Of course with some compilers you might deal with ones that never inline function calls and have a bit of an overhead to every function call. But in that case I'd still say it's relatively negligible since you probably shouldn't be worrying about such micro-level optimizations when using those languages and interpreters/compilers. Even then it will probably often be bottom on the priority list, relatively speaking, as opposed to more impactful things like improving locality of reference and thread efficiency.
It's like if you're using a compiler with very simplistic register allocation that has a stack spill for every single variable you use, that doesn't mean you should be trying to use and reuse as few variables as possible to work around its tendencies. It means reach for a new compiler in those cases where that's a non-negligible overhead (ex: write some C code into a dylib and use that for the most performance-critical parts), or focus on higher-level optimizations like making everything run in parallel.
How small and simple should I make functions for? Obviously I'd make
them if I ever have to repeat the function, but what about one time
use functions?
This is where I'm going to go slightly off-kilter and actually suggest you consider avoiding the teeniest of functions for maintainability reasons. This is admittedly a controversial opinion although at least John Carmack seems to somewhat agree (specifically in respect to inlining code and avoiding excess function calls for cases where side effects occur to make the side effects easier to comprehend).
However, if you are going to make a lot of state changes, having them
all happen inline does have advantages; you should be made constantly
aware of the full horror of what you are doing.
The reason I believe it can sometimes be good to err on the side of meatier functions is because there's often more to comprehend than that of a simple function to understand all the information necessary to make a change or fix a problem.
Which is simpler to comprehend, a function whose logic consists of 80 lines of inlined code, or one distributed across a couple dozen functions and possibly ones that lead to disparate places throughout the codebase?
The answer is not so clear cut. Naturally if the teeny functions are used widely, like say sqrt or abs, then the reader can simply skim over the function call, knowing full well what it does like the back of his hand. But if there are a lot of teeny exotic functions that are only used one time, then the ability to comprehend the operation as a whole requires looking them up and understanding what they all individually do before you can get a proper comprehension of what's going on in terms of the big picture.
I actually disagree with that Apple Swift tutorial somewhat with that one-liner function because while it is easier to understand than figuring out what the arithmetic and comparison are supposed to do, in exchange it might require looking it up to see what it does in scenarios where you can't just say, isCandyAmountAcceptable is enough information for me and need to figure out exactly what makes an amount acceptable. Instead I would actually prefer a simple comment:
// Determine if candy amount is acceptable.
if (candyCount % bandMemberCount == 0)
...
... because then you don't have to jump to disparate places in code (the analogy of a book referring its reader to other pages in the book causing the readers to constantly have to flip back and forth between pages) to figure that out. Of course the idea behind this isCandyAmountAcceptable kind of function is that you shouldn't have to be concerned with such details about what makes a candy amount of acceptable, but too often in practice, we do end up having to understand the details more often than we optimally should to debug the code or make changes to it. If the code never needs to be debugged or changed, then it doesn't really matter how it's written. It could even be written in binary code for all we care. But if it's written to be maintained, as in debugged and changed in the future, then sometimes it is helpful to avoid making the readers have to jump through lots of hoops. The details do often matter in those scenarios.
So sometimes it doesn't help to understand the big picture by fragmenting it into the teeniest of puzzle pieces. It's a balancing act, but certain types of developers can err on the side of overly dicing up their systems into the most granular bits and pieces and finding maintenance problems that way. Those types are still often promising engineers -- they just have to find their balance. The other extreme is the one that writes 500-line functions and doesn't even consider refactoring -- those are kinda hopeless. But I think you fit in the former category, and for you, I'd actually suggest erring on the side of meatier functions ever-so-slightly just to keep the puzzle pieces a healthy size (not too small, not too big).
There's even a balancing act I see between code duplication and minimizing dependencies. An image library doesn't necessarily become easier to comprehend by shaving off a few dozen lines of duplicated math code if the exchange is a dependency to a complex math library with 800,000 lines of code and an epic manual on how to use it. In such cases, the image library might very well be easier to comprehend as well as use and deploy in new projects if it chooses instead to duplicate a few math functions here and there to avoid external dependencies, isolating its complexity instead of distributing it elsewhere.
Basically, at what point do I sacrifice readability for efficiency.
As stated above, I don't think readability of the small picture and comprehensibility of the big picture are synonymous. It can be really easy to read a two-line function and know what it does and still be miles away from understanding what you need to understand to make the necessary changes. Having many of those teeny one-shot two-liners can even delay the ability to comprehend the big picture.
But if I use "comprehensibility vs. efficiency" instead, I'd say upfront at the design-level for cases where you anticipate processing huge inputs. As an example, a video processing application with custom filters knows it's going to be looping over millions of pixels many times per frame. That knowledge should be utilized to come up with an efficient design for looping over millions of pixels repeatedly. But that's with respect to design -- towards the central aspects of the system that many other places will depend upon because big central design changes are too costly to apply late in hindsight.
That doesn't mean it has to start applying hard-to-understand SIMD code right off the bat. That's an implementation detail provided the design leaves enough breathing room to explore such an optimization in hindsight. Such a design would imply abstracting at the Image level, at the level of a million+ pixels, not at the level of a single IPixel. That's the worthy thing to take into consideration upfront.
Then later on, you can optimize hotspots and potentially use some difficult-to-understand algorithms and micro-optimizations here and there for those truly critical cases where there's a strong perceived business need for the operation to go faster, and hopefully with good tools (profilers, i.e.) in hand. The user cases guide you about what operations to optimize based on what the users do most often and find a strong desire to spend less time waiting. The profiler guides you about precisely what parts of the code involved in that operation need to be optimized.
Readability, performance and maintainability are three different things. Readability will make your code look simple and understandable, not necessarily best way to go. Performance is always going to be important, unless you are running this code in non-production environment where end result is more important than how it was achieved. Enter the world of enterprise applications, maintainability suddenly gains lot more importance. What you work on today will be handed over to somebody else after 6 months and they will be fixing/changing your code. This is why suddenly standard design patterns become so important. In a way, the readability is part of maintainability on larger scale. If the cake baking program above is something more complex than what its looking like, first thing stands out as a code smell is existence if if-else. Its gotta get replaced with polymorphism. Same goes with switch case kind of construct.
At what point do you decide to sacrifice one for other? That purely depends upon what business your code is achieving. Is it academic? Its got to be the perfect solution even if it means 90% devs struggle to figure out at first glance what the hell is happening. Is it a website belonging to retail store being maintained by distributed team of 50 devs working from 2 or more different geographic locations? Follow the conventional design patterns.
A rule of thumb I have always seen being followed in almost all situations is that if a function is growing beyond half the screen, its a candidate for refactoring. Do you have functions that end up you having your editor long length scroll bars? Refactor!!!

Refactor or Rewrite UI Layer from Scratch

In most cases it is better to refactor than rewrite a full codebase. We have quite interesting situation. In our application business layer is pretty good. With unit tests, separation of concerns, etc. It does have some problems, but it can be refactored.
However UI layer is outdated. It is ASP.NET + some AJAX, but we want to migrate to pure AJAX application (ExtJS + REST). The application is quite large and has about 100 separate screens. What would you advise?
I'm very familiar with your app Michael. I've also been in that exact situation twice in the past few years. There is ENORMOUS benefit in rewriting from scratch. You can't incrementally improve where you are. See this from Kathy Sierra.
Bite the bullet and redesign from the ground up. You have an opportunity to blow away the competition, but it's never going to happen with incremental improvements.
I don't see prospect of a truly helpful, generic, answer. Given complete freedom I'd agree with Glen and bit the bullet and go for the whole thing. However commercial realities may prevent that, if the application is big enough then the up front investment may just be too big. Then question is whether you can possibly get to where you need to be in increments.
For example, suppose that your app follows a common pattern, there are selection screens where the items to be worked on are picked by a user and they lead to data entry screens. Then, can you rework the selection screen with nice ajax stuff but retain the data entry screens. Address those on the next iteration.

Game framework architecture - view components or MVC?

I'm trying to build a very light re-usable framework for my games, rather than starting from scratch each time I start a game. I have a component driven architecture - e.g. Entity composes a Position component and a Health component and Ai component etc.
My big question is whether my model composes view components to allow for more than one view of the model, or whether to use a truer MVC where the model does not know about its views, and they are managed externally.
I have tried both methods but if anyone knows the pros and cons of each approach and which is the industry standard, it would be great to know.
depends on your audience, game devs, myself included aren't very used to the MVC model, although most know it, it's not as easy to keep it clean cut, because of development casualties (not any serious technical reasons). So from experience, I've seen dozens of game frameworks start as MVC, but only a pair were able to maintain it until the end. My theory is MVC adds too much complexity and little benefits for small throwaway games (with normally a few devs), and it's to hard to keep really cleanly separate most game objects into these layers for large/complex games. And since games have a release date, they many times sacrifice code clarity and reusability for performance and quick adhoc solutions (that will get rewritten if necessarry in the sequel (if there is one)).
However, with the caveat above, it's better to aim high, because if you succeed it's better :) and if you fail, well to bad. So you should probably try the MVC, but don't worry if it fails, profesional game devs have all failed at the task many times :)
I’d certainly vote for the model to know nothing about its views. Loose coupling is good: Simpler model code, easier testing, more choices.
I know this question might be outdated, but I need to reply on it.
Actually, I started programming a game in Lua (with LÖVE) and I started programming a MVC - Framework for it.
At first, to use MVC really depends on what you want.
I know my problems with game programming, when the program becomes bigger, and mostly the structure becoms too complex to maintain.
Next thing is, I know that I will change all the graphics when I find an artist who is willing to work for it. But until then, I'm gonna use my own dummy graphics.
I want the artist to feel free to do what ever he wants, without beeing dependend on any resolution or color restriction.
That means, I might have to change the whole (!) presentation code. Maybe even the way objects interact (collision detection, f.e.).
The game logic is captured in the models, so I can concentrate on that. And I think game logic is the most important part of making a game. Isn't it?
Hope you see my point.
But, if you have everything together: all the graphics, sounds, the whole thing; then you can code straight forward.
My MVC is a configuration-over-convention-ass, that slows down prototyping a bit.
BUT(!) iterations of development can be made much more easily. Testing, especially Unit-Tests are done much more faster.
I would say MVC turns you development-speed-curve (which is normally an anti-exponential curve) into an exponential curve. Slow at the beginning, but more and more fast at the end.
MVC works really well for games, at least for my games which are designed for cross-platform.
It really depends on how you implement it in order to get the benefit.

Dealing with god objects

I work in a medium sized team and I run into these painfully large class files on a regular basis. My first tendency is to go at them with a knife, but that usually just makes matters worse and puts me into a bad state of mind.
For example, imagine you were just given a windows service to work on. Now there is a bug in this service and you need to figure out what the service does before you can have any hope of fixing it. You open the service up and see that someone decided to just use one file for everything. Start method is in there, Stop method, Timers, all the handling and functionality. I am talking thousands of lines of code. Methods under a hundred lines of code are rare.
Now assuming you cannot rewrite the entire class and these god classes are just going to keep popping up, what is the best way to deal with them? Where do you start? What do you try to accomplish first? How do you deal with this kind of thing and not just want to get all stabby.
If you have some strategy just to keep your temper in check, that is welcome as well.
Tips Thus Far:
Establish test coverage
Code folding
Reorganize existing methods
Document behavior as discovered
Aim for incremental improvement
Edit:
Charles Conway recommend a podcast which turned out to be very helpful. link
Michael Feathers (guy in the podcast) begins with the premise that were are too afraid to simply take a project out of source control and just play with it directly and then throw away the changes. I can say that I am guilty of this.
He essentially said to take the item you want to learn more about and just start pulling it apart. Discover it's dependencies and then break them. Follow it through everywhere it goes.
Great Tip
Take the large class that is used elsewhere and have it implement an emtpy interface. Then take the code using the class and have it instantiate the interface instead. This will give you a complete list of all the dependencies to that large class in your code.
Ouch! Sounds like the place I use to work.
Take a look at Working effectivly with legacy code. It has some gems on how to deal with atrocious code.
DotNetRocks recently did a show on working with legacy code. There is no magic pill that is going to make it work.
The best advice I've heard is start incrementally wrapping the code in tests.
That reminds me of my current job and when I first joined. They didn't let me re-write anything because I had the same argument, "These classes are so big and poorly written! no one could possibly understand them let alone add new functionality to them."
So the first thing I would do is to make sure there are comprehensive testing behind the areas that you're looking to change. And at least then you will have a chance of changing the code and not having (too many) arguments (hopefully). And by tests, I mean testing the components functionally with integration or acceptance tests and making sure it is 100% covered. If the tests are good, then you should be able to confidently change the code by splitting up the big class into smaller ones, getting rid of duplication etc etc
Even if you cannot refactor the file, try to reorganize it. Move methods/functions so that they are at least organized within the file logically. Then put in lots of comments explaining each section. No, you haven't rewritten the program, but at least now you can read it properly, and the next time you have to work on the file, you'll have lots of comments, written by you (which hopefully means that you will be able to understand them) which will help you deal with the program.
Code Folding can help.
If you can move stuff around within the giant class and organize it in a somewhat logical way, then you can put folds around various blocks.
Hide everthing, and you're back to a C paradigm, except with folds rather than separate files.
I've come across this situation as well.
Personally I print out (yeah, it can be a lot of pages) the code first. Then I draw a box around sections of code that are not part of any "main-loop" or are just helper functions and make sure I understand these things first. The reason is they are probably referred to many times within the main body of the class and it's good to know what they do
Second, I identify the main algorithm(s) and decompose them into their parts using a numbering system that alternates between numbers and letters (it's ugly but works well for me). For example you could be looking at part of an algorithm 4 "levels" deep and the numbering would be 1.b.3.e or some other god awful thing. Note that when I say levels, I am not referring directly to control blocks or scope necessarily, but where I have identified steps and sub-steps of an algorithm.
Then it's a matter of just reading and re-reading the algorithm. When you start out it sounds like a lot of time, but I find that doing this develops a natural ability to comprehend a great deal of logic all at once. Also, if you discover an error attributed to this code, having visually broken it down on paper ahead of time helps you "navigate" the code later, since you have a sort of map of it in your head already.
If your bosses don't think you understand something until you have some form of UML describing it, a UML sequence diagram could help here if you pretend the sub-step levels are different "classes" represented horizontally, and start-to-finish is represented vertically from top-to-bottom.
I feel your pain. I tackled something like this once for a hobby project involving processing digital TV data on my computer. A fellow on a hardware forum had written an amazing tool for recording shows, seeing everything that was on, and more. Plus, he had done incredibly vital work of working around bugs in real broadcast signals that were in violation of the standard. He'd done amazing work with thread scheduling to be sure that no matter what, you wouldn't lose those real-time packets: on an old Pentium, he could record four streams simultaneously while also playing Doom and never lose a package. In short, this code incorporated a ton of great knowledge. I was hoping to take some pieces and incorporate them into my own project.
I got the source code. One file, 22,000 lines of C, no abstraction. I spent hours reading it; there was all this great work, but it was all done badly. I was not able to reuse a single line or even a single idea.
I'm not sure what the moral of the story is, but if I had been forced to use this stuff at work, I would have begged permission to chip pieces off it one at a time, build unit tests for each piece, and eventually grow a new, sensible thing out of the pieces. This approach is a bit different than trying to refactor and maintain a large brick in place, but I would rather have left the legacy code untouched and tried to bring up a new system in parallel.
The first thing I would do is write some unit tests to box the current behavior, assuming that there are none already. Then I'd start in the area where I need to make the change and try to get that method cleaned up -- i.e. refactor working code before introducing changes. Use common refactoring techniques to extract and reuse methods from existing long methods to make them more understandable. When you extract a method, look for other places in the code where similar code exists, box that area, and reuse the method you've just extracted.
Look for groups of methods that "hang together" that can be broken out into their own classes. Write some tests for how those classes should work, build the classes using the existing code as a template if need be, then substitute the new classes into the existing code, removing the methods that they replace. Again, using your tests to make sure that you're not breaking anything.
Make enough improvement to the existing code so that you feel you can implement your new feature/fix in a clean way. Then write the tests for the new feature/fix and implement to pass the tests. Don't feel that you have to fix everything the first time. Aim for gradual improvement, but always leave the code better than you found it.

Design & Coding - top to bottom or bottom to top? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
When coding, what in your experience is a better approach?
Break the problem down into small enough pieces and then implement each piece.
Break the problem down, but then implement using a top-down approach.
Any other?
I tend to design top-down and implement bottom-up.
For implementation, building the smallest functional pieces and assembling them into the higher-level structures seems to be what works best for me. But, for design, I need to start from the overall picture and break it down to determine what those pieces will be.
Here's what I do:
Understand the domain first. Understand the problem to be solved. Make sure you and the customer (even if that customer is you!) are on the same page as to what problem is to be solved.
Then a high level solution is proposed to the problem and from that, the design will turn into bubbles or bullets on a page or whatever, but the point is that it will shake out into components that can be designed.
At that point, I write tests for the yet-to-be written classes and then flesh out the classes to pass those tests.
I use a test-first approach and build working, tested components. That is what works for me. When the component interfaces are known and the 'rules' are known for how they talk to each other and provide services to each other, then it becomes generally a straightforward 'hook everything together' exercise.
That's how I do it, and it has worked well for me.
You might want to look over the Agile Manifesto. Top down and bottom up are predicated on Built It All At Once design and construction.
The "Working software over comprehensive documentation" means the first thing you build is the smallest useful thing you can get running. Top? Bottom? Neither.
When I was younger, I worked on projects that were -- by contract -- strictly top down. This doesn't work. Indeed, it can't work. You get mountains of redundant design and code as a result. It was not a sound approach when applied mindlessly.
What I've noticed is that the Agile approach -- small pieces that work -- tends to break the problem down to parts that can be grasped all at once. The top-down/bottom-up no longer matters as much. Indeed, it may not matter at all.
Which leads do: "How do you decompose for Agile development?" The trick is to avoid creating A Big Thing that you must then decompose. If you analyze a problem, you find actors trying to accomplish use cases and failing because they don't have all the information, or they don't have it in time, or they can't execute their decisions, or something like that.
Often, these aren't Big Things that need decomposition. When they are, you need to work through the problem in the Goals Backward direction. From Goals to things that enable you to make that goal to things that enable the enablers, etc. Since goals are often Big Things, this tends to be Top Down -- from general business goal to detailed business process and step.
At some point, we overview these various steps that lead to the goals. We've done the analysis part (breaking things down). Now comes the synthesis part: we reassemble what we have into things we can actually build. Synthesis is Bottom Up. However, let's not get carried away. We have several points of view, each of which is different.
We have a model. This is often built from details into a larger conceptual model. Then, sometimes decomposed again into a model normalized for OLTP. Or decomposed into a star schema normalized for OLAP. Then we work back up to create a ORM mapping from the normalized model. Up - Down - Up.
We have processing. This is often built from summaries of the business processes down into details of processing steps. Then software is designed around the steps. Then the software is broken into classes and methods. Down - Up - Down.
[Digression. With enlightened users, this decomposition defines new job titles and ways of working. With unenlightened users, the old jobs stay and we write mountains of documentation to map old jobs onto new software.]
We have components. We often look at the pieces, look at what we know about available components, and do a kind of matching. This is the randomest process; it's akin to the way crystals form -- there are centers of nucleation and the design kind of solidifies around those centers. Web Services. Database. Transaction Management. Performance. Volume. Different features that somehow help us pick components that implement some or all of our solution. Often feels bottom-up (from feature to product), but sometimes top-down ("I'm holding a hammer, call everything a nail" == use the RDBMS for everything.)
Eventually we have to code. This is bottom up. Kind of. You have to define a package structure. You have to define classes as a whole. That part was top down. You have to write methods within the classes. I often do this bottom-up -- rough out the method, write a unit test, finish the method. Rough out the next method, write a unit test, finish the method.
The driving principle is Agile -- build something that works. The details are all over the map -- up, down, front, back, data, process, actor, subject area, business value.
Yes. Do all of those things.
It may seem sarcastic (sorry, I revert to form), but this really is a case where there is no right answer.
Also in the agile way, write your test(s) first!
Then all software is a continual cycle of
Red - the code fails the test
Green - the code passes the test
Refactor - code improvements that are intention-preserving.
defects, new features, changes. It all follows the same pattern.
Your 2nd option is a reasonable way to go. If you break the problem down into understandable chunks, the top down approach will reveal any major design flaws before you implement all the little details. You can write stubs for lower level functionality to keep everything hanging together.
I think there's more to consider than top- verses bottom-down design. You obviously need to break the design up into manageable units of work but you also need to consider prioritisation etc. And in an iterative development project, you will often redefine the problem for the next iteration once you've delivered the solution for the previous one.
When designing, I like to do middle-out. I like to model the domain, then design out the classes, move to the database and UI from there. If there are specific features that are UI-based or database-based, I may design those up front as well.
When coding, I generally like to do bottom-up (database first, then business entities, then UI) if at all possible. I find it is a lot easier to keep things straight with this method.
I believe that with good software designers (and in my opinion all software developers should also be software designers at some level), the magic is in being able to do top-down and bottom-up simultaneously.
What I was "schooled" to do by my mentors is start by very brief top-down to understand the entities involved, then move to bottom-up to figure out the basic elements I want to create, then to back up and see how I can go one level down, knowing what I know about the results of my bottom up, and so forth until "they meet in the middle".
Hope that helps.
Outside-in design.
You start with what you're trying to achieve at the top end, and you know what you've got to work with at the bottom end. Keep working both ends until they meet in the middle.
I sort of agree with all of the people saying "neither", but everyone falls somewhere on the spectrum.
I'm more of a top-down kind of guy. I pick one high level feature/point/whatever and implement it as a complete program. This lets me sketch out a basic plan and structure within the confines of the problem domain.
Then I start with another feature and refactor out everything from the original that can be used by the second into new, shared entities. Lather, rinse, repeat until application is complete.
However, I know a lot of people who are bottom up guys, who hear a problem and start thinking about all the support subsystems that they could need to build the application on top of it.
I don't believe either approach is wrong or right. They both can achieve results. I even try and find bottom up guys to work with, as we can attack the problem from two different perspectives.
Both are valid approaches. Sometimes one just "feels" more natural than the other. However, there is one big problem: some mainstream languages and especially their frameworks and libraries really heavily on IDE support, such as syntax highlighting, background type checking, background compilation, intelligent code completion, IntelliSense and so on.
However, this doesn't work with top-down coding! In top-down coding, you constantly use variables, fields, constants, functions, procedures, methods, classes, modules, traits, mixins, aspects, packages and types that you haven't implemented yet! So, the IDE will constantly yell at you because of compile errors, there will be red squiggly lines everywhere, you will get no code completion and so on. So, the IDE pretty much prohibits you from doing top-down coding.
I do a variant of top-down. I tend to try and do the interface first - I then use that as my list of features. What's good about this version is, it still works with IDE that would otherwise complain. Just comment out the one function call to what's not yet been implemented.

Resources