what is boiler-plate code? why should it be avoided? - refactoring

What is boiler-plate code, and why it is called like that?
Example for android:
onCreate(Bundle saveInstance){
setcontentView(R.layout.m);
findViewById(R.id.f1);
findViewById(R.id.f2);
findViewById(R.id.f3);
findViewById(R.id.f4);
}
What other examples are there? Why should we avoid boiler-plate code?

Boilerplate code is repetitious code that needs to be included in many places. The origins are well explained by the wikipedia article on the subject:
Interestingly, the term arose from the newspaper business. Columns and other pieces that were syndicated were sent out to subscribing newspapers in the form of a mat (i.e. a matrix). Once received, boiling lead was poured into this mat to create the plate used to print the piece, hence the name boilerplate. As the article printed on a boilerplate could not be altered, the term came to be used by attorneys to refer to the portions of a contract which did not change through repeated uses in different applications, and finally to language in general which did not change in any document that was used repeatedly for different occasions.
There are several problems with boilerplate code:
It's error prone. Your example may be straight-forward, but not all boilerplate instances are. In cases where the code includes a bit more logic to it there's more room to make mistakes (=bugs). Especially if these blocks just get copy-pasted from place to place, but require some alterations to work.
Moreover - if you ever have to change this logic, it's much harder to do, as you need to go over several places to do it.
It takes up screen realestate and attention. More code to read means there's more things you need to process when trying to understand a piece of code you're reading. Boilerplate code just adds another distraction.
It takes up actual space in the final (usually compiled) product. What would you rather deliver? A 1MB JAR file or a 10MB one?

Related

Advice on working with legacy code

I need some advice on how to work with legacy code.
A while ago, I was given the task to add a few reports to a reporting app. written in Struts 1, back in 2005. No big deal, but the code is quite messy. No usage of Action forms, and basically the code is one huge action, and a lot of if-else statements inside. Also, no one here has functional knowledge on this. We just happened to have it in our contract.
I'm quite unhappy about this, and not sure how to proceed. This application is invisible: Few people (but all very important) use it, so they don't care whether my eyes bleed while reading the code, standards, etc.
However, I feel that a technical debt is to be paid. How should I proceed on this? Continue down the if-else road, or try to do this requirement the right way, ignoring the rest of the project? Starting a huge refactor, risking my deadline?
Legacy code is a big issue, and I'm sure people will not agree!
I would say that starting a big re-factor could be a mistake.
A big re-factor means doing a lot of work to make it function exactly the way that it does now. If you choose to take this on on your own, there won't be a lot of visibility of what you are doing. If it works, no one will know the hours of work you put it. If it does NOT work, and you end up with tidy code, but add some bugs (and who has ever written code without adding some bugs) then you will get 'why did this change' type questions.
I have currently nearly completed a project working on a 10 year old code base. We have done quite a few bits of re-factoring along the way. But for each re-factor we have made we can justify 'this specific change will make the actual task we are doing now easier'. Rather than 'this is now cleaner for future work'. We have found that as we worked on the code, fixing the issues that we actually come up against one at a time, we have cleaned up a lot of it, without breaking it (much).
And I would say before you can re-factor much, you will need automated tests, so you can be fairly happy that you have put it back together right!
Most re-factoring is done to 'make maintenance and future development easier'. Your project sounds like there is not a lot of future development coming. That limits the advantage a re-factor will give the company.
Rule #1: If it ain't broke, don't fix it.
Rule #2: When in doubt, reread rule #1.
Unfortunately, legacy code can very rarely be described as "it ain't broke." Therefor we must tweak the existing code to correct a newly found bug, tweak the existing code to modify behavior that was previously acceptable, or tweak the existing code to add new functionality.
My experience has taught me that any refactoring must be done in 'infinitesimally' small increments. If you must break rule #2, I suggest that you start your search with the inner-most nested loop or IF structure and expand outward until you find a clean, logical separation point and create a new function/method/subroutine that contains only the guts of that loop or structure. This won't make anything more efficient but it should give you a clearer view of the underlying logic and structure. Once you have several new, smaller functions/methods/subroutines you can refactor and consolidate those into something more manageable.
Rule #3: Ignore my previous paragraph and reread the first two rules.
I agree with other comments. If you don't have to, then don't do it. It usually cost far more then it's worth if the code base is more or less dead any way.
On the other hand, if you feel that you cannot get your head around the code then a refactor is probably unavoidable. If this is the case then, since it's a web application, can you create a solid suite of functional tests using selenium? If so this is the fastest and most rewarding test approach for such code and will catch most bugs for the buck.
Second, start with the extract method refactoring to create compose methods of the big difficult methods. Every time you think to your self "This should have a comment to explain what it does" you should extract it to a method with a name that replaces the comment.
Once this has been done, if you still can't add the functionality required, you can go for more advanced refactorings, and perhaps even adding some unit tests. But I usually find that I can add what is required/fix the bug in legacy code by just creating self documenting code.
In a few words: before make any modifications to legacy code its good idea to start from automated unit tests.
This will give developer understanding about key things: dependencies this piece of code has, input data, output results, boundary conditions so on.
When it’s done most likely you will better understand what this code does and how it works.
After that its make sense (but not must) clean code a bit giving more accurate names to local variables, moving some functionality (repetitive code, if any) in to functions with clear human friendly names.
A simple clean up could make code more readable and at the same time save developer from regression issues with unit tests help.
Refactoring – make small changes, step by step, when you have a time and understanding of the requirements and functionality, regularly unit testing the code.
But do not start from refactoring

Web site performance - what is it about? Readability and ignorance Vs. Performance

Let me cut to the chase...
On one hand, many of the programming advices given (here and on other places) emphasize the notion that code should always be as readable and as clear as possible, at (almost?!) any pefromance cost.
On the other hand there are SO many slow web sites (at least one of whom, I know from personal experience).
Obviously round trips and DB access, are issues a web developer should always keep in mind. But the trade-off between readability and what not to do because it slows things down, for me is very unclear.
Question are- 1.What else? 2.Is there a rule (preferably simple, but probably quite general) one should adhere to in order to make sure his code does not slow things down too much?
General best practices as well as specific advices would be much appreciated. Advices based on experience would be especially appreciated.
Thanks.
Edit: A little clarification: General peformance advices aren't hard to find. That's not what I'm looking for. I'm asking about two things- 1. While trying to make my code as readable as possible, when should I stop and say: "Now I'm hurting performance too much". 2. Little, less known things like- is selecting just one column faster than selecting all (Thanks Otávio)... Thanks again!
See the Stack Overflow discussion here:
What is the most important effect on performance in a database-backed web application?
The top voted answer was, "write it clean, and use a profiler to identify real problems and address them."
In my experience, the biggest mistake (using C#/asp.net/linq) is over-querying due to LINQ's ease-of-use. One huge query is usually much faster than 10000 small ones.
The other ASP.NET gotcha I see a lot is when the viewstate gets extremely fat and bloated. EnableViewState=false is your best friend, start every new project with it!
For web applications that have a database back end, it is extremely important that:
indexing is done properly
retrieval is done for what is needed (avoid select * when selecting specific fields will do - even more so if they are part of a covered index)
Also, whenever possible an appropriate caching strategy can help performance
Optimizing your code.
While making your code as readable as possible is very important. Optimizing it is equally as important. I've listed some items that will hopefully get you in the right direction.
For example in regards to Databases:
When you define the schema of your database, you should make sure that it is normalized and the indexes of fields are defined properly.
When running a query, specifically SELECT, only select the fields you need.
You should only make one connection to the database per page load.
Re-factor. This is probably the most important factor in producing clean, optimized code. Always go back and look at your code and see what can be done to improve it.
PHP Code:
Always test your work with a tool like PHPUnit.
echo is faster than print.
Wrap your string in single quotes (‘) instead of double quotes (“) is faster because PHP searches for variables inside “…” and not in ‘…’, use this when you’re not using variables you need evaluating in your string.
Use echo’s multiple parameters (or stacked) instead of string concatenation.
Unset or null your variables to free memory, especially large arrays.
Use strict code, avoid suppressing errors, notices and warnings thus resulting in cleaner code and less overheads. Consider having error_reporting(E_ALL) always on.
Incrementing an undefined local variable is 9-10 times slower than a pre-initialized one.
Methods in derived classes run faster than ones defined in the base class.
Error suppression with # is very slow.
Website Optimization
A good place to start is here (http://developer.yahoo.com/performance/rules.html)
Performance is a huge topic and there are a lot of things that you can do to help improve the performance of your website. It's something that takes time and experience.
Best,
Richard Castera
Scott and Rcastera did a good job covering DB and querying optimization. To address your question from a HTML / CSS / JavaScript standpoint:
CSS:
Readability is key. CSS is rendered so fast that you should never feel it is necessary to sacrifice readability for performance. As such, focus on adding in as many comments as necessary to document the code, why certain rules (like hacks) are there, and whatever else floats your comment boat. In CSS there are a few obvious rules to follow: 1) Use external stylsheets. 2) Limit external stylesheets to limit GET requests.
HTML: Like CSS, HTML is read so fast by the browser you should really only focus on writing clean code. Use whitespace, indentation, and comments to properly document your work. Only major things in HTML to remember are: 1) declare the <meta charset /> early within the head section. 2) Follow this guys advice to minimize browser reflows. *this rule actually applies to CSS as well.
JavaScript: Most optimizations for JavaScript are really well known by now so these'll seem obvious, like initializing variables outside of loops, pushing javascript to bottom of body so DOM loads before scripts start tying up all of the resources, avoiding costly statements like eval() or with(). Not to sound like a broken record, but keeping a well commented and easily readable script should still be a priority when developing JavaScript code. Especially since you can just minimize and compress away all the excess when you deploy it.

How "defensive" should my code be?

I was having a discussion with one of my colleagues about how defensive your code should be. I am all pro defensive programming but you have to know where to stop. We are working on a project that will be maintained by others, but this doesn't mean we have to check for ALL the crazy things a developer could do. Of course, you could do that but this will add a very big overhead to your code.
How do you know where to draw the line?
Anything a user enters directly or indirectly, you should always sanity-check. Beyond that, a few asserts here and there won't hurt, but you can't really do much about crazy programmers editing and breaking your code, anyway!-)
I tend to change the amount of defense I put in my code based on the language. Today I'm primarily working in C++ so my thoughts are drifting in that direction.
When working in C++ there cannot be enough defensive programming. I treat my code as if I'm guarding nuclear secrets and every other programmer is out to get them. Asserts, throws, compiler time error template hacks, argument validation, eliminating pointers, in depth code reviews and general paranoia are all fair game. C++ is an evil wonderful language that I both love and severely mistrust.
I'm not a fan of the term "defensive programming". To me it suggests code like this:
void MakePayment( Account * a, const Payment * p ) {
if ( a == 0 || p == 0 ) {
return;
}
// payment logic here
}
This is wrong, wrong, wrong, but I must have seen it hundreds of times. The function should never have been called with null pointers in the first place, and it is utterly wrong to quietly accept them.
The correct approach here is debatable, but a minimal solution is to fail noisily, either by using an assert or by throwing an exception.
Edit: I disagree with some other answers and comments here - I do not think that all functions should check their parameters (for many functions this is simply impossible). Instead, I believe that all functions should document the values that are acceptable and state that other values will result in undefined behaviour. This is the approach taken by the most succesful and widely used libraries ever written - the C and C++ standard libraries.
And now let the downvotes begin...
I don't know that there's really any way to answer this. It's just something that you learn from experience. You just need to ask yourself how common a potential problem is likely to be and make a judgement call. Also consider that you don't necessarily have to always code defensively. Sometimes it's acceptable just to note any potential problems in your code's documentation.
Ultimately though, I think this is just something that a person has to follow their intuition on. There's no right or wrong way to do it.
If you're working on public APIs of a component then its worth doing a good amount of parameter validation. This led me to have a habit of doing validation everywhere. Thats a mistake. All that validation code never gets tested and potentially makes the system more complicated than it needs to be.
Now I prefer to validate by unit testing. Validation definitely happens for data coming from external sources, but not for calls from non-external developers.
I always Debug.Assert my assumptions.
My personal ideology: the defensiveness of a program should be proportional to the maximum naivety/ignorance of the potential user base.
Being defensive against developers consuming your API code is not that different from being defensive against regular users.
Check the parameters to make sure they are within appropriate bounds and of expected types
Verify that the number of API calls which could be made are within your Terms of Service. Generally called throttling it usually only applies to web services and password checking functions.
Beyond that there's not much else to do except make sure your app recovers well in the event of a problem and that you always give ample information to the developer so that they understand what's going on.
Defensive programming is only one way of hounouring a contract in a design-by-contract manner of coding.
The other two are
total programming and
nominal programming.
Of course you shouldnt defend yourself against every crazy thing a developer could do, but then you should state in wich context it will do what is expected to using preconditions.
//precondition : par is so and so and so
function doSth(par)
{
debug.assert(par is so and so and so )
//dostuf with par
return result
}
I think you have to bring in the question of whether you're creating tests as well. You should be defensive in your coding, but as pointed out by JaredPar -- I also believe it depends on the language you're using. If it's unmanaged code, then you should be extremely defensive. If it's managed, I believe you have a little bit of wiggleroom.
If you have tests, and some other developer tries to decimate your code, the tests will fail. But then again, it depends on test coverage on your code (if there is any).
I try to write code that is more than defensive, but down right hostile. If something goes wrong and I can fix it, I will. if not, throw or pass on the exception and make it someone elses problem. Anything that interacts with a physical device - file system, database connection, network connection should be considered unereliable and prone to failure. anticipating these failures and trapping them is critical
Once you have this mindset, the key is to be consistent in your approach. do you expect to hand back status codes to comminicate problems in the call chain or do you like exceptions. mixed models will kill you or at least drive you to drink. heavily. if you are using someone elses api, then isolate these things into mechanisms that trap/report in terms you use. use these wrapping interfaces.
If the discussion here is how to code defensively against future (possibly malevolent or incompetent) maintainers, there is a limit to what you can do. Enforcing contracts through test coverage and liberal use of asserting your assumptions is probably the best you can do, and it should be done in a way that ideally doesn't clutter the code and make the job harder for the future non-evil maintainers of the code. Asserts are easy to read and understand and make it clear what the assumptions of a given piece of code is, so they're usually a great idea.
Coding defensively against user actions is another issue entirely, and the approach that I use is to think that the user is out to get me. Every input is examined as carefully as I can manage, and I make every effort to have my code fail safe - try not to persist any state that isn't rigorously vetted, correct where you can, exit gracefully if you cannot, etc. If you just think about all the bozo things that could be perpetrated on your code by outside agents, it gets you in the right mindset.
Coding defensively against other code, such as your platform or other modules, is exactly the same as users: they're out to get you. The OS is always going to swap out your thread at an inopportune time, networks are always going to go away at the wrong time, and in general, evil abounds around every corner. You don't need to code against every potential problem out there - the cost in maintenance might not be worth the increase in safety - but it sure doesn't hurt to think about it. And it usually doesn't hurt to explicitly comment in the code if there's a scenario you thought of but regard as unimportant for some reason.
Systems should have well designed boundaries where defensive checking happens. There should be a decision about where user input is validated (at what boundary) and where other potential defensive issues require checking (for example, third party integration points, publicly available APIs, rules engine interaction, or different units coded by different teams of programmers). More defensive checking than that violates DRY in many cases, and just adds maintenance cost for very little benifit.
That being said, there are certain points where you cannot be too paranoid. Potential for buffer overflows, data corruption and similar issues should be very rigorously defended against.
I recently had scenario, in which user input data was propagated through remote facade interface, then local facade interface, then some other class, to finally get to the method where it was actually used. I was asking my self a question: When should be the value validated? I added validation code only to the final class, where the value was actually used. Adding other validation code snippets in classes laying on the propagation path would be too defensive programming for me. One exception could be the remote facade, but I skipped it too.
Good question, I've flip flopped between doing sanity checks and not doing them. Its a 50/50
situation, I'd probably take a middle ground where I would only "Bullet Proof" any routines that are:
(a) Called from more than one place in the project
(b) has logic that is LIKELY to change
(c) You can not use default values
(d) the routine can not be 'failed' gracefully
Darknight

What can you do to a legacy codebase that will have the greatest impact on improving the quality?

As you work in a legacy codebase what will have the greatest impact over time that will improve the quality of the codebase?
Remove unused code
Remove duplicated code
Add unit tests to improve test coverage where coverage is low
Create consistent formatting across files
Update 3rd party software
Reduce warnings generated by static analysis tools (i.e.Findbugs)
The codebase has been written by many developers with varying levels of expertise over many years, with a lot of areas untested and some untestable without spending a significant time on writing tests.
Read Michael Feather's book "Working effectively with Legacy Code"
This is a GREAT book.
If you don't like that answer, then the best advice I can give would be:
First, stop making new legacy code[1]
[1]: Legacy code = code without unit tests and therefore an unknown
Changing legacy code without an automated test suite in place is dangerous and irresponsible. Without good unit test coverage, you can't possibly know what affect those changes will have. Feathers recommends a "stranglehold" approach where you isolate areas of code you need to change, write some basic tests to verify basic assumptions, make small changes backed by unit tests, and work out from there.
NOTE: I'm not saying you need to stop everything and spend weeks writing tests for everything. Quite the contrary, just test around the areas you need to test and work out from there.
Jimmy Bogard and Ray Houston did an interesting screen cast on a subject very similar to this:
http://www.lostechies.com/blogs/jimmy_bogard/archive/2008/05/06/pablotv-eliminating-static-dependencies-screencast.aspx
I work with a legacy 1M LOC application written and modified by about 50 programmers.
* Remove unused code
Almost useless... just ignore it. You wont get a big Return On Investment (ROI) from that one.
* Remove duplicated code
Actually, when I fix something I always search for duplicate. If I found some I put a generic function or comment all code occurrence for duplication (sometime, the effort for putting a generic function doesn't worth it). The main idea, is that I hate doing the same action more than once. Another reason is because there's always someone (could be me) that forget to check for other occurrence...
* Add unit tests to improve test coverage where coverage is low
Automated unit tests is wonderful... but if you have a big backlog, the task itself is hard to promote unless you have stability issue. Go with the part you are working on and hope that in a few year you have decent coverage.
* Create consistent formatting across files
IMO the difference in formatting is part of the legacy. It give you an hint about who or when the code was written. This can gave you some clue about how to behave in that part of the code. Doing the job of reformatting, isn't fun and it doesn't give any value for your customer.
* Update 3rd party software
Do it only if there's new really nice feature's or the version you have is not supported by the new operating system.
* Reduce warnings generated by static analysis tools
It can worth it. Sometime warning can hide a potential bug.
I'd say 'remove duplicated code' pretty much means you have to pull code out and abstract it so it can be used in multiple places - this, in theory, makes bugs easier to fix because you only have to fix one piece of code, as opposed to many pieces of code, to fix a bug in it.
Add unit tests to improve test coverage. Having good test coverage will allow you to refactor and improve functionality without fear.
There is a good book on this written by the author of CPPUnit, Working Effectively with Legacy Code.
Adding tests to legacy code is certianly more challenging than creating them from scratch. The most useful concept I've taken away from the book is the notion of "seams", which Feathers defines as
"a place where you can alter behavior in your program without editing in that place."
Sometimes its worth refactoring to create seams that will make future testing easier (or possible in the first place.) The google testing blog has several interesting posts on the subject, mostly revolving around the process of Dependency Injection.
I can relate to this question as I currently have in my lap one of 'those' old school codebase. Its not really legacy but its certainly not followed the trend of the years.
I'll tell you the things I would love to fix in it as they bug me every day:
Document the input and output variables
Refactor the variable names so they actually mean something other and some hungarian notation prefix followed by an acronym of three letters with some obscure meaning. CammelCase is the way to go.
I'm scared to death of changing any code as it will affect hundreds of clients that use the software and someone WILL notice even the most obscure side effect. Any repeatable regression tests would be a blessing since there are zero now.
The rest is really peanuts. These are the main problems with a legacy codebase, they really eat up tons of time.
I'd say it largely depends on what you want to do with the legacy code...
If it will indefinitely remain in maintenance mode and it's working fine, doing nothing at all is your best bet. "If it ain't broke, don't fix it."
If it's not working fine, removing the unused code and refactoring the duplicate code will make debugging a lot easier. However, I would only make these changes on the erring code.
If you plan on version 2.0, add unit tests and clean up the code you will bring forward
Good documentation. As someone who has to maintain and extend legacy code, that is the number one problem. It's difficult, if not downright dangerous to change code you don't understand. Even if you're lucky enough to be handed documented code, how sure are you that the documentation is right? That it covers all of the implicit knowledge of the original author? That it speaks to all of the "tricks" and edge cases?
Good documentation is what allows those other than the original author to understand, fix, and extend even bad code. I'll take hacked yet well-documented code that I can understand over perfect yet inscrutable code any day of the week.
The single biggest thing that I've done to the legacy code that I have to work with is to build a real API around it. It's a 1970's style COBOL API that I've built a .NET object model around, so that all the unsafe code is in one place, all of the translation between the API's native data types and .NET data types is in one place, the primary methods return and accept DataSets, and so on.
This was immensely difficult to do right, and there are still some defects in it that I know about. It's not terrifically efficient either, with all the marshalling that goes on. But on the other hand, I can build a DataGridView that round-trips data to a 15-year-old application which persists its data in Btrieve (!) in about half an hour, and it works. When customers come to me with projects, my estimates are in days and weeks rather than months and years.
As a parallel to what Josh Segall said, I would say comment the hell out of it. I've worked on several very large legacy systems that got dumped in my lap, and I found the biggest problem was keeping track of what I already learned about a particular section of code. Once I started placing notes as I go, including "To Do" notes, I stopped re-figuring out what I already figured out. Then I could focus on how those code segments flow and interact.
I would say just leave it alone for the most part. If it's not broken then don't fix it. If it is broken then go ahead and fix and improve the portion of the code that is broken and its immediately surrounding code. You can use the pain of the bug or sorely missing feature to justify the effort and expense of improving that part.
I would not recommend any wholesale kind of rewrite, refactor, reformat, or putting in of unit tests that is not guided by actual business or end-user need.
If you do get the opportunity to fix something, then do it right (the chance of doing it right the first time might have already passed, but since you are touching that part again might as well do it right time around) and this includes all the items you mentioned.
So in summary, there's no single or just a few things that you should do. You should do it all but in small portions and in an opportunistic manner.
Late to the party, but the following may be worth doing where a function/method is used or referenced often:
Local variables often tend to be poorly named in legacy code (often owing to their scope expanding when a method is modified, and not being updated to reflect this). Renaming these in line with their actual purpose can help clarify legacy code.
Even just laying out the method slightly differently can work wonders - for instance, putting all the clauses of an if on one line.
There might be stale/confusing code comments there already. Remove them if they're not needed, or amend them if you absolutely have to. (Of course, I'm not advocating removal of useful comments, just those that are a hindrance.)
These might not have the massive headline impact you're looking for, but they are low risk, particularly if the code can't be unit tested.

TDD ...how?

I'm about to start out my first TDD (test-driven development) program, and I (naturally) have a TDD mental block..so I was wondering if someone could help guide me on where I should start a bit.
I'm creating a function that will read binary data from socket and parses its data into a class object.
As far as I see, there are 3 parts:
1) Logic to parse data
2) socket class
3) class object
What are the steps that I should take so that I could incrementally TDD? I definitely plan to first write the test before even implementing the function.
The issue in TDD is "design for testability"
First, you must have an interface against which to write tests.
To get there, you must have a rough idea of what your testable units are.
Some class, which is built by a function.
Some function, which reads from a socket and emits a class.
Second, given this rough interface, you formalize it into actual non-working class and function definitions.
Third, you start to write your tests -- knowing they'll compile but fail.
Part-way through this, you may start head-scratching about your function. How do you set up a socket for your function? That's a pain in the neck.
However, the interface you roughed out above isn't the law, it's just a good idea. What if your function took an array of bytes and created a class object? This is much, much easier to test.
So, revisit the steps, change the interface, write the non-working class and function, now write the tests.
Now you can fill in the class and the function until all your tests pass.
When you're done with this bit of testing, all you have to do is hook in a real socket. Do you trust the socket libraries? (Hint: you should) Not much to test here. If you don't trust the socket libraries, now you've got to provide a source for the data that you can run in a controlled fashion. That's a large pain.
Your split sounds reasonable. I would consider the two dependencies to be the input and output. Can you make them less dependent on concrete production code? For instance, can you make it read from a general stream of data instead of a socket? That would make it easier to pass in test data.
The creation of the return value could be harder to mock out, and may not be a problem anyway - is the logic used for the actual population of the resulting object reasonably straightforward (after the parsing)? For instance, is it basically just setting trivial properties? If so, I wouldn't bother trying to introduce a factory etc there - just feed in some test data and check the results.
First, start thinking "the testS", plural, rather than "the test", singular. You should expect to write more than one.
Second, if you have mental block, consider starting with a simpler challenge. Lower the difficulty until it's real easy to do, then move on to more substantial work.
For instance, assume you already have a byte array with the binary data, so you don't even need to think about sockets. All you need to write is something that takes in a byte[] and return an instance of your object. Can you write a test for that ?
If you still have mental block, lower it yet another notch. Assume your byte array is only going to contain default values anyway. So you don't even have to worry about the parsing, just about being able to return an instance of your object that has all values set to the defaults. Can you write a test for that ?
I imagine something like:
public void testFooReaderCanParseDefaultFoo() {
FooReader fr = new FooReader();
Foo myFoo = fr.buildFoo();
assertEquals(0, myFoo.bar());
}
That's rock bottom, right ? You're only testing the Foo constructor. But you can then move up to the next level:
public void testFooReaderGivenBytesBuildsFoo() {
FooReader fr = new FooReader();
byte[] fooData = {1};
fr.consumeBytes(fooData);
Foo myFoo = fr.buildFoo();
assertEquals(1, myFoo.bar());
}
And so on...
'The best testing framework is the application itself'
I believe that a common misconception amongst developers is, they mistakenly make a strong association between testing frameworks and TDD principles. I would advise re-reading the official docs on TDD; bearing in mind that, there is no real relationship between testing frameworks and TDD. After all, TDD is a paradigm not a framework.
Upon reading the wiki on TDD (https://en.wikipedia.org/wiki/Test-driven_development), I've come to realise that to an extent things are a little bit open to interpretation.
There are various personal styles of TDD mainly due to the fact that TDD principles are open to interpretation.
I'm not here to say anyone is wrong, but I would like to share my techniques with you and explain how they have served me well. Bear in mind that I have been programming for 36 years; making my programming habits very well evolved.
Code reuse is over rated. Reuse code too much and you'll end up with bad abstraction and it will become very difficult to fix or change something without it affecting something else. The obvious advantage being less code to manage.
Repeating too much code leads to code management problems and oversized code bases. However it does have the advantage of good separation of concerns (the ability to tweak, change and fix things without affecting other parts of the app).
Don't repeat/refactor too much, don't reuse too much. Code needs to be maintainable. It’s important to understand and respect the balance between code reuse and abstraction/separation of concerns.
When deciding whether to reuse code I base the decision on: .... Will the nature of this code change in context throughout the app codebase? If the answer is no, then I reuse it. If the answer is yes or I'm not sure, I repeat/refactor it. I will however revise my codebases from time to time and see if any of my repeated code can be merged without compromising separation of concerns/abstraction.
As far as my basic programming habits are concerned, I like to write the conditions (if then else switch case etc) first; test them, then fill the conditions with the code and test again. Keep in mind there's no rule that you have to do this in a unit test. I refer to this as the low level stuff.
Once my low level stuff is done, I'll either reuse the code or refactor it into another part of the app, but not after testing it very thoroughly. Problem with repeating/refactoring badly tested code is that, if it’s broken, you have to fix it in multiple places.
BDD To me is a natural follow on from TDD. Once my code base is well tested I can easily tweak behaviours by moving entire blocks of code around. Cool thing is about my programming habits is that sometimes I move code around and discover useful behaviours that I didn’t even intend. It can sometimes even be useful for rebranding stuff to seem like a completely different code base.
To this end my code bases tend to start out a bit slow and pick up momentum because as I advance toward the end of development I have more and more code to refactor from or reuse.
The advantages for me in the way that I code is that, I am able to take on very high levels of complexity as this is promoted by good separation of concerns. It’s also awesome for writing highly optimised code. However the well optimised code tends to be a bit bloated, but to my knowledge there is no way to write optimized code without a bit of bloating. If the app doesn't need high processor efficiency, there's nothing stopping me from de-bloating my code. I'm of the opinion that server side code should be optimised and most client side code normally doesn't require it.
Going back to the topic of testing frameworks, I use them to just save a bit of compiler time.
As far as following story boards is concerned, that comes naturally to me without actually considering it. I've noticed most devs develop in the natural order of story boards even when they are not available.
As a general separation of concerns strategy, in most apps I separate concerns based on UI forms. For example I’ll reuse code within a form and repeat/refactor across forms. This is only a generalistic rule. There are times when I have to think outside the box. Sometimes repeating code can serve well for making code processor efficient.
As a little addendum to my TDD habits; I do optimizations and fault tolerance last. I will try to avoid using try catch blocks as much as possible and write my code in such a way as to not need them. For example rather than catch a null, I will check for null, or rather than catch an index out of bounds, I will scrutinise my code so that it never happens. I find that error trapping too early in app development, leads to semantic errors (behavioural errors that don't crash the app). Semantic errors can be very hard to trace or even notice.
Well that’s my 10 cents. Hope it helps.
Test Driven Development ?
So, this means you should start with writing a test first.
Write a test which contains the code like 'how you want to use your class'. This class or method that you are going to test with this test, is not even there yet.
For instance, you could write a test first like this:
[Test]
public void CanReadDataFromSocket()
{
SocketReader r = new SocketReader( ... ); // create a socketreader instance which takes a socket or a mock-socket in its constructor
byte[] data = r.Read();
Assert.IsTrue (data.Length > 0);
}
For instance; I'm just making up an example here.
Next, once you're able to read data from a socket, you can start thinking on how you'll parse it, and write a test in where you use the 'Parser' class which takes the data that you've read, and outputs an instance of your data class.
etc...
Knowing where to start writing tests and when to stop writing tests while using TDD, is a common problem when starting out.
I have found that it can sometimes help to write an integration test first. Doing so will help create some of the common objects you will be using. It will also allow you to focus your thoughts and tests, since you will need to start writing tests to make the integration test pass.
When I was starting with TDD, I read these 3 rules by Uncle Bob that really helped me out:
You are not allowed to write any production code unless it is to
make a failing unit test pass.
You are not allowed to write any more of a unit test than is sufficient to fail; and compilation failures are failures.
You are not allowed to write any more production code than is sufficient to pass the one failing unit test.
In a shorter version it would be:
Write only enough of a unit test to fail.
Write only enough production code to make the failing unit test pass.
as you can see, this is very simple.

Resources