Understanding Dijkstra's Mozart programming style - coding-style

I came across this article about programming styles, seen by Edsger Dijsktra. To quickly paraphrase, the main difference is Mozart, when the analogy is made to programming, fully understood (debatable) the problem before writing anything, while Beethoven made his decisions as he wrote the notes out on paper, creating many revisions along the way. With Mozart programming, version 1.0 would be the only version for software that should aim to work with no errors and maximum efficiency. Also, Dijkstra says software not at that level of refinement and stability should not be released to the public.
Based on his views, two questions. Is Mozart programming even possible? Would the software we write today really benefit if we adopted the Mozart style instead?
My thoughts. It seems, to address the increasing complexity of software, we've moved on from this method to things like agile development, public beta testing, and constant revisions, methods that define web development, where speed matters most. But when I think of all the revisions web software can go through, especially during maintenance, when often patches are applied over patches, to then be refined through a tedious refactoring process—the Mozart way seems very attractive. It would at least lessen those annoying software updates, e.g. Digsby, Windows, iTunes, etc., many the result of unforeseen vulnerabilities that require a new and immediate release.
Edit: Refer to the response below for a more accurate explanation of Dijsktra's views.

The Mozart programming style is a complete myth (everybody has to edit and modify their initial efforts), and although "Mozart" is essentially a metaphor in this example, it's worth noting that Mozart was substantially a myth himself.
Mozart was a supposed magical child prodigy who composed his first sonata at 4 (he was actually 6, and it sucked - you won't ever hear it performed anywhere). It's rarely mentioned, of course, that his father was considered Europe's greatest music teacher, and that he forced all of his children to practice playing and composition for hours each day as soon as they could pick up an instrument or a pen.
Mozart himself was careful to perpetuate the illusion that his music emerged whole from his mind by destroying most of his drafts, although enough survive to show that he was an editor like everyone else. Beethoven was just more honest about the process (maybe because he was deaf and couldn't tell if anyone was sneaking up on him anyway).
I won't even mention the theory that Mozart got his melodies from listening to songbirds. Or the fact that he created a system that used dice to randomly generate music (which is actually pretty cool, but might also explain how much of Mozart's music appeared to come from nowhere).
The moral of the story is: don't believe the hype. Programming is work, followed by more work to fix the mistakes you made the first time around, followed by more work to fix the mistakes you made the second time around, and so on and so forth until you die.

It doesn't scale.
I can figure out a line of code in my head, a routine, and even a small program. But a medium program? There are probably some guys that can do it, but how many, and how much do they cost? And should they really write the next payroll program? That's like wasting Mozart on muzak.
Now, try to imagine a team of Mozarts. Just for a few seconds.
Still it is a powerful instrument. If you can figure out a whole line in your head, do it. If you can figure out a small routine with all its funny cases, do it.
On the surface, it avoids going back to the drawing board because you didn't think of one edge case that requires a completely different interface altogether.
The deeper meaning (head fake?) can be explained by learning another human language. For a long time you thinking which words represent your thoughts, and how to order them into a valid sentence - that transcription costs a lot of foreground cycles.
One day you will notice the liberating feeling that you just talk. It may feel like "thinking in a foregin language", or as if "the words come naturally". You will sometimes stumble, looking for a particular word or idiom, but most of the time translation runs in the vast ressources of the "subconcious CPU".
The "high goal" is developing a mental model of the solution that is (mostly) independent of the implementation language, to separate solution of a problem from transcribing the problem. Transcription is easy, repetetive and easily trained, and abstract solutions can be reused.
I have no idea how this could be taught, but "figuring out as much as possible before you start to write it" sounds like good programming practice towards that goal.

A classic story from Usenet, about a true programming Mozart.
Real Programmers write in Fortran.
Maybe they do now, in this decadent
era of Lite beer, hand calculators and
"user-friendly" software but back in
the Good Old Days, when the term
"software" sounded funny and Real
Computers were made out of drums and
vacuum tubes, Real Programmers wrote
in machine code. Not Fortran. Not
RATFOR. Not, even, assembly language.
Machine Code. Raw, unadorned,
inscrutable hexadecimal numbers.
Directly.
Lest a whole new generation of
programmers grow up in ignorance of
this glorious past, I feel duty-bound
to describe, as best I can through the
generation gap, how a Real Programmer
wrote code. I'll call him Mel, because
that was his name.
I first met Mel when I went to work
for Royal McBee Computer Corp., a
now-defunct subsidiary of the
typewriter company. The firm
manufactured the LGP-30, a small,
cheap (by the standards of the day)
drum-memory computer, and had just
started to manufacture the RPC-4000, a
much-improved, bigger, better, faster
-- drum-memory computer. Cores cost too much, and weren't here to stay,
anyway. (That's why you haven't heard
of the company, or the computer.)
I had been hired to write a Fortran
compiler for this new marvel and Mel
was my guide to its wonders. Mel
didn't approve of compilers.
"If a program can't rewrite its own
code," he asked, "what good is it?"
Mel had written, in hexadecimal, the
most popular computer program the
company owned. It ran on the LGP-30
and played blackjack with potential
customers at computer shows. Its
effect was always dramatic. The LGP-30
booth was packed at every show, and
the IBM salesmen stood around talking
to each other. Whether or not this
actually sold computers was a question
we never discussed.
Mel's job was to re-write the
blackjack program for the RPC-4000.
(Port? What does that mean?) The new
computer had a one-plus-one addressing
scheme, in which each machine
instruction, in addition to the
operation code and the address of the
needed operand, had a second address
that indicated where, on the revolving
drum, the next instruction was
located. In modern parlance, every
single instruction was followed by a
GO TO! Put that in Pascal's pipe and
smoke it.
Mel loved the RPC-4000 because he
could optimize his code: that is,
locate instructions on the drum so
that just as one finished its job, the
next would be just arriving at the
"read head" and available for
immediate execution. There was a
program to do that job, an "optimizing
assembler", but Mel refused to use it.
"You never know where it's going to
put things", he explained, "so you'd
have to use separate constants".
It was a long time before I understood
that remark. Since Mel knew the
numerical value of every operation
code, and assigned his own drum
addresses, every instruction he wrote
could also be considered a numerical
constant. He could pick up an earlier
"add" instruction, say, and multiply
by it, if it had the right numeric
value. His code was not easy for
someone else to modify.
I compared Mel's hand-optimized
programs with the same code massaged
by the optimizing assembler program,
and Mel's always ran faster. That was
because the "top-down" method of
program design hadn't been invented
yet, and Mel wouldn't have used it
anyway. He wrote the innermost parts
of his program loops first, so they
would get first choice of the optimum
address locations on the drum. The
optimizing assembler wasn't smart
enough to do it that way.
Mel never wrote time-delay loops,
either, even when the balky
Flexowriter required a delay between
output characters to work right. He
just located instructions on the drum
so each successive one was just past
the read head when it was needed; the
drum had to execute another complete
revolution to find the next
instruction. He coined an
unforgettable term for this procedure.
Although "optimum" is an absolute
term, like "unique", it became common
verbal practice to make it relative:
"not quite optimum" or "less optimum"
or "not very optimum". Mel called the
maximum time-delay locations the "most
pessimum".
After he finished the blackjack
program and got it to run, ("Even the
initializer is optimized", he said
proudly) he got a Change Request from
the sales department. The program used
an elegant (optimized) random number
generator to shuffle the "cards" and
deal from the "deck", and some of the
salesmen felt it was too fair, since
sometimes the customers lost. They
wanted Mel to modify the program so,
at the setting of a sense switch on
the console, they could change the
odds and let the customer win.
Mel balked. He felt this was patently
dishonest, which it was, and that it
impinged on his personal integrity as
a programmer, which it did, so he
refused to do it. The Head Salesman
talked to Mel, as did the Big Boss
and, at the boss's urging, a few
Fellow Programmers. Mel finally gave
in and wrote the code, but he got the
test backwards, and, when the sense
switch was turned on, the program
would cheat, winning every time. Mel
was delighted with this, claiming his
subconscious was uncontrollably
ethical, and adamantly refused to fix
it.
After Mel had left the company for
greener pa$ture$, the Big Boss asked
me to look at the code and see if I
could find the test and reverse it.
Somewhat reluctantly, I agreed to
look. Tracking Mel's code was a real
adventure.
I have often felt that programming is
an art form, whose real value can only
be appreciated by another versed in
the same arcane art; there are lovely
gems and brilliant coups hidden from
human view and admiration, sometimes
forever, by the very nature of the
process. You can learn a lot about an
individual just by reading through his
code, even in hexadecimal. Mel was, I
think, an unsung genius.
Perhaps my greatest shock came when I
found an innocent loop that had no
test in it. No test. None. Common
sense said it had to be a closed loop,
where the program would circle,
forever, endlessly. Program control
passed right through it, however, and
safely out the other side. It took me
two weeks to figure it out.
The RPC-4000 computer had a really
modern facility called an index
register. It allowed the programmer to
write a program loop that used an
indexed instruction inside; each time
through, the number in the index
register was added to the address of
that instruction, so it would refer to
the next datum in a series. He had
only to increment the index register
each time through. Mel never used it.
Instead, he would pull the instruction
into a machine register, add one to
its address, and store it back. He
would then execute the modified
instruction right from the register.
The loop was written so this
additional execution time was taken
into account -- just as this
instruction finished, the next one was
right under the drum's read head,
ready to go. But the loop had no test
in it.
The vital clue came when I noticed the
index register bit, the bit that lay
between the address and the operation
code in the instruction word, was
turned on-- yet Mel never used the
index register, leaving it zero all
the time. When the light went on it
nearly blinded me.
He had located the data he was working
on near the top of memory -- the
largest locations the instructions
could address -- so, after the last
datum was handled, incrementing the
instruction address would make it
overflow. The carry would add one to
the operation code, changing it to the
next one in the instruction set: a
jump instruction. Sure enough, the
next program instruction was in
address location zero, and the program
went happily on its way.
I haven't kept in touch with Mel, so I
don't know if he ever gave in to the
flood of change that has washed over
programming techniques since those
long-gone days. I like to think he
didn't. In any event, I was impressed
enough that I quit looking for the
offending test, telling the Big Boss I
couldn't find it. He didn't seem
surprised.
When I left the company, the blackjack
program would still cheat if you
turned on the right sense switch, and
I think that's how it should be. I
didn't feel comfortable hacking up the
code of a Real Programmer.

Edsger Dijkstra discusses his views on Mozart vs Beethoven programming in this YouTube video entitled "Discipline in Thought".
People in this thread have pretty much discussed how Dikstra's views are impractical. I'm going to try and defend him some.
Dijkstra is against companies
essentially "testing" their software
on their customers. Releasing
version 1.0 and then immediately
patch 1.1. He felt that the program
should be polished to a degree that
"hotfix" patches are borderline
unethical.
He did not think that software should be written in one fell swoop or that changes would never need to be made. He often discusses his design ideals, one of them being modularity and ease of change. He often thought that individual algorithms should be written in this way however, after you have completely understood the problem. That was part of his discipline.
He found after all his extensive experience with programmers, that programmers aren't happy unless they are pushing the limits of their knowledge. He said that programmers didn't want to program something they completely and 100% understood because there was no challenge in it. Programmers always wanted to be on the brink of their knowledge. While he understood why programmers are like that he stated that it wasn't representative of low-error tolerance programming.
There are some industries or applications of programming that I believe Dijkstra's "discipline" are warranted as well. NASA Rovers, Health Industry embedded devices (ie dispense medication, etc), certain Financial software that transfer our money. These areas don't have the luxuries of incremental change after release and a more "Mozart Approach" is necessary.

I think the Mozart story confuses what gets shipped versus how it is developed. Beethoven did not beta-test his symphonies on the public. (It would be interesting to see how much he changed any of the scores after the first public performance.)
I also don't think that Dijkstra was insisting that it all be done in your head. After all, he wrote books on disciplined programming that involved working it out on paper, and to the same extent that he wanted to see mathematical-quality discipline, have you noticed how much paper and chalk board mathematicians may consume while working on a problem?
I favor Simucal's response, but I think the Mozart-Beethoven metaphor should be discarded. That shoe-horns Dijkstra's insistence on discipline and understanding into a corner where it really doesn't belong.
Additional Remarks:
The TV popularization is not so hot, and it confuses some things about musical composition and what a composer is doing and what a programmer is doing. In Dijkstra's own words, from his 1972 Turing Award Lecture: "We must not forget that it is not our business to make programs; it is our business to design classes of computations that will display a desired behavior." A composer may be out to discover the desired behavior.
Also, in Dijkstra's notion that version 1.0 should be the final version, we too easily confuse how desired behavior and functionality evolve over time. I believe he oversimplifies in thinking that all future versions are because the first one was not thought out and done rigorously and reliably.
Even without time-to-market urgency, I think we now understand much better that important kinds of software evolve along with the users experience with it and the utilitarian purpose they have for it. Obvious counter-examples are games (also consider how theatrical motion pictures are developed). Do you think Beethoven could have written Symphony No. 9 without all of his preceding experience and exploration? Do you think the audience could have heard it for what it was? Should he have waited until he had the perfect Sonata? I'm sure Dijkstra doesn't propose this, but I do think he goes too far with Mozart-Beethoven to make his point.
In addition, consider chess-playing software. The new versions are not because the previous ones didn't play correctly. It is about exploiting advances in chess-playing heuristics and the available computer power. For this and many other situations, the idea that version 1.0 be the final version is off base. I understand that he is rightfully objecting to the release of known-to-be unreliable and maybe impaired software with deficiencies to be made up in maintenance and future releases. But the Mozartian counter-argument doesn't hold up for me.
So, did Dijkstra continue to drive the first automobile he purchased, or clones of exactly that automobile? Maybe there is planned obsolescence, but a lot of it has to do with improvements and reliability that could not have possibly been available or even considered in previous generations of automotive technology.
I am a big Dijkstra fan, but I think the Mozart-Beethoven thing is way too simplistic as well as inappropriate. I am a big Beethoven fan too.

I think it's possible to appear to employ Mozart programming. I know of one company, Blizzard, that doesn't release a software product until they're good and ready. This doesn't mean that Diablo 3 will spring whole and complete from someone's mind in one session of dazzlingly brilliant coding. It does mean that that's how it will appear to the rest of us. Blizzard will test the heck out of their game internally, not showing it to the rest of the world until they've got all the kinks worked out. Most companies don't take this approach, preferring instead to release software when it's good enough to solve a problem, then fix bugs and add features as they come up. This approach works (to varying degrees) for most companies.

Well, we can't all be as good as Mozart, can we? Perhaps Beethoven programming is easier.

If Apple adopted "Mozart" programming, there would be no Mac OS X or iTunes today.
If Google adopted "Mozart" programming, there would be no Gmail or Google Reader.
If SO developers adopted "Mozart" programming, there would be no SO today.
If Microsoft adopted "Mozart" programming, there would be no Windows today (well, I think that would be good).
So the answer is simply NO. Nothing is perfect, and nothing is ever meant to be perfect, and that includes software.

I think the idea is to plan ahead. You need to at least have some kind of outline of what you are trying to do and how you plan to get there. If you just sit down at the keyboard and hope "the muse" will lead you to where your program needs to go, the results are liable to be rather uneven, and it will take you much longer to get there.
This is true with any kind of writing. Very few authors just sit down at a typewriter with no ideas and start banging away until a bestselling novel is produced. Heck, my father-in-law (a high school English teacher) actually writes outlines for his letters.

Progress in computing is worth a sacrifice in glory or genius here and there.

Related

How to draw a tiger with just 3 lines?

Background:
An art teacher once gave me a design problem to draw a tiger using only 3 lines. The idea being that I study a tiger and learn the 3 lines to draw for people to still be able to tell it is a tiger.
The solution for this problem is to start with a full drawing of a tiger and remove elements until you get to the three parts that are most recognizable as a tiger.
I love this problem as it can be applied in multiple disciplines like software development, especially in removing complexity.
At work I deal with maintaining a large software system that has been hacked to death and is to the point of becoming unmaintainable. It is my job to remove the burdensome complexity that was caused by past developers.
Question:
Is there a set process for removing complexity in software systems - a kind of reduction process template to be applied to the problem?
Check out the book Refactoring by Martin Fowler, and his http://www.refactoring.com/ website.
Robert C. Martin's Clean Code is another good resource for reducing code complexity.
Unfortunately, the analogy with the tiger drawing may not work very well. With only three lines, a viewer can imagine the rest. In a software system, all the detail has to actually be there. You generally can't take much away without removing something essential.
Check out the book Anti-Patterns for a well-written book on the whole subject of moving from bad (or maladaptive) design to better. It provides ways to recover from a whole host of problems typically found in software systems. I would then add support to Kristopher's recommendation of Refactoring as an important second step.
Checkout the book, Working Effectively with Legacy Code
The topics covered include
Understanding the mechanics of software change: adding features, fixing bugs, improving design, optimizing performance
Getting legacy code into a test harness
Writing tests that protect you against introducing new problems
Techniques that can be used with any language or platform—with examples in Java, C++, C, and C#
Accurately identifying where code changes need to be made
Coping with legacy systems that aren't object-oriented
Handling applications that don't seem to have any structure
This book also includes a catalog of twenty-four dependency-breaking techniques that help you work with program elements in isolation and make safer changes.
While intellectually stimulating, the concept of detail removal doesn't carry very well (at least as-is) to software programs. The reason being that the drawing is re-evaluated by a human with it ability to accept fuzzy input, whereby the program is re-evaluated by a CPU which is very poor at "filling the blanks". Another more subtle reason is that programs convey a spaciotemporal narrative, whereas the drawing is essentially spacial.
Consequently with software there is much less room for approximation, and for outright removal of particular sections of the code. Never the less, refactoring is the operational keyword and is sometimes applicable even for them most awkward legacy pieces. This discipline is however part art part science and doesn't have very many "quick tricks" that I know of.
Edit: One isn't however completely helpless against legacy code. See for example the excellent book references provided in Alex Baranosky and Kristopher Johnson's answers. These books provide many useful techniques, but on the whole I remain strong in my assertion that refactoring non-trivial legacy code is an iterative process that requires both art and science (and patience and ruthlessness and gentleness ;-) ).
This is a loaded question :-)
First, how do we measure "complexity"? Without any metric decided apriori, it may be hard to justify any "reduction" project.
Second, is the choice entirely yours? If we may take an example, assume that, in some code base, the hammer of "inheritance" is used to solve every other problem. While using inheritance is perfectly right for some cases, it may not be right for all cases. What do you in such cases?
Third, can it be proved that behavior/functionality of the program did not change due to refactoring? (This gets more complex when the code is part of a shipping product.)
Fourth, you can start with start with simpler things like: (a) avoid global variables, (b) avoid macros, (c) use const pointers and const references as much as possible, (d) use const qualified methods wherever it is the logical thing to do. I know these are not refactoring techniques, but I think they might help you proceed towards your goal.
Finally, in my humble opinion, I think any such refactoring project is more of people issue than technology issue. All programmers want to write good code, but the perception of good vs. bad is very subjective and varies across members in the same team. I would suggest to establish a "design convention" for the project (Something like C++ Coding Standards). If you can achieve that, you are mostly done. The remaining part is modify the parts of code which does not follow the design convention. (I know, this is very easy to say, but much difficult to do. Good wishes to you.)

IT evaluating quality of coding - how do we know what's good? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Coming from an IT background, I've been involved with software projects but I'm not a programmer. One of my biggest challenges is that having a lot of experience in IT, people often turn to me to manage projects that include software development. The projects are usually outsourced and there isnt a budget for a full time architect or PM, which leaves me in a position to evaluate the work being performed.
Where I've managed to get through this in the past, I'm (with good reason) uneasy about accepting these responsibilities.
My question is, from a perspective of being technically experienced but not in programming, how can I evaluate whether coding is written well besides just determining if it works or not? Are there methodologies, tips, tricks of the trade, flags, signs, anything that would say - hey this is junk or hey this is pretty damn good?
Great question. Should get some good responses.
Code cleanliness (indented well, file organization, folder structure)
Well commented (not just inline comments, but variables that say what they are, functions that say what they do, etc.)
Small understandable functions/methods (no crazy 300 line methods that do all sorts of things with nested if logic all over the place)
Follows SOLID principles
Is the amount of unit test code similar in size and quality as the code base of the project
Is the interface code separate from the business logic code which in turn should be separate from the infrastructure access code (email, database, web services, file system, etc.)
What does a performance analysis tool think of the code (NDepend, NDoc, NCover, etc.)
There is a lot more to this...but this gets your started.
Code has 2 primary audiences:
The people who use it
The people who develop it
So you neeed 2 simple tests:
Run the code. Can you get it to do the job it is supposed to do?
Read the code. Can you understand the general intentions of the developer?
If you can answer yes to both of these, it is great code.
When reading the code, don't worry that you are not a programmer. If code is well written / documented, even a non-programmer should be able to see guess much of what it is intended to achieve.
BTW: Great question! I wish more non-programmers cared about code quality.
First, set ground rules (that all programmers sign up to) that say what's 'good' and what isn't. Automate tests for those that you can measure (e.g. functions less than a number of lines, McCabe complexity, idioms that your coders find confusing). Then accept that 'good coding' is something you know when you see rather than something you can actually pin down with a set of rules, and allow people to deviate from the standard provided they get agreement from someone with more experience. Similarly, such standards have to be living documents, adapted in the face of feedback.
Code reviews also work well, since not all such 'good style' rules can be automatically determined. Experienced programmers can say what they don't like about inexperienced programmers' code - and you have to get the original authors to change it so that they learn from their mistakes - and inexperienced programmers can say what they find hard to understand about other people's code - and, by being forced to read other people's code, they'll also learn new tricks. Again, this will give you feedback on your standard.
On some of your specific points, complexity and function size work well, as does code coverage during repeatable (unit) testing, but that last point comes with a caveat: unless you're working on something where high quality standards are a necessity (embedded code, as an example, or safety-critical code) 100% code coverage means you're testing the 10% of code paths that are worthwhile to test and the 90% that almost never get coded wrong in the first place. Worthwhile tests are the ones that find bugs and improve maintainability.
I think it's great you're trying to evaluate something that typically isn't evaluated. There have been some good answers above already. You've already shown yourself to be more mature in dealing with software by accepting that since you don't practice development personally, you can't assume that writing software is easy.
Do you know a developer whose work you trust? Perhaps have that person be a part of the evaluation process.
how can I evaluate whether coding is written well
There are various ways/metrics to define 'well'or 'good', for example:
Delivered on time
Delivered quickly
No bugs after delivery
Easy to install
Well documented
Runs quickly
Uses cheap hardware
Uses cheap software
Didn't cost much to write
Easy to administer
Easy to use
Easy to alter (i.e. add new features)
Easy to port to new hardware
...etc...
Of these, programmers tend to value "easy to alter": because, their job is to alter existing software.
Its a difficult one and could be where your non-functional requirements will help you
specify your performance requirements: transactions per second, response time, expected DB records over time,
require the delivery to include outcome from a performance analysis tool
specify the machine the application will be running on, you should not have to upgrade your hardware to run the app
For eyeballing the code and working out whether or not its well written its tougher, the answers from #Andrew & #Chris cover it pretty much... you want code that looks good, is easy to maintain and is performant.
Summary
Use Joel Test.
Why?
Thanks for tough question. I was about to write a long answer on merits of direct and indirect code evaluation, understanding your organisational context, perspective, figuring out a process and setting a criteria for code to be good enough, and then the difference between the code being perfect and just good enough which still might mean “very impressive”. I was about to refer to Steve McConnell’s Code Complete and even suggest delegating code audit to someone impartial you can trust, who is savvy enough business and programming-wise to get a grasp of the context, perspective, apply the criteria sensibly and report results neatly back to you. I was going to recommend looking at parts of UI that are normally out of end-user reach in the same way as one would be judging quality of cleaning by checking for dirt in hard-to-reach places.
Well, and then it struck me: what is the end goal? In most, but very few edge cowboy-coding scenarios, as a result of the audit you’re likely to discover that the code is better than junk, but certainly not damn good, maybe just slightly below the good enough mark. And then what is next? There are probably going to be a few choices:
Changing the supplier.
Insisting on the code being re-factored.
Leaving things as they are and from that point on demanding better code.
Unfortunately, none of the options is ideal or very good either. Having made an investment changing supplier is costly and quite risky: part of the software conceptual integrity will be lost, your company will have to, albeit indirectly, swallow the inevitable cost of the new supplier taking over the development and going through the learning curve (exactly opposite to that most suppliers are going to tell you to try and get their foot in the door). And there is going to be a big risk of missing the original deadlines.
The option of insisting on code re-factoring isn’t perfect either. There is going to be a question of cost and it’s very likely that for various contractual and historical reasons you won’t find yourself in a good negotiation position. In any case re-writing software is likely to affect deadlines and the organisation what couldn’t do the job right the first time is very unlikely to produce much better code on the second attempt. The latter is pertinent to the third option I would be dubious of any company producing a better code without some, often significant, organisational change. Leaving things as they are not good either: a piece of rotten code unless totally isolated is going to eventually poison the rest of the source.
This brings me to the actual conclusion, or in fact two:
Concentrate on picking the right software company in a first place, since going forward options are going to be somewhat constrained.
Make use of IT and management knowledge to pick a company that is focused on attracting and retaining good developers, that creates a working environment and culture fit for production of good quality code instead of relying on the post factum analysis.
It’s needless to expand on the importance of choosing the right company in the first place as opposed to summative evaluation of delivered project; hopefully the point is already made.
Well, how do we know the software company is right? Here I fully subscribe to the philosophy evangelised by Joel Spolsky: quality of software directly depends on quality of people involved which as it has been indicated by several studies can vary by an order of magnitude. And through the workings of free markets developers end up clustered in companies based on how much a particular company cares about attracting and retaining them.
As a general rule of life, best programmers end up working with the best, good with good, average with average and cowboy coders with other cowboy coders. However, there is a caveat. Most companies would have at least one or two very good developers they care about and try their hardest to retain. These devs are always put on a frontline: to fire fight, to lure a customer, to prove the organisation potential and competence. Working amongst more than average colleagues, overstretched between multiple projects, and being treated as royalty, sadly, these star programmers very often loose touch with the reality and become prima donnas who won’t “dirty” their hands with any actual programming work.
Unfortunately, programming talent doesn’t scale and it’s unlikely that the prima donna is going to work on your project past the initial phase designed to lure and lock in you as a customer. At the end the code is going to be produced by a less talented colleague and as a result you’ll get what you’ll get.
The solution is to look for a company there developer talents are more consistent and everyone is at least good enough to produce the right quality of code. And when it comes to choosing such an organisation that’s where Joel Test comes mighty handy. I believe it’s especially suitable for application by someone who has no programming experience but good understanding of IT and management.
The more points company scores on the Joel Test the more it’s likely to attract and retain good developers and most importantly provide them with the conditions to produce quality code. And since most great devs are actually in love with programming all the need is to be teamed up, given good and supportive work environment, a credible goal (or even better incredible) and they’ll start chucking out high quality code. It’s that simple.
Well, the only thing is that company that scores full twelve points on the Joel’s Test is likely to charge more than a sweatshop that scores a mere 3 or 5 (a self-estimated industry average). However, the benefits of having the synergy of efficient operations and bespoke trouble-free software that leverage strategic organisational goals will undoubtedly produce exceptional return on investment and overcome any hurdle rates by far outweighing any project costs. I mean, at the end of the day the company's work will likely be worth the money, every penny of it.
Also hope that someone will find this longish answer worthwhile.

Software projects and development in a research environment [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
What are useful strategies to adopt when you or the project does not have a clear idea of what the final (if any) product is going to be?
Let us take "research" to mean an exploration into an area where many things are not known or implemented and where a formal set of deliverables cannot be specified at the start of the project. This is common in STEM (science (physics, chemistry, biology, materials, etc.), technology engineering, medicine) and many areas of informatics and computer science. Software is created either as an end in itself (e.g. a new algorithm), a means of managing data (often experimental) and simulation (e.g. materials, reactions, etc.). It is usually created by small groups or individuals (I omit large science such as telescopes and hadron colliders where much emphasis is put of software engineering.)
Research software is characterised by (at least):
unknown outcome
unknown timescale
little formal project management
limited budgets (in academia at least)
unpredictability of third-party tools and libraries
changes in the outside world during the project (e.g. new discoveries which can be positive - save effort - or negative - getting scooped
Projects can be anything from days ("see if this is a worthwhile direction to go") to years ("this is my PhD topic") or longer. Frequently the people are not hired as software people but find they need to write code to get the research done or get infected by writing software. There is generally little credit for good software engineering - the "product" is a conference or journal publication.
However some of these projects turn out to be highly valuable - the most obvious area is genomics where in the early days scientists showed that dynamic programming was a revolutionary tool to help thinking about protein and nucleic structure - now this is a multi-billion industry (or more). The same is true for quantum mechanics codes to predict properties of substances.
The downside is that much code gets thrown away and it is difficult to build on. To try to overcome this we have build up libraries which are shared in the group and through the world as Open Source (but here again there is very little credit given). Many researchers reinvent the wheel ("head-down" programming where colleagues are not consulted and "hero" programming where someone tries to do the whole lot themself).
Too much formality at the start of a project often puts people off and innovation is lost (no-one will spend 2 months writing formal specs and unit tests). Too little and bad habits are developed and promulgated. Programming courses help but again it's difficult to get people doing them especially when you rely on their goodwill. Mentoring is extremely valuable but not always successful.
Are there online resources which can help to persuade people into good software habits?
EDIT: I'm grateful for dmckee (below) for pointing out a similar discussion. It's all good stuff and I particularly agree with version control as being one of the most important things that we can offer scientists (we offered this to our colleagues and got very good takeup). I also like the approach of the Software Carpentry course mentioned there.
It's extremely difficult. The environment both you and Stefano Borini describe is very accurate. I think there are three key factors which propagate the situation.
Short-term thinking
Lack of formal training and experience
Continuous turnover of grad students/postdocs to shoulder the brunt of new development
Short-term thinking. There are a few reasons that short-term thinking is the norm, most of them already well explained by Stefano. As well as the awful pressure to publish and the lack of recognition for software creation, I would emphasise the number of short-term contracts. There is simply very little advantage for more junior academics (PhD students and postdocs) to spend any time planning long-term software strategies, since contracts are 2-3 years. In the case of longer-term projects e.g. those based around the simulation code of a permanent member of staff, I have seen some applications of basic software engineering, things like simple version control, standard test cases, etc. However even in these cases, project management is extremely primitive.
Lack of formal training and experience. This is a serious handicap. In astronomy and astrophysics, programming is an essential tool, but understanding of the costs of development, particularly maintenance overheads, is extremely poor. Because scientists are normally smart people, there is a feeling that software engineering practices don't really apply to them, and that they can 'just make it work'. With more experience, most programmers realise that writing code that mostly works isn't the hard part; maintaining and extending it efficiently and safely is. Some scientific code is throwaway, and in these cases the quick and dirty approach is adequate. But all too often, the code will be used and reused for years to come, bringing consequent grief to all involved with it.
Continuous turnover of grad students/postdocs for new development. I think this is the key feature that allows the academic approach to software to continue to survive. If the code is horrendous and takes days to understand and debug, who pays that price? In general, it's not the original author (who has probably moved on). Nor is it the permanent member of staff, who is often only peripherally involved with new development. It is normally the graduate student who is implementing new algorithms, producing novel approaches, trying to extend the code in some way. Sometimes it will be a postdoc, hired specifically to work on adding some feature to an existing code, and contractually obliged to work on this area for some fraction of their time.
This model is hugely inefficient. I know a PhD student in astrophysics who spent over a year trying to implement a relatively basic piece of mathematics, only a few hundred lines of code, in an existing n-body code. Why did it take so long? Because she literally spent weeks trying to understand the existing, horribly written code, and how to add her calculation to it, and months more ineffectively debugging her problems due to the monolithic code structure, coupled with her own lack of experience. Note that there was almost no science involved in this process; just wasting time grappling with code. Who ultimately paid that price? Only her. She was the one who had to put more hours in to try and get enough results to make a PhD. Her supervisor will get another grad student after she's gone - and so the cycle continues.
The point I'm trying to make is that the problem with the software creation process in academia is endemic within the system itself, a function of the resources available and the type of work that is rewarded. The culture is deeply embedded throughout academia. I don't see any easy way of changing that culture through external resources or training. It's the system itself that needs to change, to reward people for writing substantial code, to place increased scrutiny on the correctness of results produced using scientific code, to recognise the importance of training and process in code, and to hold supervisors jointly responsible for wasting the time of the members of their research group.
I'll tell you my experience.
It is undoubt that a lot of software gets created and wasted in the academia. Fact is that it's difficult to adapt research software, purposely created for a specific research objective, to a more general environment. Also, the product of academia are scientific papers, not software. The value of software in academia is zero. The data you produce with that software is evaluated, once you write a paper on it (which takes a lot of editorial time).
In most cases, however, research groups have recognized frequent patterns, which can be polished, tested and archived as internal knowledge. This is what I do with my personal toolkit. I grow it according to my research needs, only with those features that are "cross-project". Developing a personal toolkit is almost a requirement, as your scientific needs are most likely unique for some verse (otherwise you would not be doing research) and you want to have as low amount of external dependencies as possible (since if something evolves and breaks your stuff, you will not have the time to fix it).
Everything else, however, is too specific for a given project to be crystallized. I therefore tend not to encapsulate something that is clearly a one-time solver. I do, however, go back and improve it if, later on, other projects require the same piece of code.
Short project span, and the heat of research (e.g. the publish or perish vision so central today), requires agile, quick languages, and in general, languages that can be grasped quickly. Ph.Ds in genomics and quantum chemistry don't have formal programming background. In some cases, they don't even like it. So the language must be quick, easy, clean, flexible, and easy to understand later on. The latter point is capital, as there's no time to produce documentation, and it's guaranteed that in academia, everyone will leave sooner or later, you burn the group experience to zero every three years or so. Academia is a high risk industry that periodically fires all their hard-formed executors, keeping only some managers. Having a code that is maintainable and can be easily grasped by someone else is therefore capital. Also, never underestimate the power of a google search to solve your problems. With a well deployed language you are more likely to find answers to gotchas and issues you can stumble on.
Management is a problem as well. Waterfall is out of discussion. There is no time for paperwork programming (requirements, specs, design). Spiral is quite ok, but as low paperwork as possible is clearly recommended. Fact is that anything that does not give you an article in academia is wasted time. If you spend one month writing specs, it's a month wasted, and your contract expires in 11 months. Moreover, that fatty document counts zero or close to zero for your career (as many other things: administration and teaching are two examples). Of course, Agile methods are also out of discussion. Most development is made by groups that are far, and in general have a bunch of other things to do as well. Coding concentration comes in brief bursts during "spare time" between articles, and before or after meetings. The bazaar is the most likely, but the bazaar carries a lot of issues as well.
So, to answer your question, the best strategy is "slow accumulation" of known good software, development in small bursts with a quick and agile method and language. Good coding practices need to be taught during lectures, as good laboratory practices are taught during practical courses (eg. never put water in sulphuric acid, always the opposite)
The hardest part is the transition between "this is just for a paper" and "we're really going to use this."
If you know that the code will only be for a paper, fine, take short cuts. Hardcode everything you can. Don't waste time on extensive validation if the programmer is the only one who will ever run the code. Etc. The problem is when someone says "Great! Now let's use this for real" or "Now let's use it for this entirely different scenario than what it was developed and tested for."
A related challenge is having to explain why the software isn't ready for prime time even though it obviously works, i.e. it's prototype quality and not production quality. What do you mean you need to rewrite it?
I would recommend that you/they read "Clean Code"
http://www.amazon.co.uk/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882/ref=sr_1_1?ie=UTF8&s=books&qid=1251633753&sr=8-1
The basic idea of this book is that if you do not keep the code "clean", eventually the mess will stop you from making any progress.
The kind of big science I do (particle physics) has a small number of large, long-running projects (ROOT and Geant4, for instance). These are developed mostly by actual programming professionals. Using processes that would be recognized by anyone else in the industry.
Then, each collaboration has a number of project-wide programs which are developed collaboratively under the direction of a small number of senior programming scientists. These use at least the basic tools (always version control, often some kind of bug tracking or automated builds).
Finally almost every scientist works on their own programs. Use of process on these programs is very spotty, and they often suffer from all the ills that others have discussed (short lifetimes, poor coding skills, no review, lots of serial maintainers, Not Invented Here Syndrome, etc. etc.). The only advantage that is available here compared to small group science, is that they work with the tools I talked about above, so there is something that you can point to and say "That is what you want to achieve.".
Don't really have that much more to add to what has already been said. It's a difficult balance to strike because our priorities are different - academia is all about discovering new things, software engineering is more about getting things done according to specifications.
The most important thing I can think of is to try and extricate yourself from the culture of in-house development that goes on in academia and try to maintain a disciplined approach to development, difficult as that may be in many cases owing to time restraints, lack of experience etc. This control-freakery sucks away at individual responsibility and decision-making and leaves it in the hands of a few who do not necessarily know best
Get a good book on software development, Code Complete already mention is excellent, as well as any respected book on algorithms and data structures. Read up on how you will need to manage your data eg do you need fast lookup / hash-tables / binary trees. Don't reinvent the wheel - use the libraries and things like STL otherwise you are likely to be wasting time. There is a vast amount on the web including this very fine blog.
Many academics, besides sometimes being primadonna-ish and precious about any approach seen as businesslike, tend to be quite vague in their objectives. To put it mildly. For this reason alone it is vital to build up your own software arsenal of helper functions and recipes, eventually, hopefully ending up with a kind of flexible experimental framework that enables you to try out any combination of things without being to restricted to any particular problem area. Strongly resist the temptation to just dive into the problem at hand.

What are some good strategies to fix bugs as code becomes more complex?

I'm "just" a hobbyist programmer, but I find that as my programs get longer and longer the bugs get more annoying--and harder to track. Just when everything seems to be running smoothly, some new problem will appear, seemingly spontaneously. It may take me a long time to figure out what caused the problem. Other times I'll add a line of code, and it'll break something in another unit. This can get kind of frustrating if I thought everything was working well.
Is this common to everyone, or is it more of a newbie kind of thing? I hear about "unit testing," "design frameworks," and various other concepts that sound like they would decrease bugginess, make my apps "robust," and everything easy to understand at a glance :)
So, how big a deal are bugs to people with professional training?
Thanks -- Al C.
The problem of "make a fix, cause a problem elsewhere" is very well known, and is indeed one of the primary motivations behind unit testing.
The idea is that if you write exhaustive tests for each small part of your system independently, and run them on the entire system every time you make a change anywhere, you will see the problem immediately. The main benefit, however, is that in the process of building these tests you'll also be improving your code to have less dependencies.
The typical solution to these sort of problems is to reduce coupling; make different parts less dependent on one another. More experienced developers sometimes have habits or design skills to build systems in this manner. For example, we use interfaces and implementations rather than classes; we use model-view-controller for user interfaces, etc. In addition, we can use tools that help further reduce dependencies, like "Dependency injection" and aspect oriented programming.
All programmers make mistakes. Good and experienced programmers build their programs so that it is easier to find the mistakes and restrict their effects.
And it is a big deal for everyone. Most companies spend more time on maintenance than on writing new code.
Are you automating your tests? If you do not, you're signing up creating bugs without finding them.
Are you adding tests for bugs as you fix them? If you do not, you are signing up for creating the same bugs over and over.
Are you writing unit tests? If not, you are signing up for long debugging sessions when a test fails.
Are you writing your unit tests first? If not, your unit tests will be hard to write when your units are tightly coupled.
Are you refactoring mercilessly? If not, every edit will become more difficult and more likely to introduce bugs. (But make sure you have good tests, first.)
When you fix a bug, are you fixing the entire class? Don't just fix the bug; don't just fix similar bugs throughout your code; change the game so you can never create that kind of bug again.
Bugs are a big deal to everyone. I've always found that the more I program, the more I learn about programming in general. I cringe at the code I wrote a few years back!! I started out as a hobbyist and liked it so much that I went to engineering college to get a Computer Science Engineering major (I am in my final semester). These are the things that I have learned :
I take time to actually design what I am going to write and document the design. It really eliminates a lot of problems down the line. Whether the design is as simple as writing down a few points on what I am going to write or full blown UML modeling (:( ) doesn't matter. Its the clarity of thought and purpose and having material to look back at when I come back to the code after a while that matter the most.
No matter what language I write in, keeping my code simple and readable is important. I think that it is extremely important not to over complicate the code and at the same time not to over simplify it. (Hard learned lesson!!)
Efficiency optimizations and fancy tricks should be applied at the end, only when necessary and only if they are needed. Another thing is that I apply them only If I really know what I am doing and I always test my code!
Learning language dependant details helps me keep my code bug free. For instance I learned that scanf() is evil in C!
Others have already commented on the zen of writing tests. I would like to add that you should always do regression tests. (i.e. Write new code, test all parts of your code to see if it breaks)
Keeping a mental picture of code is hard at times, so I always document my code.
I use methods to make sure that there is a bare minimum dependence between different parts of my code. Interfaces, class hierarchies etc. (Decoupled design)
Thinking before I code and being disciplined in whatever I write is another crucial skill. I know people who don't format their code so its readable (Shudder!).
Reading other peoples source to learn best practices is good. Making my own list is better!. When working in a team, there must be a common set of them.
Don't be paralyzed by analysis. Write tests, then code, then execute and test. Rinse wash repeat!
Learning to read over my own code and combing it for mistakes is important. Improving my arsenal of debugging skills was a great investment. I keep them sharp by helping my classmates fix bugs regularly.
When there is a bug in my code, I assume its my mistake, not the computers and work from there. That is a state of mind that really helps me.
A fresh pair of eyes aids in debugging. Programmers tend to miss even the most obvious errors in their own code when exhausted. Having someone to show your code to is great.
having someone to throw ideas at and not be judged is important. I talk to my mom (who is not a programmer) , throw ideas at her and find solutions. She helps me bounce my ideas back and forth and refine them. If she is unavailable, I talk to my pet cat.
I am not so be discouraged by bugs anymore. I've learned to love removing bugs almost as much as programming.
Using version control has really helped me manage different ideas I get while coding. That helps reduce errors. I recommend using git or any other version control system you might like.
As Jay Bazzuzi said - Refactor code. I just added this point after reading his answer, to keep my list complete. All credit goes to him.
Try to write reusable code. Reuse code, both yours and from libraries. Using libraries which are bug free to do some common tasks really reduces bugs (sometimes).
I think the following quote says it best - "If debugging is the art of removing bugs, programming must be the art of putting them in."
No offense to anyone who disagrees. I hope this answer helps.
Note
As others Peter has pointed out, use Object Oriented Programming if you are writing a large amount of code. There is a limit to code length after which it becomes harder and harder to manage if written procedurally. I like procedural for smaller stuff, like playing with algorithms.
There are two ways to write error-free programs; only the third one works. ~Alan J. Perlis
The only way for errors to occur in a program is by being put there by the author. No other mechanisms are known. Programs can't acquire bugs by sitting around with other buggy programs. ~Harlan Mills
Obviously, bugs are a big deal to any programmer. Just look through the list of questions on Stack Overflow to see this illustrated.
The difference between a hobbyist and an experienced professional is that the pro will be able to use his experience to code in a more "defensive" way, avoiding many types of bugs in the first place.
All the other answers are great. I'll add two things.
Source control is mandatory. I'm assuming you're on windows here. VisualSVN Server is free and maybe 4 clicks to install. TortoiseSVN is also free and it integrates into Windows Explorer, getting around the VS Express limitations of no add-ins. If you create too many bugs, you can revert your code and start over. Without source control, this is next to impossible. Plus you can sync your code if you have a laptop and a desktop.
People are going to recommend many techniques like unit testing, mocking, Inversion of Control, Test Driven Development, etc. These are great practices, but don't try to cram it all into your head too quickly. You have to write code to get better at writing code, so work these techniques slowly into your code writing. You have to crawl before you walk and walk before you can run.
Best of luck in your coding adventures!
This is a common newbie thing. As you get more experience, of course, you'll still have bugs, but they'll be easier to find and fix because you'll learn how to make your code more modular (so that changing one thing doesn't have ripple effects everywhere else), how to test it, and how to structure it to fail fast, close to the source of the problem, rather than in some arbitrary place. One very basic but useful thing that doesn't require complex infrastructure to implement is to check the inputs to all functions that have non-trivial precondtions with asserts. This has saved me several times in cases where I would have otherwise gotten weird segfaults and arbitrary behavior that would have been near impossible to debug.
If bugs weren't a problem then I'd be able to write a 100,000 line program in 10 minutes!
Your question is like, "As an amateur doctor, I worry about my patients' health: sometimes when I'm not careful enough, they sicken. Is patients' health a problem for you professional doctors too?"
Yes: it's the central problem, even the only problem (for any sufficiently all-inclusive definition of 'bug').
Bugs are common to everyone -- professional or not.
The larger and more distributed the project, the more careful one must be. One look at any open source bug database (ex: https://bugzilla.mozilla.org/ ) will confirm this for you.
The software industry has evolved various programming styles and standards, which when used right, make wrong code easier to spot or limited in its impact.
Therefore, training has a very positive on code quality... But at the end of the day, bugs still sneak through.
If you're just a hobbyist programmer, learning full bore TDD and OOP may involve more time than you're willing to put in. So, going on the assumption that you don't want to put in the time on them, a few easily digestible suggestions to cut down on bugs are:
Keep each function doing one thing. Be suspect of a function more than, say, 10 lines long. If you think you can break it into two functions, you probably should. Something that will help you control this is naming your functions according to exactly what they are doing. If you find that your names are long and unwieldy then you function is probably doing too many things.
Turn magic strings into constants. That is, instead of using:
people["mom"]
use instead
var mom = "mom";
people[mom]
Design your functions to either do something (command) or get something (query), but not both.
An extremely short and digestible take on OOP is here http://www.holub.com/publications/notes_and_slides/Everything.You.Know.is.Wrong.pdf. If you get this, you've got the gist of OOP and are quite frankly ahead of a lot of professional programmers.
The prevailing wisdom seems to be that the average programmer creates 12 bugs per 1000 lines of code - depends on who you ask for the exact number, but it's always per lines of code - so, the bigger the program, the more the bugs.
Subpar programmers tend to create way more bugs.
Newbies are often trapped by idiosyncrasies of the language, and lacking experience tends towards more bugs too. As you go on, you will get better, but never will you create bug-free code... well I still have bugs, even after 30 years, but that could be just me.
Nasty bugs happen to everyone from pros to hobbyists. Really good programmers get asked to track down really nasty bugs. It's part of the job. You'll know you've made it as a software developer when you stare at a nasty bug for two days and in frustration you shout, "Who wrote this crap!?!?" ... only to realize it was you. :-)
Part of the skill of a software developer is the ability to keep a large set of interrelated items straight in his/her head. It sounds like you're discovering what happens when your mental model of the system breaks down. With practice you will learn to design software that doesn't feel so brittle. There are tons of books, blogs, etc. out there on the subject of software design. And Stack Overflow of course for specific questions.
All that said, here's a couple of things you can do:
A good debugger is invaluable. Often you have to step through your code line by line to figure out what went wrong.
Use a garbage-collected language such as Python or Java if it makes sense for your project. GC will help you focus on making things work instead of getting bogged down by maddening memory errors.
If you write C++, learn to love RAII.
Write LOTS of code. Software is somewhat of an art form. Lots of practice will make you better at it.
Welcome to Stack Overflow!
What really changed my odds against code complexity and bugs was using a coding standart - how to place brackets an so on. It may seem like just boring and useless thing but it really unifies all the code and makes it much easier to read and maintain. So do you use a coding standart?
If you're not well organized, your codebase will become your very own Zebra Puzzle. Adding more code is like adding more people/animals/houses to your puzzle, and soon you have 150 various animals, people, houses and cigarette brands in your puzzle and you realize that it just took you a week to add 3 lines of code because everything is so inter-related that it takes forever to make sure the code still executes how you want it to.
The most popular organizational paradigm seems to be Object Oriented Programming, if you can break your logic down into small units which can be constructed and used independently of each other, then you will find bugs far less painful when they occur.

When is it good (if ever) to scrap production code and start over? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I was asked to do a code review and report on the feasibility of adding a new feature to one of our new products, one that I haven't personally worked on until now. I know it's easy to nitpick someone else's code, but I'd say it's in bad shape (while trying to be as objective as possible). Some highlights from my code review:
Abuse of threads: QueueUserWorkItem and threads in general are used a lot, and Thread-pool delegates have uninformative names such as PoolStart and PoolStart2. There is also a lack of proper synchronization between threads, in particular accessing UI objects on threads other than the UI thread.
Magic numbers and magic strings: Some Const's and Enum's are defined in the code, but much of the code relies on literal values.
Global variables: Many variables are declared global and may or may not be initialized depending on what code paths get followed and what order things occur in. This gets very confusing when the code is also jumping around between threads.
Compiler warnings: The main solution file contains 500+ warnings, and the total number is unknown to me. I got a warning from Visual Studio that it couldn't display any more warnings.
Half-finished classes: The code was worked on and added to here and there, and I think this led to people forgetting what they had done before, so there are a few seemingly half-finished classes and empty stubs.
Not Invented Here: The product duplicates functionality that already exists in common libraries used by other products, such as data access helpers, error logging helpers, and user interface helpers.
Separation of concerns: I think someone was holding the book upside down when they read about the typical "UI -> business layer -> data access layer" 3-tier architecture. In this codebase, the UI layer directly accesses the database, because the business layer is partially implemented but mostly ignored due to not being fleshed out fully enough, and the data access layer controls the UI layer. Most of the low-level database and network methods operate on a global reference to the main form, and directly show, hide, and modify the form. Where the rather thin business layer is actually used, it also tends to control the UI directly. Most of this lower-level code also uses MessageBox.Show to display error messages when an exception occurs, and most swallow the original exception. This of course makes it a bit more complicated to start writing units tests to verify the functionality of the program before attempting to refactor it.
I'm just scratching the surface here, but my question is simple enough: Would it make more sense to take the time to refactor the existing codebase, focusing on one issue at a time, or would you consider rewriting the entire thing from scratch?
EDIT: To clarify a bit, we do have the original requirements for the project, which is why starting over could be an option. Another way to phrase my question is: Can code ever reach a point where the cost of maintaining it would become greater than the cost of dumping it and starting over?
Without any offense intended, the decision to rewrite a codebase from scratch is a common, and serious management mistake newbie software developers make.
There are many disadvantages to be wary of.
Rewrites stop new features from being developed cold for months/years. Few, if any companies can afford to stand-still for this long.
Most development schedules are difficult to nail. This rewrite will be no exception. Amplify the previous point by, now, a delay in development.
Bugs that were fixed in the existing codebase through painful experience will be re-introduced. Joel Spolsky has more examples in this article.
Danger of falling victim to the Second-system effect -- in summary, ``People who have designed something only once before try to do all the things they "didn't get to do last time", loading the project up with all the things they put off while making version one, even if most of them should be put off in version two as well.''
Once this expensive, burdensome rewrite is completed, the very next team to inherit the new codebase is likely to use the same excuses for doing another rewrite. Programmers hate learning someone else's code. No one writes perfect code because perfection is so subjective. Find me any real-world application and I can give you a damning indictment and rationale for doing a from-scratch rewrite.
Whether you ultimately rewrite from scratch or not, beginning a refactoring phase now is a good way to both really sit down and understand the problem so that the rewrite will go more smoothly if truly called for, as well as giving the existing codebase an honest look to really see if a rewrite's needed.
To actually scrap and start over?
When the current code doesn't do what you would like it to do, and would be cost prohibitive to change.
I'm sure someone will now link Joel's article about Netscape throwing their code away and how it's oh-so-terrible and a huge mistake. I don't want to talk about it in detail, but if you do link that article, before you do so, consider this: the IE engine, the engine that allowed MS to release IE 4, 5, 5.5, and 6 in quick succession, the IE engine that totally destroyed Netscape... it was new. Trident was a new engine after they threw away the IE 3 engine because it didn't provide a suitable basis for their future development work. MS did that which Joel says you must never do, and it is because MS did so that they had a browser that allowed them to completely eclipse Netscape. So please... just meditate on that thought for a moment before you link Joel and say "oh you should never do it, it's a terrible idea".
A rule of thumb I've found useful is that if given a code base, if I have to re-write more than 25% of the code to make it work or modify it based upon new requirements, you may as well re-write it from scratch.
The reasoning is that you can only patch a body of code so far; beyond a certain point, it's quicker to do over.
There's an underlying assumption that you have a mechanism (such as thorough unit and/or system tests) that will tell you whether your re-written version is functionally equivalent (where it needs to be) as the original.
If it requires more time to read and understand the code (if that is even possible)
than it would to rewrite the entire application, I say scrap it and start over.
Be very carefull with this:
Are you sure you aren't just being lazy and not bothering to read the code
Are you being arrogant about the great code you will write compared to the rubbish anyone else produced.
Remember tested-working code is worth a lot more than imaginary yet-to-be-written code
In the words of our estemed host and overlord, Joel - things you should never do,
it's not always wrong to abandon working code - but you have to be sure about the reason.
I saw an application re-architected within 2 years of its introduction into production, and others rewritten in different technologies (one was C++ - now Java). Both efforts were were not, to my mind, successful.
I prefer a more evolutionary approach to bad software. If you can "componentize" your old app such that you can introduce your new requirements and interface with the old code, you can ease yourself into the new environment without having to "sell" the zero-value (from a biz perspective) investment in rewriting.
Suggested approach - write unit tests for the functionality with which you wish to interface to 1) ensure the code behaves as you expect and 2) provide a safety net for any refactoring that you may wish to do on the old base.
Bad code is the norm. I think IT gets a bad rap from business for favoring rewrites/rearchitecting/etc. They pay the money and "trust" us (as an industry) to deliver solid, extensible code. Sadly, business pressures frequently result in shortcuts that make the code unmaintainable. Sometimes it's bad programmers... sometimes bad situations.
To answer your rephrased question... can code maintenance costs ever exceed rewriting costs... the answer is clearly yes. I don't see anything in your examples, however, that lead me to believe this is your case. I think those issues can be addressed with tests and refactoring.
In terms of business value, I would think it's extremely rare that a real case can be made for a rewrite due solely to the internal state of the code. If the product's customer-facing and is currently live and bringing in money (i.e. is not a mothballed or unreleased product), then consider that:
You already have customers using it. They're familiar with it, and might have built some of their own assets around it. (Other systems that interface to it; products based on it; processes they'd have to change; staff they'd maybe have to retrain). All of this costs the customer money.
Re-writing it might cost less in the long term than making difficult changes and fixes. But you can't quantify that yet, unless your app is no more complex than Hello World. And a re-write means a re-test and a redeploy, and probably an upgrade path for your customers.
Who says the re-write will be any better? Can you honestly say your firm is writing sparkly code now? Have the practices that turned the original code to spaghetti been corrected? (Even if the main culprit was a single developer, where were his peers and management, ensuring quality through reviews, testing, etc.?)
In terms of technical reasons, I'd suggest it could be time for a major rewrite if the original has some technical dependencies that have become problematic. e.g. a third party dependency that's now out of support, etc.
In general though, I think the most sensible move is to refactor piece by piece (very small pieces if it's really that bad), and improve the internal architecture incrementally rather than in one big drop.
Two threads of thought on this one: Do you have the original requirements? Do you have confidence that the original requirements are accurate? What about test plans or unit tests? If you have those things in place it might be easier.
Putting on my customer hat, does the system work or is it unstable? If you've got something that's unstable you've got an argument to change; otherwise you're best of refactoring it bit by bit.
I think the line in the sand is when basic maintenance is taking 25% - 50% longer than it should. There comes a time when maintaining legacy code becomes too costly. A number of factors contribute to the final decision. Time and cost being the most important factors I think.
If there are clean interfaces and you can cleanly delineate module boundaries, then it might be worth refactoring it module by module or layer by layer in order to allow you to migrate existing customers forward into cleaner more stable codebases, and over time, after you've refactored every module, you will have rewritten everything.
But, based on the codereview, doesn't sound like there would be any clean boundaries.
I wonder if the people who vote for scrapping and starting over have ever successfully refactored a large project, or at least seen a large project in poor condition that they think could use a refactoring?
If anything, I err on the opposite side: I've seen 4 large projects that were a mess, that I advocated refactoring as opposed to rewriting. On a couple, there was barely a single line of original code that remained, and major interfaces changed in significant ways, but the process never involved the entire project failing to function as well as it originally did, for any more than a week. (And top-of-trunk was never broken).
Perhaps a project exists that is so severely broken that to attempt to refactor it would be doomed to failure, or perhaps one of the previous projects I refactored would have been better served by a "clean re-write", but I'm not sure I'd know how to recognize it.
I agree with Martin. You really need to weigh the effort that will be involved in writing the app from scratch against the current state of the app and how many people use it, do they like it, etc. Often we may want to completely start from scratch, but the cost far outweighs the benefit. I come across bits of ugly looking code all the time, but I soon realize that some of these 'ugly' areas are really bug fixes and make the program work correctly.
I would try to consider the architecture of the system and see whether it is possible to scrap and rewrite specific well defined components without starting everything from scratch.
What would usually happen is that you can either do that (and then sell that to the customer/management), or that you find out that the code is such a horrible and tangled mess that you become even more convinced that you need a rewrite and have more convincing arguments for it (including: "if we engineer it right, we would never need to scrap the whole thing and do a third rewrite).
Slow maintenance would eventually cause that architectural drift that would make a rewrite more expensive later.
Scrap old code early and often. When in doubt, throw it out. The hard part is convincing non-technical folks of the cost-to-maintain.
So long as the value derived appears to be greater than the cost to operate and maintain, there's still positive value flowing from the software. The question surrounding a rewrite this: "will we get even more value from a rewrite?" Or alternatively "How much more value will we get from a rewrite?" How many person-hours of maintenance will you save?
Remember, the rewrite investment is once only. The return on the rewrite investment lasts forever. Forever.
Focus the value question down to specific issues. You listed a bunch of them above. Stick with that.
"Will we get more value by reducing cost through
dropping the junk that we don't use
but still have to wade through?"
"Will we get more value from dropping the junk that's unreliable and breaks?"
"Will we get more value if we understand it -- not by documenting, but by replacing with something we built as a team?"
Do you homework. You'll have to confront the following show-stoppers.
These will originate somewhere in your executive foodchain from someone who'll respond as follows:
"Is it broken?" And when you say "It's not crashed as such," They'll say "It's not broke - don't fix it."
"You've done the code analysis, you understand it, you no longer need to fix it."
What's your answer to them?
That's only the first hurdle. Here's the worst possible situation. This doesn't always happen, but it does happen with alarming frequency.
Someone in your executive foodchain will have this thought:
"A rewrite doesn't create enough value. Rather than simply rewrite, let's expand it." The justification is that by creating enough value, users are more likely to buy in to the rewrite.
A project where scope is expanded -- artificially -- to add value is usually doomed.
Instead, do the smallest rewrite you can to replace the darn thing. Then expand to fit real needs and add value.
You can only give a definite yes to rewriting in case if you know completely how your application works (and by completely I mean it, not just having a general idea of how it should work) and you know more or less exactly how to make it better. Any other cases and it's a shot in the dark, it depends on too much things. Perhaps gradual refactoring would be safer if it is possible.
If possible, I typically would prefer to rewrite smaller portions of the code over time when I need to refactor a baseline. There are typically many smaller issues such as magic number, poor commenting, etc. that tend to make the code look worse than it actually is. So, unless the baseline is just awful, keep the code and just make improvements at the same time you are maintaining the code.
If refactoring requires a lot of work, I recommend laying out a small re-design plan/todo list that gives you a list of things to work on in order so that you can bring the baseline to a better state. Starting from scratch is always a risky move and you are not guaranteed that the code will be better when you are finished. Using this technique, you will always have a working system that improves over time.
Code with excessively high cyclomatic complexity (like over 100 in a large number of modules) is a good clue. Also, how many bugs does it have / KLOC? How critical are the bugs? How often are bugs introduced when bug fixes are made. If your answer is a lot (I cant remember norms right now), then a rewrite is warranted.
As early as possible. Whenever you get a premonition that your code is slowly turning into an ugly beast that is very likely to consume your soul and give you headaches, and you know the problem is in the underlying structure of the code (so any fix would be a hack, e.g. introduce a global variable), then it's time to start over.
For some reasons people don't like throwing away precious code, but if you feel your better off starting over, you are probably right. Trust your instinct and remember that it wasn't a waste of time, it taught you one more way of NOT approaching the problem. You could (should) always use a version control system so your baby is never really lost.
I do not have any experience with using metrics for this myself, but the
article
"Software Maintainability Metrics Models in Practice" discusses
more or less the same question asked here for two case studies they did.
It starts with the following editor's note:
In the past, when a maintainer
received new code to maintain, the
rule-of-thumb was "If you have to
change more than 40 percent of someone
else's code, you throw it out and
start over." The Maintainability Index
[MI] addressed here gives a much more
quantifiable method to determine when
to "throw it out and start over." This
work was sponsored by the U.S. Air
Force Information Warfare Center and
the U.S. Department of Energy [DOE],
Idaho Field Office, DOE Contract No.
DE-AC07-94ID13223.)
I think the rule was...
The first version is always a throw away
So, if you learned your lesson(s), or his/her lessons, then you can go ahead and write it fresh now that you understand your problem domain better.
Not that there aren't parts that can/should be kept. Tested code is the most valuable code, so if it isn't deficient in any real way other than style, no reason to toss it all out.
When is it good (if ever) to scrap production code and start over?
Never had to do this, but logic would dictate (to me, anyway) that once you pass the inflection point where you're spending more time reworking and fixing bugs in the existing code base than you are adding new functionality, it's time to trash the old stuff and get a fresh start.
If it requires more time to read and understand the code (if that is even possible) than it would to rewrite the entire application, I say scrap it and start over.
I have never completely thrown out code. Even when going from a foxpro system to a c# system.
If the old system worked then why just throw it out?
I have come across a few really bad system. Threads being used where not needed. Horrible inheritance and abuse of interfaces.
It is best to understand what the old code is doing and why it is doing it. Then change it so that it is not confusing.
Of course if the old code doesn't work. I mean can't even compile. Then you might be justified in just starting over. But how often does that actually happen?
Yes, it totally can happen. I've seen money be saved by doing it.
This is not a tech decision, it's a business decision. Code rewrites are long term gains, while "if it ain't totally broke..." is a short term gain. If you are in a first year startup that is focused on getting a product out the door, the answer is usually to just live with it. If you're in an established company, or the errors with the current systems are causing more workload, therefor more company money.. then they might go for it.
Present the problem as best as you can to your GM, use dollar values where you can. "I don't like dealing with it" means nothing. "It'll take twice the time to do everything until this is fixed" means a lot.
I think there are a number of issues here that depend largely on where you are at.
Is the software working well from a customer perspective? (If yes be very careful about changes). I would think there would be little point re-witting unless you were expanding the feature set if the system was working. And are you planning to expand the features and customer base of the software? If so then you have much more reason to change.
As much as anything just trying to understand some else's code even if well written can be difficult, when badly written I would imagine almost impossible. What you describe sounds like something that would be very difficult to expand.
I would take into consideration if the application does what it is intended to do, is required for you to ever make modifications, and are you confident that the app has been thoroughly tested in all scenarios that it will be used in.
Do not invest the time if the app does not need alterations. However, if it doesn't function as you need and you need to control the hours and time invested to make corrections, scrap it and re-write to the standards that your team can support. There's nothing worse than terrible code that you have to support / decipher but still have to live with. Remember, Murphy's Law says it will 10 at night when you'll have to make things work, and that is never productive.
Production code always has some value. The only case where I would truly throw it all out and start again is if we determine the intellectual property is irrevocably contaminated. For example if someone brought large amounts of code from a previous employer, or a large percentage of the code was ripped from a GPLd codebase.
I'm going to post this book every time I see a discussion on Refactoring. Everyone should read "Working Effectively with Legacy Code" by Michael Feathers. I found it to be an excellent book - if nothing else, it's a fun read, and motivational.
When the code has reached a point that is not maintainable or extensible anymore. Is full of short-term hacky fixes. It has lots of coupling. It has long (100+lines) methods. It has database access in the UI. It generates a lot of random, impossible to debug errors.
Bottom line: When maintaining it is more expensive (i.e. takes longer) than rewriting it.
I used to believe in just re-write from scratch, but it is wrong.
http://www.joelonsoftware.com/articles/fog0000000069.html
Changed my mind.
What I would suggested is figuring out a way to properly refactor the code. Keep all existing functionality and test as you go. We have all seen horrible code bases, but it is important to keep the knowledge over time you application has.

Resources