Metrics for programmer self evaluation - metrics

I'm programming from home and I want to know whether I'm more or less productive programming at 10 AM than I am when I'm programming at 8PM.
What metrics should I use to determine an answer to the question?

Ignoring the debate in the question's comments, a bunch of arbitrary productivity-ish metrics you could measure...
lines of code written
user stories/tasks completed
bugs fixed
tests written
tests passing first time
bugs found
code churn vs new code (i.e. "right first time" vs "rewritten repeatedly")
%age of time in IDE vs debugging
%age of time in IDE vs non-work applications
code quality (using another similarly arbitrary measure like FxCop compliance or cyclic complexity)
code performance (against some arbitrary or customer-specified benchmark)
The best metrics tend to be combinations - say, "average of bugs found per line of code written" - rather than a single measure. Still, these are all subjective and innacurate.
I'd suggest the best thing to do is decide what your goal is when you're programming. Is it to produce high-quality code, or super-performant realtime code, or mission-critical-must-be-bug-free code, or do you just need to ship something that works in the shortest time? Until you've defined "productive", it's hard to suggest what would be a meaningful measurement.

I don't know if there is some established method for measuring productivity in programmers but assuming alertness and focus has a direct impact on productivity, I suppose you could set yourself some kind of mental arithmetic test with randomised questions and answers and take it at regular intervals.
It's a tricky one because you can't measure by lines, or problems solved (because they vary in scale and difficulty.) In fact, this article suggests that when attempting to measure programmer productivity, there is almost no correllation between the time it takes to complete a task and the quality of the finished product.

Related

What is Best for Defect Rate Tracking? Defects per KLOC?

I'm trying to create some internal metrics to demonstrate (determine?) how well TDD improves defect rates in code.
Is there a better way than defects/KLOC? What about a language's 'functional density'?
Any comments or suggestions would be helpful.
Thanks - Jonathan
You may also consider mapping defect discovery rate and defect resolution rates... how long does it take to find bugs, and once they're found, how long do they take to fix? To my knowledge, TDD is supposed to improve on fix times because it makes defects known earlier... right?
Any measure is an arbitrary comparison of defects to code size; so long as the comparison is similar, it should work. E.g., defects/kloc in C to defects/kloc in C. If you changed languages, it would affect the metric in any case, since the same program in another language might be less defect-prone.
Measuring defects isn't an easy thing. One would like to account for the complexity of the code, but that is incredibly messy and unpleasant. When measuring code quality I recommend:
Measure the current state (what is your defect rate now)
Make a change (peer reviews, training, code guidelines, etc)
Measure the new defect rate (Have things improved?)
Goto 2
If you are going to compare coders make sure you compare coders doing similar work in the same language. Don't compare the coder who works in the deep internals of your most complex calculation engine to the coder who writes the code that stores stuff in the database.
I try to make sure that coders know that the process is being measured not the coders. This helps to improve the quality of the metrics.
I suggest to use the ratio between the times :
the time spend fixing bugs
the time spend writing other codes
This seem valid across languages...
It also works if you only have a rough estimation of some big code base. You can still compare it to the new code you are writing, to impress you management ;-)
I'm skeptical of all LOC-related measurements, not just because of different relative expressiveness of languages, but because individual programmers will vary enough in the expressiveness of their code as to make this metric "fuzzy" at best.
The things I would measure in the interests of project management are:
Number of open defects on the project. There's no single scalar that can tell you where the project is and how close it is to a releasable state, but this is still a handy number to have on hand and watch over time.
Defect detection rate. This is not the rate of introduction of new defects into the system, but it's probably the closest proxy you'll find.
Defect resolution rate. If this is less than the detection rate, you're falling behind - if it's greater, you're getting ahead.
All of these numbers are more useful if you combine them with severity information. A product with 20 minor bugs may well closer to release than one with 2 crashing bugs. If you're clearing the minor bugs but not the severe ones, you have to get the developers to refocus their attention.
I would track these numbers per project and per developer. The reason for doing them per project should be clear. The per-developer numbers are certainly not the whole picture of an individual contributor's skill or productivity, but can point you to people who might need training or remediation.
You may also wish to tag all the tickets in your defect tracking system by project module as well (especially for larger projects), so that you can tell when critical modules are in a fragile state.
Why dont you consider defects per use case ? or defects per requirement. We have faced practical issues in arriving at the KLOC.

Evaluating software estimates: sure signs of unrealistic figures?

Whilst answering “Dealing with awful estimates” posted by Ash I shared a few tips that I learned and personally use to spot weak estimates. But I am certain there must be many more!
What heuristics to use in the scenario when one needs to make a quick evaluation of software project estimate that has been compiled by a third-party (a colleague, a business partner or an external company)?
What are the obvious and not so obvious signs of weak software estimates that can be spotted without much detailed knowledge of task at hand?
A single person having done the estimates, rather than having used consensus based estimation (to fully understand the implied scope of requirements) such as Wideband Delphi.
Especially true if the person doing the estimation is not the person doing the implementation!! - I once worked on a project estimated by someone else as 60 days before any requirements had even been given. Lets just say I was not a happy bunny
No time for documentation.
No time for ramp-up (in terms of learning, and team size).
No list of risks, and their impact to the timescale.
No buffer for the unexpected - in terms of late breaking requirements, and risks.
No one has said it, so I will. The obvious answer is that if you have software schedule estimates then that is a sure sign of unrealistic figures. Yes, there are many methods for estimating software but none of them are accurate in any way, shape or form. What usually happens is that deadlines are set. If the task is over-estimated then extra time is spent making the result better. If the task is under-estimated then something is sacrificed to meet the delivery (like testing and features).
I know this answer isn’t what people want to believe, but estimating is always a guess. More often than not, a developer can’t even predict how much they will accomplish by the end of the day. You are expecting them to guess things months/years down the road on something that they aren’t even sure what is really involved yet.
The only practical answer to your question that isn’t prone to giving unrealistic results would be using a worksheet that comes up with guesses based on previous history at your company. Unfortunately, that will not account for tasks the estimator missed. At least this may give ballpark numbers.
Unless you develop knock offs of the same exact system over and over again, then anyone who thinks they have figured this out is fooling themselves. There are way too many variables involved.
There are two types of estimates: task estimates and project estimates. You can view these as the big and small pictures.
Project estimates are necessarily high level (granularity no smaller than days typically) and must include things like:
High level architecture;
Time for testing;
Ramp up times;
Defect processes;
Time for documentation;
Relevant training;
Assumptions;
Dependencies (eg team A can't do what they need to until team B delivers phase 1);
Critical path (which pieces are likely to determine if the project slips and by how much); and
Risks.
The more of those things that are missing, the more unrealistic (or risky) the estimates.
The second kind of a task estimate, which is typically much lower level. For this kind of estimate it should be simply a task breakdown (with no task being larger than say 5 days).
These don't tend to address the above items but some of them might be relevant, such as assumptions regarding decisions not made yet (eg production hardware). It may also be worth identifying who can and can't do the tasks due to relevant experience, background knowledge or skills (as that person or those persons may end up overcommitted).
Other posts have mentioned the testing time should equal or exceed dev time. I strongly disagree with this. I've seen 8 hour dev tasks result in 100+ hours test time and 80 hour dev tasks result in less than 2 hours of testing. In both cases the testing time was entirely reasonable. The is no absolute correlation between the two. At best, there is a loose connection.
Is the estimate what the management
wanted to be told?
Does the
estimate nicely fit in with the
planned shipment date for the next
release?
Does the management reward
people that give good news more then
people the give bad news?
Was the
estimate prepared before knowing who
would be working on the project?
Did
someone that wanted that bit of
functionally implemented prepare the
estimate?
Is there a history of
software being late?
Is it normal for
developers to be moved onto other
tasks partway though a project?
Have
some or all developers given up on
commenting on bad estimates as a
waste of time?
Count up the number of questions you get “yes” or “maybe” answers.…
If you get mostly “no” answers to the above questions, then it may be worth looking at the estimate in detail to see if it includes the tasks that other people of listed in this thread.
Wow... I really like toolkit's answer.
And I agree that any estimate at all is flawed, because it assumes that the estimator has way more of a clue for how to solve the problem than any estimator actually does when a project gets estimated. However, I think you still need to at least estimate the size of the mountain before you start. Some thought as to whether it's worth trying to do it should precede any endeavor and that's what the essence of an estimate should be.
I did come up with a few more indicators of a dangerous estimate:
No cross-reference - If the estimate can't be validated at least two different ways, it's likely to be unreliable. For example, the last estimates I've done I've been able to break down the work into small feature chunks, and consider how long it's taken our team to do similarly scoped features. Then I was able to look at the sum of these costs and see how big the scope was relative to other projects I've worked on. That's two ways to validate.
The background of the estimator - if this is a software estimate done by a hardware guy who's never written code - be very afraid. More subtle - the closer the estimator's background is to the technology and problem domain of the project, the better.
Detail - as said a few different ways in a few different posts - I like to see detail for both individual features, as well as the tasks needed to complete each feature. Most estimates I see don't show the detail in the formal presentation, but if you ask the person who did the estimate, they should have a file somewhere. Hopefully it's not the back of a paper napkin stained with beer and ketchup. :)
Documented Assumptions - any estimator will have had to make some set of assumptions about the task. These should be documented somewhere, perferably in the formal paperwork. I always get a little worried when I see a short proposal with not many documented assumptions. Either they were thought through and not communicated to the customer, or they were not thought through. I'm not honestly sure which one is worse. It goes without saying that the assumptions should also be reasonable.
Balanced Lifecycle - However the task is broken down, what's the ratio of design, code and test? How about documentation, integration with external systems and post release support? How about those extra things that are so vital (system admins, CM gurus, management effort)?
Slack - I'm sure the corporate daemons of cheapness will come and flay me, but a schedule and a cost should have some slack. If the estimate looks ambitious and agressive to an experienced eye, it is likely to be too low. Estimates are almost never too high.
One good heuristic is to see if test time is roughly the same a development time. That's a good sign for the estimate.
If they can't give you a breakdown of the estimate then that's a bad thing. Usually a sign of lots of little things that may have been forgotten. They don't need to provide the complete original breakdown, just a breakdown like:
requirements
development
testing
packaging and deployment
etc.
They should be using a standard template to calculate their estimate. They don't need a number in every column, but they do the template to list all possible tasks. That way the template can be used to jog peoples's minds when working on the estimate.
If the estimate is overly precise, e.g. 0.25 hour increments, then that, for me, is a bad smell.
If there are things missing like requirements capture, testing, deployment, and handover to any Ops group? If any of those are missing, that's the sort of thing that will come back and bite you.
Edit: One other thing to watch for is the old "perpetually 90% complete" tasks. You get progress update after progress update listing a task as "90% complete". That's not good!
HTH
cheers
Is the compiler of the
estimate available and willing to
discuss it with other senior project
members? If not, that is often a
concern.
Was the estimate sent to the
customer before the experience and
skills of the development staff are
known. Two point estimates may help
but only to some extent.
Before even getting a chance to look at the estimate, you are told that you are committed to delivering all of the functionality described by a specific date.
(Thanks for responding to my question, by the way.)
If you see one or more of these, you may have a bad estimate:
Single point estimates: an estimate should be associated with a range of possible dates or a confidence value
Insufficient granularity of tasks: a large task bucket usually indicates that the functionality is not well understood (which is especially a problem since poorly understood problems are usually under-estimated)
No expression of assumptions and/or risks
Inadequate effort allocated for commonly skipped or underestimated items (e.g. build scripts, documentation, deployment, etc.)
I agree with sateesh, I really like Software Estimation: Demystifying the Black Art by Steve McConnell. He has several checklists which are useful when reviewing and/or preparing estimates.
I totally agree with Dunk, the first sign of bad estimates is the mere presence of a large detailed upfront schedule. Estimates are exactly that, an approximation, otherwise we would call them exactimates. So they should never be used alone in the management of a project.
The most important point to consider is not the accuracy of estimates but the consistency. If a third party were doing estimates for you, then ask to see a history of their successes or failures, speak with their past clients and determine their reliability.
That all being said, from an Agile standpoint, some of the ways we attempt to gain more consistent estimates during a project are;
Use a relative sizing standard (S,M,L,XL) rather than time based (ideal days).
focus on complexity not time
Always use group estimates not single person estimates
Gather estimates frequently throughout the project, generally at the start of each sprint
use feedback from previous sprints in determination of story complexity
track velocity to give meaning to the relative sizing
frequent and early story retrospection to examine/understand any thrashing
If you are dealing with companies that use these estimation methods then, chances are you are going to receive consistent and therefore better results.
Estimates of the form 3, 6, or 12 months (basically any round numbers) reek of guessing. Usually when you guess you pick some round number bigger than you think it will take -- quarters, half a year, etc. -- are the usual suspects. I much prefer estimates in terms of actual development iterations (whatever their size).
What are the obvious and not so
obvious signs of weak software
estimates that can be spotted without
much detailed knowledge of task at
hand?
Estimates which are given without much detailed knowledge of the task at hand are generally not good.
Perhaps a general approach you could take is to check that items in the requirements correspond to those in the estimate. If you want to be very quick check the relative sizes, if there is a 100 word estimate given to a 100,000 word brief it stands no chance of being right.
Also (as others have said) check that analysis, coding, debugging, testing, integration, contingency etc are mentioned. It shows some thought has gone into it.
Having success and sign off criteria at various stages is a great sign. If they have a defined point which is 10% done at least if the estimate is wrong you know early and have a chance to adapt. If there are no checkpoints until “finish” you may not know that you are behind until that date is hit.
How familar is the person giving the estimate with the people doing the work?
I have often seen estimates where there is a generic person doing the work, even though the team is made up of individuals with very different backgrounds. Most likely the tasks and the specialities don't line up perfectly and you get a c++ serverside programmer who ends up doing either your gui or your database... Sometimes the manager of the team doesn't really appreciate the team member's strengths, so if he has been asked to come up with the estimate on his own because his team is busy on the previous project you will find that the work in question is really only suitable for part of the team (not motivating, lack of skills etc)
One other helpful way to evaluate the estimates is to compare it with the actual effort that was spent on previous projects of similar kind. The best data for the comparison would be the effort data of the previous projects that the organization has done. If there is no organizational historical data you can try to benchmark the estimate against industry wide benchmarks.
I would also say if the estimate is presented as single absolute number (say 180 days) then it is not a good sign. A single absolute number would indicate that the estimate is that the task will be finished with 100% probability on the given data. The estimates presented in a range (say 130 to 180 days) would indicate that the range in which task could be completed.
Much of what I have written above I attribute it to the book :
Software Estimation: Demystifying the Black Art
by Steve McConnell
I check the estimates against the man-power. Although not a very accurate heuristic, it's clear if some massive work has just one or two devs assigned to it, that the task was not estimated correctly
A good estimate will have a good breakdown, involving all phases of the project.
It will almost certainly not finish at a convenient date for the business.
It will include risks of various sorts.
It will be presented in terms of confidence intervals, either implicitly (10-12 months) or by using large units (about four quarters).
It will be made by somebody with responsibility for getting the project done, preferably more than one such person.
If there are delays at the start, there will be delays at the end (expressed as 10-12 months from start, or about 1Q2010 if we start now, not something like January 2010 when the project hasn't started yet).
Assumptions and dependencies will be clearly and prominently listed.
Edit: Part of this depends on the stage the project is in. An early but precise estimate is a warning sign, particularly if there is no confidence interval assigned. That reeks of a Procrustean estimate.
Also, watch for other development methodologies. A timeboxed project can have a rigid and arbitrary schedule, but the feature set will be flexible.
Any of the following:
It is one big project and there isn't a short high level strategy described
There isn't a clear, short and concise vision of what wants to be achieve with the project
The project isn't structured around business value being delivered gradually
The team is trying to give "accurate" estimates for a big project, going into (or was done with) a long analysis phase? (changes will come, and will usually affect those estimates in way that can't be easily quantified without yet more big efforts)
There are "detailed" estimates for the whole project (related to previous)
There aren't detailed estimates for the first phase, or there is something wrong in those.
"Four to six weeks", when not accompanied with a breakdown into shorter tasks...

How to create an accurate hour estimate? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
What are your experiences regarding
project planning and creating hour
estimates for new projects?
What is the approach you are using,
and why has or has it not worked
for you?
Are there any best practices to take
into account?
Estimation Tasks
The principles that I try to use (I don't always get the opportunity) are:
Step-wise refinement
3 point estimates
Risk analysis
Step-wise refinement
When estimating it's important to estimate at the right granularity and to continually break down and add tasks until you're confident in the estimates. Quite often, estimating highlights a lengthy, critical path task that may need more refinement and risk analysis.
Risk analysis
Trying to work out where risks lie in each task (are there lead times for something? is there a lack of knowledge? could a competitor beat you to it? etc. etc.) helps to determine your confidence in the estimates, which allows you to determine how to treat those estimates. Risk analysis also helps to determine if further step-wise refinement is required.
3-point estimates
Specifying the best, probable, and worst case estimates for each task (including design, development, testing, and bug-fixing) helps with risk analysis and planning. The estimates can be used to calculate the most likely duration to hit particular percentage success of that task. Together with information on other related tasks, and risk analysis, a project manager can factor the risk, and other known elements like system testing into the estimates to get a more reliable estimate.
Of course, the granularity of estimates is also important. There's no point in estimating in hours for most tasks. In software, days is usually best, but sometimes it could be weeks or months (such as if you are out-sourcing blocks of work). Choose a time granularity that makes sense for all the tasks in a project (I usually use days for the requirements capture and functional specification phases, and half-days thereafter as I learn more about the tasks and their sub-tasks).
Conclusion
All three of these items feed into each other, so quite often you have to refine each step a number of times. For example, you might have a stab at the requirments stage, then again during functional specification, and again during design specification.
Estimation is a learned skill; the more you do, the better you get. Risk analysis improves as you learn more about what you don't know, 3-point estimates improve as you learn more about what you do know, and stepwise refinement improves as you go through each step of a design process.
If you have the time, revisit your original estimates after you've completed a task and see how the actual time stacks up against your 3-point estimates and your project plan. If it differs, see where the time was lost or gained and try to learn what you can take from that for future projects.
Estimation should not be a daunting task - I always feel like I know more about my work after estimation than before.
I highly recommend the book "Software Estimation: Demystifying the Black Art" by Steve McConnell. It really covers this question well.
There is some excellent information about this in The Pragmatic Programmer. They advise that you use appropriate units of time rather than estimating 130 days estimate 6months. They also advise to concentrate tasks that are most crucial. And avoid making estimates based on sub estimates.
Personally I find it is useful to break the task down to understandable chunks to properly estimate them. If the task is large there are too many nooks & crannies that can hide unthought-of problems. By concentrating on the details of smaller chunks, you can evaluate the potential problems more successfully.
Your question is an NP-Complete problem:) There are many algorithms used to come up with an estimate but they are always just guesses, are never accurate, and many take a long time to execute. Forget hour estimates, use scrum or some other agile framework. Making estimates for a project in hours at its start is simply lying to people.
Don't make hour-based estimates until right before you build the feature and update those estimates continually as you progress on the feature.
Don't forget to include time for testing in your estimates.
Practice, practice, practice. To be safe, overestimate as you refine your estimating abilities. Of course if you're a consultant, this can cost you business. If you're afraid of losing business, under estimate, but be aware you'll be making up the extra hours out of your free time/bottom line.
RE:
If you're afraid of losing business, under estimate, but be aware you'll be making up the extra hours out of your free time/bottom line.
you are better off reducing your hourly rate rather then messing with the hours you present to the client. at least this way, you present the appearance of added value to your client.
LM
Log the time spent in your actual projects and that will help you plan to for the next one, PSP/TSP offer a way to do it

One could use a profiler, but why not just halt the program? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
If something is making a single-thread program take, say, 10 times as long as it should, you could run a profiler on it. You could also just halt it with a "pause" button, and you'll see exactly what it's doing.
Even if it's only 10% slower than it should be, if you halt it more times, before long you'll see it repeatedly doing the unnecessary thing. Usually the problem is a function call somewhere in the middle of the stack that isn't really needed. This doesn't measure the problem, but it sure does find it.
Edit: The objections mostly assume that you only take 1 sample. If you're serious, take 10. Any line of code causing some percentage of wastage, like 40%, will appear on the stack on that fraction of samples, on average. Bottlenecks (in single-thread code) can't hide from it.
EDIT: To show what I mean, many objections are of the form "there aren't enough samples, so what you see could be entirely spurious" - vague ideas about chance. But if something of any recognizable description, not just being in a routine or the routine being active, is in effect for 30% of the time, then the probability of seeing it on any given sample is 30%.
Then suppose only 10 samples are taken. The number of times the problem will be seen in 10 samples follows a binomial distribution, and the probability of seeing it 0 times is .028. The probability of seeing it 1 time is .121. For 2 times, the probability is .233, and for 3 times it is .267, after which it falls off. Since the probability of seeing it less than two times is .028 + .121 = .139, that means the probability of seeing it two or more times is 1 - .139 = .861. The general rule is if you see something you could fix on two or more samples, it is worth fixing.
In this case, the chance of seeing it in 10 samples is 86%. If you're in the 14% who don't see it, just take more samples until you do. (If the number of samples is increased to 20, the chance of seeing it two or more times increases to more than 99%.) So it hasn't been precisely measured, but it has been precisely found, and it's important to understand that it could easily be something that a profiler could not actually find, such as something involving the state of the data, not the program counter.
On Java servers it's always been a neat trick to do 2-3 quick Ctrl-Breakss in a row and get 2-3 threaddumps of all running threads. Simply looking at where all the threads "are" may extremely quickly pinpoint where your performance problems are.
This technique can reveal more performance problems in 2 minutes than any other technique I know of.
Because sometimes it works, and sometimes it gives you completely wrong answers. A profiler has a far better record of finding the right answer, and it usually gets there faster.
Doing this manually can't really be called "quick" or "effective", but there are several profiling tools which do this automatically; also known as statistical profiling.
Callstack sampling is a very useful technique for profiling, especially when looking at a large, complicated codebase that could be spending its time in any number of places. It has the advantage of measuring the CPU's usage by wall-clock time, which is what matters for interactivity, and getting callstacks with each sample lets you see why a function is being called. I use it a lot, but I use automated tools for it, such as Luke Stackwalker and OProfile and various hardware-vendor-supplied things.
The reason I prefer automated tools over manual sampling for the work I do is statistical power. Grabbing ten samples by hand is fine when you've got one function taking up 40% of runtime, because on average you'll get four samples in it, and always at least one. But you need more samples when you have a flat profile, with hundreds of leaf functions, none taking more than 1.5% of the runtime.
Say you have a lake with many different kinds of fish. If 40% of the fish in the lake are salmon (and 60% "everything else"), then you only need to catch ten fish to know there's a lot of salmon in the lake. But if you have hundreds of different species of fish, and each species is individually no more than 1%, you'll need to catch a lot more than ten fish to be able to say "this lake is 0.8% salmon and 0.6% trout."
Similarly in the games I work on, there are several major systems each of which call dozens of functions in hundreds of different entities, and all of this happens 60 times a second. Some of those functions' time funnels into common operations (like malloc), but most of it doesn't, and in any case there's no single leaf that occupies more than 1000 μs per frame.
I can look at the trunk functions and see, "we're spending 10% of our time on collision", but that's not very helpful: I need to know exactly where in collision, so I know which functions to squeeze. Just "do less collision" only gets you so far, especially when it means throwing out features. I'd rather know "we're spending an average 600 μs/frame on cache misses in the narrow phase of the octree because the magic missile moves so fast and touches lots of cells," because then I can track down the exact fix: either a better tree, or slower missiles.
Manual sampling would be fine if there were a big 20% lump in, say, stricmp, but with our profiles that's not the case. Instead I have hundreds of functions that I need to get from, say, 0.6% of frame to 0.4% of frame. I need to shave 10 μs off every 50 μs function that is called 300 times per second. To get that kind of precision, I need more samples.
But at heart what Luke Stackwalker does is what you describe: every millisecond or so, it halts the program and records the callstack (including the precise instruction and line number of the IP). Some programs just need tens of thousands of samples to be usefully profiled.
(We've talked about this before, of course, but I figured this was a good place to summarize the debate.)
There's a difference between things that programmers actually do, and things that they recommend others do.
I know of lots of programmers (myself included) that actually use this method. It only really helps to find the most obvious of performance problems, but it's quick and dirty and it works.
But I wouldn't really tell other programmers to do it, because it would take me too long to explain all the caveats. It's far too easy to make an inaccurate conclusion based on this method, and there are many areas where it just doesn't work at all. (for example, that method doesn't reveal any code that is triggered by user input).
So just like using lie detectors in court, or the "goto" statement, we just don't recommend that you do it, even though they all have their uses.
I'm surprised by the religous tone on both sides.
Profiling is great, and certainly is a more refined and precise when you can do it. Sometimes you can't, and it's nice to have a trusty back-up. The pause technique is like the manual screwdriver you use when your power tool is too far away or the bateries have run-down.
Here is a short true story. An application (kind of a batch proccessing task) had been running fine in production for six months, suddenly the operators are calling developers because it is going "too slow". They aren't going to let us attach a sampling profiler in production! You have to work with the tools already installed. Without stopping the production process, just using Process Explorer, (which operators had already installed on the machine) we could see a snapshot of a thread's stack. You can glance at the top of the stack, dismiss it with the enter key and get another snapshot with another mouse click. You can easily get a sample every second or so.
It doesn't take long to see if the top of the stack is most often in the database client library DLL (waiting on the database), or in another system DLL (waiting for a system operation), or actually in some method of the application itself. In this case, if I remember right, we quickly noticed that 8 times out of 10 the application was in a system DLL file call reading or writing a network file. Sure enough recent "upgrades" had changed the performance characteristics of a file share. Without a quick and dirty and (system administrator sanctioned) approach to see what the application was doing in production, we would have spent far more time trying to measure the issue, than correcting the issue.
On the other hand, when performance requirements move beyond "good enough" to really pushing the envelope, a profiler becomes essential so that you can try to shave cycles from all of your closely-tied top-ten or twenty hot spots. Even if you are just trying to hold to a moderate performance requirement durring a project, when you can get the right tools lined-up to help you measure and test, and even get them integrated into your automated test process it can be fantasticly helpful.
But when the power is out (so to speak) and the batteries are dead, it's nice know how to use that manual screwdriver.
So the direct answer: Know what you can learn from halting the program, but don't be afraid of precision tools either. Most importantly know which jobs call for which tools.
Hitting the pause button during the execution of a program in "debug" mode might not provide the right data to perform any performance optimizations. To put it bluntly, it is a crude form of profiling.
If you must avoid using a profiler, a better bet is to use a logger, and then apply a slowdown factor to "guesstimate" where the real problem is. Profilers however, are better tools for guesstimating.
The reason why hitting the pause button in debug mode, may not give a real picture of application behavior is because debuggers introduce additional executable code that can slowdown certain parts of the application. One can refer to Mike Stall's blog post on possible reasons for application slowdown in a debugging environment. The post sheds light on certain reasons like too many breakpoints,creation of exception objects, unoptimized code etc. The part about unoptimized code is important - the "debug" mode will result in a lot of optimizations (usually code in-lining and re-ordering) being thrown out of the window, to enable the debug host (the process running your code) and the IDE to synchronize code execution. Therefore, hitting pause repeatedly in "debug" mode might be a bad idea.
If we take the question "Why isn't it better known?" then the answer is going to be subjective. Presumably the reason why it is not better known is because profiling provides a long term solution rather than a current problem solution. It isn't effective for multi-threaded applications and isn't effective for applications like games which spend a significant portion of its time rendering.
Furthermore, in single threaded applications if you have a method that you expect to consume the most run time, and you want to reduce the run-time of all other methods then it is going to be harder to determine which secondary methods to focus your efforts upon first.
Your process for profiling is an acceptable method that can and does work, but profiling provides you with more information and has the benefit of showing you more detailed performance improvements and regressions.
If you have well instrumented code then you can examine more than just the how long a particular method; you can see all the methods.
With profiling:
You can then rerun your scenario after each change to determine the degree of performance improvement/regression.
You can profile the code on different hardware configurations to determine if your production hardware is going to be sufficient.
You can profile the code under load and stress testing scenarios to determine how the volume of information impacts performance
You can make it easier for junior developers to visualise the impacts of their changes to your code because they can re-profile the code in six months time while you're off at the beach or the pub, or both. Beach-pub, ftw.
Profiling is given more weight because enterprise code should always have some degree of profiling because of the benefits it gives to the organisation of an extended period of time. The more important the code the more profiling and testing you do.
Your approach is valid and is another item is the toolbox of the developer. It just gets outweighed by profiling.
Sampling profilers are only useful when
You are monitoring a runtime with a small number of threads. Preferably one.
The call stack depth of each thread is relatively small (to reduce the incredible overhead in collecting a sample).
You are only concerned about wall clock time and not other meters or resource bottlenecks.
You have not instrumented the code for management and monitoring purposes (hence the stack dump requests)
You mistakenly believe removing a stack frame is an effective performance improvement strategy whether the inherent costs (excluding callees) are practically zero or not
You can't be bothered to learn how to apply software performance engineering day-to-day in your job
....
Stack trace snapshots only allow you to see stroboscopic x-rays of your application. You may require more accumulated knowledge which a profiler may give you.
The trick is knowing your tools well and choose the best for the job at hand.
These must be some trivial examples that you are working with to get useful results with your method. I can't think of a project where profiling was useful (by whatever method) that would have gotten decent results with your "quick and effective" method. The time it takes to start and stop some applications already puts your assertion of "quick" in question.
Again, with non-trivial programs the method you advocate is useless.
EDIT:
Regarding "why isn't it better known"?
In my experience code reviews avoid poor quality code and algorithms, and profiling would find these as well. If you wish to continue with your method that is great - but I think for most of the professional community this is so far down on the list of things to try that it will never get positive reinforcement as a good use of time.
It appears to be quite inaccurate with small sample sets and to get large sample sets would take lots of time that would have been better spent with other useful activities.
What if the program is in production and being used at the same time by paying clients or colleagues. A profiler allows you to observe without interferring (as much, because of course it will have a little hit too as per the Heisenberg principle).
Profiling can also give you much richer and more detailed accurate reports. This will be quicker in the long run.
EDIT 2008/11/25: OK, Vineet's response has finally made me see what the issue is here. Better late than never.
Somehow the idea got loose in the land that performance problems are found by measuring performance. That is confusing means with ends. Somehow I avoided this by single-stepping entire programs long ago. I did not berate myself for slowing it down to human speed. I was trying to see if it was doing wrong or unnecessary things. That's how to make software fast - find and remove unnecessary operations.
Nobody has the patience for single-stepping these days, but the next best thing is to pick a number of cycles at random and ask what their reasons are. (That's what the call stack can often tell you.) If a good percentage of them don't have good reasons, you can do something about it.
It's harder these days, what with threading and asynchrony, but that's how I tune software - by finding unnecessary cycles. Not by seeing how fast it is - I do that at the end.
Here's why sampling the call stack cannot give a wrong answer, and why not many samples are needed.
During the interval of interest, when the program is taking more time than you would like, the call stack exists continuously, even when you're not sampling it.
If an instruction I is on the call stack for fraction P(I) of that time, removing it from the program, if you could, would save exactly that much. If this isn't obvious, give it a bit of thought.
If the instruction shows up on M = 2 or more samples, out of N, its P(I) is approximately M/N, and is definitely significant.
The only way you can fail to see the instruction is to magically time all your samples for when the instruction is not on the call stack. The simple fact that it is present for a fraction of the time is what exposes it to your probes.
So the process of performance tuning is a simple matter of picking off instructions (mostly function call instructions) that raise their heads by turning up on multiple samples of the call stack. Those are the tall trees in the forest.
Notice that we don't have to care about the call graph, or how long functions take, or how many times they are called, or recursion.
I'm against obfuscation, not against profilers. They give you lots of statistics, but most don't give P(I), and most users don't realize that that's what matters.
You can talk about forests and trees, but for any performance problem that you can fix by modifying code, you need to modify instructions, specifically instructions with high P(I). So you need to know where those are, preferably without playing Sherlock Holmes. Stack sampling tells you exactly where they are.
This technique is harder to employ in multi-thread, event-driven, or systems in production. That's where profilers, if they would report P(I), could really help.
Stepping through code is great for seeing the nitty-gritty details and troubleshooting algorithms. It's like looking at a tree really up close and following each vein of bark and branch individually.
Profiling lets you see the big picture, and quickly identify trouble points -- like taking a step backwards and looking at the whole forest and noticing the tallest trees. By sorting your function calls by length of execution time, you can quickly identify the areas that are the trouble points.
I used this method for Commodore 64 BASIC many years ago. It is surprising how well it works.
I've typically used it on real-time programs that were overrunning their timeslice. You can't manually stop and restart code that has to run 60 times every second.
I've also used it to track down the bottleneck in a compiler I had written. You wouldn't want to try to break such a program manually, because you really have no way of knowing if you are breaking at the spot where the bottlenck is, or just at the spot after the bottleneck when the OS is allowed back in to stop it. Also, what if the major bottleneck is something you can't do anything about, but you'd like to get rid of all the other largeish bottlenecks in the system? How to you prioritize which bottlenecks to attack first, when you don't have good data on where they all are, and what their relative impact each is?
The larger your program gets, the more useful a profiler will be. If you need to optimize a program which contains thousands of conditional branches, a profiler can be indispensible. Feed in your largest sample of test data, and when it's done import the profiling data into Excel. Then you check your assumptions about likely hot spots against the actual data. There are always surprises.

How Much Time Should be Allotted for Testing & Bug Fixing [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Every time I have to estimate time for a project (or review someone else's estimate), time is allotted for testing/bug fixing that will be done between the alpha and production releases. I know very well that estimating so far into the future regarding a problem-set of unknown size is not a good recipe for a successful estimate. However for a variety of reasons, a defined number of hours invariably gets assigned at the outset to this segment of work. And the farther off this initial estimate is from the real, final value, the more grief those involved with the debugging will have to take later on when they go "over" the estimate.
So my question is: what is the best strategy you have seen with regards to making estimates like this? A flat percentage of the overall dev estimate? Set number of hours (with the expectation that it will go up)? Something else?
Something else to consider: how would you answer this differently if the client is responsible for testing (as opposed to internal QA) and you have to assign an amount of time for responding to the bugs that they may or may not find (so you need to figure out time estimates for bug fixing but not for testing)
It really depends on a lot of factors. To mention but a few: the development methodology you are using, the amount of testing resource you have, the number of developers available at this stage in the project (many project managers will move people onto something new at the end).
As Rob Rolnick says 1:1 is a good rule of thumb- however in cases where a specification is bad the client may push for "bugs" which are actually badly specified features. I was recently involved in a project which used many releases but more time was spent on bug fixing than actual development due to the terrible specification.
Ensure a good specification/design and your testing/bug fixing time will be reduced because it will be easier for testers to see what and how to test and any clients will have less lee-way to push for extra features.
Maybe I just write buggy code, but I like having a 1:1 ratio between devs and tests. I don't wait until alpha to test, but rather do it throughout the whole project. The logic? Depending on your release schedule, there could be a good deal of time between when development starts and when your alpha, beta, and ship dates are. Furthermore, the earlier you catch bugs, the easier (and cheaper) they are to fix.
A good tester, who find bugs soon after each check-in, is invaluable. (Or, better yet, before a check-in from a PR or DPK) Simply put, I am still extremely familiar with my code, so most bug fixes become super simple. With this approach, I tend to leave roughly 15% of my dev time to bug fixing. At least when I do estimates. So in a 16 week run I'd leave around 2-3 weeks.
Only a good amount of accumulated statistics from previous projects can help you to give precise estimates. If you have a well defined set of requirements, you can make a rough calculation of how many use cases you have. As I said you need to have some statistics for your team. You need to know average bugs-per-loc number to estimate total bugs count. If you don't have such numbers for your team, you can use industry average numbers. After you have estimated LOC (number of use cases * NLOC) and average bugs-per-lines, you can give more or less accurate estimation on time required to release project.
From my practical experience, time spent on bug-fixing is equal to or more (in 99% cases :) ) than time spent on original implementation.
From the testing Bible:
Testing Computer Software
p. 31: "Testing [...] accounts for 45% of initial development of a product." A good rule of thumb is thus to allocate about half of your total effort to testing during initial development.
Use a language with Design-by-Contract or "Code-contracts" (preconditions, check assertions, post-conditions, class-invariants, etc) to get "testing" as close to your classes and class features (methods and properties) as possible. Then use TDD to test your code with its contracts.
Use as much self-built code-generation as you possibly can. Generated code is proven, predictable, easier to debug, and easier/faster to fix than all-hand-coded code. Why write what you can generate? However, do not use OPG (other-peoples-generators)! Code YOU generate is code you control and know.
You can expect to spend an inverting ratio over the course of your project--that is--you will write lots of hand-code and contracts in the start (1:1) of your project. As you see patterns, teach a code generator YOU WRITE to generate the code for you and reuse it. The more you generate, the less you design, write, debug, and test. By the end of the project, you will find that your equation has inverted: You're writing less of your core-code, and your focus shifts to your "leaf-code" (last-mile) or specialized (vs generalized and generated) code.
Finally--get a code analyzer. A good, automated code analysis rule system and engine will save you oodles of time finding "stupid-bugs" because there are well-known gotchas in how people write code in particular languages. In Eiffel, we now have Eiffel Inspector, where we not only use the 90+ rules coming with it, but are learning to write our own rules for our own discovered "gotchas". Such analyzers not only save you in terms of bugs, but enhance your design--even GREEN programmers "get it" rather quickly and stop making rookie mistakes earlier and learn faster!
The rule of thumb for rewriting existing systems is this: "If it took 10 years to write, it will take 10 years to re-write." In our case, using Eiffel, Design-by-Contract, Code Analysis, and Code Generation, we have re-written a 14 year system in 4 years and will fully deliver in 4 1/2. The new system is about 4x to 5x more complex than the old system, so this is saying a lot!

Resources