Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I am reading through "Mythical Man-Month" and near the end in the updates for the 20th anniversary edition it talks a bit about Boehm's model and the optimal time to delivery based upon the exected effort in man-months of a project.
His statement, when discussing Boehm's model, is:
His results solidly confirm The MM-M’s
assertion that the trade-off between
men and months is far from linear,
that the man-month is indeed mythical
as a measure of productivity. In
particular, he finds:[16]
• There is a
cost-optimum schedule time to first
shipment, T = 2.5MM1/3. That is,
the optimum time in months goes as the
cube root of the expected effort in
man-months, a figure derived from the
size estimate and other factors in his
model. An optimum staffing curve is a
corollary.
• The cost curve rises slowly as the planned schedule gets
longer than the optimum. People with
more time take more time.
• The cost
curve rises sharply as the planned
schedule gets shorter than the
optimum.
• Hardly any projects succeed
in less than 3/4 of the calculated
optimum schedule, regardless of the
number of people applied! This
quotable result gives the software
manager solid ammunition when higher
management is demanding impossible
schedule commitments
I am having a bit of difficulty in applying this statement practically, I wonder if anyone has any insight as to how this would inform software estimates? I am particually trying to interpret the estimate formula, as plotted here:
cost-optimum schedule time plot
This seems to indicate that, for a project with 1 man-month of work, there is an cost-optimal time-to-delivery of 2.5 months. This makes sense, however, if you then assume that there is a project with 5 man-months of work, the plot suggests that the cost-optimal time-to-delivers is 4 months!
Does this suggest that more man-power should be allocate to deliver within this time frame, or that the estimates are too large?
Also, how can you estimate optimum staffing levels from this model?
Thanks
Essentially, all models are wrong, but some are useful. -- George E. P. Box
I don't have any references handy, but I think this model is derived from data from large waterfall-style projects. For small projects like 1 or 5 man-months, the model may not be applicable that well. Models give you wrong results if you try to extrapolate them too far from their validity range.
Though it's also true that especially in a small project, it's not always possible to do work that contributes to deliverables. For example, when waiting on external dependencies required in order to proceed.
I have used models like these to sanity check project offers within the same size range and with similar process characteristics. Not mechanistically, but as indicators to see if there are areas in the plan/offer that require closer attention.
Also, how can you estimate optimum staffing levels from this model?
If you have and optimum duration of T months and effort of MM persons*months, you allocate the staff to complete MM work in T time. Your average staffing level is MM/T persons.
Of course, in practice having a steady MM/T staffing level is not optimal. Start with a small team to get the high-level architectural issues settled and then grow the team only after there's something useful for new persons to do.
As with any model there is no need to take it on blind faith, especially when the model is really easy to test:
Effort in MM Opt. Dur. Avg. Team Size
1 2.5 0.4
2 3.1 0.6
3 3.6 0.8
4 4.0 1.0
5 4.3 1.2
6 4.5 1.3
7 4.8 1.5
8 5.0 1.6
9 5.2 1.7
10 5.4 1.9
20 6.8 2.9
30 7.8 3.9
40 8.5 4.7
50 9.2 5.4
60 9.8 6.1
70 10.3 6.8
80 10.8 7.4
90 11.2 8.0
100 11.6 8.6
200 14.6 13.7
300 16.7 17.9
400 18.4 21.7
500 19.8 25.2
600 21.1 28.5
700 22.2 31.5
800 23.2 34.5
900 24.1 37.3
1000 25.0 40.0
As far as I can see for software development projects of up to 10 man-months that currently prevail within business environments (in-house projects run within non-software companies) the optimal figures produced by the model do not reflect typical durations and team sizes.
The figures for projects over 20 man-months become much more believable, especially where efforts are tightly coupled.
As a result, I'd avoid using the formula for anything but a quick order of magnitude guestimate for projects of over 20 man-months in duration. For anything less than that a quick planning session would give you a more accurate and trustworthy result.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed last year.
Improve this question
I am comparing a random forest (RF) with a feed-forward neural network (NN) to predict species richness. In both models, I used the same (60) predictors. The issue is that the r2 and root mean squared errors are very similar, but when I plot predicted vs test, the NN looks much much better. Is that so? or is it just a perception issue?
These are the results for the NN:
RMSE: 7 ±0.5
Relative RMSE: 0.26 ±0.04
r2: 0.36 ±0.1
Predicted vs test NN
And for RF:
RMSE: 7 ±1.1
Relative RMSE: 0.25 ±0.06
r2: 0.36 ±0.1
Predicted vs test RF
The results are an average for the 5 folds, and the plots show all the accumulated predictions of the 5 folds vs the true values. Both models were built in python (keras for NN and sklearn for RF).
To wrap it up, if I trust the numbers, both models perform the same, but NN has the best fit visually. Is there another validation metric that could tell which model performs better?
You can take a better look on the distribution of the error, rather than just the raw mean and std.
This all depends on your target - what is the cost of a mistake? if your model tends to be accurate but has a small percentage of being completely wrong, is that ok, or would you prefer a more stable model but with higher error margins?
For example, you can use this code to check your MSE distribution (disclaimer: I'm one of the maintainers of the used package):
from deepchecks.checks import RegressionErrorDistribution
RegressionErrorDistribution().run(test_data, model)
Another more advanced option is to use look at the error analysis, which shows you how predictible your error is, meaning in which segments of your data your model tends to be more wrong. This too can help you decide which model is better, depending on the segments of data which are more important to you:
from deepchecks.checks import ModelErrorAnalysis
ModelErrorAnalysis(min_error_model_score=0.3).run(train_data, test_data, model)
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
I have been using agile approaches (XP and Scrum) for my projects for several years with great results. But in all cases, all members of the dev team were committed 100% to the project.
Now I am faced with doing this when the team is not stable. For instance, one iteration there may be four people working, the next maybe only two or three.
I realize this makes it hard (or impossible) to estimate using the normal velocity approach since it will fluctuate to much and not be stable. What follows is that one cannot really expect to be able to release at the end of each iteration.
Maybe another approach is needed here. Just grab stuff from the backlog and just muddle through and release whenever it is possible. I really don't like that though...
Any thoughts?
From the question I assume you have some developers (probably, 2) 100% commited to the project and some (another 2-3) only participate at a times.
One thing you can do is set different process for core developers who are 100% commited and everyone else. Use you normal agile process for core people and release their work at normal iteration cycle. For non-core people, do little planning and assume their (and your) estimates would be way of at a times. Ideally their changes should be isolated and merged into stable branch of code by core members, but not every project's architecture and team roles allow this.
The point is to separate and isolate source of chaos and leave the heart of a project and team unaffected.
Maybe instead of agile approaches, you can slow things down with other iterative and incremental approaches. Instead of having iterations measured in weeks, having longer iterations (perhaps measured in months) would be better if you keep adding and dropping people from the team.
This doesn't mean that you still can't use some Agile techniques. I would still maintain your Backlogs and burn down charts, with the realization that instead of having a release every 2 weeks, you'll release every 6 weeks (~2 months). If you have new developers joining more experienced developers, use pair programming, assign the new developers to bug fixes, or assign the new developers to maintaining unit tests to help them learn the code base.
Velocity is only an estimation.
Naively, if you have a given velocity v with a team of 4 developers, then schedule your iteration with a velocity of (v/4)*number_of_developers
You can fudge this value if the members you are losing are particularly stronger or weaker than the average.
This is basically what PivotalTracker does with its team strength metric.
So, you have a project with a continually changing team size and your boss wants you to give him an accurate estimate of how long it will take? You can do this, as long as you keep in mind the difference between accurate and precise. Your precision will depend largely on the number of items and how granular (decomposed) each item is; the more items you have the more the Law of Large Numbers works for you, averaging out over- and underestimates.
Your accuracy is a function of confidence. Note that estimates aren't single-point values, they're a range with numbers with a percentage of confidence. For instance, a proper estimate wouldn't be "2 weeks" it would be "50% confidence of 2 weeks, 80% confidence of 4 weeks."
If I were the person assigned with the unenviable task of providing an estimate to completion for a project being managed as arbitrarily as in the original post, I'd try to figure out a range based upon the minimum number of folks assigned (e.g., "48 to 66 weeks given 2 developers [50% to 80% confident]"), and a range associated with the average number of folks assigned (e.g., "25 to 45 weeks with 5 developers [50% to 80% confident]"), and use the low figure from the average number along with the high figure from the minimum number (e.g., "25 to 66 weeks given anywhere from 2 to 5 developers [50% to 80% confident]"), and even then I'd put a disclaimer on it ("plus 10% for the lost time due to context switching").
Better yet, I'd explain exactly why this arrangement was, to be polite, sub-optimal, and why multi-tasking is a primary signpost on the road to project Hell.
As someone else suggested, changing the workflow from iteration-based to flow-based (Kanban) might well be a good strategy. With Kanban you handle changing project priorities by changing the priority of items in the backlog; once an item has been grabbed by the team it is generally finished (flows all of the way through the workflow, stakeholders aren't allowed to disrupt the team by screwing around with work-in-progress). I've used Kanban for sustained engineering projects and it worked very well. Re how it would help with estimates, the key to continuous flow is to try to have each work item be roughly the same size (1x, 2x, 3x, not 10x, 20x, 100x). You should track movement of items through the workflow by tracking dates of process state changes, e.g., Queue 1/15, Design 1/22, Dev 1/24, Test 2/4, Integrate 2/7, etc., and then generating a cumulative flow diagram regularly to evaluate the time-in-state durations over time. Working out how long the project should take given that you know the size of each item and the time through the workflow for items is a trivial computational exercise left to the reader. (The more interesting question is how to spot constraints... and then how to remove them. Hint: look for long times in states, because work piles up in front of constraints.)
Let the individual developer that will be working on the story estimate the effort required to complete the story. You can take into account historical variances in that developer's estimations, but the idea is that you can take their estimates and then figure out how many stories you'll be able to finish in that sprint.
Don't forget that average velocity is largely used for lookahead release planning; the team is responsible for selecting in each iteration how many backlog items to take on (although knowing historical velocity can assist them).
If your team size (and hence velocity) is fluctuating from iteration to iteration, you can still do useful release planning by using average velocities over the past N sprints, assuming the team fluctuations will continue and hence their long-term average velocity will actually be stable.
Your main problem here is that the team will find it hard to give predictable estimates and deliveries since the team is changing from sprint to sprint. This can also hurt the team commitment and continuous improvement.
This case might actually be well suited for a Kanban approach. Check out Henrik Knibergs introduction to Kanban for a quick overview.
Good luck!
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Is it better to describe improvements using percentages or just the differences in the numbers? For example if you improved the performance of a critical ETL SQL Query from 4000 msecs to 312 msecs how would you present it as an 'Accomplishment' on a performance review?
In currency. Money is the most effective medium for communicating value, which is what you're trying to use the performance review to demonstrate.
Person hours saved, (very roughly) estimated value of $NEW_THING_THE_COMPANY_CAN_DO_AS_RESULT, future hardware upgrades averted, etc.
You get the nice bonus that you show that you're sensitive to the company's financial position; a geek who can align himself with what the company is really about.
Take potato
Drench Potato in Lighter Fluid
Light potato on fire
Hand potato to boss
Make boss hold it for 4 seconds.
Ask boss how long those 4 seconds felt
Ask boss how much better half a second would have been
Bask in glory
It is always better to measure relative improvement.
So, if you brought it down to 312ms from 4000ms then it is an improvement of 3688ms, which is 92.2% of the original speed. So, you reduced the runtime by 92.2%. In other words, you brought the runtime down to only 7.8% of what it was originally.
Absolute numbers, on the other hand, usually are not that good since they are not comparable. (If your original runtime was 4,000,000ms then an improvement of 3688ms isn't that great.)
See this link for some nice chart suggestions.
Comparison to Requirements
If I have requirements (response time, throughput), I like to color code the absolute numbers like so:
Green: <= 80% of the requirement (response time); >= 120% of > the requirement (throughput)
No formatting: Meets the requirement.
Red: Does not meet the requirement.
Comparisons are interesting, but only if we have enough to see trends over time; Is our performance steadily improving or degrading? Ultimately, the business only cares if we're meeting the requirement. It's only when we don't that they ask for comparisons to previous releases.
Comparison of Benchmarks
If I'm comparing benchmarks to some baseline, then I like to use percentages, but only if the benchmark is a statistically significant change from the baseline.
Hardware Sizing
If I'm doing hardware sizing or capacity planning, then I like to express the performance as the absolute number plus the cost per transaction. For example:
System A: 1,000 transactions/second, $0.02/transaction
System B: 1,500 transactions/second, $0.04/transaction
Use whichever appears most impressive given the change. According to one method of calculation, that change sped up the query by 1,300%, which looks more impressive than 13x improvement, or
============= <-- old query
= <-- new query
Although the graph isn't a bad method.
If you can calculate the improvement in money, then go for that. One piece of software I wrote many years ago saved a few engineers a little bit of time each day. Figuring out the cost of salary, benefits, overhead and it turned into a savings of more than $12k per year for a small company.
-Adam
Rule of the thumb: Whichever sounds more impressive.
If you went from 10 tasks done in a period to 12, you could say you improved the performance by 20%
Saying you did two tasks more doesnt seem that impressive.
In your case, both numbers sound good, but try different representations and see what you get!
Sometimes graphics help a lot of the improvement is there on a number of factors, but the combined somehow does not look that cool
Example: You have 5 params A, B, C, D, E. You could make a bar chart with those 5 params and "before and after" values side by side for each param. That sure will look impressive.
God im starting to sound like my friend from marketing!
runs away screaming
you can make numbers and graphs say anything you want - the important thing is to make them say something meaningful and relevant to the audience you're presenting them to. if it's end users you can show them differences in the screen refreshes (something they understand), to managers perhaps the reduced number of servers they'll need in order to support the application ($ savings), financial...it's all about the $ how much did it save them. a general rule is the less technical the group the more graphical and dramatic you need to be.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
During our iteration planning, we frequently find ourselves in the same position as this guy - How to estimate a programming task if you have no experience in it
I definitely agree with prototyping before you can give a reasonable estimate. But the same applies to anything that needs a bit of architecture and design - but I'm not that comfortable doing all this outwith the scope of a sprint.
The basic idea is that you identify as many tasks as you can that you're confident of, and estimate these as normal. For those areas that you're unsure of then there should be two 'types' of task identified: Investigation & Implementation.
Investigation tasks are brief descriptions of work that you're just unsure of, for example "Investigate how to bind Control X to data". An estimate is provided for these.
The Implementation task is a traditional rough guess, probably based on the story points assigned, of how long you think it would take to implement the feature.
During the sprint, when the investigation tasks have been completed, the developer should then be at a stage where they have a much better idea what is going on. 'Proper' Tasks can then be identified, which take the place of the Implementation placeholder. In addition, further Investigation tasks may be identified at this stage, and the cycle continues.
In the above example, we start with an Investigation task at 7 hours and an Implementation task estimated at 14. Once the first Investigation has been completed, Tasks 1, 2 and 3 will be identified and estimated with some degree of certainty, where Task 3 is another Investigation task from which Task 4 and 5 will be identified at a later stage. As you can see, the first Implementation estimate had delivery of the feature within 14 hours - but the reality is it took at least 4 + 7 + 3 + 4 + 2 = 20. A third more than the initial estimate.
alt text http://www.duncangunn.me.uk/myweb/images/estimate.png
All thoughts are welcome - my gut instinct is this will fly - am I right or am I the Wrong Brothers?
Cheers!
What we do.
Some features involve new technology. We can't accurately estimate them. Period.
We make up a number. Based on a couple of things. How hard does it "feel"? Can we get by with some kind of "partial" or "just-enough" implementation?
If it's hard, then it's hard. It will be expensive.
If there's a lot of parts, with a kernel of goodness and some bonus stuff layered on, we have a possibility of putting just the kernel into a release, and setting other stuff aside for later. A very few things are "all or nothing" where a partial release isn't possible. In that case, we have to provide enough time for "all", and that gets expensive.
Our standard approach is to get stuff that works, and possibly defer things to a later sprint if we ran of out time because of unexpected complexities.
What you're calling "investigation", we call technical spike sprints. For stuff that's new, we make up estimate number to placate managers who feel it necessary to overplan things. Then we spike the technology. Once it's spiked, we can revise the estimates based on what we now know.
Actually, the implementation of the feature took 27 hours - you forgot the first investigation of 7 hours, so in reality the actual implementation took almost twice as long as the estimate.
There are two ways you can go on this:
Just make the estimate as best you can and potentially experience a blowout in your sprint and a declined project velocity (you should only do this if the feature is both urgent and critical); or
Schedule the investigation for this sprint and leave the implementation for another sprint - without an idea of how long the task will take, the Product Owner does not have enough information to make a decision about in which sprint to schedule it or even whether to do it at all. Only tasks that have been estimated should be included in your sprint.
The first choice means your sprint and project estimates are somewhat arbitrary. The second choice gives much more predictability to your sprints.
In your example, the initial investigation may be scheduled for Sprint 1 but without knowledge of how long the task will take the Product Owner can't decide how to schedule it. If you came back with an estimate of 200 hours the Product Owner may decide not to do that feature at all, or to delay it until Release 2 of the product. The estimate comes in and the Product Owner schedules Task 1, Task 2 and the investigation of Task 3 for Sprint 2. After estimating Task 3, Tasks 4 and 5 can be scheduled in Sprint 3 or later.
Estimating feature usually is complex task. After some time your estimation will become better. But good approach can be that you estimate features with the story points. Story point is abstract value (meaning agreed among the team) that express complexity of the problem.
You should assign the same complexity (same number of story points) to the features of the similar complexity. Then later on it is enough to estimate only smaller set of features (or looking at the historical data) and you should be able to estimate how much time you need.
Features with the similar complexity need similar time effort for implementation.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Observing one year of estimations during a project I found out some strange things that make me wonder if evidence based scheduling would work right here?
individual programmers seem to have favorite numbers (e.g. 2,4,8,16,30 hours)
the big tasks seem to be underestimated by a fix value (about 2) but the standard deviation is low here
the small tasks (1 or 2 hours) are absolutely wide distributed. In average they have the same average underestimation factor of 2, but the standard deviation is high:
some 5 minute spelling issues are estimated with 1 hour
other bugfixes are estimated with 1 hour too, but take a day
So, is it really a good idea to let the programmers break down the 30 hours task down to 4 or 2 hours steps during estimations? Won't this raise the standard deviation? (Ok, let them break it down - but perhaps after the estimations?!)
Yes, your observations are exatly the sort of problems EBS is designed to solve.
Yes, it's important to break bigger tasks down. Shoot for 1-2 day tasks, more or less.
If you have things estimated at under 2 hrs, see if it makes sense to group them. (It might not -- that's ok!)
If you have tasks that are estimated at 3+ days, see if there might be a way to break them up into pieces. There should be. If the estimator says there is not, make them defend that assertion. If it turns out that the task really just takes 3 days, fine, but the more of these you have, the more you should be looking hard in the mirror and seeing if folks aren't gaming the system.
Count 4 & 5 day estimates as 2x and 4x as bad as 3 day ones. Anyone who says something is going to take longer than 5 days and it can't be broken down, tell them you want them to spend 4 hrs thinking about the problem, and how it can be broken down. Remember, that's a task, btw.
As you and your team practice this, you will get better at estimating.
...You will also start to recognize patterns of failure, and solutions will present themselves.
The point of Evidence based scheduling is to use Evidence as the basis for your schedule, not a collection of wild-assed guesses. It's A Good Thing...!
I think it is a good idea. When people break tasks down - they figure out the specifics of the task, You may get small deviations here and there, this way or the other, they may compensate or not...but you get a feeling of what is happening.
If you have a huge task of 30 hours - can take all 100. This is the worst that could happen.
Manage the risk - split down. You already figured out these small deviation - you know what to do with them.
So make sure developers also know what they do and say :)
"So, is it really a good idea to let the programmers break down the 30 hours task down to 4 or 2 hours steps during estimations? Won't this raise the standard deviation? (Ok, let them break it down - but perhaps after the estimations?!)"
I certainly don't get this question at all.
What it sounds like you're saying (you may not be saying this, but it sure sounds like it)
The programmers can't estimate at all -- the numbers are always rounded to "magic" values and off by 2x.
I can't trust them to both define the work and estimate the time it takes to do the work.
Only I know the correct estimate for the time required to do the task. It's not a round 1/2 day multiple. It's an exact number of minutes.
Here's my follow-up questions:
What are you saying? What can't you do? What problem are you having? Why do you think the programmers estimate badly? Why can't they be trusted to estimate?
From your statements, nothing is broken. You're able to plan and execute to that plan. I'd say you were totally successful and doing a great job at it.
Ok, I have the answer. Yes it is right AND the observations I made (see question) are absolutely understandable. To be sure I made a small Excel simulation to ensure myself of what I was guessing.
If you add multiple small task with a high standard deviation to bigger tasks, they will have a lower deviation, because the small task partially compensate the uncertainty.
So the answer is: Yes, it will work, if you break down your tasks, so that they are about the same length. It's because the simulation will do the compensation for bigger tasks automatically. I do not need to worry about a higher standard deviation in the smaller tasks.
But I am sure you must not mix up low estimated tasks with high estimated tasks - because they simply do not have the same variance.
Hence, it's always better to break them down. :)
The Excel simulation I made:
create 50 rows with these columns:
first - a fixed value 2 (the very homogeneous estimation)
20 columns with some random function (e.g. "=rand()*rand()*20")
make sums fore each column
add "=VARIANCE(..)" for each random column
and add a variance calculation for the sums
The variance for each column in my simulation was about 2-3 and the variance of the sums below 1.