Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 4 years ago.
Improve this question
I want to manually analyze bug reports of three large software projects. Total bug reports of the three projects are 10,000, 12,000, and 8000. I need to examine bug reports, comments, and bug fixing files. Manually analyze all bug reports are a time-consuming and difficult task. For these reasons, I would like to take a sample of bug reports from each project. Would you please suggest me how many bugs reports from each project should I analyze to make a representative sample size.
It depends on the following two things:
Confidence level: It tells you how sure you can be. The 95% confidence level means you can be 95% certain; the 99% confidence level means you can be 99% certain. Most researchers use the 95% confidence level.
Confidence interval (margin of error): It is the plus-or-minus figure that is an acceptable deviation from the actual result. Most researchers use the 5% confidence interval.
Therefore, you can use a 95% confidence level and 5% confidence interval to generate your sample size.
For example,
The population size of project A=10,000
Confidence Level = 95%
Confidence Interval =5%
So, representative sample size=370 (That means you should analyze 370 bug reports for project A)
I usually use the sample size calculator to calculate sample size.
(https://www.surveysystem.com/sscalc.htm#one)
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed last year.
Improve this question
I am comparing a random forest (RF) with a feed-forward neural network (NN) to predict species richness. In both models, I used the same (60) predictors. The issue is that the r2 and root mean squared errors are very similar, but when I plot predicted vs test, the NN looks much much better. Is that so? or is it just a perception issue?
These are the results for the NN:
RMSE: 7 ±0.5
Relative RMSE: 0.26 ±0.04
r2: 0.36 ±0.1
Predicted vs test NN
And for RF:
RMSE: 7 ±1.1
Relative RMSE: 0.25 ±0.06
r2: 0.36 ±0.1
Predicted vs test RF
The results are an average for the 5 folds, and the plots show all the accumulated predictions of the 5 folds vs the true values. Both models were built in python (keras for NN and sklearn for RF).
To wrap it up, if I trust the numbers, both models perform the same, but NN has the best fit visually. Is there another validation metric that could tell which model performs better?
You can take a better look on the distribution of the error, rather than just the raw mean and std.
This all depends on your target - what is the cost of a mistake? if your model tends to be accurate but has a small percentage of being completely wrong, is that ok, or would you prefer a more stable model but with higher error margins?
For example, you can use this code to check your MSE distribution (disclaimer: I'm one of the maintainers of the used package):
from deepchecks.checks import RegressionErrorDistribution
RegressionErrorDistribution().run(test_data, model)
Another more advanced option is to use look at the error analysis, which shows you how predictible your error is, meaning in which segments of your data your model tends to be more wrong. This too can help you decide which model is better, depending on the segments of data which are more important to you:
from deepchecks.checks import ModelErrorAnalysis
ModelErrorAnalysis(min_error_model_score=0.3).run(train_data, test_data, model)
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
I'm working in a team that's been consistently and fairly successfully working in an agile approach, and this has been working great for the current project until now, for our initial work, as we incrementally build the product.
We're now moving into the next phase of this though, and the management are keen for us to set some specific deadlines ourselves, for when we'll be in a position to demo and sell this to real customers, on the order of months.
We have a fairly well organised large backlog for each of the elements of functionality we'd like to include, and a good sense of the prioritisation of these individual bits of functionality.
The naive solution is to get the minimum list of stories that would provide a demo-able product, estimate all of those individually, and add them up and combine with our velocity to get a date, and announce we'll be demoing from then. That leaves no leeway though, and seems likely to result in a mad crunch as we get up to deadline time, which I desperately want to avoid.
As an improvement, I'd like to add in some ratio of more optional stories to act as either contingency or bonus improvements, depending on how we progress, but we don't have any idea what ratio would be sensible, or whether this is the standard approach.
I'm also concerned by having to estimate the whole of our backlog all in one go up-front, as that seems very time consuming, and it seems likely that we'll discover more information in the months before we get to that story, which will affect our estimates.
Are there recommended approaches to dealing with setting deadlines to allow for an agile development process? Most of the information I've seen seems to be around handling the situation once you've got a fixed deadline to hit instead. I'd also be interested in any relevant literature or interesting blog posts that cover this issue.
Regarding literature: the best book I know regarding the estimation in software is "Software Estimation: Demystifying the Black Art" by Steve McConnel. It covers your case. Plus, it describes the difference between estimation and commitment (set-deadline, in other words) and explains how to derive the second from the first reliably.
The naive solution is to get the minimum list of stories that would
provide a demo-able product, estimate all of those individually, and
add them up and combine with our velocity to get a date, and announce
we'll be demoing from then. That leaves no leeway though, and seems
likely to result in a mad crunch as we get up to deadline time, which
I desperately want to avoid.
This is the solution I have used in the past. Your initial estimate is going to be off a bit so add some slack via a couple of additional sprints before setting your release date. If you get behind you can make it up in the slack. If not, your product backlog gives you additional features that you can include in the release if you so choose. This will be dependent on your velocity metric for the team though. Adjust your slack based on how accurate you feel this metric is for the current team. Once you have a target release you can circle back to see if you have any known resource constraints that might affect that release.
The approach you describe is likely to be correct. You may want to estimate for all desirable features, and prioritise UI elements (because investors and customers basically just see the shiny UI), and then your deadline will be that estimated date for completion; then add on some slack in the form of scaling your estimates. Use the ratio between current productivity and your worst period to create a pessimistic estimate. You can use that same ratio to scale shorter estimates (e.g. for your estimate to the minimum feature set).
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
What I'm looking for is very simple: I want a tool that computes the calculated, as opposed to estimated based on confidence intervals, shipping date given a list of tasks with total estimates and current progress each without introducing further uncertainty as I want to handle that externally.
I want it to take workdays duration and user input holidays into account, etc.
I know Fogbugz's Evidence Base Scheduling does something very close to that but I would like it without the statistical aspect and associated confidence intervals. I'm aware it's a drastic simplification and that statistical estimation is the essence of EBS but I'm not looking for a subjective discussion here, I just want to be able to access this simple information (the supposedly exact shipping date) at any given time during the project without having to calculate it myself.
So I'm looking for one of three things : 1) a way to customize Fogbugz (6.0) to show me the information I want besides confidence intervals 2) a way to customize Fogbugz to set estimates uncertainty to 0 3) another tool (free) that does what I want exactly.
EDIT: By "supposedly exact" or "calculated", I don't mean with respect to what is actually going to happen, that would indeed be trying to predict the future. I mean with respect to the information that was input, together with its obvious uncertainty. In that case, I guess estimates for individual tasks should be more seen as spending limits or upper bounds. The information I would like to be able to compute is really very simple : if everything goes exactly as specified, where does it take us ? Then, with information about how the estimates were made, such as the ability of each individual developper to make good estimates, I can derive the confidence interval. EBS does this automatically and, undoubtebly, very well which is why I use it. What I would like is to obtain is one more little piece of information, ie the same starting point EBS uses and try to play with my own asumptions as to how the statistical estimation should be made.
FogBugz will show you the sum of estimates at the bottom of the LIST page, labelled "Total estimated time remaining". This is the raw sum of estimates, without any EBS calculations.
You can't predict the future. So any calculated shipping date can only be a guess. That guess depends on the confidence intervals around each individual number that went into it. This is a matter of definition -- even though you may not like it.
You may want to have a "100% confident" date, but such a thing (by definition) cannot exist. You cannot have an uncertainty of zero unless you want a date infinitely far in the future. It's the nature of statistics: the distribution is actually infinite, but data is considerably more likely to cluster around the mean.
The only thing you can do is pick a really big confidence interval (99.7%). You are free to ignore the supporting statistical facts about the confidence interval and pretend it has zero uncertainty. For all practical purposes 0.3% uncertainty is small enough that you're not going to be unhappy with that date.
However, all statistically-based predictions of the future must have uncertainty. It's a law.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Is it better to describe improvements using percentages or just the differences in the numbers? For example if you improved the performance of a critical ETL SQL Query from 4000 msecs to 312 msecs how would you present it as an 'Accomplishment' on a performance review?
In currency. Money is the most effective medium for communicating value, which is what you're trying to use the performance review to demonstrate.
Person hours saved, (very roughly) estimated value of $NEW_THING_THE_COMPANY_CAN_DO_AS_RESULT, future hardware upgrades averted, etc.
You get the nice bonus that you show that you're sensitive to the company's financial position; a geek who can align himself with what the company is really about.
Take potato
Drench Potato in Lighter Fluid
Light potato on fire
Hand potato to boss
Make boss hold it for 4 seconds.
Ask boss how long those 4 seconds felt
Ask boss how much better half a second would have been
Bask in glory
It is always better to measure relative improvement.
So, if you brought it down to 312ms from 4000ms then it is an improvement of 3688ms, which is 92.2% of the original speed. So, you reduced the runtime by 92.2%. In other words, you brought the runtime down to only 7.8% of what it was originally.
Absolute numbers, on the other hand, usually are not that good since they are not comparable. (If your original runtime was 4,000,000ms then an improvement of 3688ms isn't that great.)
See this link for some nice chart suggestions.
Comparison to Requirements
If I have requirements (response time, throughput), I like to color code the absolute numbers like so:
Green: <= 80% of the requirement (response time); >= 120% of > the requirement (throughput)
No formatting: Meets the requirement.
Red: Does not meet the requirement.
Comparisons are interesting, but only if we have enough to see trends over time; Is our performance steadily improving or degrading? Ultimately, the business only cares if we're meeting the requirement. It's only when we don't that they ask for comparisons to previous releases.
Comparison of Benchmarks
If I'm comparing benchmarks to some baseline, then I like to use percentages, but only if the benchmark is a statistically significant change from the baseline.
Hardware Sizing
If I'm doing hardware sizing or capacity planning, then I like to express the performance as the absolute number plus the cost per transaction. For example:
System A: 1,000 transactions/second, $0.02/transaction
System B: 1,500 transactions/second, $0.04/transaction
Use whichever appears most impressive given the change. According to one method of calculation, that change sped up the query by 1,300%, which looks more impressive than 13x improvement, or
============= <-- old query
= <-- new query
Although the graph isn't a bad method.
If you can calculate the improvement in money, then go for that. One piece of software I wrote many years ago saved a few engineers a little bit of time each day. Figuring out the cost of salary, benefits, overhead and it turned into a savings of more than $12k per year for a small company.
-Adam
Rule of the thumb: Whichever sounds more impressive.
If you went from 10 tasks done in a period to 12, you could say you improved the performance by 20%
Saying you did two tasks more doesnt seem that impressive.
In your case, both numbers sound good, but try different representations and see what you get!
Sometimes graphics help a lot of the improvement is there on a number of factors, but the combined somehow does not look that cool
Example: You have 5 params A, B, C, D, E. You could make a bar chart with those 5 params and "before and after" values side by side for each param. That sure will look impressive.
God im starting to sound like my friend from marketing!
runs away screaming
you can make numbers and graphs say anything you want - the important thing is to make them say something meaningful and relevant to the audience you're presenting them to. if it's end users you can show them differences in the screen refreshes (something they understand), to managers perhaps the reduced number of servers they'll need in order to support the application ($ savings), financial...it's all about the $ how much did it save them. a general rule is the less technical the group the more graphical and dramatic you need to be.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
Some people have suggested that when doing an estimate one should make a lower and upper range on the expected time to delivery. The few project tools I have seen, seem to demand one fixed date. Are there any tools that support this concept of a estimation range?
Joel touts Evidence-Based Scheduling in their FogBugz 6.0 software.
There's also the classic method of providing a best, worst and expected case estimate for each item and then computing a result
computed_result = (b + 4e + w)/6
You can use that to demonstrate how you derived your estimates.
HOWEVER, if you provide a range of time; all the client/sponsor/stakeholder is going to see is the lowest value. No mater what you say. So keep the range secret, and advertise the computed result.
I've used Merlin2 which is a project management product for the Mac. When you are starting a new project it asks you the start date and end date - which look fixed, but when you look at the project plan inspector you see that there is actually an "Earliest Date" and "Latest Date" for both the Start and End dates which can be edited. By default it adds the start date into "Earliest Start Date" and the end date to "Latest End Date" - and you can tweak as necessary.
"Some people have suggested that when doing an estimate one should make a lower and upper range on the expected time to delivery."
But what do your project stakeholders want? Will a range help them decide to fund your project?
Ranges don't really mean very much. Further, most people ignore the range and either see the low or the high number. Optimists have "happy-eyes", see the low number, and complain when you don't hit it, even if you're under the high number. Pessimists see the high number, say it's too big, and demand you replan the project to make the number smaller.
How -- precisely -- will a range help you? Who needs the range? What information will the range help them with? What decision do they have to make that requires a range?
I suggest that you plan each piece realistically.
Further, prioritize your project. After prioritizing, you'll see that there's some essential stuff, some important-but-not-essential stuff, and some optional stuff. This is your range. Cost to do the essential stuff is low. Cost to do important-but-not-essential is in the middle. Cost to do the optional stuff is high.
When someone asks you to "replan", you trim optional stuff.
It isn't a simplistic range. It's a realistic view of what you'll get done and what value it has.