How far should you go with estimating your tasks? [closed] - project-management

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
What is the good resolution of time to use for estimating duration of your tasks?
is it like 0.5, 1, 2, 5 days or should you go down to hours like 0.5, 1, 2, 4 hours and then continue up to days?
should a change to a label text be task at all? ( ETA < 1min )
Suggestions?

Scrum uses values of 0, 0.5, 1, 2, 3, 5, 8, 13, 20, 40 and 100. The time of the values is in hours, when you plan in finer detail (breaking down feature requests into technical tickets) and days, when you plan the bigger picture (big-picture features).
In general, if you estimate a ticket over 20 hours (or 20 days), your tasks should be split up into smaller pieces.

Well, it really depends. Personally, I like tasks to be smaller (should be estimated in hours, usually using the something close to Fibonacci series: 0.5, 1, 2, 3, 5, 8, ...).
About small tasks, usually they should be estimated too. Even minor changes require some work like creating tests, seeing if none of the other ones broke, sending to server, etc. You could create a 15 min in the series for stuff like these :)

It's really hard to predict the future.
The units (minutes, hours, days, weeks, fortnights) don't matter.
Pick a unit that makes your manager happy.
Just be clear that an estimate of 30 minutes, .5 hour or .0625 days is only a guess, not a fact.
An estimate of 0.0625 days or 30 minutes looks really precise because it has a lot of decimal places. However, any ambiguity about the requirements, the architecture, the language, the libraries, the unit tests, or anything else will make this number incorrect.
The very best you can hope for is that the average of all your estimates is reasonably close to the actual facts as they unfold. This means that half your estimates will be too low and half will be too high. It also means that some fraction of your estimates will be really, really far from your manager's hoped-for accuracy.

Planning and estimating time required is never the goal of your project, so those units must serve some purpose.
The good rule to use is this: split the task into smaller chunks, until you know exactly what you should do next (and it is not planning). This "knowing exactly what to do" thing is a little subjective, but tasks longer than 2 days rarely fit into this category.

I guess it depends on how accurate you want to be.
I personally use minutes as "days" or "months" can be misleading time periods. For example if you say something will take 1 day - does that mean 24 hours solid work? Or 8 hours? Or the average 3-4 hours of productivity within a working day?
All tasks should be listed, but if they are small you can often group them. But remember that just because it's only changing a label text there is more time involved that just changing the label, you have to find it, open the file, make the change, commit the changes, update any documentation, test it etc - so tasks very rarely only take less than 1 minute.
Also try to put an upper limit on the task time. If your adding tasks that are taking over 3-4 hours, break them down into smaller sub-tasks and list those. This will give you a much more accurate estimate.

I have found myself sorting tasks into these categories:
Half a day
Whole day
Whole week
Several weeks
In my work environment where we do a lot of second-level support, it doesn't make sense to have a more granular system. It is also good to have some slack in the planning, so that you can take an hour to make small improvements to your work environment.

I tend to use units of hours should they be appropriate (1h, 4.5h, 6h, etc). Once days creep into the equation then I tend to not use hours and keep days as the only unit being used (3d, 7d, 10d). Overestimating slightly gives room to accommodate those undesired but expected complications you will hit along the way.

There's a difference between est. work effort - how long something should take to complete - and est. when something will be done (aka duration) - when that something is expected to get completed based on other tasks.
You can never predict the future and any est. is just that - an est. - the likelihood of being able to predict 1 week out in 1/2 hour increments isn't that great, what if the first 6 tasks take 5 min. more to complete, you're 1/2 off (after 3 hours of work).
I would suggest creating a work breakdown structure (WBS) to determine effort and then group the tasks into no less then day groupings and no more then week groupings (depending on the overall duration of the project - you never want to have an increment representing more then 2-3% of overall work effort). This will allow programs to switch between tasks (normal working conditions) without the pressure of having to meet a specific 1/2 hour delivery.

Related

Algorithm development and optimization

I have this problem:
You need to develop a meal regime based on the data entered by the user. We have a database with the meals and their prices (meals also have a mark whether they are breakfast, cause on breakfast we eat smth different from lunch and dinner most often). The input receives the amount of money (Currency is not important) and the number of days. At the output, we must get a meal regime for a given number of days. Conditions:
Final price does not differ from the given one by more than 3%.
meals mustn't repeat more than once every 5 days.
I found this not effective solution: We are looking for an average price per day = amount of money / number of days. Then, until we reach the given number of days, we iterate throught each breakfast, then lunch and dinner (3 for loops, 2 are nested) and if price is not too different, then we end the search and add this day to the result list. So the design now looks like this:
while(daysCounter < days){
for(){
for(){
for(){
}
}
It looks scary, although there is not a lot of data (number of meals is about 150). There are thoughts that it is possible to find a more effective solution. Also i think about dynamic programming, but so far there are no ideas how to implement it.
Dynamic programming won't work because a necessary part of your state is the meals from the last 5 days. And the number of possibilities for that are astronomical.
However it is likely that there are many solutions, not just a few. And that it is easy to find a working solution by being greedy. Also that an existing solution can be improved fairly easily.
So I'd solve it like this. Replace days with an array of meals. Take the total you want to spend. Divide it up among your meals, in proportion to the median price of the options for that meal. (Breakfast is generally cheaper than dinner.) And now add up that per meal cost to get running totals.
And now for each meal choose the meal you have not had in the last 5 days that brings the running total of what has been spent as close as possible to the ideal total. Choose all of your meals, one at a time.
This is the greedy approach. Normally, but not always, it will come fairly close to the target.
And now, to a maximum of n tries or until you have the target within 3%, pick a random meal, consider all options that are not eaten within the last or next 5 days, and randomly pick one (if any such options exist) that brings the overall amount spent closer to the target.
(For a meal plan that is likely to vary more over a long period, I'd suggest trying simulated annealing. That will produce more interesting results, but it will also take longer.)

How to Schedule Tasks

There is a carwash that can only service 1 customer at a time. The goal of the car wash is to have as many happy customers as possible by making them wait the least amount of time in the queue. If the customers can be serviced under 15 minutes they are joyful, under an hour they are happy, between 1 hour to 3 hours neutral and 3 hours to 8 hours angry. (The goal is to minimize angry people and maximize happy people). The only caveat to this problem is that each car takes a different amount of time to wash and service so we cannot always serve on first come first serve basis given the goal we have to maximize customer utility. So it may look like this:
Customer Line :
Customer1) Task:6 Minutes (1st arrival)
Customer2) Task:3 Minutes (2nd arrival)
Customer3) Task:9 Minutes (3rd)
Customer4) Task:4 Minutes (4th)
Service Line:
Serve Customer 2, Serve Customer 1, Serve Customer 4, Serve Customer 3.
In this way, no one waited in line for more than 15 minutes before being served. I know I should use some form of priority queue to implement this but I honestly know how should I give priority to different customers. I cannot give priority to customers with the least amount of work needed since they may be the last customer to have arrived for example (others would be very angry) and I cannot just compare based on time since the first guy might have a task that takes the whole day.So how would I compare these customers to each other with the goal of maximizing happiness?
Cheers
This is similar to ordering calls to a call center. It becomes more interesting when you have gold and silver customers. Anyway:
Every car has readyTime (the earliest time it can be washed, so it's arrival time - which might be different for each car) and a dueTime (the moment the customer becomes angry, so 3 hours after readyTime).
Then use a constraint solver (like OptaPlanner) to determine the order of the cars (*). The order of the cars (which is a genuine planning variable) implies the startWashingTime of each car (which is a shadow variable), because in your example, if customer 1 is ordered after customer 2 and if we start at 08:00, we can deduce that customer 1's startWashingTime is 08:03.
Then the endWashingTime is startWashingTime + washingDuration.
Then it's just a matter of adding 2 constraints and let the solver solve() it:
endWashingTime must be lower than dueTime, this is a hard constraint. This is to have no angry customers.
endWashingTime must be lower than startTime plus 15 minutes, this is a soft constraint. This is to maximize happy customers.
(*) This problem is NP-complete or NP-hard because you can relax it to a knapsack problem. In practice this means: you can't write an algorithm for it that scales out and finds the optimal solution in reasonable time. But a constraint solver can get you close.

slot machine payout calculation

There's this question but it has nothing close to help me out here.
Tried to find information about it on the internet yet this subject is so swarmed with articles on "how to win" or other non-related stuff that I could barely find anything. None worth posting here.
My question is how would I assure a payout of 95% over a year?
Theoretically, of course.
So far I can think of three obvious variables to consider within the calculation: Machine payout term (year in my case), total paid and total received in that term.
Now I could simply shoot a random number between the paid/received gap and fix slots results to be shown to the player but I'm not sure this is how it's done.
This method however sounds reasonable, although it involves building the slots results backwards..
I could also make a huge list of all possibilities, save them in a database randomized by order and simply poll one of them each time.
This got many flaws - the biggest one is the huge list I'm going to get (millions/billions/etc' records).
I certainly hope this question will be marked with an "Answer" (:
You have to make reel strips instead of huge database. Here is brief example for very basic 3-reel game containing 3 symbols:
Paytable:
3xA = 5
3xB = 10
3xC = 20
Reels-strip is a sequence of symbols on each reel. For the calculations you only need the quantity of each symbol per each reel:
A = 3, 1, 1 (3 symbols on 1st reel, 1 symbol on 2nd, 1 symbol on 3rd reel)
B = 1, 1, 2
C = 1, 1, 1
Full cycle (total number of all possible combinations) is 5 * 3 * 4 = 60
Now you can calculate probability of each combination:
3xA = 3 * 1 * 1 / full cycle = 0.05
3xB = 1 * 1 * 2 / full cycle = 0.0333
3xC = 1 * 1 * 1 / full cycle = 0.0166
Then you can calculate the return for each combination:
3xA = 5 * 0.05 = 0.25 (25% from AAA)
3xB = 10 * 0.0333 = 0.333 (33.3% from BBB)
3xC = 20 * 0.0166 = 0.333 (33.3% from CCC)
Total return = 91.66%
Finally, you can shuffle the symbols on each reel to get the reels-strips, e.g. "ABACA" for the 1st reel. Then pick a random number between 1 and the length of the strip, e.g. 1 to 5 for the 1st reel. This number is the middle symbol. The upper and lower ones are from the strip. If you picked from the edge of the strip, use the first or last one to loop the strip (it's a virtual reel). Then score the result.
In real life you might want to have Wild-symbols, free spins and bonuses. They all are pretty complicated to describe in this answer.
In this sample the Hit Frequency is 10% (total combinations = 60 and prize combinations = 6). Most of people use excel to calculate this stuff, however, you may find some good tools for making slot math.
Proper keywords for Google: PAR-sheet, "slot math can be fun" book.
For sweepstakes or Class-2 machines you can't use this stuff. You have to display a combination by the given prize instead. This is a pretty different task, so you may try to prepare a database storing the combinations sorted by the prize amount.
Well, the first problem is with the keyword assure, if you are dealing with random, you cannot assure, unless you change the logic of the slot machine.
Consider the following algorithm though. I think this style of thinking is more reliable then plotting graphs of averages to achive 95%;
if( customer_able_to_win() )
{
calculate_how_to_win();
}
else
no_win();
customer_able_to_win() is your data log that says how much intake you have gotten vs how much you have paid out, if you are under 95%, payout, then customer_able_to_win() returns true; in that case, calculate_how_to_win() calculates how much the customer would be able to win based on your %, so, lets choose a sampling period of 24 hours. If over the last 24 hours i've paid out 90% of the money I've taken in, then I can pay out up to 5%.... lets give that 5% a number such as 100$. So calculate_how_to_win says I can pay out up to 100$, so I would find a set of reels that would pay out 100$ or less, and that user could win. You could add a little random to it, but to ensure your 95% you'll have to have some other rules such as a forced max payout if you get below say 80%, and so on.
If you change the algorithm a little by adding random to the mix you will have to have more of these caveats..... So to make it APPEAR random to the user, you could do...
if( customer_able_to_win() && payout_percent() < 90% )
{
calculate_how_to_win(); // up to 5% payout
}
else
no_win();
With something like that, it will go on a losing streak after you hit 95% until you reach 90%, then it will go on a winning streak of random increments until you reach 95%.
This isn't a full algorithm answer, but more of a direction on how to think about how the slot machine works.
I've always envisioned this is the way slot machines work especially with video poker. Because the no_win() function would calculate how to lose, but make it appear to be 1 card off to tease you to think you were going to win, instead of dealing with a 'fair' game and the random just happens to be like that....
Think of the entire process of.... first thinking if you are going to win, how are you going to win, if you're not going to win, how are you going to lose, instead of random number generators determining if you will win or not.
I worked many years ago for an internet casino in Australia, this one being the only one in the world that was regulated completely by a government body. The algorithms you speak of that produce "structured randomness" are obviously extremely complex especially when you are talking multiple lines in all directions, double up, pick the suit, multiple progressive jackpots and the like.
Our poker machine laws for our state demand a payout of 97% of what goes in. For rudely to be satisfied that our machine did this, they made us run 10 million mock turns of the machine and then wanted to see that our game paid off at what the law states with the tiniest range of error (we had many many machines running a script to auto playing using a script to simulate the click for about a week before we hit the 10 mil).
Anyhow the algorithms you speak of are EXPENSIVE! They range from maybe $500k to several million per machine so as you can understand, no one is going to hand them over for free, that's for sure. If you wanted a single line machine it would be easy enough to do. Just work out you symbols/cards and what pay structure you want for each. Then you could just distribute those payouts amongst non-payouts till you got you respective figure. Obviously the more options there are means the longer it will take to pay out at that respective rate, it may even payout more early in the piece. Hit frequency and prize size are also factors you may want to consider
A simple way to do it, if you assume that people win a constant number of times a time period:
Create a collection of all possible tumbler combinations with how much each one pays out.
The first time someone plays, in that time period, you can offer all combinations at equal probability.
If they win, take that amount off the total left for the time period, and remove from the available options any combination that would payout more than you have left.
Repeat with the reduced combinations until all the money is gone for that time period.
Reset and start again for the next time period.

How to notice unusual news activity

Suppose you were able keep track of the news mentions of different entities, like say "Steve Jobs" and "Steve Ballmer".
What are ways that could you tell whether the amount of mentions per entity per a given time period was unusual relative to their normal degree of frequency of appearance?
I imagine that for a more popular person like Steve Jobs an increase of like 50% might be unusual (an increase of 1000 to 1500), while for a relatively unknown CEO an increase of 1000% for a given day could be possible (an increase of 2 to 200). If you didn't have a way of scaling that your unusualness index could be dominated by unheard-ofs getting their 15 minutes of fame.
update: To make it clearer, it's assumed that you are already able to get a continuous news stream and identify entities in each news item and store all of this in a relational data store.
You could use a rolling average. This is how a lot of stock trackers work. By tracking the last n data points, you could see if this change was a substantial change outside of their usual variance.
You could also try some normalization -- one very simple one would be that each category has a total number of mentions (m), a percent change from the last time period (δ), and then some normalized value (z) where z = m * δ. Lets look at the table below (m0 is the previous value of m) :
Name m m0 δ z
Steve Jobs 4950 4500 .10 495
Steve Ballmer 400 300 .33 132
Larry Ellison 50 10 4.0 400
Andy Nobody 50 40 .20 10
Here, a 400% change for unknown Larry Ellison results in a z value of 400, a 10% change for the much better known Steve Jobs is 495, and my spike of 20% is still a low 10. You could tweak this algorithm depending on what you feel are good weights, or use standard deviation or the rolling average to find if this is far away from their "expected" results.
Create a database and keep a history of stories with a time stamp. You then have a history of stories over time of each category of news item you're monitoring.
Periodically calculate the number of stories per unit of time (you choose the unit).
Test if the current value is more than X standard deviations away from the historical data.
Some data will be more volatile than others so you may need to adjust X appropriately. X=1 is a reasonable starting point
Way over simplified-
store people's names and the amount of articles created in the past 24 hours with their name involved. Compare to historical data.
Real life-
If you're trying to dynamically pick out people's names, how would you go about doing that? Searching through articles how do you grab names? Once you grab a new name, do you search for all articles for him? How do you separate out Steve Jobs from Apple from Steve Jobs the new star running back that is generating a lot of articles?
If you're looking for simplicity, create a table with 50 people's names that you actually insert. Every day at midnight, have your program run a quick google query for past 24 hours and store the number of results. There are a lot of variables in this though that we're not accounting for.
The method you use is going to depend on the distribution of the counts for each person. My hunch is that they are not going to be normally distributed, which means that some of the standard approaches to longitudinal data might not be appropriate - especially for the small-fry, unknown CEOs you mention, who will have data that are very much non-continuous.
I'm really not well-versed enough in longitudinal methods to give you a solid answer here, but here's what I'd probably do if you locked me in a room to implement this right now:
Dig up a bunch of past data. Hard to say how much you'd need, but I would basically go until it gets computationally insane or the timeline gets unrealistic (not expecting Steve Jobs references from the 1930s).
In preparation for creating a simulated "probability distribution" of sorts (I'm using terms loosely here), more recent data needs to be weighted more than past data - e.g., a thousand years from now, hearing one mention of (this) Steve Jobs might be considered a noteworthy event, so you wouldn't want to be using expected counts from today (Andy's rolling mean is using this same principle). For each count (day) in your database, create a sampling probability that decays over time. Yesterday is the most relevant datum and should be sampled frequently; 30 years ago should not.
Sample out of that dataset using the weights and with replacement (i.e., same datum can be sampled more than once). How many draws you make depends on the data, how many people you're tracking, how good your hardware is, etc. More is better.
Compare your actual count of stories for the day in question to that distribution. What percent of the simulated counts lie above your real count? That's roughly (god don't let any economists look at this) the probability of your real count or a larger one happening on that day. Now you decide what's relevant - 5% is the norm, but it's an arbitrary, stupid norm. Just browse your results for awhile and see what seems relevant to you. The end.
Here's what sucks about this method: there's no trend in it. If Steve Jobs had 15,000 a week ago, 2000 three days ago, and 300 yesterday, there's a clear downward trend. But the method outlined above can only account for that by reducing the weights for the older data; it has no way to project that trend forward. It assumes that the process is basically stationary - that there's no real change going on over time, just more and less probable events from the same random process.
Anyway, if you have the patience and willpower, check into some real statistics. You could look into multilevel models (each day is a repeated measure nested within an individual), for example. Just beware of your parametric assumptions... mention counts, especially on the small end, are not going to be normal. If they fit a parametric distribution at all, it would be in the Poisson family: the Poisson itself (good luck), the overdispersed Poisson (aka negative binomial), or the zero-inflated Poisson (quite likely for your small-fry, no chance for Steve).
Awesome question, at any rate. Lend your support to the statistics StackExchange site, and once it's up you'll be able to get a much better answer than this.

How do I estimate task size for an open source project?

The scale of an open source project is completely different from the projects I do at the office. Work is done in spare time, volunteer work that may not materialize, personal development resources, not corporate, etc.
Clearly the chestnut "do the smallest thing that works" applies, but beyond that, are there any more formal methods to estimating the appropriate size for an open source project, for example, number of tables, number of web pages, or--heaven forbid--function points counting?
What estimation tools would work best for these sorts of projects?
I was recently asked to estimate how long it would take to build an enormous system just by looking at screen shot mockups. Mgmt was asking for a gut feel in under an hour without asking any questions.
I listed out all the modules (pages, reports, big queries, etc.) that I could see and started giving them relative estimates. e.g.:
Task 1: 8 units
Task 2: 16 units
Task 3: 4 units
Then I added a bunch of modules we had already done for this customer along with the relative number of units and actual number of hours/days. This told me what my ratio of units to hours was so I could guess (more than estimate) how long the unknown tasks should take. For example, if I found that an 8 unit task took us 16 hours in the past (2 hours/unit), I'd estimate that the above tasks might take:
Task 1: 8 units * 2 hours/unit = 16 hours
Task 2: 16 units * 2 hours/unit = 32 hours
Task 3: 4 units * 2 hours/unit = 8 hours
This approach enabled me to methodically consider the work to be done and apply some structure around guessing how long it would take to implement.
Of course I delivered my +/- guess with a generous disclaimer.
Then, if you want a calendar schedule from this, estimate how many hours per week you will work on the project and see what you come up with.

Resources