How to manage transactions, debt, interest and penalty? - algorithm

I am making a BI system for a bank-like institution. This system should manage credit contracts, invoices, payments, penalties and interest.
Now, I need to make a method that builds an invoice. I have to calculate how much the customer has to pay right now. He has a debt, which he has to pay for. He also has to pay for the interest. If he was ever late with due payment, penalties are applied for each day he's late.
I thought there were 2 ways of doing this:
By having only 1 original state - the contract's original state. And each time to compute the monthly payment which the customer has to make, consider the actual, made payments.
By constantly making intermediary states, going from the last intermediary state, and considering only the events that took place between the time of these 2 intermediary states. This means having a job that performs periodically (daily, monthly), that takes the last saved state, apply the changes (due payments, actual payments, changes in global constans like the penalty rate which is controlled by the Central Bank), and save the resulting state.
The benefits of the first variant:
Always actual. If changes were made with a date from the past (a guy came with a paid invoice 5 days after he made the payment to the bank), they will be correctly reflected in the results.
The flaws of the first variant:
Takes long to compute
Documents printed with the current results may differ if the correct data changes due to operations entered with a back date.
The benefits of the second variant:
Works fast, and aggregated data is always available for search and reports.
Simpler to compute
The flaws of the second variant:
Vulnerable to failed jobs.
Errors in the past propagate until the end, to the final results.
An intermediary result cannot be changed if new data from past transactions arrives (it can, but it's hard, and with many implications, so I'd rather mark it as Tabu)
Jobs cannot be performed successfully and without problems if an unfinished transaction exists (an issued invoice that wasn't yet paid)
Is there any other way? Can I combine the benefits from these two? Which one is used in other similar systems you've encountered? Please share any experience.

Problems of this nature are always more complicated than they first appear. This
is a consequence of what I like to call the Rumsfeldian problem of the unknown unknown.
Basically, whatever you do now, be prepared to make adjustments for arbitrary future rules.
This is a tough proposition. some future possibilities that may have a significant impact on
your calculation model are back dated payments, adjustments and charges.
Forgiven interest periods may also become an issue (particularly if back dated). Requirements
to provide various point-in-time (PIT) calculations based on either what was "known" at
that PIT (past view of the past) or taking into account transactions occurring after the reference PIT that
were back dated to a PIT before the reference (current view of the past). Calculations of this nature can be
a real pain in the head.
My advice would be to calculate from "scratch" (ie. first variant). Implement optimizations (eg. second variant) only
when necessary to meet performance constraints. Doing calculations from the beginning is a compute intensive
model but is generally more flexible with respect to accommodating unexpected left turns.
If performance is a problem but the frequency of complicating factors (eg. back dated transactions)
is relatively low you could explore a hybrid model employing the best of both variants. Here you store the
current state and calculate forward
using only those transactions that posted since the last stored state to create a new current state. If you hit a
"complication" re-do the entire account from the
beginning to reestablish the current state.
Being able to accommodate the unexpected without triggering a re-write is probably more important in the long run
than shaving calculation time right now. Do not place restrictions on your computation model until you have to. Saving
current state often brings with it a number of built in assumptions and restrictions that reduce wiggle room for
accommodating future requirements.

Related

Parallel Solving with PDPTW in OptaPlanner

I'm trying to increase OptaPlanner performance using parallel methods, but I'm not sure of the best strategy.
I have PDPTW:
vehicle routing
time-windowed (1 hr windows)
pickup and delivery
When a new customer wants to add a delivery, I'm trying to figure out a fast way (less than a second) to show them what time slots are available in a day (8am, 9am, 10am, etc). Each time slot has different score outcomes. Some are very efficient and some aren't bookable depending on the time/situation with increased drive times.
For performance, I don't want to try each of the hour times in sequence as it's too slow.
How can I try the customer's delivery across all the time slots in parallel? It would make sense to run the solver first before adding the customer's potential delivery window and then share that solved original state with all the different added delivery's time slots being solved independently.
Is there an intuitive way to do this? Eg:
Reuse some of the original solving computation (the state before adding the new delivery). Maybe this can even be cached ahead of time?
Perhaps run all the time slot solving instances on separate servers (or at least multiple threads).
What is the recommended setup for something like this? It would be great to return an HTTP response within a second. This is for roughly 100-200 deliveries and 10-20 trucks.
Thanks!
A) If you optimize the assignment of 1 customer to 1 index in 1 of the vehicles, while pinning all other already assigned customers, then you forgoing all optimization benefits. It's not NP-hard.
You can still use OptaPlanner <constructionHeuristic/> for this (<localSearch/> won't improve the score), with or without moveThreadCount to spread it across cores, even though the main benefit will just the the incremental score calculation, not the AI algoritms.
B) Optimize assignment of all customers to an index of a vehicle. The real business benefits - like 25% less driving time - come when adding a new customer allows moving existing customer assignments too. The problem is that those existing customers already received a time window they blocked out in their agenda. But that doesn't need to be a problem if those time windows are wide enough: those are just hard constraints. Wider time windows = more driving time optimization opportunities (= more $$$, less CO² emissions).
What about the response within one minute?
At that point, you don't need to publish (= share info with the customer) which vehicle will come at which time in which order. You only need to publish whether or not you accept the time window. There's two ways to accomplish this:
C) Decision table based (a relaxation): no more than 5 customers per vehicle per day.
Pitfall: if it gets 5 customers in the 5 corners of the country/state, then it might still be infeasible. Factor in the average eucledean distance between any 2 customer location pairs to influence the decision.
D) By running optaplanner until termination feasible=true, starting from a warm start of the previous schedule. If no such feasible solution is found within 1000ms, reject the time window proposal.
Pitfall with D): if 2 requests come in at the same time, and you run them in parallel, so neither takes into account the other one, they could be feasible individually but infeasible together.

Calculating results in a scalable way based on transaction data in web app?

Possible duplicate:
Database design: Calculating the Account Balance
I work with a web app which stores transaction data (e.g. like "amount x on date y", but more complicated) and provides calculation results based on details of all relevant transactions[1]. We are investing a lot of time into ensuring that these calculations perform efficiently, as they are an interactive part of the application: i.e. a user clicks a button and waits to see the result. We are confident, that for the current levels of data, we can optimise the database fetching and calculation to complete in an acceptable amount of time. However, I am concerned that the time taken will still grow linearly as the number of transactions grow[2]. I'd like to be able to say that we could handle an order of magnitude more transactions without excessive performance degradation.
I am looking for effective techniques, technologies, patterns or algorithms which can improve the scalability of calculations based on transaction data.
There are however, real and significant constraints for any suggestion:
We currently have to support two highly incompatible database implementations, MySQL and Oracle. Thus, for example, using database specific stored procedures have roughly twice the maintenance cost.
The actual transactions are more complex than the example transaction given, and the business logic involved in the calculation is complicated, and regularly changing. Thus having the calculations stored directly in SQL are not something we can easily maintain.
Any of the transactions previously saved can be modified at any time (e.g. the date of a transaction can be moved a year forward or back) and calculations are expected to be updated instantly. This has a knock-on effect for caching strategies.
Users can query across a large space, in several dimensions. To explain, consider being able to calculate a result as it would stand at any given date, for any particular transaction type, where transactions are filtered by several arbitrary conditions. This makes it difficult to pre-calculate the results a user would want to see.
One instance of our application is hosted on a client's corporate network, on their hardware. Thus we can't easily throw money at the problem in terms of CPUs and memory (even if those are actually the bottleneck).
I realise this is very open ended and general, however...
Are there any suggestions for achieving a scalable solution?
[1] Where 'relevant' can be: the date queried for; the type of transaction; the type of user; formula selection; etc.
[2] Admittedly, this is an improvement over the previous performance, where an ORM's n+1 problems saw time taken grow either exponentially, or at least a much steeper gradient.
I have worked against similar requirements, and have some suggestions. Alot of this depends on what is possible with your data. It is difficult to make every case imaginable quick, but you can optimize for the common cases and have enough hardware grunt available for the others.
Summarise
We create summaries on a daily, weekly and monthly basis. For us, most of the transactions happen in the current day. Old transactions can also change. We keep a batch and under this the individual transaction records. Each batch has a status to indicate if the transaction summary (in table batch_summary) can be used. If an old transaction in a summarised batch changes, as part of this transaction the batch is flagged to indicate that the summary is not to be trusted. A background job will re-calculate the summary later.
Our software then uses the summary when possible and falls back to the individual transactions where there is no summary.
We played around with Oracle's materialized views, but ended up rolling our own summary process.
Limit the Requirements
Your requirements sound very wide. There can be a temptation to put all the query fields on a web page and let the users choose any combination of fields and output results. This makes it very difficult to optimize. I would suggest digging deeper into what they actually need to do, or have done in the past. It may not make sense to query on very unselective dimensions.
In our application for certain queries it is to limit the date range to not more than 1 month. We have in aligned some features to the date-based summaries. e.g. you can get results for the whole of Jan 2011, but not 5-20 Jan 2011.
Provide User Interface Feedback for Slow Operations
On occasions we have found it difficult to optimize some things to be shorter than a few minutes. We shirt a job off to a background server rather than have a very slow loading web page. The user can fire off a request and go about their business while we get the answer.
I would suggest using Materialized Views. Materialized Views allow you to store a View as you would a table. Thus all of the complex queries you need to have done are pre-calculated before the user queries them.
The tricky part is of course updating the Materialized View when the tables it is based on change. There's a nice article about it here: Update materialized view when urderlying tables change.
Materialized Views are not (yet) available without plugins in MySQL and are horribly complicated to implement otherwise. However, since you have Oracle I would suggest checking out the link above for how to add a Materialized View in Oracle.

Scrum in a fixed cost project [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I have read the agile manifesto and spend a nice day surfing the web in search for this elusive answer. But sadly I did not get an answer that would cover all the bases.
When watching all the blog posts and newscasts of Agile preachers, you just hear about open scope or open "time" projects. How do you apply this to a fix cost project?
From what I found out the biggest problem is scope management. How do you determine if something is not inside the projected scope and how do you formulate arguments for your decision? Because of the agile way you are implementing your software there is no detailed design to argue upon. In most cases you only have a vague wish-list that the customer hands to you. And is so general that you can interpret any feature into it.
And with the rising percentage of fixed-cost projects this seams to me to be a real issue.
So the questions would be:
How do you manage scope in a fix cost project?
How do you determine if the features wished for, are outside the original scope?
To me, the short answer about Agile and fixed price is that you can't do it, at least not with a fixed scope.
I know some people will say "that's not true, we are doing it" but, with all due respect, I don't think they are really doing Agile and I'll explain why. Actually the explanation is quite simple: fixed price implies fixed scope and is based on predictability where Agile is all about variable scope, scope management and adaptivity. So fixed price with fixed scope is basically the opposite of Agile.
With an Agile approach, fixed price gives you a number of iterations for a given team size. During these iterations, the customer will be able to have the team build the most valuable features first and thus to maximize the generated business value. The whole idea is then to stop iterating when the cost of an iteration is greater than the generated value. This is how Agile works.
So when people says they do fixed price with fixed scope in an agile way, they actually introduce some constraints that are not really compatible with the Agile theory - like doing an up-front estimation of a given set of features and freezing these features and estimations - and they loose important advantages of Agile (unless they have a perfect knowledge of the technologies and of the business domain and master them enough to predict everything but I know few projects that are like this).
Here is anyway a good compilation of various Agile contracts: 10 Contracts for your next Agile Software Project that might be helpful. But I think they all require some education of customers, especially the one that are used to fixed price with fixed scope (and late deliveries).
Scrum does not replace having proper requirements, or even having occasional major releases or milestones. Rather, it gives you a means to keep your team productive and focused, and avoids the time-wasting side-effects of a waterfall process.
In fact, one of the biggest advantages of an agile process like Scrum is that it causes you to "fail quickly and loudly" on problematic areas of your project. If, after a couple of sprints, your team still can't effectively estimate the time and resources needed to implement a particular feature, it may be worth pushing back on the requirements in that area -- they may need to be clarified, simplified, or scrapped altogether. In a traditional waterfall process, however, those "problem features" can often be pushed back to the last possible minute, resulting in the usual deathmarch and under-delivery into which most projects devolve.
However, the role of the Product Owner is even more critical in teams using Scrum who have a large set of requirements. Left to their own devices, most development teams will focus on the most interesting/fun/geeky features (service APIs, caching, search) first, and leave the "messy" stuff like payment process, UX design, and i18n until the last minute. A strong user voice is essential to making sure those features critical to the end user receive their fair share of attention.
Okay, this will not be the ideal answer you are looking for, but may help non-the-less.
For your first point:
With agile, and Scrum in particular, the style is suited toward changing specifications and unfixed deadlines using iteration patterns. To be able to manage this in a fixed scope project will be a nightmare. What one would normally do is set a budget for the specified scope, and any addendum to this would produce billable hours above and beyond the scoped budget. To do this in Scrum would be pointless, as the product backlog will be continually filled by the stakeholders. If there is no "punishment" for scope changes in a fixed budget, there will be nothing holding people back from just loading on to you.
The alternative here is to have fixed scope sprint successions, so for instance:
5x Sprints = x Cost with minimal scope change.
For your second point:
The use of Analysis and Design is an invaluable tool. By using use cases, event tables, sequence diagrams, state machines and the like; you will be saving yourselves oceans of tears in the long run. Basically, once the planning has been done, any addendum to this that requires additional (please note additional, not things that have been overlooked) use cases and large code changes will be out of scope. In fact, anything that was not overlooked in the planning and is not in your specification, is out of scope.
In closing, you will need to have very well planned documentation as well as very solid agreements with your clients to be able to pull this off 100%.
I hope this helps.
I worked in a environment where we had fixed cost and fixed time projects. We has switched to a Scrum-esque methology from a Waterfall/VModel methology. Scrum can work very well in fixed cost/time projects as the concept is that the customer is put in control, however for this to work you have to be able to somewhat accuratly determine what work is required and what it will cost (time, money, resource). And this is a situtation where Scrum in an ideal candidate.
You break down the wishy-washy wish list/requirements/screenshots into tagiable deliverables. E.g. a customer may say "I want ecommerce, with Paypal", you need to break this down into actual deliverables e.g. "1. Customer Registration and Login, 2. Product Catalogue, 3. Shopping Bag, 4. Payment, 5. Order Acknowlegment". At this stage, it's still impossible to determine how long it will take, and ofc we need to deliver all of the above in order to complete the project (i.e. you can't have Ecommerce without Payment). So break them down again, and again, until you have granular deliverables, genreally delverable within hours, maybe days, but certainly not weeks e.g.
1 Catalogue
1a View all Items
1ai View all items on 1 page with an image and item name underneath in a grid, 4 items per row
1aii View 10 items per page with paging
1aiii View a user slected number of items per page, with paging
1aiiii View all items on 1 page with an image and item name, descriptioon and price on the same line, 1 item per row
1b View by Category
...
1c Search
...
1d Attribute Filter
...
And so on, it can be done very quickly, and you can now probably guesstimate how long it would take todo x (ofc, I might break the above down even further, add more descriptive text to describe the work required, such as what persistant data stuctures Ill might need, the data in those structures, how data will be added, going further you might even desribe the required the begin and exit states).
Once you've go this, you'll notice that some features and depenant on others, e..g you can't have paging feature on a catalogue unless you have a catalogue to start witj, and the catagloge will require the CMS screesn to add and edit items etc etc. Highlight these 'can't live without feature' in whatever tool you using and this forms the core project, and within a day or two you have a bunch of features that can be developed somewhat standalone, with costs, which when added up make the cost of the project. And now the customer is in charge, they decide thay want to added a feature and increase the cost, cool, its up to them afterall.
All the above is obviously only a small portion of what scrum or any agile process is.
I don't think a fixed price contract with scope creep and a Scrum process are incompatible. You just need to agree up front with your customer how it will work. If you create your initial backlog with your customer, estimating as you go, you can use that as your basis for the fixed price cost and schedule. You can even agree to a rate of "X" story points equals "Y" cost and "Z" schedule at the beginning.
You then do the normal scrum thing, having the customer allocate stories to the current iteration, etc.
As the customer engages in scope creep, you work with them to add the "creep" as user stories to the backlog. Each time you add a new story, point out that for each X points added to the backlog, they will have to increase cost by Y and schedule by Z, or, they will have to give up story points of equal value. Since they are picking what you work each iteration, the points they give up (if that's the choice) will be the least valuable features. When your schedule runs out, you will be left with a backlog of the least important features that they can choose to drop or give you a new contract to finish.
The trick, of course, is to be good at estimating cost and schedule for each story/task ;-)
The project could be broken down into smaller parts and fixed rates could be attached to those. The other phases of the project could then be adjusted.
You have to be able to sell the agile process against your competitors. If a client has a history of fixed bid projects that were delivered on time, spec and cost, why would they waste their time taking bids from other developers?
Fixed Cost does not mean single sprint. Scope gets transfered to the Product Backlog, and as Sprints progress, scope is adjusted, negotiated and delivered. Scrum allows for rapid value delivery, and provides quick validation, and the opportunity to identify potential gold plating.
Scope change may result in the addition of backlog items, and the deletion of others. Its a balance of ROI vs the fixed budget provided.
If the scope does increase (and add value), and the cost is fixed, then the triple constraint (cost, time and scope) must be managed accordingly.
Remember that fixed cost does not mean fixed length.

What is better: set up underestimated or overestimated deadlines? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
Suppose you are a project manager. You can estimate an effort in days for specific task for specific developer. After performing estimation you obtain some min and max values.
After this you delegate a task to developer. Actually you also set up deadline.
Which estimation is better to use when set up deadline: min or max?
As I see min estimation can result in stress for developer, max estimation can result in using all the time which is allocated to developer even if task can be complete faster (so called Student syndrome).
Which other pros and cons of two approaches?
EDIT:
Small clarification: I speak about setting up deadlines for subordinates when delegating the task, NOT for reporting to my boss.
EDIT:
To add one more clarification: I can keep in mind my real estimation, provide to boss slightly larger estimation, to subordinates - slightly smaller.
And this questions touches the following thing: is it good idea to provide to developer underestimation to make him working harder?
You should use the best guess which is a function of the min and max estimates* - not just the simple average -
best_guess = (min * min_weighting + max * max_weighting) / divisor*
* Tom Neyland suggests it should be (min_weighting + max_weighting). Actually I'm not sure whether that is correct, but it's probably more correct than my original divisor of 2.0.
The weighting you give to the min and max values will depend on the complexity of the task, the risks associated with the task, the likelihood of the risks occuring, the skill of the developer, etc. and will vary from organisation to organisation and from project to project. If you keep a record of your previous estimates and the actual time each took you'll be able to refine these estimates over time.
You should also use these values, plus a confidence value, when talking to senior management and customers. While giving the max and delivering early is not the same as giving the min and delivering late, it still shows that you don't have control over your development.
Giving the confidence value and an idea of the risks will also help manage expectations so if there are problems they're not unexpected.
* These min and max estimates will be got by various means - asking the developers, past experience etc. If polling developers then the actual min and max values should be treated as outliers and either discarded or modified in some way. What I mean here are the values you get from phrases like "it'll take 2 weeks if all goes well or a month if we hit some snags". So the values you plug into the formula are not the raw numbers.
Use neither min nor max but something in between.
Erring on the side of overestimation is better. It has much nicer cost behavior in the long term.
To overcome the stress due to underestimation, people may take shortcuts that are not beneficial in the long term. For example, taking extra technical debt thast has to be paid back eventually, and it comes back with an interest. The costs grow exponentially.
The extra cost from inefficiency due to student's syndrome behaves linearly.
Estimates and targets are different. You (or your managers and customers) set the targets you need to achieve. Estimates tell you how likely you are to meet those targets. Deadline is one sort of target. The deadline you choose depends on what kind of confidence level (risk of not meeting the deadline) you are willing to accept. P50 (0.5 probability of meeting the deadline) is commonplace. Sometimes you may want to schedule with P80 or some other confidence level. Note that the probability curve is a long-tailed one and the more confidence you want, the longer you will need to allocate time for the project.
Overall, I wouldn't spend too much time tracking individual tasks. With P50 targets half of them will be late in any case. What matters most is how the aggregate behaves. When composing individual tasks estimates into an aggregate, neither min or max is sensible. It's extremely unlikely that either all tasks complete with minimum time (most likely something like P10 time) or maximum time (e.g. P90 time): for n P10/P90 tasks the probability is 0.1^n.
PERT has some techniques for coming up with reasonable task duration probability distributions and aggregating them to larger wholes. I won't go into the math here. Here's some pointer for further reading:
Steve McConnell: Software Estimation - Demystifying the Black Art. It's quite readable and pragmatic but at least the 1st edition I have has some quirks in its math and otherwise.
Richard D. Stutzke: Estimating Software-Intensive Systems - Projects, Products and Processes. It's a little more academic, harder read but for example explains the math better.
Ask for best, likely and worst case scenario estimates instead. Then use Program Evaluation and Review Technique. However you may want to take a look at some PERT critique first.
For individual tasks or tasks making up the critical path it’s simply not prudent to go for the best case estimates. It’s like saying that the project is absolutely free of any risk and uncertainty. If the actual job turns out to be anything but the best case scenario you’ll end up blowing the schedule. It’s better to end up with some extra time on your hands and fill the time by implementing some nice-to-haves as opposed to having to work nights and weekends.
On the other hand if managers mostly went for the worst case estimates and in software world they can easily be an order of magnitude greater than the best case figures most projects would never make it past the feasibility and planning stage. Not all of the risks going to materialise.
Going for the best case estimate won't help fighting student syndrome. Include interim milestones and deliverables instead, beside being helpful at combating the student syndrome they're pre-requisite for having a trustworthy data on the project progress and uncovering early any potential issues.
If the difference between min and max are big rather than using some black magic formula I think it the best thing to do would be to go back to the developers and ask them to do a finer breakdown and prototyping, which will lead to better estimates where the gap between min and max is not that big.
Note to the question: In my opinion, the estimates should be done by the developers/architects since they have the best technical knowledge to be able to break down into tasks and estimate those tasks.
If you are estimating for a specific developer, and you know your estimates are generally accurate for that developer, then the min value is the logical deadline (initially). In the course of the project you will adjust deadlines according to circumstance.
If you have little experience with a specific developer, one of my fondly regarded previous managers would ask the developer himself to do the estimate and set the initial deadline a third of the distance between that developer's min and max, challenging the developer to beat it.
Something which has been missing in many of these answers (perhaps because it's slightly off-topic) is frequent updates. With younger/newer developers this is even more important - read the code they commit, and/or check in daily to ask them for specific, detailed reports.
This also allows you to set tight deadlines for developers without giving them too much stress, because they will know you're around to help adjust deadlines when needed.
Frequent updates give you the most important tool in setting customer/management expectations - early warning of issues which might delay things, and I prefer having that over any formula.
Is the developer going back into a cave to develop this or is there a good chance of changing requirements over the course of the project? I would think most projects will have a good chance that something won't go smoothly and thus it may be better to try to get the prototype up sooner rather than later.
As for the initial question, I think I'd break this out into a few different outcomes and consider each:
Gross underestimation -> This leads to the problem that there is still a lot of work to do and the manager appears unable to make reasonable estimations.
Minor underestimation -> In this case, either there is an extension, scope gets cut or some bugs are in the release, but this is better than the previous case.
Made the deadline, on time and on budget with quality -> While this may seem optimal as everything worked out, I don't think this is the best result possible.
Minor overestimation -> In this case, there is some breathing room that means either things finish early or some extra work is added. A point here is that this may seem to deliver a slightly better result than the previous case like how some companies will try to beat the earnings estimate by a small amount to do better than expected.
Gross overestimation -> I think this would be the worst case outcome though it is similar to the first in terms of someone being way out of their league in being able to provide a reasonable estimation.
That's just my opinion on each and others may have a different take on it than me.
If you're trying to hold developers to their minimum estimate, that's foolish. No one, in any industry, consistently hits their minimum time estimate for getting something done. Eventually, they'll just learn to pad their minimum estimates significantly, and then they'll never hit the old minimums, because all estimates will be above that.
In Agile/Scrum, you don't set firm deadlines, but set "how many hours left on this task". Every day, you update the amount of time left. You do not track hours spent, but do track estimated hours remaining, and you try and stay honest about it.
If you have lazy developers, this is bad, because they can easily game that system. If you have developers that are worth their salt, this is great. They get better at estimation pretty quickly, and you - as a project manager - learn how reliable their estimates are, and you'll have a much better feel for what estimates to pass up the chain based on the individual developer estimates.
Go slightly towards Agile, fire the bad developers as you discover which are which, reward the good developers for actually giving a damn, and have a more productive, happier team while being able to report more accurate expectations to your superiors.
If in doubt under promise and over deliver: you want to be the person who is delivering more than they were expecting, not less. Based on this always go with the higher of any estimate.
Slightly more complex:
For a given potential delivery, if you plot the delivery times against the chances of them being happening, you're going to get a curve which is a variation of a normal distribution, and you can assume that a developers minimum estimates are going to be somewhere towards the left of the curve and their maximum towards the right.
The area under the curve to the left of the single number you select as your estimate represents the probability of you successfully delivering on or before that estimate. So if you give a number at the very left hand side your chance of hitting is effectively zero, if you give a number at the very right hand side your chance is effectively 100%.
What is less commonly realised is if you give the mean value (assuming your min and max averaged out give something approximating the actual mean) you'll only hit that deadline 50% of the time. Effectively if you use the mean you're going to miss the deadline half the time. I don't know about you but I don't like being seen as the guy whose misses half his deadlines.
So you want a number which is going to give you something you hit, say, 90% of the time. Conveniently 95% represents the mean + two standard deviations but if you can't be arsed to calculate that (and most of us probably don't have the data) my experience says that:
(3 x max + 1 x min) / 4
gives a reasonable result.
Incidentally, what you tell the developer is the deadline is another question entirely. Personally I'd give him somewhere around ((2 x max + 1 x min) / 3) and have the rest as contingency.
What are you using the estimates for? Specifically, why will the developer feel stressed if you normally underestimate?
If you're trying to schedule how long something is likely to take, you go for an intermediate value. Probably on the long side, since people normally underestimate. In any case, you shouldn't be using these estimates as firm objectives for developers, and so they shouldn't be overly stressful.
If you're using these estimates to set up commitments, you need to err on the side of overestimating. Giving developers insufficient time leads to burnout, unmaintainable buggy code that doesn't do quite what the user wants, and low morale and high turnover. Set the commitments to be reachable, and encourage the developers to finish early.
This depends on project.
Some projects may require fast development and there's no alternatives if deadline is already set and there's no good chance to prolong development. Typical issue: marketing campaign resulting in new service. Such deadline can be enough for normal development, but in some organizations it is so close, that developers work in stress and make many errors that are fixed during production stage. That's a kind of project when developers have to work with topmost effectiveness and they'd better get good reward on success.
Some projects are accurately planned and here you can use all analytics you have: history data, some developer's time metrics on subtasks, calculating risks, etc.
But anyway MAX time shouldn't be used: its the most inaccurate measure that usually leads to even more time taken. And here's a simple reason: when developer just gives away this MAX, he almost doesn't measure. He just gives away his intuition that has very little info at the time. But if he'll spend at least half an hour he'll understand specifics of his tasks, he even may split it into subtask and increase his accuracy. So you can give developer some bias like "hey, guys, just think in what time you would provide stable code here" but send him measure himself. It is good for a job, it is good for a programmer himself.
The first mistake most estimators use when setting the deadline is assuming that the dev will be full-time every day on that task which is a disastrous mistake. This can result in not meeting the deadline even when you use the over estimate to figure out the deadline. Being under the hours but past the deadline you told the client is a big problem. People take leave, get sick, have jury duty, have to go to required meetings on some new HR policy, get called over to help on another project when someone is stuck, have to load software on a new computer when their old one breaks, have to research a production problem on code they recently deployed, etc. If you are estimating more than 6 hours a day on the project per person, you are already in trouble on the deadline before the project starts. When I did manpower studies, we used a figure that equated to just slightly more than 6 hours a day of direct work when calculating out how many people were needed for any job. And we did a lot of statistical analysis as the basis for the figure we used.
I think you have to decide which of these to use on a case by case basis. We have some projects that we know the max estimate is still probably a little low (usually when someone in management couldn't face the client with the real estimate), we have others where we are doing something new where we know the estimates are more likely to be off, in these kinds of cases go with the max. But for work you've done before that is well-defined and you know the dev assigned won't be learning new skills, then go closer to the min (but never actually use the min, there are alawys unexpected bumps in the road). ALso the shorter the project, the more likely you will be able to meet the min, it is far easier to get a good estimate for a week-long project than a year-long one.
More importantly is changing the estimate and deadline every time the circumstances change. If the client adds work, the extend the deadline and estimate, don't just do it. If your best dev quits and you have to put someone new on the project, extend the deadline becasue that person has to have time to get up to speed (you may have to eat the hours though, the client may not agree to pay for that time. Critical to this is telling the client right away. They tend to be better about moving a deadline (although not happy) than they are about missing one or making the dealine but the product doesn't work as they expect it to. Too many project managers just like to wish a problem is going away and the won't have to face that conversation with the client. But usually when they do finally have to tell him it is a much worse conversation than the difficult one they tried to avoid.

How to detect anomalous resource consumption reliably?

This question is about a whole class of similar problems, but I'll ask it as a concrete example.
I have a server with a file system whose contents fluctuate. I need to monitor the available space on this file system to ensure that it doesn't fill up. For the sake of argument, let's suppose that if it fills up, the server goes down.
It doesn't really matter what it is -- it might, for example, be a queue of "work".
During "normal" operation, the available space varies within "normal" limits, but there may be pathologies:
Some other (possibly external)
component that adds work may run out
of control
Some component that removes work seizes up, but remains undetected
The statistical characteristics of the process are basically unknown.
What I'm looking for is an algorithm that takes, as input, timed periodic measurements of the available space (alternative suggestions for input are welcome), and produces as output, an alarm when things are "abnormal" and the file system is "likely to fill up". It is obviously important to avoid false negatives, but almost as important to avoid false positives, to avoid numbing the brain of the sysadmin who gets the alarm.
I appreciate that there are alternative solutions like throwing more storage space at the underlying problem, but I have actually experienced instances where 1000 times wasn't enough.
Algorithms which consider stored historical measurements are fine, although on-the-fly algorithms which minimise the amount of historic data are preferred.
I have accepted Frank's answer, and am now going back to the drawing-board to study his references in depth.
There are three cases, I think, of interest, not in order:
The "Harrods' Sale has just started" scenario: a peak of activity that at one-second resolution is "off the dial", but doesn't represent a real danger of resource depletion;
The "Global Warming" scenario: needing to plan for (relatively) stable growth; and
The "Google is sending me an unsolicited copy of The Index" scenario: this will deplete all my resources in relatively short order unless I do something to stop it.
It's the last one that's (I think) most interesting, and challenging, from a sysadmin's point of view..
If it is actually related to a queue of work, then queueing theory may be the best route to an answer.
For the general case you could perhaps attempt a (multiple?) linear regression on the historical data, to detect if there is a statistically significant rising trend in the resource usage that is likely to lead to problems if it continues (you may also be able to predict how long it must continue to lead to problems with this technique - just set a threshold for 'problem' and use the slope of the trend to determine how long it will take). You would have to play around with this and with the variables you collect though, to see if there is any statistically significant relationship that you can discover in the first place.
Although it covers a completely different topic (global warming), I've found tamino's blog (tamino.wordpress.com) to be a very good resource on statistical analysis of data that is full of knowns and unknowns. For example, see this post.
edit: as per my comment I think the problem is somewhat analogous to the GW problem. You have short term bursts of activity which average out to zero, and long term trends superimposed that you are interested in. Also there is probably more than one long term trend, and it changes from time to time. Tamino describes a technique which may be suitable for this, but unfortunately I cannot find the post I'm thinking of. It involves sliding regressions along the data (imagine multiple lines fitted to noisy data), and letting the data pick the inflection points. If you could do this then you could perhaps identify a significant change in the trend. Unfortunately it may only be identifiable after the fact, as you may need to accumulate a lot of data to get significance. But it might still be in time to head off resource depletion. At least it may give you a robust way to determine what kind of safety margin and resources in reserve you need in future.

Resources