Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
As a lead developer I often get handed specifications for a new project, and get asked how long it'll take to complete the programming side of the work involved, in terms of hours.
I was just wondering how other developers calculate these times accurately?
Thanks!
Oh and I hope this isn't considered as a argumentitive question, I'm just interested in finding the best technique!
Estimation is often considered a black art, but it's actually much more manageable than people think.
At Inntec, we do contract software development, most if which involves working against a fixed cost. If our estimates were constantly way off, we would be out of business in no time.
But we've been in business for 15 years and we're profitable, so clearly this whole estimation thing is solvable.
Getting Started
Most people who insist that estimation is impossible are making wild guesses. That can work sporadically for the smallest projects, but it definitely does not scale. To get consistent accuracy, you need a systematic approach.
Years ago, my mentor told me what worked for him. It's a lot like Joel Spolsky's old estimation method, which you can read about here: Joel on Estimation. This is a simple, low-tech approach, and it works great for small teams. It may break down or require modification for larger teams, where communication and process overhead start to take up a significant percent of each developer's time.
In a nutshell, I use a spreadsheet to break the project down into small (less than 8 hour) chunks, taking into account everything from testing to communication to documentation. At the end I add in a 20% multiplier for unexpected items and bugs (which we have to fix for free for 30 days).
It is very hard to hold someone to an estimate that they had no part in devising. Some people like to have the whole team estimate each item and go with the highest number. I would say that at the very least, you should make pessimistic estimates and give your team a chance to speak up if they think you're off.
Learning and Improving
You need feedback to improve. That means tracking the actual hours you spend so that you can make a comparison and tune your estimation sense.
Right now at Inntec, before we start work on a big project, the spreadsheet line items become sticky notes on our kanban board, and the project manager tracks progress on them every day. Any time we go over or have an item we didn't consider, that goes up as a tiny red sticky, and it also goes into our burn-down report. Those two tools together provide invaluable feedback to the whole team.
Here's a pic of a typical kanban board, about halfway through a small project.
You might not be able to read the column headers, but they say Backlog, Brian, Keith, and Done. The backlog is broken down by groups (admin area, etc), and the developers have a column that shows the item(s) they're working on.
If you could look closely, all those notes have the estimated number of hours on them, and the ones in my column, if you were to add them up, should equal around 8, since that's how many hours are in my work day. It's unusual to have four in one day. Keith's column is empty, so he was probably out on this day.
If you have no idea what I'm talking about re: stand-up meetings, scrum, burn-down reports, etc then take a look at the scrum methodology. We don't follow it to the letter, but it has some great ideas not only for doing estimations, but for learning how to predict when your project will ship as new work is added and estimates are missed or met or exceeded (it does happen). You can look at this awesome tool called a burn-down report and say: we can indeed ship in one month, and let's look at our burn-down report to decide which features we're cutting.
FogBugz has something called Evidence-Based Scheduling which might be an easier, more automated way of getting the benefits I described above. Right now I am trying it out on a small project that starts in a few weeks. It has a built-in burn down report and it adapts to your scheduling inaccuracies, so that could be quite powerful.
Update: Just a quick note. A few years have passed, but so far I think everything in this post still holds up today. I updated it to use the word kanban, since the image above is actually a kanban board.
There is no general technique. You will have to rely on your (and your developers') experience. You will have to take into account all the environment and development process variables as well. And even if cope with this, there is a big chance you will miss something out.
I do not see any point in estimating the programming times only. The development process is so interconnected, that estimation of one side of it along won't produce any valuable result. The whole thing should be estimated, including programming, testing, deploying, developing architecture, writing doc (tech docs and user manual), creating and managing tickets in an issue tracker, meetings, vacations, sick leaves (sometime it is batter to wait for the guy, then assigning task to another one), planning sessions, coffee breaks.
Here is an example: it takes only 3 minutes for the egg to become roasted once you drop it on the frying pen. But if you say that it takes 3 minutes to make a fried egg, you are wrong. You missed out:
taking the frying pen (do you have one ready? Do you have to go and by one? Do you have to wait in the queue for this frying pen?)
making the fire (do you have an oven? will you have to get logs to build a fireplace?)
getting the oil (have any? have to buy some?)
getting an egg
frying it
serving it to the plate (have any ready? clean? wash it? buy it? wait for the dishwasher to finish?)
cleaning up after cooking (you won't the dirty frying pen, will you?)
Here is a good starting book on project estimation:
http://www.amazon.com/Software-Estimation-Demystifying-Practices-Microsoft/dp/0735605351
It has a good description on several estimation techniques. It get get you up in couple of hours of reading.
Good estimation comes with experience, or sometimes not at all.
At my current job, my 2 co-workers (that apparently had a lot more experience than me) usually underestimated times by 8 (yes, EIGHT) times. I, OTOH, have only once in the last 18 months gone over an original estimate.
Why does it happen? Neither of them, appeared to not actually know what they are doing, codewise, so they were literally thumb sucking.
Bottom line:
Never underestimate, it is much safer to overestimate. Given the latter you can always 'speed up' development, if needed. If you are already on a tight time-line, there is not much you can do.
If you're using FogBugz, then its Evidence Based Scheduling makes estimating a completion date very easy.
You may not be, but perhaps you could apply the principles of EBS to come up with an estimation.
This is probably one of the big misteries in the IT business. Countless failed software projects have shown that there is no perfect solution to this yet, but the closest thing to solving this I have found so far is to use the adaptive estimation mechanism built in to FogBugz.
Basically you are breaking your milestones into small tasks and guess how long it will take you to complete them. No task should be longer than about 8 hours. Then you enter all these tasks as planned features into Fogbugz. When completing the tasks, you track your time with FogBugz.
Fogbugz then evaluates your past estimates and actual time comsumption and uses that information to predict a window (with probabilities) in which you will have fulfilled your next few milestones.
Overestimation is rather better than underestimation. That's because we don't know the "unknown" and (in most cases) specifications do change during the software development lifecycle.
In my experience, we use iterative steps (just like in Agile Methodologies) to determine our timeline. We break down projects into components and overestimate on those components. The sum of these estimations used and add extra time for regression testing and deployment, and all the good work....
I think that you have to go back from your past projects, and learn from those mistakes to see how you can estimate wisely.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I work on a software project and would like to estimate the percentage out of the total contribution that I have put in the development of the software. Is there some tool doing this? Such a tool can be useful for appraisals or negotiations, for example. After all, we work for money (yes, not only money, put the point remains). I think there is enough hand-waving for the most important things.
The estimation is very subjective (at least to me now) but I do not know of any tool that provides even a subjective estimate. I know of Sloccount that spells out the total effort using the lines of code but not on per-developer basis.
My idea of an ideal tool for this purpose would:
measure the complexity of the code (more complex is more effort, but more effort is not necessarily more contribution)
measure the decomposibility/flexibility of the software (more decomposable is better)
how much library code is used -- using library code speeds up the development process, increases the associated risk and requires the developer to know from before or learn about the library.
be intelligent enough to differentiate between "who wrote the code", "who copied the code" and "who indented the code".
It is difficult to differentiate between the complexity in the implementation and the intrinsic complexity of the problem. Perhaps a comparison can be made with an equivalent open source counterpart if there is, or for each submodule separately.
If there is no such tool, is there no merit in having such a tool? Or do you believe in "I do work, I do not measure"? It takes time after all. Perhaps the project manager should do this estimation continuously, say, weekly. Are there any standards? Yes, standardization is difficult because every project has different goals, but perhaps that should mean there should be multiple standards, not no standards at all. This looks similar to the how a company is valued in the market.
Update: after seeing a few initial answers: It does not make sense to imagine a tool that just outputs the percentages. Are there tools that can help humans (particularly managers) in making better decisions? Or what is the sufficient statistic for making better decisions? Are these statistics available?
I really doubt there is any reliable trustworthy way of measuring individual's contribution to the solution. Sometimes rewriting some complicated legacy code that results in less lines of code, less complicated solution (smaller cyclomatic complexity etc.) can be seen as a quite significant contribution, while in other cases deleting valuable code covering edge cases that results in the same statistics (less lines of code, smaller CC etc.) is definitely something bad. It all comes down to people, trust and cooperation, individualism in the team is almost always wrong and I would rather avoid it and especially not use it as a motivation factor.
This is a research topic on its own. There are several tools that have tried to define metrics like code ownership. There are other approaches which tackle other aspect of collaborative development, for instance the trustability we can have in the code.
There has been also several studies that tried to use the information from bug trackers. For instance, to identify the developer that is the more likely to introduce bugs. But it's hard to be objective (A brilliant developer that is assigned the most critical part of the system, will still be more likely to introduce critical bugs).
It's actually hard to monetize the development tasks. What is the cost of a bug? What is the gain of refactoring? That would be however one way to estimate the contribution of a developer.
The last cool tool I saw of this kind was the Game Plugin for Hudson continuous integration system. A score is assigned to each developer according their actions
-10 if they break the build
-1 for breaking a test
+1 for fixing a test
etc.
That's again a way to somehow assess the contribution of the developer.
All in all, I do feel like what you are asking for exist, but is still very immature.
I don't think you can get a tool to evaluate your share of the project. Measuring lines of source is all very well, but what of the quality of that source? You wouldn't want someone taking the credit for 200 lines of source if the job could have been easiy done in 20...
Also, thinking of my employer for a moment, a lot of people contribute to the project in ways other than code. Immediate examples I can think of would be Project Managers and Testers - both of whom are essential, both of whom rightly deserve some credit.
Martin
The only thing that I could imagine would be a voting system. I have absolutely no idea, if that would work in your team or anywhere - but I'm sure, that you will need humans for any realistic estimation of code quality.
In Stroustrup's Book on C++ I've read once "Don't try to solve social problems with technical means".
Thinking progmatically, the attitude and the ability of a programmer could be very quickly estimated by making a code-review together and having a talk on relevant topics.
Thinking as an IT-enthusiast and as a control-freak, this shouldn't be very hard, to implement a teachable machine-learning software, which uses version-cotrol, bug-database, etc and greates real-time performanced data for each contributor. E.g. R, KNIME or WEKA could be used for this.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am final year college student. I am trying to figure out how much time i should spend on coding and learning.
while (true) {
learn;
code;
}
I have two friends in my university, both studying media informatics, and both were absolute beginners in programming.
The first one reads a lot at home if he has to learn new languages for a project but has never had a private programming project.
The second one reads a bit but has his own python project. A web application for his friends, where you can bet on soccer results.
Both in comparison:The first guy is slow in programming and always stumble upon simple things and his code can be optimized (in line numbers and comments) at least by 5. And in two days he will stumble upon the same issue again...
The second guy is much faster, can easily read foreign code and languages and stumble upon one problem at most two times, the third time he used what he has learned....
So imho, doing your own project, where you code because you love it, where you work until morning to fix a bug or to finish an implementation, is the very best way to learn!
Surely you've realized that putting a finite measurement on the amount of time you should spend coding, is futile and hugely irrelevant.
Do what you want, but always try and keep up to date.
When I first started programming, it seems that I learn new things by leaps and bounds. Functions, classes, inheritance and etc. But after a while, I realize that you learn by coding. I load myself up with tonnes of reading material - Effective C++, Modern C++, but nothing beats them when I actually sat down and code.
Of course, writing your code the same way over and over again does not make you a better programmer. You have to learn to think - how do I make it reusable? less error prone? portable? immune to changes in other areas of the application? easier to maintain? Is there a better way to do this?
Eventually, the learning peaks, and what you learn are what I like to term as multipliers. It's like knowing that dirname(__FILE__) in PHP returns the current directory which an include file is in. It's like finding out what's a ORM and how by abstracting away the DB you can focus more on the code logic rather than an endless routines of writing INSERTS and UPDATEs SQL statements. It's like learning smart pointers and effective use of STL in C++, using Firebug effectively when doing JavaScript/CSS/HTML...and lots more.
So code; once you get frustrated about something ("There must be a better way to do this than now!"), search for a better way - this is how I learn, anyway.
When I was young:
Monday to Friday, 10am to 7pm, programming in office
Saturday afternoon, reading in Chapters
Monday to Saturday, 9pm to 1am, programming at home
Sunday, drive to downtown and pick up a few books from the bookstore
those were the days when Google was know as nntp
These days:
Monday to Friday, 10am to 7pm, coding in office (too bad I am on web now ;-)
9pm to 1am, coding on my MacBook Air on a few iPhone projects
Saturday and Sunday, coding for another 16 hours
too bad, Google interrupts me too much and I cannot count how many hours are spent on reading blog and pdf books ...
You have to decide that yourself. If you're constantly feeling like you should spend more time coding, then you're probably right. You should never force yourself to the point where the sight of a curly bracket makes you want to puke either. If you're sufficiently interested in programming then the amount of time you spend naturally without slacking off/burning out, will be just fine. (And if you're not, you should cut your losses as soon as possible.)
Be sure that this kind of approach does not make you less valuable of a programmer than the angry nerd in your class who spends every waking hour coding as a part of his masterplan to get back at the world.
the simple answer: do not create some sort of a schedule
why?
you never can know ahead what situation you're in during a certain time, so let's say you set it at everyday at 10am, then suddenly your dog died today at 10am, your family called you to mourn over poor Snuffel...for hours; schedule's all ruined
so what do you do?
code up; if you get tired grab a book or read an article (articles today are really juicy), if you get tired of reading and coding, play games that rattle your brain (yet entertaining, something like Civilizations IV). if you're all rested, fire up your IDE and apply what you just read about. Don't worry if you get it all messy the first time (unless you're a mad genius who certainly will kill himself if he doesn't get something right on his first try).
Note: you should probably set a time for how long you play the game, though:)
My suggestion would be to discover your strengths and if learning is among them, then you may enjoy spending a lot of time learning so do what you want here. Of course one shouldn't go so overboard that things like hygiene get sacrificed, so do try to maintain some basic standard of existing which includes the basics of cleaning your place, yourself, and that kind of thing.
For myself, I'd say that I'm almost always trying to learn something, somewhere. Maybe it is learning about how much patience I have in traffic or how well can I handle this curveball that life has thrown me by my having to do things like income taxes and discover what has changed in the software or tax laws from the previous year. If you look at life as a series of opportunities, you may learn a lot in the world.
In my humble opinion most of the time you are programming. While you program you are learning from experience. This is one type of learning. Another type of learning comes from reading books and other resources (courses, internet, Development conventions). I use books to keep up with technology and to better understand what I am doing. I read almost every day from 0.5-1.0 hour. It depends on your free time and the type of person you are.
Please take into account that there are more ways to learn: code reviews, reading other people's code and I am sure that there are more that I didn't mention here.
Anyway, good luck.
I am guessing the 'learning' here means, acquiring new tips and tricks, grapsing new technology in the market and stay up to date with the trends in technology.
From my experience it is taking around 20% time for learning, and it is mainly because I work on all latest technologies from Microsoft like WPF/Silverlight/Surface. But this % of time will really depends on your personal interest/ organizational interest and the type of career growth you are looking forward to.
And if your job is merely converting domain/business logic to the code which doesn't involve critical technology roadblocks then it might be close to 0% time you need to spend on learning.
Since you didn't present any constraints or conditions in your question then the simplest answer I can give is:
Spend as much as you want.
The very fact you have to ask, might mean you are not ideally matched to writing code. First and foremost you should love coding, and finding out how things work.
This is not a profession where you ever stand still. Totally agree with another poster, in that you should always be looking for a better way, and recognising when there is no better way.
The best software workers - the rockstars if you will - are always on. Any situation can be a teaching. For instance, consider Gregor Hohpe's article Starbucks Does Not Use Two-Phase Commit, in which he analyses how the coffee vendor uses asynchronous processing to maximize throughput of customers' orders.
Coding == Learning
In my opinion.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
This book was written in the era of time sharing systems, procedural programming, and about 30 fewer years in software engineering experience. With the improvement of things such as existing libraries, higher level languages, IDES, and the amount of documentation and examples available on the internet how much of the book still holds true?
While I can believe that adding new people to a project may initially slow it down I would think things such as unit testing, separation of concerns, and other forms of automation and design improvements would allow new members of a team to become productive faster then assumed in the book, assuming the project had good design documentation and processes in place.
I don’t have experience on large projects or with large teams so am interested to hear what those of you who do have experiences with them think.
edit:
I was wondering if new communication tools such as Wikis, instant messaging, and the internet in general decreased the time spent communicating. Based on everyones answers I would say that any increase in communications efficiency has been offset by increased complexity.
It is still as true today as the day it was written. This is because it is fundamentally about communication between people working on the same project, and none of the advances of the past 30 years have substantially changed that.
Of course, we have learned a lot in those 30 years, but all improvements in our tools and undertanding have been incremental, in accordance with Brooks' "no silver bullet" prediction.
Isn't this kind of like asking if Sun Tzu's Art of War is still applicable to warfare since we have modern equipment?
The book still has things to tell us, and I for one have experienced the problems in communication that increased team sizes bring. You should be aware that unit tests, separation of concerns etc. are not new concepts.
However, some things have not stood the test of time. I don't believe that writing ASCII flow charts in your code is a good idea, and the "surgical team" approach suggested has been tried by several people (Charles Simony at MS, most famously) and found not to work too well.
The idea isn't that "large teams don't work", it's that "throwing people/money at the problem isn't the answer". Things like unit testing, separation of concerns, etc. are doing other things rather than just throwing people at the problem. These other things allow you to carefully add more people in the right place to speed things up. If anything, the points you make support the ideas of the book.
Both of the famous Brooks writings, "No Silver Bullet" and "The Mythical Man-Month" are explanations of fundamental limitations, in programming languages and project management respectively.
While true that some of the chapters a bit farther than halfway through TMMM deal too heavily with the technology of the time, the remaining chapters are still as relevant today as they were when written.
In TMMM, Brooks follows a technique of "outline the problem", "show some false starts", and "propose my own solution". Some commentators above has pointed out that his own solutions may be considered outdated at this point, but his concise description of the problems inherent in large projects make the book worth reading.
One theme he keeps coming back to is communications overhead as a limiting factor for large teams. As a thought experiment, consider the effect of the Internet as a communications medium for programmers, and the catalyst that has been to large open source projects.
Personally, I would read the book just for the "The Joys of the Craft" section. I've never read anything that so elegantly describes what programming at it's best feels like.
(If you're curious, it's on page 7, and viewable in the amazon.com "Look Inside!" feature)
I certainly think things like "No Silver Bullet" are just as applicable today as they were decades ago, especially as we see more and more young people come into industry and think x is the latest and greatest killer language/technology and all the other technologies will die because of it.
Granted, the references to Ada or sharing computers are antiquated, but the concept of accidental and essential difficulties, buy vs. build, how code is complex by definition because we don't repeat parts, and all the other theoretical topics are still completely accurate and relevant.
The other argument for why TMMM is relevant is that, it's not really about software itself but about how programmers get things done. In this way, it's hard for it to become obsolete.
The two that stick out in my mind: "version 2" still applies and so does "adding more people not necessarily faster".
Spolsky discusses "version 2" in his own way. I do not recall if he specifically links to MMM but it is very similar in concept.
Communication has become a lot more efficient than when MMM was identified, however, I think it's all proportional. It takes a lot more to make software production ready than it did when MMM was written.
Someone said that everything in computer science was discovered in 1960 something and everything since then has been derivative.
Read TMM as a book outlining a problem, perhaps THE problem, in software engineering: its not the technology, its the people! All the improvements you mention spring from that core realization. They are all in place to solve the problems Brooks laid out. This is the book that I'm sure Kent Beck and Ward Cunningham and Alister Cockburn and Martin Fowler all read, took to heart and then began to craft their silver bullets.
I consider this to be one of the "must read" books for anyone who wants to understand the software development process.
The demand for development workforce has been growing rapidly last 40 years, and this need will not stop. Since the rate of clever (vide Joel’s “smart and get things done”) people in the population remains generally the same, educating more and more developers each year means that the average smartness of a developer is getting lower.
40 years ago, it was demigods who became developers; 20 years ago, it was a job for clever people, while now when I look at young CS students at my Alma Mater it seems that they are taking everyone who knows what a computer is.
This does not mean a disaster is coming – western world keeps importing smart people (or outsourcing work) from emerging markets or third world countries. The new development tools make it easier to develop good applications. Those factors seem to neutralize each other, making MM-M eternal true.
The Mythical Man Month is a very dated read, but the core truths still apply. Sure Brooks discusses the need for a secretary which is clearly not true today and his concept of a surgical team doesn't work well, but most of the book is still accurate. His insight that communication requirements increase along with the size of the team is still true. His observation that adding people to a late project makes it later has been born out by a lot of projects. Unit tests help a little, but they don't stop one from misunderstanding the code or asking a lot of questions. No Silver Bullet also continues to stand the test of time.
All of it. The simple fact is that software projects are nontrivial; we build our own domain knowledge, really, directly into our solutions. Domain knowledge transfer is costly, both for the transferrer and for the transferee; this has not changed. And I, for one, believe it never will, no matter what the practices and tools used. Things may get marginally better, but the simple fact is that teaching and learning are both expensive and difficult things, and there's just no way to avoid them.
The social factors are still present, because humans are still essentially the same beasts we were 50 years ago.
The technical examples are almost completely obsolete, and only make sense when you think about the 0.034 MIPS System/360 of 1964. When you've only got 8 KB of memory, suggesting that the user should be responsible for handling leap years, instead of wasting 26 bytes of system memory (as Brooks did) made sense, but today it seems downright silly. I don't know any system that small today -- your telephone is thousands of times more powerful than the most powerful OS/360 system. Today we know a lot more about usability and human-computer interaction, and making the user responsible for that category of thing is just crazy.
One programmer can now write more code/build more software than one programmer could back then, but adding the second developer is not going to produce twice as much.
If/when I get on a project with good design documentation and processes in place, I'll let you know if that improves anything.
If you think of a large project that is late, it is then most likely in crisis/panic mode, as with most things in this mode, the best answer is a half decent plan, simply throwing more resources at a crisis doesn't solve anything other than waste resources and compund the problem.
There is no substitute for scheduling (pronounced shed-uling), planning and decent management.
As with most of these "one liners" or "golden rules", consider them more guidelines (with context) than set in stone.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I lead a small group of programmers in a university setting, having just moved into this position last year. While the majority of our team are full time employees, we have a couple of people who are traditionally graduate assistants.
The competition for these assistantships is fairly intense, as they get free graduate school tuition on top of their salary while they have the job. We require that they sign up for at least a year, though we consider ourselves lucky if they stay for two. After that, they get their master's degree and move on to bigger and better things.
As you can imagine, hiring and re-training these positions is time- and resource-intensive. To make matters worse, up to now they have typically been the sole developer working on their respective projects, with me acting in an advisory and supervisory role, so wrangling the projects themselves to fight the entropy as we switch from developer to developer is a task unto itself.
I'm tempted to bring up to the administrators the possibility of hiring a full- (and long-haul) developer to replace these two positions, but for a school in a budget crisis, paying for two half-time graduate assistants is far cheaper (in terms of salary and benefits) than paying for one full-time developer. Also, since I'm new to this position, I'd like to avoid seeming as though I'm not able to deal with what I signed up for. For the forseeable future, I don't think the practice of hiring short-term graduate assistants is going to change.
My question: What can I do to create an effective training program considering that the employees may be gone after as little as a year on the job?
How much time should I invest in training them, and how much would simply be a waste of time?
How much time should they take simply getting acclamated to our process and the project?
Are there any specific training practices or techniques that can help with this kind of situation?
Has anyone dealt with a similar situation before?
Do I worry too much, or not enough?
By the way, and for the record, we do the vast majority of our development in Perl. It's hard to find grad students who know Perl, while on the other hand everybody seems to have at least an academic understanding of Java. Hence this question which I asked a while back.
Why don't you ask the students what they find difficult and make cheat sheets, lectures, etc. for the parts of the job that they have trouble with? Maybe you need to create some introductory Perl lectures or purchase some dead trees. How about a Safari subscription at O'Reilly? I'd ask the students how they prefer to learn, though, before embarking on a training project. Everyone has different learning styles.
I'd also spend some time and capital creating a culture of professional software development at work. It'll be tough since academic programmers are often neophytes and used to kludging up solutions (I'm an academic programmer, btw) but the students will thank you in the long run. Maybe you can all go out to lunch once a week to discuss programming and other topics. You might also want to take some time to do code reviews so people can learn from each other.
With high turnover you definitely need to ensure that knowledge transfer occurs. Make sure you are using source code control and that your students understand proper commenting. I'd also make the students create brief documentation for posterity. If they are getting credit, make them turn in a writeup of their progress once a semester. You can put this in a directory in the project's repository for anyone who inherits it. As mentioned in other posts, a group wiki can really help with knowledge transfer. We use Mediawiki in our group and like it a lot.
One last thing I should add is that I find it helps to keep a list of projects for new developers that relatively easy and can be completed in a month or so. They are a great way for new people to get acclimated to your development environment.
This is a relative question, and should be taken on a case-by-case basis. If the new hire already knows Perl, you don't need to go over this piece of training (yes, you could put Perl as a mandatory prerequisite, but that would significantly limit your applicant pool), and their first bit of training should be something like fixing a bug in an existing application or walking them through an application they will maintain. Though, given that the developers are only there for a year makes me think the development styles are going to vary some (if not a lot).
Getting the new person up to speed with your process is very important, as long as your process works. In this high turnover environment, you should put a strong emphasis on documentation in your process. A Wiki is a great thing to have for this documentation, since it's centralized and any of the developers can access it. Having them try to figure out how a project works by themselves (with little to no documentation) is a waste of both their time and your time.
Perhaps I'm reading too much into the question, but if your university teaches java, why are you using Perl? Wouldn't it make more sense to use the tools that your students already know? This alone would cut the learning curve significantly. [once you eliminate the legacy code of course]
other than that, try:
break the projects up into month-sized bites
overlap the internships by at least 2 months, if not 6, so the new guy can work with and be trained by the old guy(s)
document whatever repeatable processes you have (as was suggested by Mark Nold)
if the grad students are cheaper than full-time pros, quit whinin' ;-) If not, go for the pros.
Have you considered making a "three ring binder" like Macdonalds and many other high turn over industries have? Have one folder which you can print out and hand to the new hire which shows the new hire some basics of getting up and running with Perl in your environment. This should be a "hello world", plus some basic regex and array manipulation. Lastly your manual should go on to show examples of the 5 things you find yourself doing all the time.
The example code may be authenticating users against an external security system, walking through recordsets or using ghostscript to create PDFs. Whatever they are, they should cover the basics of what you meet 80% of the time. More importantly the examples should show users how you expect the code to be written for clarity (eg: naming and approach), and give them some insight into servers and software in use and other practicalities which a generic book won't show them.
You won't get the binder right first time, but since you have a high staff turn over, you'll have plenty of time to test and improve it.
On top of this i would pick a single Perl programming book and give the new user their own copy of your three ring binder, plus "Programming Perl" to keep on their first day. At a cost of $50 per hire i'm sure it's a lot cheaper than the alternatives and you'll have them flipping burgers.... i mean cutting code in no time.
My initial couple of thoughts are that you should:
hire for the position, i.e. it's Perl-centric so make that a big part of the pre-requisites. That way you don't need that piece of training as well.
invest time in the on-boarding process, maybe use a wiki so that you can easily update it to help bringing them on-board.
Edit: Some extra points:
maybe have a chat and see if Perl can be introduced into the curriculum? If not, then make it known six months before the ads go up that applicants need to know Perl. This way you'll get people who have Perl experience and who have actively demonstrated their motivation.
can you open up some small projects so that they could be done by potential candidates during this six months?
approach the design of your large-scale projects so that they can be done in a piece meal manner. This is how The ACE Components have been done iirc.
allow a specific period for documentation and review of the work done by the departing grad student.
allow an overlap period of at least a couple of weeks where the new grad student can work with the departing grad student. They can learn the development environment and they can be guinea pigs for any updates to your wiki.
Still more to come...
HTH
cheers
That is a pickle, but it is not as uncommon in the commercial sector as you would think. I heard a statistic once that the average tenure of a programmer industry-wide is about 18-24 months. Normally I would suggest getting more experienced programmers who would require less ramp-up time and only need to be trained on the problem domain/technology updates and not the basics
I think your best bet is just to ask for about 30-50% more grad students than it will take to actually perform the job to account for the learning and ramp-up time and invest in some additional resources for testing as this environment is a recipe for mistakes since everyone is learning on the job. Also, this is probably difficult given the academic schedule, but try to stagger the start dates as much as possible to maximize the overlap between employees. Pair-programming teams of new-hires/old-hires might also help increase consistency and supplement the training without sacrificing too much productivity.
How much time should I invest in training them, and how much would simply be a waste of time?
Answer: It's not the amount of time or the amount of waste, but perhaps the approach. Would it be possible to video train - video yourself training one person and provide it as training for subsequent students/developers. You can add over time, but it does reduce your time needed to go through the same process.
How much time should they take simply getting acclamated to our process and the project?
Answer: This is all dependent on the person...min. of a day max of 2-3 weeks I would guess on avg.
Are there any specific training practices or techniques that can help with this kind of situation?
Answer: Video training (home made), having the current student/developer create/update a wiki of needed info, points of interest, etc.
Has anyone dealt with a similar situation before?
Answer: Use to be a 12-18 month avg. turnover - I would imagine that it's changed now (longer), but every company has turn-over, but perhaps not forced like yours due to the resource being students.
Do I worry too much, or not enough?
Answer: Knowledge lost through transition is a key risk area...
Is the application something you could consider open-sourcing?