Why is speech recognition difficult? [closed]

Why is speech recognition difficult? [closed] - algorithm

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Why is speech recognition so difficult? What are the specific challenges involved? I've read through a question on speech recognition, which did partially answer some of my questions, but the answers were largely anecdotal rather than technical. It also still didn't really answer why we still can't just throw more hardware at the problem.
I've seen tools that perform automated noise reduction using neural nets and ambient FFT analysis with excellent results, so I can't see a reason why we're still struggling with noise except in difficult scenarios like ludicrously loud background noise or multiple speech sources.
Beyond this, isn't it just a case of using very large, complex, well-trained neural nets to do the processing, then throwing hardware at it to make it work fast enough?
I understand that strong accents are a problem and that we all have our colloquialisms, but these recognition engines still get basic things wrong when the person is speaking in a slow and clear American or British accent.
So, what's the deal? What technical problems are there that make it still so difficult for a computer to understand me?

Some technical reasons:
You need lots of tagged training data, which can be difficult to acquire once you take into account all the different accents, sounds etc.
Neural networks and similar gradient descent algorithms don't scale that well - just making them bigger (more layers, more nodes, more connections) doesn't guarantee that they will learn to solve your problem in a reasonable time. Scaling up machine learning to solve complex tasks is still a hard, unsolved problem.
Many machine learning approaches require normalised data (e.g. a defined start point, a standard pitch, a standard speed). They don't work well once you move outside these parameters. There are techniques such as convolutional neural networks etc. to tackle these problems, but they all add complexity and require a lot of expert fine-tuning.
Data size for speech can be quite large - the size of the data makes the engineering problems and computational requirements a little more challenging.
Speech data usually needs to be interpreted in context for full understanding - the human brain is remarkably good at "filling in the blanks" based on understood context. Missing informations and different interpretations are filled in with the help of other modalities (like vision). Current algorithms don't "understand" context so they can't use this to help interpret the speech data. This is particularly problematic because many sounds / words are ambiguous unless taken in context.
Overall, speech recognition is a complex task. Not unsolvably hard, but hard enough that you shouldn't expect any sudden miracles and it will certainly keep many reasearchers busy for many more years.....

Humans use more than their ears when listening, they use the knowledge they
have about the speaker and the subject. Words are not arbitrarily sequenced
together, there is a grammatical structure and redundancy that humans use
to predict words not yet spoken. Furthermore, idioms and how we ’usually’
say things makes prediction even easier.
In Speech Recognition we only have the speech signal. We can of course construct a
model for the grammatical structure and use some kind of statistical model
to improve prediction, but there are still the problem of how to model world
knowledge, the knowledge of the speaker and encyclopedic knowledge. We
can, of course, not model world knowledge exhaustively, but an interesting
question is how much we actually need in the ASR to measure up to human
comprehension.
Speech is uttered in an environment of sounds, a clock ticking, a computer
humming, a radio playing somewhere down the corridor, another human
speaker in the background etc. This is usually called noise, i.e., unwanted
information in the speech signal. In Speech Recognition we have to identify and filter out
these noises from the speech signal. Spoken language != Written language
1: Continuous speech
2: Channel variability
3: Speaker variability
4: Speaking style
5: Speed of speech
6: Ambiguity
All this points have to be considered while building a speech recognition, That's why its a quite difficult.
-------------Refered from http://www.speech.kth.se/~rolf/gslt_papers/MarkusForsberg.pdf

I suspect you are interested in 'countinuous' speech recognition, where the speaker speaks sentences (not single words) at normal speed.
The problem is not simply one of signal analysis, but there is a large natural language component as well. Most of us understand spoken language not by analyzing every single thing that we hear, as that would never work because each person speaks differently, phonemes are suppressed, pronunciations are different, etc. We just interpret a portion of what we hear and the rest is 'interpolated' by our brain once the context of what is being said is established. When you have no context, it is difficult to understand spoken language.

Lots of major problems in speech recognition are not directly related to the language itself:
different people (women, men, children, elders etc.) have different voices
sometimes the same person sounds different for example when the person has a cold
different background noises
everyday speech sometimes contains words from other languages (like you have the german word Kindergarden in the US/English)
some persons not from the country itself learned the language (they usually sound different)
some persons speak faster, others speak slower
quality of the microphone
etc.
Solving these things always is pretty hard... on top of that you have the language/pronounciation to take care of...
For reference see the Wikipedia article http://en.wikipedia.org/wiki/Speech_recognition - it has a good overview including some links and book references which are a good starting point...
From the technical POV the "audio preprocessing" is just one step in a long process... let's say the audio is "crytal clear", then several of the above mentioned aspects (like having a cold, having a mixup in languages etc.) still need to be solved.
All this means that for good speech recognition you need to have a model of the langauge(s) that is thorough enough to account for slight differences (like "ate" versus "eight") which usually involves some context-analysis (i.e. semantic and fact/world knowledge, see http://en.wikipedia.org/wiki/Semantic%5Fgap) etc.
Since almost all relevant languages have evolved and were not designed as mathematical models you basically need to "reverse engineer" the available implicit and explicit knowlegde about a language into a model which is a big challenge IMHO.
Having worked myself with neural nets I can assure you that while they provide good results in some cases they are not "magical tools"... almost always a good neural net has been carefully designed and optimized for the specific requirement... in this context it needs both extensive experience/knowledge of languages and neural nets PLUS extensive training to achieve usable results...

Its been a decade since I took a language class in college, but from what I recall language can be broken up into phonemes. Language processors do their best to identify these phonemes, but they are unique to every individual. Even once they are broken up they must then be reassembled into a meaningful construct.
Take this example, humans are quite capable of reading with no punctuation and no capital letters and no spaces. It takes a second, but we can do it quite readily. This is kind of what a computer has to look at when it gets a block of phonemes. However, computers are not nearly as good at parsing this data out. One of the reasons is it is difficult for computers to have context. Humans can even understand babies despite the fact that their phonemes can be completely wrong.
Even if you have all the phonemes correct, then arranging them into an order that makes sense is also difficult.

Related

How does a computer reproduce the SIFT paper method on its own in deep learning

Let me begin by saying that I am struggling to understand what is going on in deep learning. From what I gather, it is an approach to try to have a computer engineer different layers of representations and features to enable it learn stuff on its own. SIFT seems to be a common way to sort of detect things by tagging and hunting for scale invariant things in some representation. Again I am completely stupid and in awe and wonder about how this magic is achieved. How does one have a computer do this by itself? I have looked at this paper https://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf and I must say at this point I think it is magic. Can somebody help me distill the main points of how this works and why a computer can do it on its own?

SIFT and CNN are both methods to extract features from images in different ways and outputs.
SIFT/SURF/ORB or any similar feature extraction algorithms are "Hand-made" feature extraction algorithms. It means, independent from the real world cases, they are aiming to extract some meaningful features. This approach has some advantages and disadvantages.
Advantages :
You don't have to care about input image conditions and probably you don't need any pre-processing step to extract those features.
You can directly get SIFT implementation and integrate it to your application
With GPU based implementations (i.e. GPU-SIFT), you can achieve high inference speed.
Disadvantages:
It has limitation about finding the features. You will have trouble about getting features over quite plain surfaces.
SIFT/SURF/ORB cannot solve all problems that requires feature classification / matching. Think face recognition problem. Do you think that extracting & classifying SIFT features over face will be enough to recognize people?
Those are hand-made feature extraction techniques, they cannot be improved over time (of course unless a better technique is being introduced)
Developing such a feature extraction technique requires a lot of research work
In the other hand, in deep learning, you can start analyzing much complex features which are impossible by human to recognize. CNNs are perfect approach as today to analyze hierarchical filter responses and much complex features which are created by combining those filter responses (going deeper).
The main power of CNNs are coming from not extracting features by hand. We only define "how" PC has to look for features. Of course this method has some pros and cons too.
Advantages :
More data, better success! It is all depending on data. If you have enough data to explain your case, DL outperforms hand-made feature extraction techniques.
As soon as you extract the features from image, you can use it for many purposes like to segment image, to create description words, to detect objects inside image, to recognize them. The better part is, all of them can be obtained in one shot, rather than complex sequential processes.
Disadvantages:
You need data. Probably a lot.
It is better to use supervised or reinforcement learning methods in these days. As unsupervised learning is still not good enough yet.
It takes time and resource to train a good neural net. A complex hierarchy like Google Inception took 2 weeks to be trained on 8 GPU server rack. Of course not all the networks are so hard to train.
It has some learning curve. You don't have to know how SIFT is working to use it for your application but you have to know how CNNs are working to use them in your custom purposes.

Artificial Intelligence/Rules to guess user taste in Apparel/Clothing

Are there standard rules engine/algorithms around AI that would predict the user taste on a particular kind of product like clothes.
I know it's one thing all e-commerce website will kill for. But I am looking out for theoretical patterns defined out there which would help make that prediction in a better way, if not accurately.

Two books that cover recommender systems:
Programming Collective Intelligence: Python, does a good job explaining the algorithm, but doesn't provide enough help IMO in terms of understanding how to scale.
Algorithms of the Intelligent Web: Java, harder to follow, but also covers using persistence, in this case MySQL, to facilitate scaling and identifiers areas in example code that will not scale as-is.
Basically two ways of approaching the problem, user or item based. Netflix appears to use the former, while Amazon the latter. Typically user based requires more time and/or processing power to generate recommendations because you tend to have more users than items to consider.

Not sure how to answer this, as this question is overly broad. What you are describing is a Machine Learning kind of task, and thus would fall under that (very broad) umbrella. There are a number of different algorithms that can be used for something like this, but most texts would tell you that the definition of the problem is the important part.
What parts of fashion are important? What parts are not? How are you going to gather the data? How noisy is the data? All of these are important considerations to the problem space. Pandora does a similar type of thing with music, with their big benefit being that their users tell them initially what they like and don't like.
To categorize their music, they actually have trained musicians listening to the music to identify all sorts of stuff. See the article on Ars Technica here for more information about that. Based on what I know about fashion tastes, I would say that it is a similar problem space, and would probably require experts to "codify" the information before you could attempt to draw parallels.
Sorry for the vague answer - if you want more specifics, I would recommend asking a more specific question, about specific algorithms or data sets, etc.

Why isn't speech recognition advancing? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
What's so difficult about the subject that algorithm designers are having a hard time tackling it?
Is it really that complex?
I'm having a hard time grasping why this topic is so problematic. Can anyone give me an example as to why this is the case?

Auditory processing is a very complex task. Human evolution has produced a system so good that we don't realize how good it is. If three persons are talking to you at the same time you will be able to focus in one signal and discard the others, even if they are louder. Noise is very well discarded too. In fact, if you hear human voice played backwards, the first stages of the auditory system will send this signal to a different processing area than if it is real speech signal, because the system will regard it as "no-voice". This is an example of the outstanding abilities humans have.
Speech recognition advanced quickly from the 70s because researchers were studying the production of voice. This is a simpler system: vocal chords excited or not, resonation of vocal tractus... it is a mechanical system easy to understand. The main product of this approach is the cepstral analysis. This led automatic speech recognition (ASR) to achieve acceptable results. But this is a sub-optimal approach. Noise separation is quite bad, even when it works more or less in clean environments, it is not going to work with loud music in the background, not as humans will.
The optimal approach depends on the understanding of the auditory system. Its first stages in the cochlea, the inferior colliculus... but also the brain is involved. And we don't know so much about this. It is being a difficult change of paradigm.
Professor Hynek Hermansky compared in a paper the current state of the research with when humans wanted to fly. We didn't know what was the secret —The feathers? wings flapping?— until we discovered Bernoulli's force.

Because if people find it hard to understand other people with a strong accent why do you think computers will be any better at it?

I remember reading that Microsoft had a team working on speech recognition, and they called themselves the "Wreck a Nice Beach" team (a name given to them by their own software).
To actually turn speech into words, it's not as simple as mapping discrete sounds, there has to be an understanding of the context as well. The software would need to have a lifetime of human experience encoded in it.

This kind of problem is more general than only speech recognition.
It exists also in vision processing, natural language processing, artificial intelligence, ...
Speech recognition is affected by the semantic gap problem :
The semantic gap characterizes the
difference between two descriptions of
an object by different linguistic
representations, for instance
languages or symbols. In computer
science, the concept is relevant
whenever ordinary human activities,
observations, and tasks are
transferred into a computational
representation
Between an audio wave form and a textual word, the gap is big,
Between the word and its meaning, it is even bigger...

beecos iyfe peepl find it hard to arnerstand uvver peepl wif e strang acsent wie doo yoo fink compootrs wyll bee ani bettre ayt it?
I bet that took you half a second to work out what the hell I was typing and all Iw as doing was repeating Simons answer in a different 'accent'. The processing power just isn't there yet but it's getting there.

The variety in language would be the predominant factor, making it difficult. Dialects and accents would make this more complicated. Also, context. The book was read. The book was red. How do you determine the difference. The extra effort needed for this would make it easier to just type the thing in the first place.
Now, there would probably be more effort devoted to this if it was more necessary, but advances in other forms of data input have come along so quickly that it is not deemed that necessary.
Of course, there are areas where it would be great, even extremely useful or helpful. Situations where you have your hands full or can't look at a screen for input. Helping the disabled etc. But most of these are niche markets which have their own solutions. Maybe some of these are working more towards this, but most environments where computers are used are not good candidates for speech recognition. I prefer my working environment to be quiet. And endless chatter to computers would make crosstalk a realistic problem.
On top of this, unless you are dictating prose to the computer, any other type of input is easier and quicker using keyboard, mouse or touch. I did once try coding using voice input. The whole thing was painful from beginning to end.

Because Lernout&Hauspie went bust :)
(sorry, as a Belgian I couldn't resist)

The basic problem is that human language is ambiguous. Therefore, in order to understand speech, the computer (or human) needs to understand the context of what is being spoken. That context is actually the physical world the speaker and listener inhabit. And no AI program has yet demonstrated having adeep understanding of the physical world.

Speech synthesis is very complex by itself - many parameters are combined to form the resulting speech. Breaking it apart is hard even for people - sometimes you mishear one word for another.

Most of the time we human understand based on context. So that a perticular sentence is in harmony with the whole conversation unfortunately computer have a big handicap in this sense. It is just tries to capture the word not whats between it.
we would understand a foreigner whose english accent is very poor may be guess what is he trying to say instead of what is he actually saying.

To recognize speech well, you need to know what people mean - and computers aren't there yet at all.

You said it yourself, algorithm designers are working on it... but language and speech are not an algorithmic constructs. They are the peak of the development of the highly complex human system involving concepts, meta-concepts, syntax, exceptions, grammar, tonality, emotions, neuronal as well as hormon activity, etc. etc.
Language needs a highly heuristic approach and that's why progress is slow and prospects maybe not too optimistic.

I once asked a similar question to my instructor; i asked him something like what challenge is there in making a speech-to-text converter. Among the answers he gave, he asked me to pronounce 'p' and 'b'. Then he said that they differ for a very small time in the beginning, and then they sound similar. My point is that it is even hard to recognize what sound is made, recognizing voice would be even harder. Also, note that once you record people's voices, it is just numbers that you store. Imagine trying to find metrics like accent, frequency, and other parameters useful for identifying voice from nothing but input such as matrices of numbers. Computers are good at numerical processing etc, but voice is not really 'numbers'. You need to encode voice in numbers and then do all computation on them.

I would expect some advances from Google in the future because of their voice data collection through 1-800-GOOG411

It's not my field, but I do believe it is advancing, just slowly.
And I believe Simon's answer is somewhat correct in a way: part of the problem is that no two people speak alike in terms of the patterns that a computer is programmed to recognize. Thus, it is difficult to analysis speech.

Computers are not even very good at natural language processing to start with. They are great at matching but when it comes to inferring, it gets hairy.
Then, with trying to figure out the same word from hundreds of different accents/inflections and it suddenly doesn't seem so simple.

Well I have got Google Voice Search on my G1 and it works amazingly well. The answer is, the field is advancing, but you just haven't noticed!

If speech recognition was possible with substantially less MIPS than the human brain, we really could talk to the animals.
Evolution wouldn't spend all those calories on grey matter if they weren't required to do the job.

Spoken language is context sensitive, ambiguous. Computers don't deal well with ambiguous commands.

I don't agree with the assumption in the question - I have recently been introduced to Microsoft's speech recognition and am impressed. It can learn my voice after a few minutes and usually identifies common words correctly. It also allows new words to be added. It is certainly usable for my purposes (understanding chemistry).
Differentiate between recognising the (word) tokens and understanding the meaning of them.
I don't yet know about other languages or operating systems.

The problem is that there are two types of speech recognition engines. Speaker-trained ones such as Dragon are good for dictation. They can recognize almost any spoke text with fairly good accuracy, but require (a) training by the user, and (b) a good microphone.
Speaker-independent speech rec engines are most often used in telephony. They require no "training" by the user, but must know ahead of time exactly what words are expected. The application development effort to create these grammars (and deal with errors) is huge. Telephony is limited to a 4Khz bandwidth due to historical limits in our public phone network. This limited audio quality greatly hampers the speech rec engines' ability to "hear" what people are saying. Digits such as "six" or "seven" contain an ssss sound that is particularly hard for the engines to distinguish. This means that recognizing strings of digits, one of the most basic recognition tasks, is problematic. Add in regional accents, where "nine" is pronounced "nan" in some places, and accuracy really suffers.
The best hope are interfaces that combine graphics and speech rec. Think of an IPhone application that you can control with your voice.

Software projects and development in a research environment [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
What are useful strategies to adopt when you or the project does not have a clear idea of what the final (if any) product is going to be?
Let us take "research" to mean an exploration into an area where many things are not known or implemented and where a formal set of deliverables cannot be specified at the start of the project. This is common in STEM (science (physics, chemistry, biology, materials, etc.), technology engineering, medicine) and many areas of informatics and computer science. Software is created either as an end in itself (e.g. a new algorithm), a means of managing data (often experimental) and simulation (e.g. materials, reactions, etc.). It is usually created by small groups or individuals (I omit large science such as telescopes and hadron colliders where much emphasis is put of software engineering.)
Research software is characterised by (at least):
unknown outcome
unknown timescale
little formal project management
limited budgets (in academia at least)
unpredictability of third-party tools and libraries
changes in the outside world during the project (e.g. new discoveries which can be positive - save effort - or negative - getting scooped
Projects can be anything from days ("see if this is a worthwhile direction to go") to years ("this is my PhD topic") or longer. Frequently the people are not hired as software people but find they need to write code to get the research done or get infected by writing software. There is generally little credit for good software engineering - the "product" is a conference or journal publication.
However some of these projects turn out to be highly valuable - the most obvious area is genomics where in the early days scientists showed that dynamic programming was a revolutionary tool to help thinking about protein and nucleic structure - now this is a multi-billion industry (or more). The same is true for quantum mechanics codes to predict properties of substances.
The downside is that much code gets thrown away and it is difficult to build on. To try to overcome this we have build up libraries which are shared in the group and through the world as Open Source (but here again there is very little credit given). Many researchers reinvent the wheel ("head-down" programming where colleagues are not consulted and "hero" programming where someone tries to do the whole lot themself).
Too much formality at the start of a project often puts people off and innovation is lost (no-one will spend 2 months writing formal specs and unit tests). Too little and bad habits are developed and promulgated. Programming courses help but again it's difficult to get people doing them especially when you rely on their goodwill. Mentoring is extremely valuable but not always successful.
Are there online resources which can help to persuade people into good software habits?
EDIT: I'm grateful for dmckee (below) for pointing out a similar discussion. It's all good stuff and I particularly agree with version control as being one of the most important things that we can offer scientists (we offered this to our colleagues and got very good takeup). I also like the approach of the Software Carpentry course mentioned there.

It's extremely difficult. The environment both you and Stefano Borini describe is very accurate. I think there are three key factors which propagate the situation.
Short-term thinking
Lack of formal training and experience
Continuous turnover of grad students/postdocs to shoulder the brunt of new development
Short-term thinking. There are a few reasons that short-term thinking is the norm, most of them already well explained by Stefano. As well as the awful pressure to publish and the lack of recognition for software creation, I would emphasise the number of short-term contracts. There is simply very little advantage for more junior academics (PhD students and postdocs) to spend any time planning long-term software strategies, since contracts are 2-3 years. In the case of longer-term projects e.g. those based around the simulation code of a permanent member of staff, I have seen some applications of basic software engineering, things like simple version control, standard test cases, etc. However even in these cases, project management is extremely primitive.
Lack of formal training and experience. This is a serious handicap. In astronomy and astrophysics, programming is an essential tool, but understanding of the costs of development, particularly maintenance overheads, is extremely poor. Because scientists are normally smart people, there is a feeling that software engineering practices don't really apply to them, and that they can 'just make it work'. With more experience, most programmers realise that writing code that mostly works isn't the hard part; maintaining and extending it efficiently and safely is. Some scientific code is throwaway, and in these cases the quick and dirty approach is adequate. But all too often, the code will be used and reused for years to come, bringing consequent grief to all involved with it.
Continuous turnover of grad students/postdocs for new development. I think this is the key feature that allows the academic approach to software to continue to survive. If the code is horrendous and takes days to understand and debug, who pays that price? In general, it's not the original author (who has probably moved on). Nor is it the permanent member of staff, who is often only peripherally involved with new development. It is normally the graduate student who is implementing new algorithms, producing novel approaches, trying to extend the code in some way. Sometimes it will be a postdoc, hired specifically to work on adding some feature to an existing code, and contractually obliged to work on this area for some fraction of their time.
This model is hugely inefficient. I know a PhD student in astrophysics who spent over a year trying to implement a relatively basic piece of mathematics, only a few hundred lines of code, in an existing n-body code. Why did it take so long? Because she literally spent weeks trying to understand the existing, horribly written code, and how to add her calculation to it, and months more ineffectively debugging her problems due to the monolithic code structure, coupled with her own lack of experience. Note that there was almost no science involved in this process; just wasting time grappling with code. Who ultimately paid that price? Only her. She was the one who had to put more hours in to try and get enough results to make a PhD. Her supervisor will get another grad student after she's gone - and so the cycle continues.
The point I'm trying to make is that the problem with the software creation process in academia is endemic within the system itself, a function of the resources available and the type of work that is rewarded. The culture is deeply embedded throughout academia. I don't see any easy way of changing that culture through external resources or training. It's the system itself that needs to change, to reward people for writing substantial code, to place increased scrutiny on the correctness of results produced using scientific code, to recognise the importance of training and process in code, and to hold supervisors jointly responsible for wasting the time of the members of their research group.

I'll tell you my experience.
It is undoubt that a lot of software gets created and wasted in the academia. Fact is that it's difficult to adapt research software, purposely created for a specific research objective, to a more general environment. Also, the product of academia are scientific papers, not software. The value of software in academia is zero. The data you produce with that software is evaluated, once you write a paper on it (which takes a lot of editorial time).
In most cases, however, research groups have recognized frequent patterns, which can be polished, tested and archived as internal knowledge. This is what I do with my personal toolkit. I grow it according to my research needs, only with those features that are "cross-project". Developing a personal toolkit is almost a requirement, as your scientific needs are most likely unique for some verse (otherwise you would not be doing research) and you want to have as low amount of external dependencies as possible (since if something evolves and breaks your stuff, you will not have the time to fix it).
Everything else, however, is too specific for a given project to be crystallized. I therefore tend not to encapsulate something that is clearly a one-time solver. I do, however, go back and improve it if, later on, other projects require the same piece of code.
Short project span, and the heat of research (e.g. the publish or perish vision so central today), requires agile, quick languages, and in general, languages that can be grasped quickly. Ph.Ds in genomics and quantum chemistry don't have formal programming background. In some cases, they don't even like it. So the language must be quick, easy, clean, flexible, and easy to understand later on. The latter point is capital, as there's no time to produce documentation, and it's guaranteed that in academia, everyone will leave sooner or later, you burn the group experience to zero every three years or so. Academia is a high risk industry that periodically fires all their hard-formed executors, keeping only some managers. Having a code that is maintainable and can be easily grasped by someone else is therefore capital. Also, never underestimate the power of a google search to solve your problems. With a well deployed language you are more likely to find answers to gotchas and issues you can stumble on.
Management is a problem as well. Waterfall is out of discussion. There is no time for paperwork programming (requirements, specs, design). Spiral is quite ok, but as low paperwork as possible is clearly recommended. Fact is that anything that does not give you an article in academia is wasted time. If you spend one month writing specs, it's a month wasted, and your contract expires in 11 months. Moreover, that fatty document counts zero or close to zero for your career (as many other things: administration and teaching are two examples). Of course, Agile methods are also out of discussion. Most development is made by groups that are far, and in general have a bunch of other things to do as well. Coding concentration comes in brief bursts during "spare time" between articles, and before or after meetings. The bazaar is the most likely, but the bazaar carries a lot of issues as well.
So, to answer your question, the best strategy is "slow accumulation" of known good software, development in small bursts with a quick and agile method and language. Good coding practices need to be taught during lectures, as good laboratory practices are taught during practical courses (eg. never put water in sulphuric acid, always the opposite)

The hardest part is the transition between "this is just for a paper" and "we're really going to use this."
If you know that the code will only be for a paper, fine, take short cuts. Hardcode everything you can. Don't waste time on extensive validation if the programmer is the only one who will ever run the code. Etc. The problem is when someone says "Great! Now let's use this for real" or "Now let's use it for this entirely different scenario than what it was developed and tested for."
A related challenge is having to explain why the software isn't ready for prime time even though it obviously works, i.e. it's prototype quality and not production quality. What do you mean you need to rewrite it?

I would recommend that you/they read "Clean Code"
http://www.amazon.co.uk/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882/ref=sr_1_1?ie=UTF8&s=books&qid=1251633753&sr=8-1
The basic idea of this book is that if you do not keep the code "clean", eventually the mess will stop you from making any progress.

The kind of big science I do (particle physics) has a small number of large, long-running projects (ROOT and Geant4, for instance). These are developed mostly by actual programming professionals. Using processes that would be recognized by anyone else in the industry.
Then, each collaboration has a number of project-wide programs which are developed collaboratively under the direction of a small number of senior programming scientists. These use at least the basic tools (always version control, often some kind of bug tracking or automated builds).
Finally almost every scientist works on their own programs. Use of process on these programs is very spotty, and they often suffer from all the ills that others have discussed (short lifetimes, poor coding skills, no review, lots of serial maintainers, Not Invented Here Syndrome, etc. etc.). The only advantage that is available here compared to small group science, is that they work with the tools I talked about above, so there is something that you can point to and say "That is what you want to achieve.".

Don't really have that much more to add to what has already been said. It's a difficult balance to strike because our priorities are different - academia is all about discovering new things, software engineering is more about getting things done according to specifications.
The most important thing I can think of is to try and extricate yourself from the culture of in-house development that goes on in academia and try to maintain a disciplined approach to development, difficult as that may be in many cases owing to time restraints, lack of experience etc. This control-freakery sucks away at individual responsibility and decision-making and leaves it in the hands of a few who do not necessarily know best
Get a good book on software development, Code Complete already mention is excellent, as well as any respected book on algorithms and data structures. Read up on how you will need to manage your data eg do you need fast lookup / hash-tables / binary trees. Don't reinvent the wheel - use the libraries and things like STL otherwise you are likely to be wasting time. There is a vast amount on the web including this very fine blog.
Many academics, besides sometimes being primadonna-ish and precious about any approach seen as businesslike, tend to be quite vague in their objectives. To put it mildly. For this reason alone it is vital to build up your own software arsenal of helper functions and recipes, eventually, hopefully ending up with a kind of flexible experimental framework that enables you to try out any combination of things without being to restricted to any particular problem area. Strongly resist the temptation to just dive into the problem at hand.

What metrics would be usable to determine expertise level in a particular programming language

I am interesting in the raw (or composite) metrics used to get a handle on how well a person can program in a particular language.
Scenario: George knows a few programming languages and wants to learn "foobar", but He would like to know when he has a reasonable amount of experience in "foobar".
I am really interesting in something broader than just the LOC (lines of code) metric.
My hope for this question is to understand how engineers quantify the programming language experiences of others and if this can be mechanically measured.
Thanks in Advance!

In reply to the previous two posters, I'd guess that there is a way to get a handle on how well a person can program in a particular language: you can test how well someone knows English, or Maths, or Music, or Medicine, or Fine Art, so what's so special about a programming language?
In reply to the OP, I guess the tests must assess:
How well you can program
How well you can use the programming language
Therefore the metrics might be:
What's the goodness of the person's programming (and there are various dimensions of goodness such as bug-free, maintainable, quick/cheap to write, runs quickly, meets user requirements, etc.)?
Does the person use appropriate/idiomatic features of the programming language in question in order to do that good programming?
It would be difficult to make the test 'mechanical', though: most exams that I know of are graded by a human examiner. In the case of programming, part of the test could be graded mechanically (i.e. "does it run?") but part of it ("is it understandable and idiomatic?") is intended to benefit, and is better judged by, other human programmers.

The best indicator of your expertise in a particular language, in my opinion, is how productive you are in it.
Productivity is not just how fast you can work but, importantly, how few bugs you create and how little refactoring/rework is required later on.
For example, if you took two languages you have similar level of experience with, and were (in parallel universes) to build the same system with both, I would say the language you build the system with faster and with fewer defects/design flaws, is the language you have more expertise in.
Sorry it's not a "hard" metric for you, it's a more practical approach.

I don't believe that this can be "mechanically measured". I've thought about this a lot though.

Hang on...
Even the "LOC" of a program is a heavily disputed topic!
(Are we talking about the output of cat *.{h,c} | wc -l or some other mechnanism, for instance? What about blank lines? Comments? Are comments important? Is good code documented?)
Until you've realised how pointless a LOC comparison is, you've no hope of realising how pointless other metrics are.

It's a rather qualitative thing that is rarely measured with any great accuracy. It's like asking "how smart was Einstein?". Certification is one (and a reasonably thorough) quantitative indicator, but even it falls drastically short of identifying "good programmers" as many recruiters discover.
What are you ultimately trying to achieve? General programming aptitude can be more important than language expertise in some situations.
If you are language-focussed, taking on a challenge like Project Euler using that language may be a way to track progress.

How proficient they are in debugging complex problems in that language.
Ask them about projects they have worked on in the past, difficult problems they encountered and how they solved them. Ask them about debugging techniques they have used - you'll be surprised at what you'll hear, and you might even learn something new ;-)
A lot of places have a person or two who is a superstar in their field - the person everyone else goes to when they can't figure out what is wrong with their program. I'm guessing thats the person you are looking for :-)

Facility with a programming language is not enough. What is required is facility with a programming language in the context of a partiular suite of libraries on a particular platform
C++ on winapi on Windows 32bit
C++ on KDE on Linux
C++ on Symbian on a Nokia S60 phone
C# on MS .NET on Windows
C# on Mono on Linux
Within such a context, the measures of competence using the target language on the target platform are as follows:
The ability to express common
patterns succinctly and robustly.
The ability to debug common but subtle bugs like race conditions.
It would be possible to develope a suite of benchmark exercises for a programmer. One might also, once significant samples were available, determine the bell curve for ability. Preparing these things would take literally years and they would rapidly be obsoleted. This (and general tightness) is why organisations don't bother.
It would also be necessary to grade people in both "tool maker" and "tool user" modes. Tool makers are very different people with a much higher level of competence but they are often unsuited to monkey work, for which you really want a tool user.

John
There are a couple of ways to approach your question:
1) If you are interviewing candidates for a particular position requiring a particular language, then the only measure to compare candidates is 'how long has this person been writing in this language.' It's not perfect - it's not even very good - but it's reality. Unless you want to give the candidate a problem, a computer, and a compiler to test them on the spot there's no other measure. And then most programmer-types don't do well in "someone's watching you" scenarios.
2) I interpret your question to be more of 'when can I call MYSELF profecient in a language?' For this I would refer to levels of learning a non-native language: first level is you need to look up words/phrases in a dictionary (book) in order to say or understand anything; second level would be that you can understand hearing the language(or reading code) with only the occasional lookup in your trusted and now well-worn dictionary; third level you can now speak (or write code) with only the occasional lookup; fourth level is where you dream in the language; and the final levels is where fool native speakers into thinking that you're a native speaker also (in programming, other experts would think that you may have helped develop the language syntax).
Note that this doesn't help determine how good of a programmer you are - just like knowing English without having to look up words in the dictionary doesn't show "how gooder you is at writin' stuff" - that's subjective and has nothing to do with a particular language as people that are good at programming are good in any language you give them.

The phrase "a reasonable amount of experience" is dependent upon the language being considered and what that language can be used for.
A metric is the result of a measurement. Stevens (see wikipedia: Level Of Measurement) proposed that measurements use four different scale types: nominal (assigning a label), ordinal (assigning a ranking), interval (ordering the measurements) and ratio (having a non-arbitrary zero starting point). LOC is a ratio measurement. Although far from perfect, I think LOC is a relevant, objective number indicating how much experience you have in a language and can be compared to quantifiable values in the software industry. But, this begs the question: where do these industry values come from?
Personally, I would say that "George" will know that he has a reasonable amount of experience when he has designed, implemented and tested a project, maybe of his own choosing on his personal time on his home computer if need be. For example: database, business application, web page, GUI test tool, etc.
From the hiring managers point of view, I would start off by asking the programmer how good s/he is in the language, but this is not a metric. I have always thought that the best way to measure a persons ability to write programs is to give the programmer several small programming problems that are thought-out in advance and solved in a given amount of time, say, 5 minutes each. I have never objected to this being done to me in job interviews. Several metrics are available: Was the programmer able to solve the problem (yes or no - nominal)? How much time did it take (number of minutes - ratio)? How effective was their approach to solving the problem (good, fair, poor - ordinal)? You learn not only the persons ability to write code, but can observe several subjective things as well, such as their behaviour as they go about solving the problem, the questions s/he asks while solving the problem, the ability to work under pressure, etc, From a "quality" perspective though, remember that people do not like being measured.

Still, I believe there are some good metrics like the McCabe Cyclomatic Metric for cyclomatic complexity or the amount of useful commentary per block of code or even the average amount of code written between two consecutive tests.

I know of no such thing. I don't believe there's concensus on how to quantify experience or what "reasonable" means. Maybe I'll learn something too, but if I do it'll be a great surprise.
This may be pertinent.

I find that testing the ability to debug is a more accurate gauge of programming skill than any test aimed at straightforward programming problems that I have encountered. Given the source for a reasonably sized class or function with a stated (or unstated, in some cases) misbehavior, can the testee locate the problem?

Well, they try that in job interviews. There's no metric, but you can assess a person's abilities through questioning and quizzing.

WTF/s * LOC, smaller is best.

there are none; expertise can only be judged subjectively relative to others, or tested on specifics (which has its own level of inaccuracy)
see what is the fascination with code metrics for more information

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio