Related
Is it possible to write a program that can extract a melody/beat/rhythm provided by a specific instument in a wave (or other music format) file made up of multiple instruments?
Which algorithms could be used for this and what programming language would be best suited to it?
This is a fascinating area. The basic mathematical tool here is the Fourier Transform. To get an idea of how it works, and how challenging it can be, take a look at the analysis of the opening chord to A Hard Day's Night.
An instrument produces a sound signature, just the same way our voices do. There are algorithms out there that can pick a single voice out of a crowd and identify that voice from its signature in a database which is used in forensics. In the exact same way, the sound signature of a single instrument can be picked out of a soundscape (such as your mixed wave) and be used to pick out a beat, or make a copy of that instrument on its own track.
Obviously if you're thinking about making copies of tracks, i.e. to break down the mixed wave into a single track per instrument you're going to be looking at a lot of work. My understanding is that because of the frequency overlaps of instruments, this isn't going to be straightforward by any means... not impossible though as you've already been told.
There's quite an interesting blog post by Comparisonics about sound matching technologies which might be useful as a start for your quest for information: http://www.comparisonics.com/SearchingForSounds.html
To extract the beat or rhythm, you might not need perfect isolation of the instrument you're targeting. A general solution may be hard, but if you're trying to solve it for a particular piece, it may be possible. Try implementing a band-pass filter and see if you can tune it to selects th instrument you're after.
Also, I just found this Mac product called PhotoSounder. They have a blog showing different ways it can be used, including isolating an individual instrument (with manual intervention).
Look into Karaoke machine algorithms. If they can remove voice from a song, I'm sure the same principles can be applied to extract a single instrument.
Most instruments make sound within certain frequency ranges.
If you write a tunable bandpass filter - a filter that only lets a certain frequency range through - it'll be about as close as you're likely to get. It will not be anywhere near perfect; you're asking for black magic. The only way to perfectly extract a single instrument from a track is to have an audio sample of the track without that instrument, and do a difference of the two waveforms.
C, C++, Java, C#, Python, Perl should all be able to do all of this with the right libraries. Which one is "best" depends on what you already know.
It's possible in principle, but very difficult - an open area of research, even. You may be interested in the project paper for Dancing Monkeys, a step generation program for StepMania. It does some fairly sophisticated beat detection and music analysis, which is detailed in the paper (linked near the bottom of that page).
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
What's so difficult about the subject that algorithm designers are having a hard time tackling it?
Is it really that complex?
I'm having a hard time grasping why this topic is so problematic. Can anyone give me an example as to why this is the case?
Auditory processing is a very complex task. Human evolution has produced a system so good that we don't realize how good it is. If three persons are talking to you at the same time you will be able to focus in one signal and discard the others, even if they are louder. Noise is very well discarded too. In fact, if you hear human voice played backwards, the first stages of the auditory system will send this signal to a different processing area than if it is real speech signal, because the system will regard it as "no-voice". This is an example of the outstanding abilities humans have.
Speech recognition advanced quickly from the 70s because researchers were studying the production of voice. This is a simpler system: vocal chords excited or not, resonation of vocal tractus... it is a mechanical system easy to understand. The main product of this approach is the cepstral analysis. This led automatic speech recognition (ASR) to achieve acceptable results. But this is a sub-optimal approach. Noise separation is quite bad, even when it works more or less in clean environments, it is not going to work with loud music in the background, not as humans will.
The optimal approach depends on the understanding of the auditory system. Its first stages in the cochlea, the inferior colliculus... but also the brain is involved. And we don't know so much about this. It is being a difficult change of paradigm.
Professor Hynek Hermansky compared in a paper the current state of the research with when humans wanted to fly. We didn't know what was the secret —The feathers? wings flapping?— until we discovered Bernoulli's force.
Because if people find it hard to understand other people with a strong accent why do you think computers will be any better at it?
I remember reading that Microsoft had a team working on speech recognition, and they called themselves the "Wreck a Nice Beach" team (a name given to them by their own software).
To actually turn speech into words, it's not as simple as mapping discrete sounds, there has to be an understanding of the context as well. The software would need to have a lifetime of human experience encoded in it.
This kind of problem is more general than only speech recognition.
It exists also in vision processing, natural language processing, artificial intelligence, ...
Speech recognition is affected by the semantic gap problem :
The semantic gap characterizes the
difference between two descriptions of
an object by different linguistic
representations, for instance
languages or symbols. In computer
science, the concept is relevant
whenever ordinary human activities,
observations, and tasks are
transferred into a computational
representation
Between an audio wave form and a textual word, the gap is big,
Between the word and its meaning, it is even bigger...
beecos iyfe peepl find it hard to arnerstand uvver peepl wif e strang acsent wie doo yoo fink compootrs wyll bee ani bettre ayt it?
I bet that took you half a second to work out what the hell I was typing and all Iw as doing was repeating Simons answer in a different 'accent'. The processing power just isn't there yet but it's getting there.
The variety in language would be the predominant factor, making it difficult. Dialects and accents would make this more complicated. Also, context. The book was read. The book was red. How do you determine the difference. The extra effort needed for this would make it easier to just type the thing in the first place.
Now, there would probably be more effort devoted to this if it was more necessary, but advances in other forms of data input have come along so quickly that it is not deemed that necessary.
Of course, there are areas where it would be great, even extremely useful or helpful. Situations where you have your hands full or can't look at a screen for input. Helping the disabled etc. But most of these are niche markets which have their own solutions. Maybe some of these are working more towards this, but most environments where computers are used are not good candidates for speech recognition. I prefer my working environment to be quiet. And endless chatter to computers would make crosstalk a realistic problem.
On top of this, unless you are dictating prose to the computer, any other type of input is easier and quicker using keyboard, mouse or touch. I did once try coding using voice input. The whole thing was painful from beginning to end.
Because Lernout&Hauspie went bust :)
(sorry, as a Belgian I couldn't resist)
The basic problem is that human language is ambiguous. Therefore, in order to understand speech, the computer (or human) needs to understand the context of what is being spoken. That context is actually the physical world the speaker and listener inhabit. And no AI program has yet demonstrated having adeep understanding of the physical world.
Speech synthesis is very complex by itself - many parameters are combined to form the resulting speech. Breaking it apart is hard even for people - sometimes you mishear one word for another.
Most of the time we human understand based on context. So that a perticular sentence is in harmony with the whole conversation unfortunately computer have a big handicap in this sense. It is just tries to capture the word not whats between it.
we would understand a foreigner whose english accent is very poor may be guess what is he trying to say instead of what is he actually saying.
To recognize speech well, you need to know what people mean - and computers aren't there yet at all.
You said it yourself, algorithm designers are working on it... but language and speech are not an algorithmic constructs. They are the peak of the development of the highly complex human system involving concepts, meta-concepts, syntax, exceptions, grammar, tonality, emotions, neuronal as well as hormon activity, etc. etc.
Language needs a highly heuristic approach and that's why progress is slow and prospects maybe not too optimistic.
I once asked a similar question to my instructor; i asked him something like what challenge is there in making a speech-to-text converter. Among the answers he gave, he asked me to pronounce 'p' and 'b'. Then he said that they differ for a very small time in the beginning, and then they sound similar. My point is that it is even hard to recognize what sound is made, recognizing voice would be even harder. Also, note that once you record people's voices, it is just numbers that you store. Imagine trying to find metrics like accent, frequency, and other parameters useful for identifying voice from nothing but input such as matrices of numbers. Computers are good at numerical processing etc, but voice is not really 'numbers'. You need to encode voice in numbers and then do all computation on them.
I would expect some advances from Google in the future because of their voice data collection through 1-800-GOOG411
It's not my field, but I do believe it is advancing, just slowly.
And I believe Simon's answer is somewhat correct in a way: part of the problem is that no two people speak alike in terms of the patterns that a computer is programmed to recognize. Thus, it is difficult to analysis speech.
Computers are not even very good at natural language processing to start with. They are great at matching but when it comes to inferring, it gets hairy.
Then, with trying to figure out the same word from hundreds of different accents/inflections and it suddenly doesn't seem so simple.
Well I have got Google Voice Search on my G1 and it works amazingly well. The answer is, the field is advancing, but you just haven't noticed!
If speech recognition was possible with substantially less MIPS than the human brain, we really could talk to the animals.
Evolution wouldn't spend all those calories on grey matter if they weren't required to do the job.
Spoken language is context sensitive, ambiguous. Computers don't deal well with ambiguous commands.
I don't agree with the assumption in the question - I have recently been introduced to Microsoft's speech recognition and am impressed. It can learn my voice after a few minutes and usually identifies common words correctly. It also allows new words to be added. It is certainly usable for my purposes (understanding chemistry).
Differentiate between recognising the (word) tokens and understanding the meaning of them.
I don't yet know about other languages or operating systems.
The problem is that there are two types of speech recognition engines. Speaker-trained ones such as Dragon are good for dictation. They can recognize almost any spoke text with fairly good accuracy, but require (a) training by the user, and (b) a good microphone.
Speaker-independent speech rec engines are most often used in telephony. They require no "training" by the user, but must know ahead of time exactly what words are expected. The application development effort to create these grammars (and deal with errors) is huge. Telephony is limited to a 4Khz bandwidth due to historical limits in our public phone network. This limited audio quality greatly hampers the speech rec engines' ability to "hear" what people are saying. Digits such as "six" or "seven" contain an ssss sound that is particularly hard for the engines to distinguish. This means that recognizing strings of digits, one of the most basic recognition tasks, is problematic. Add in regional accents, where "nine" is pronounced "nan" in some places, and accuracy really suffers.
The best hope are interfaces that combine graphics and speech rec. Think of an IPhone application that you can control with your voice.
Speed and learnability do not directly fight each other, but it seems easy enough to design such a GUI that lacks either (or both) of them. GUI designers seem to prefer 'easy to learn' most of the time even when 'fast to apply' would be wiser.
There's only few UI concepts or programs that are weighted towards maximizing the peak efficiency of whatever you are doing with the program. Most of them haven't gotten common.
Normal people prefer gedit instead of vim. For normal people there are already good-enough GUIs because there was tons of research on them two decades ago.
I'd like to get some advices on doing UIs that do the tradeoffs from 'easy to learn' rather than from 'fast to apply'.
We have a product in our lineup that has won numerous awards based largely on its ability to provide more power with an easier interface than any of our competitors. I designed the interface a few years after holding a position in one of Bell Labs' human interface research groups so I had a pretty clear idea of what constituted "success" when I approached it. I have four pieces of design advice for creating easy but powerful interfaces.
First, select a metaphor that makes sense in their environment and do your best to stick to it. This doesn't have to be a physical metaphor although that can help if working with people who are not tech savvy. This was popular in the early days of Windows but its value remains. We used a "folder and page" metaphor that permitted us to organize a wide range of tasks while not crimping power users' style.
Second, offer a consistent layout relationship between data display and tasks. In our interface, each "page" displays a set of action buttons in the exact same position and, wherever possible, uses the same actual buttons. Thus, once one page is learned, the user has a head start on learning the rest. One of these buttons, always placed in a distinctive position, is a "Help" button...which brings me to point #3. The more general rule is: find ways of leveraging learning in one area to assist in learning others.
Third, offer context-sensitive help and make sure that it addresses the user's primary question (which is usually "what do I do now"?) How often have you seen technical help that simply shows you the Inheritance tree, constructor syntax and an alphabetical list of methods? That isn't help, it is abuse. We focused all of our help on walking people through sample tasks. In particularly tough areas, we also offered multimedia tutorials.
Fourth, offer users the ability to customize the interface. Our users often had no use for specific "pages" (analysis types) in their work. Thus, we made it very simple to turn them off so that the user would see an interface that was no more complicated than it had to be. Our app was usually installed by a power user and then used by multiple staff members so this was more of a win for us because we could usually count on the power user to understand what to shut off. However, I think it is good advice in general.
Good luck!
Autocad has a console mode. As you do things using the mouse and toolbars, the text-equivalent of those commands is written to the console. You can type commands directly in there. This provides a great way to learn the power-user names for commands (they are very short, like unix commands) which aids greatly the process of moving from beginner to productive power-user. Generally speaking, one primary focus has to be in minimising movement between the mouse and keyboard, so put lots of functionality into the mouse, or into the keyboard, because when you have to move your hands like that, there is a real delay in trying to find the right place to put them.
Beyond avoiding an angry fruit salad, just try to make it as intuitive as possible. Typically, programs with a very frustrating UI share one common problem, the developers didn't define a clear scope of what the program would actually do prior to marrying a UI design.
Its not so much a question of 'easy' , some people jump right into the UI and begin writing stuff to back the interface, rather than writing the core of a planned program and then planning an interface to use it.
This goes for web apps, desktop apps .. or even command line programs. A good design means writing the user interface after (and only after) you are sure that 'scope creep' is no longer a possibility.
Sure, you need some interface to test your program, but be prepared to trash it and do something better prior to releasing the program. Otherwise, there's a good chance that the UI is only going to make sense to you.
Rant (or, Stuff I think you should keep in mind):
Speed and learnability do directly fight each other. A menu item tells you what it does so that you don't have to remember. But it's much slower than a keyboard shortcut that you have to memorize to benefit from. The general technique for resolving this conflict seems to be allowing more than one way of doing things. While one way of doing something usually cannot be both fast and easy to learn, you can often provide two ways to accomplish the same task: one that's fast, and one that's obvious.
There are different kinds of people. The learning gap is a result of interest, motivation, intellectual capacity, etc. There is a class of person that will never bother to even learn which menu provides the action they want, and they'll scrub the menubar every time. There is also a (minority) class of person that thinks vim (or emacs) is the best thing since sliced bread. Most people probably fall somewhere in between these extremes.
My answer to the actual question:
I think you are asking how to strive for a fast UI. Your question wasn't particularly clear (to me).
First of all, be consistent. This helps both speed and learnability. Self consistency is the most important, but consistency with your environment may also be important.
For real speed, require as little attention and motion as possible. Keyboard shortcuts are fast because experienced users know where they are (they don't have to look), and their hands are already on the keyboard. Especially avoid forcing the user to change their position in front of the computer (e.g., moving one hand between the mouse and keyboard).
The keyboard is almost always faster than the mouse.
Customization (especially the ability to write custom scripts) will let power users make the interface work the way that is fastest for their specific habits.
Make it possible to get by without the most powerful features. All you need to know in order to survive in vim is "i, ESC, :wq, :q!". With that, you can use vi about the same way a lot of people use notepad. but once you start learning "h,j,k,l,w,b,e,d,c" (and so on) you get much more efficient. So there is a steep learning curve, but you can get by until you surmount it.
Keep in mind that if you focus on interface efficiency, you will probably limit your user base. Vim is popular among programmers, but lots of programmers use other tools, and it's virtually unknown among non-programmers. Most people want easy, not fast. Some want a balance. A very few just want fast.
I would like to point you towards Kathy Sierra's old blog for thoughts on 'easy to learn' and 'fast to apply' — I don't necessarily agree there needs to be a tradeoff between the two.
Three posts to get you started:
How much control should users have? This post ponders on whether 'fast to apply' is the ideal we should strive for.
The hi-res user experience talks about what you say about "normal people" vs. others. It's not so much that there are different kinds of people, but there are different levels of learning/expertise/involvement. Some are satisfied with less, some need more. How you get from less to more is arguably pretty much the same for everyone.
Finally, Featuritis vs. the Happy User Peak talks about the scope creep pointed out by #tinkertim.
Have you seen Gimp shortcuts?
Use nice visual controls and show keyboard shortcuts for them while hovering control - that will help to learn fast mode. If your software copy some behavior of other programs - copy shortcuts mapping from them (such as Copy/Paste/New Tab/Close Window/etc), but allow to dynamically re-map them as shown in Gimp. For reaped operations you could implement Action recoder. But it depends on type the software.
The main thing to be careful of is putting UI elements where they are most commonly located for other applications in that environment. For example, if you're going to make use of a menu system, people are accustomed to it being along top of the window by default for a desktop application. If you're in a web browser a menu system on a webpage seems out of place because it's not a consistent feature. If you're going to have an options/preferences configuration window, people are used to finding it under the Tools menu option, occasionally under the Edit menu. The main thing with keeping a UI "easy to learn" is that your UI elements shouldn't break the mold too much of how they're used in other applications.
If you haven't had the opportunity to see Mark Miller's presentation on The Science of Great User Experience, I'd recommend you watch the DNR TV episodes Part 1 and Part 2.
While I've been writing my own UI I've understood couple of things myself.
I imitated vim, but at the same time realized why it's so fast to use for text editing. It is because it acknowledges a thing: People prefer doing one thing at a time (inserting text, navigating around, selecting text), but they may switch the task often.
This means that you can pack different activities into different modes if you keep the mode switching schemes simple. It gives space for more commands. The user also gets better chances at learning the full interface because they are sensibly grouped already.
Vim is practically stuffed full of commands, every letter in the keyboard does something in vim, depending on the mode. Still I can remember most of them. And it's all because of modes.
I know bunch of projects that sneer at mode-dependent behavior. Main argument is the uncertainty of which mode you are in. In vim I'm never uncertain about the mode where I am in. Therefore I say the interface design is a failure if a trained user fails to recognize in which mode the interface is operating at the moment.
I came across this article about programming styles, seen by Edsger Dijsktra. To quickly paraphrase, the main difference is Mozart, when the analogy is made to programming, fully understood (debatable) the problem before writing anything, while Beethoven made his decisions as he wrote the notes out on paper, creating many revisions along the way. With Mozart programming, version 1.0 would be the only version for software that should aim to work with no errors and maximum efficiency. Also, Dijkstra says software not at that level of refinement and stability should not be released to the public.
Based on his views, two questions. Is Mozart programming even possible? Would the software we write today really benefit if we adopted the Mozart style instead?
My thoughts. It seems, to address the increasing complexity of software, we've moved on from this method to things like agile development, public beta testing, and constant revisions, methods that define web development, where speed matters most. But when I think of all the revisions web software can go through, especially during maintenance, when often patches are applied over patches, to then be refined through a tedious refactoring process—the Mozart way seems very attractive. It would at least lessen those annoying software updates, e.g. Digsby, Windows, iTunes, etc., many the result of unforeseen vulnerabilities that require a new and immediate release.
Edit: Refer to the response below for a more accurate explanation of Dijsktra's views.
The Mozart programming style is a complete myth (everybody has to edit and modify their initial efforts), and although "Mozart" is essentially a metaphor in this example, it's worth noting that Mozart was substantially a myth himself.
Mozart was a supposed magical child prodigy who composed his first sonata at 4 (he was actually 6, and it sucked - you won't ever hear it performed anywhere). It's rarely mentioned, of course, that his father was considered Europe's greatest music teacher, and that he forced all of his children to practice playing and composition for hours each day as soon as they could pick up an instrument or a pen.
Mozart himself was careful to perpetuate the illusion that his music emerged whole from his mind by destroying most of his drafts, although enough survive to show that he was an editor like everyone else. Beethoven was just more honest about the process (maybe because he was deaf and couldn't tell if anyone was sneaking up on him anyway).
I won't even mention the theory that Mozart got his melodies from listening to songbirds. Or the fact that he created a system that used dice to randomly generate music (which is actually pretty cool, but might also explain how much of Mozart's music appeared to come from nowhere).
The moral of the story is: don't believe the hype. Programming is work, followed by more work to fix the mistakes you made the first time around, followed by more work to fix the mistakes you made the second time around, and so on and so forth until you die.
It doesn't scale.
I can figure out a line of code in my head, a routine, and even a small program. But a medium program? There are probably some guys that can do it, but how many, and how much do they cost? And should they really write the next payroll program? That's like wasting Mozart on muzak.
Now, try to imagine a team of Mozarts. Just for a few seconds.
Still it is a powerful instrument. If you can figure out a whole line in your head, do it. If you can figure out a small routine with all its funny cases, do it.
On the surface, it avoids going back to the drawing board because you didn't think of one edge case that requires a completely different interface altogether.
The deeper meaning (head fake?) can be explained by learning another human language. For a long time you thinking which words represent your thoughts, and how to order them into a valid sentence - that transcription costs a lot of foreground cycles.
One day you will notice the liberating feeling that you just talk. It may feel like "thinking in a foregin language", or as if "the words come naturally". You will sometimes stumble, looking for a particular word or idiom, but most of the time translation runs in the vast ressources of the "subconcious CPU".
The "high goal" is developing a mental model of the solution that is (mostly) independent of the implementation language, to separate solution of a problem from transcribing the problem. Transcription is easy, repetetive and easily trained, and abstract solutions can be reused.
I have no idea how this could be taught, but "figuring out as much as possible before you start to write it" sounds like good programming practice towards that goal.
A classic story from Usenet, about a true programming Mozart.
Real Programmers write in Fortran.
Maybe they do now, in this decadent
era of Lite beer, hand calculators and
"user-friendly" software but back in
the Good Old Days, when the term
"software" sounded funny and Real
Computers were made out of drums and
vacuum tubes, Real Programmers wrote
in machine code. Not Fortran. Not
RATFOR. Not, even, assembly language.
Machine Code. Raw, unadorned,
inscrutable hexadecimal numbers.
Directly.
Lest a whole new generation of
programmers grow up in ignorance of
this glorious past, I feel duty-bound
to describe, as best I can through the
generation gap, how a Real Programmer
wrote code. I'll call him Mel, because
that was his name.
I first met Mel when I went to work
for Royal McBee Computer Corp., a
now-defunct subsidiary of the
typewriter company. The firm
manufactured the LGP-30, a small,
cheap (by the standards of the day)
drum-memory computer, and had just
started to manufacture the RPC-4000, a
much-improved, bigger, better, faster
-- drum-memory computer. Cores cost too much, and weren't here to stay,
anyway. (That's why you haven't heard
of the company, or the computer.)
I had been hired to write a Fortran
compiler for this new marvel and Mel
was my guide to its wonders. Mel
didn't approve of compilers.
"If a program can't rewrite its own
code," he asked, "what good is it?"
Mel had written, in hexadecimal, the
most popular computer program the
company owned. It ran on the LGP-30
and played blackjack with potential
customers at computer shows. Its
effect was always dramatic. The LGP-30
booth was packed at every show, and
the IBM salesmen stood around talking
to each other. Whether or not this
actually sold computers was a question
we never discussed.
Mel's job was to re-write the
blackjack program for the RPC-4000.
(Port? What does that mean?) The new
computer had a one-plus-one addressing
scheme, in which each machine
instruction, in addition to the
operation code and the address of the
needed operand, had a second address
that indicated where, on the revolving
drum, the next instruction was
located. In modern parlance, every
single instruction was followed by a
GO TO! Put that in Pascal's pipe and
smoke it.
Mel loved the RPC-4000 because he
could optimize his code: that is,
locate instructions on the drum so
that just as one finished its job, the
next would be just arriving at the
"read head" and available for
immediate execution. There was a
program to do that job, an "optimizing
assembler", but Mel refused to use it.
"You never know where it's going to
put things", he explained, "so you'd
have to use separate constants".
It was a long time before I understood
that remark. Since Mel knew the
numerical value of every operation
code, and assigned his own drum
addresses, every instruction he wrote
could also be considered a numerical
constant. He could pick up an earlier
"add" instruction, say, and multiply
by it, if it had the right numeric
value. His code was not easy for
someone else to modify.
I compared Mel's hand-optimized
programs with the same code massaged
by the optimizing assembler program,
and Mel's always ran faster. That was
because the "top-down" method of
program design hadn't been invented
yet, and Mel wouldn't have used it
anyway. He wrote the innermost parts
of his program loops first, so they
would get first choice of the optimum
address locations on the drum. The
optimizing assembler wasn't smart
enough to do it that way.
Mel never wrote time-delay loops,
either, even when the balky
Flexowriter required a delay between
output characters to work right. He
just located instructions on the drum
so each successive one was just past
the read head when it was needed; the
drum had to execute another complete
revolution to find the next
instruction. He coined an
unforgettable term for this procedure.
Although "optimum" is an absolute
term, like "unique", it became common
verbal practice to make it relative:
"not quite optimum" or "less optimum"
or "not very optimum". Mel called the
maximum time-delay locations the "most
pessimum".
After he finished the blackjack
program and got it to run, ("Even the
initializer is optimized", he said
proudly) he got a Change Request from
the sales department. The program used
an elegant (optimized) random number
generator to shuffle the "cards" and
deal from the "deck", and some of the
salesmen felt it was too fair, since
sometimes the customers lost. They
wanted Mel to modify the program so,
at the setting of a sense switch on
the console, they could change the
odds and let the customer win.
Mel balked. He felt this was patently
dishonest, which it was, and that it
impinged on his personal integrity as
a programmer, which it did, so he
refused to do it. The Head Salesman
talked to Mel, as did the Big Boss
and, at the boss's urging, a few
Fellow Programmers. Mel finally gave
in and wrote the code, but he got the
test backwards, and, when the sense
switch was turned on, the program
would cheat, winning every time. Mel
was delighted with this, claiming his
subconscious was uncontrollably
ethical, and adamantly refused to fix
it.
After Mel had left the company for
greener pa$ture$, the Big Boss asked
me to look at the code and see if I
could find the test and reverse it.
Somewhat reluctantly, I agreed to
look. Tracking Mel's code was a real
adventure.
I have often felt that programming is
an art form, whose real value can only
be appreciated by another versed in
the same arcane art; there are lovely
gems and brilliant coups hidden from
human view and admiration, sometimes
forever, by the very nature of the
process. You can learn a lot about an
individual just by reading through his
code, even in hexadecimal. Mel was, I
think, an unsung genius.
Perhaps my greatest shock came when I
found an innocent loop that had no
test in it. No test. None. Common
sense said it had to be a closed loop,
where the program would circle,
forever, endlessly. Program control
passed right through it, however, and
safely out the other side. It took me
two weeks to figure it out.
The RPC-4000 computer had a really
modern facility called an index
register. It allowed the programmer to
write a program loop that used an
indexed instruction inside; each time
through, the number in the index
register was added to the address of
that instruction, so it would refer to
the next datum in a series. He had
only to increment the index register
each time through. Mel never used it.
Instead, he would pull the instruction
into a machine register, add one to
its address, and store it back. He
would then execute the modified
instruction right from the register.
The loop was written so this
additional execution time was taken
into account -- just as this
instruction finished, the next one was
right under the drum's read head,
ready to go. But the loop had no test
in it.
The vital clue came when I noticed the
index register bit, the bit that lay
between the address and the operation
code in the instruction word, was
turned on-- yet Mel never used the
index register, leaving it zero all
the time. When the light went on it
nearly blinded me.
He had located the data he was working
on near the top of memory -- the
largest locations the instructions
could address -- so, after the last
datum was handled, incrementing the
instruction address would make it
overflow. The carry would add one to
the operation code, changing it to the
next one in the instruction set: a
jump instruction. Sure enough, the
next program instruction was in
address location zero, and the program
went happily on its way.
I haven't kept in touch with Mel, so I
don't know if he ever gave in to the
flood of change that has washed over
programming techniques since those
long-gone days. I like to think he
didn't. In any event, I was impressed
enough that I quit looking for the
offending test, telling the Big Boss I
couldn't find it. He didn't seem
surprised.
When I left the company, the blackjack
program would still cheat if you
turned on the right sense switch, and
I think that's how it should be. I
didn't feel comfortable hacking up the
code of a Real Programmer.
Edsger Dijkstra discusses his views on Mozart vs Beethoven programming in this YouTube video entitled "Discipline in Thought".
People in this thread have pretty much discussed how Dikstra's views are impractical. I'm going to try and defend him some.
Dijkstra is against companies
essentially "testing" their software
on their customers. Releasing
version 1.0 and then immediately
patch 1.1. He felt that the program
should be polished to a degree that
"hotfix" patches are borderline
unethical.
He did not think that software should be written in one fell swoop or that changes would never need to be made. He often discusses his design ideals, one of them being modularity and ease of change. He often thought that individual algorithms should be written in this way however, after you have completely understood the problem. That was part of his discipline.
He found after all his extensive experience with programmers, that programmers aren't happy unless they are pushing the limits of their knowledge. He said that programmers didn't want to program something they completely and 100% understood because there was no challenge in it. Programmers always wanted to be on the brink of their knowledge. While he understood why programmers are like that he stated that it wasn't representative of low-error tolerance programming.
There are some industries or applications of programming that I believe Dijkstra's "discipline" are warranted as well. NASA Rovers, Health Industry embedded devices (ie dispense medication, etc), certain Financial software that transfer our money. These areas don't have the luxuries of incremental change after release and a more "Mozart Approach" is necessary.
I think the Mozart story confuses what gets shipped versus how it is developed. Beethoven did not beta-test his symphonies on the public. (It would be interesting to see how much he changed any of the scores after the first public performance.)
I also don't think that Dijkstra was insisting that it all be done in your head. After all, he wrote books on disciplined programming that involved working it out on paper, and to the same extent that he wanted to see mathematical-quality discipline, have you noticed how much paper and chalk board mathematicians may consume while working on a problem?
I favor Simucal's response, but I think the Mozart-Beethoven metaphor should be discarded. That shoe-horns Dijkstra's insistence on discipline and understanding into a corner where it really doesn't belong.
Additional Remarks:
The TV popularization is not so hot, and it confuses some things about musical composition and what a composer is doing and what a programmer is doing. In Dijkstra's own words, from his 1972 Turing Award Lecture: "We must not forget that it is not our business to make programs; it is our business to design classes of computations that will display a desired behavior." A composer may be out to discover the desired behavior.
Also, in Dijkstra's notion that version 1.0 should be the final version, we too easily confuse how desired behavior and functionality evolve over time. I believe he oversimplifies in thinking that all future versions are because the first one was not thought out and done rigorously and reliably.
Even without time-to-market urgency, I think we now understand much better that important kinds of software evolve along with the users experience with it and the utilitarian purpose they have for it. Obvious counter-examples are games (also consider how theatrical motion pictures are developed). Do you think Beethoven could have written Symphony No. 9 without all of his preceding experience and exploration? Do you think the audience could have heard it for what it was? Should he have waited until he had the perfect Sonata? I'm sure Dijkstra doesn't propose this, but I do think he goes too far with Mozart-Beethoven to make his point.
In addition, consider chess-playing software. The new versions are not because the previous ones didn't play correctly. It is about exploiting advances in chess-playing heuristics and the available computer power. For this and many other situations, the idea that version 1.0 be the final version is off base. I understand that he is rightfully objecting to the release of known-to-be unreliable and maybe impaired software with deficiencies to be made up in maintenance and future releases. But the Mozartian counter-argument doesn't hold up for me.
So, did Dijkstra continue to drive the first automobile he purchased, or clones of exactly that automobile? Maybe there is planned obsolescence, but a lot of it has to do with improvements and reliability that could not have possibly been available or even considered in previous generations of automotive technology.
I am a big Dijkstra fan, but I think the Mozart-Beethoven thing is way too simplistic as well as inappropriate. I am a big Beethoven fan too.
I think it's possible to appear to employ Mozart programming. I know of one company, Blizzard, that doesn't release a software product until they're good and ready. This doesn't mean that Diablo 3 will spring whole and complete from someone's mind in one session of dazzlingly brilliant coding. It does mean that that's how it will appear to the rest of us. Blizzard will test the heck out of their game internally, not showing it to the rest of the world until they've got all the kinks worked out. Most companies don't take this approach, preferring instead to release software when it's good enough to solve a problem, then fix bugs and add features as they come up. This approach works (to varying degrees) for most companies.
Well, we can't all be as good as Mozart, can we? Perhaps Beethoven programming is easier.
If Apple adopted "Mozart" programming, there would be no Mac OS X or iTunes today.
If Google adopted "Mozart" programming, there would be no Gmail or Google Reader.
If SO developers adopted "Mozart" programming, there would be no SO today.
If Microsoft adopted "Mozart" programming, there would be no Windows today (well, I think that would be good).
So the answer is simply NO. Nothing is perfect, and nothing is ever meant to be perfect, and that includes software.
I think the idea is to plan ahead. You need to at least have some kind of outline of what you are trying to do and how you plan to get there. If you just sit down at the keyboard and hope "the muse" will lead you to where your program needs to go, the results are liable to be rather uneven, and it will take you much longer to get there.
This is true with any kind of writing. Very few authors just sit down at a typewriter with no ideas and start banging away until a bestselling novel is produced. Heck, my father-in-law (a high school English teacher) actually writes outlines for his letters.
Progress in computing is worth a sacrifice in glory or genius here and there.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am planning to give a Technical presentation for a product we are building.
Intended audience is Technical developers. So, most of the time, I will be debugging trough the code in Visual Studio, performance analysis, some architecture review etc.
I have read couple of blogs on font sizes to use, templates to use on Visual Studio, presentation tools, among other very useful tips.
What I am looking specifically for is how to keep the session interesting without making it a dry code walkthrough? How to avoid making people fall asleep? Would be great to hear some stories..
Update1: Nice youtube clip on zoomit. Glue Audience To Your Presentation With Zoomit.
Update2: New post from Scott Hanselman after his PDC talk - Tips for Preparing for a Technical Presentation
Put interesting comments in the code.
// This better not fail during my next presentation, stupid ##$##%$ code.
Don't talk about them, let them be found by the audience.
-Adam
FYI, that Hanselman article has an update (your link is from 2003).
Use stories. Even with code examples, have a backstory: here's why someone is doing this. To increase audience participation, ask for examples of X where X is something you know you can demo, then phrase the walk-through in those terms.
Or maybe you have war stories about how it was different or how it normally takes longer or whatever. I find people identify with such things, then as you give your examples they're mentally tracking it back to their own experience.
I recommend Scott Hanselman's post (previously mentioned). I've written up a post with some tips, mostly for selfish reasons - I review it every time before I give a technical presentation:
Tips for a Technical Presentation
If you're using a console prompt, make sure the font is readable and that your paths are preset when possible.
Take 15 minutes to install and learn to use ZoomIt, so your audience can clearly see what you're showing off. If you have to ask if they can see something, you've already failed.
Probably most important is to have separate Visual Studio settings pre-configured with big, readable fonts.
One of the best pieces of advice I ever got for doing demos is to just plain record them in advance and play back the video, narrating live. Then the unexpected stuff happens in private and you get as many stabs at it as you need.
You still usually need some environment to use as a reference for questions, but for the presentation bit, recording it in advance (and rehearsing your narration over the video) pretty much guarantees you can be at the top of your game.
I also like to put small jokes into the slides and that recorded video that make it seem like the person who made the slides is commenting on the live proceedings or that someone else is actually running the slides. Often, I make absolutely no reference at all to the joke in the slide.
For instance, in my most recent demo presentation, I had a slide with the text "ASP.NET MVC" centered that I was talking over about how I was using the framework. In a smaller font, I had the text "Catchy name, huh?". When I did that demo live, that slide got a chuckle. It's not stand-up worthy by any stretch of the imagination, but we're often presenting some pretty dry stuff and every little bit helps.
Similarly, I've included slides that are just plain snarky comments from the offscreen guy about what I'm planning to say. So, I'll say, "The codebase for this project needed a little help", while the slide behind me said "It was a pile of spaghetti with 3 meatballs, actually" and a plate of spaghetti as the slide background. Again, with no comment from me and just moving on to the next slide as though I didn't even see it actually made it funnier.
That can also be a help if you don't have the best comedic timing by taking the pressure off while still adding some levity.
Anyway, what it really comes down to is that I've been doing most of my demo/presentation work just like I would if it was a screencast and then substituting the live version of me (pausing the video as appropriate if things go off the rails) for the audio when I give it in front of an audience.
Of course, you can then easily make the real presentation available afterward for those who want it.
For the slides, I generally go out of my way to not say the exact words on the screen more often than not.
If you are showing code that was prepared for you then make sure you can get it to work. I know this is an obvious one but I was just at a conference where 4 out of 5 speakers had code issues. Telling me it is 'cool' or even 'really cool' when it doesn't work is a tough sell.
You should read Mark Jason Dominus excellent presentaton on public speaking:
Conference Presentation Judo
The #1 rule for me is: Don't try to show too much.
It's easy to live with a chunk of code for a couple of weeks and think, "Damn, when I show 'em this they are gonna freak out!" Even during your private rehearsals you feel good about things. But once in front of an audience, the complexity of your code is multiplied by the square of the number of audience members. (It becomes exponentially harder to explain code for each audience member added!)
What seemed so simple and direct privately quickly turns into a giant bowl of spaghetti that under pressure even you don't understand. Don't try to show production code (well factored and well partitioned), make simple inline examples that convey your core message.
My rule #1 could be construed, by the cynical, as don't overestimate you audience. As an optimist, I see it as don't overestimate your ability to explain your code!
rp
Since it sounds like you are doing a live presentation, where you will be working with real systems and not just charts (PPT, Impress, whatever) make sure it is all working just before you start. It never fails, if I don't try it just before I start talking, it doesn't work how I expected it to. Especially with demos. (I'm doing one on Tuesday so I can relate.)
The other thing that helps is simply to practice, practice, practice. Especially if you can do it in the exact environment you will be presenting in. That way you get a feel for where you need to be so as not to block the view for your listeners as well as any other technical gotchas there might be with regards to the room setup or systems.
This is something that was explained to me, and I think it is very useful. You may want to consider not going to slide heavy at the beginning. You want to show your listeners something (obviously probably not the code) up front that will keep them on the edge of their seats wanting to learn about how to do what you just showed them.
I've recently started to use Mind Mapping tools for presentations and found that it goes over very well.
http://en.wikipedia.org/wiki/Mind_map
Basically, I find people just zone out the second you start to go into details with a presentation. Conveying the information with a mind map (at least in my experience), provides a much easier way for the information to be conveyed and tied together.
The key is presenting the information in stages (ie, your high-level ideas first, then in more detail, one at a time). The mind-mapping tools basically let you expand your map, as the audience watches and your present more and more detailed information. Doing it this way lets your audience gradually absorb the data in smaller stages, which tends to aid retention.
Check out FreeMind for a free tool to play with. Mind Manager is a paid product, but is much more polished and fluent.
Keep your "visual representation" simple and standard.
If you're on Vista hide your desktop icons and use one of the default wallpapers. Keep your Visual Studio settings (especially toolbars) as standard and "out of the box" as possible. The more customizations you show in your environment the more likely people are going to focus on those rather than your content.
Keep the content on your slides as consisce as possible. Remember, you're speaking to (and in the best scenario, with) your audience so the slides should serve as discussion points. If you want to include more details, put them in the slide notes. This is especially good if you make the slide decks available afterwards.
If someone asks you a question and you don't know the answer, don't be afraid to say you don't know. It's always better than trying to guess at what you think the answer should be.
Also, if you are using Vista be sure to put it in "presentation mode". PowerPoint also has a similar mode, so be sure to use it as well - you have the slide show on one monitor (the projector) and a smaller view of the slide, plus notes and a timer on your laptop monitor.
Have you heard of Pecha-Kucha?
The idea behind Pecha Kucha is to keep
presentations concise, the interest
level up and to have many presenters
sharing their ideas within the course
of one night. Therefore the 20x20
Pecha Kucha format was created: each
presenter is allowed a slideshow of 20
images, each shown for 20 seconds.
This results in a total presentation
time of 6 minutes 40 seconds on a
stage before the next presenter is up
Now, i am not sure if that short duration could be ok for a product demonstration.
But you can try to get some nice ideas from the concept, such as to be concise and keep to the point, effective time, space management etc..
Besides some software like Mind Manager to show your architecture, you make find a screen recorder as a presentation tool to illustrate your technical task. DemoCreator would be something nice to make video of your onscreen activity. And you can add more callout to make the process easier to understand.
If you use slides at all, follow Guy Kawasaki's 10/20/30 rule:
No more than 10 slides
No more than 20 minutes spent on slides
No less than 30 point type on slides
-Adam