Create a sound from scratch in DirectSound - playback

Is there a way to create a sound from scratch using DirectSound, e.g play the notes a c, d, e, f,g etc? However, the sound must be realistic, and sound at least a little like a proper sound.
Thanks. I have tried to be as concise yet as detailed as I can.

This is what an audio synthesizer does. There are many techniques - you probably need to narrow down what you want to do before you can get useful advice.
However, the simplest technique that will produce a tone is to write an even number of periods of a sine wave of the correct frequency into a buffer, and play that as a looping buffer in DirectSound


Using AVFoundation / CoreMedia / Other Frameworks to Detect Beats / Onsets

(Preface: This is my first audio-related question on Stack Overflow, so I'll try to word this as best as I possibly can. Edits welcome.)
I'm creating an application that'll allow users to loop music. At the moment our prototypes allow these "loop markers" (implemented as UISliders) to snap at every second, specifying the beginning and end of a loop. Obviously, when looping music, seconds are a very crude manner to handle this, so I would like to use beats instead.
I don't want to do anything other than mark beats for the UISliders to snap to:
Feed our loadMusic method an audio file.
Run it through a library to detect beats or the intervals between them (maybe).
Feed that value into the slider's setNumberOfTickMarks: method.
Unfortunately, most of the results I've run into via Google and SO have yielded much more advanced beat detection libraries like those that remixers would use. Overkill in my case.
Is this something that CoreMedia, AVFoundation or AudioToolbox can handle? If not, are there other libraries that can handle this? My research into Apple's documentation has only yielded relevant results... for MIDI files. But Apple's own software have features like this, such as iMovie's snap-to-beats functionality.
Any guidance, code or abstracts would be immensely helpful at this point.
EDIT: After doing a bit more digging around, it seems the correct terminology for what I'm looking for is onset detection.
Onset Detection algorithms come in many flavors from looking at the raw music signal to using frequency domain techniques.
if you want a quick and easy way to determin where beats are:
Chop up the music signal into small segments (20-50ms chunks)
Compute the squared sum average of the signal: Sum(Xn ^2) / N (where N is the number of sample per 20-50ms)
If you want more sophisticated techniques look into:
or for hardcore treatment of it:

Polyphony with AudioRenderCallback and AudioUnitRenderFlag

I am getting my bearings in core audio / audio units, so please forgive me if this should be self evident.
If i want to allow for five voices, I need 5 buses on the mixer and at least 5 buffers if i want 5 different sounds. I have figured out a basic way to schedule a note by checking the time and using a start time. I think i should use ioActionFlags to indicate when I am rendering silence, but I don't quite get how.
I ask because, with 2 buses, I get buzzing when one is silent for a while but the other one plays.
If you are getting buzzing, it is probably what is known as the "satan saw", which is a sawtooth-sounding noise created by an uncleared buffer playing over and over again out of a channel. Any sound other than silence repeated in this manner will sound a bit like a sawtooth wave.
When you are rendering silence, you should simply clear out all of the samples in your output buffer to 0.0f for the given voice. I don't think that there is a way to stop the callback from trying to fetch your samples, and anyways, this is a much easier (and more portable) solution than fiddling around with the system's rendering setup.

Suggestions For Syncing Audio With Audio Queue Services?

I'm decoding a video format that has an accompanying audio track in a separate file. Per the specs, I render a frame of video every 1/75th second. And the length of the video file is the same as the length of the audio track.
I'm playing the audio with Audio Queue Services (which I chose because I figured there would be situations where I needed precise timing control -- just the sort of situation I'm encountering!). It's a big API and I haven't progressed much past the sample code in Apple's programming guide (though I have wrapped things up in a nicer ObjC API).
In ideal situations, things work fine with the basic playback setup. The video and audio stays synced and both end at the same time (within my own ability to tell the difference). However, if performance hiccups (or I attach the Leaks Instrument or something), they quickly get out of sync.
This is the first time I've ever written something of this nature: I have no prior experience with sound or video. I certainly have no experience with Audio Queue Services. So I'm not sure where to go from here.
Have you done something like this? Do you have some advice or tips or tricks to offer? Is there some fundamental piece of documentation I need to read? Any help would be greatly appreciated.
First off, I've never actually coded anything like this so I'm shooting from the hip. Also, I've done a decent amount of programming with the HAL and AUHAL but never with AudioQueue so my approach might not be the best way to use AQ.
Obviously the first thing to decide is whether to sync the audio to the video or the video to the audio. From the question it seems you've decided the video will be the master and the audio should sync to it.
I would approach this by keeping track of the number of frames of video rendered, along with the frame rate. Then, when enqueuing your audio buffers, rather than passing a monotonically increasing value for the startTime adjust the buffer's start time to match any discontinuities observed in the video. This is a bit vague because I don't know exactly where your audio is coming from or how you are enqueuing it, but hopefully the principle is clear.

extracting a specific melody/beat/rhythm from a specific instument from a mixed wave (or other music format) file

Is it possible to write a program that can extract a melody/beat/rhythm provided by a specific instument in a wave (or other music format) file made up of multiple instruments?
Which algorithms could be used for this and what programming language would be best suited to it?
This is a fascinating area. The basic mathematical tool here is the Fourier Transform. To get an idea of how it works, and how challenging it can be, take a look at the analysis of the opening chord to A Hard Day's Night.
An instrument produces a sound signature, just the same way our voices do. There are algorithms out there that can pick a single voice out of a crowd and identify that voice from its signature in a database which is used in forensics. In the exact same way, the sound signature of a single instrument can be picked out of a soundscape (such as your mixed wave) and be used to pick out a beat, or make a copy of that instrument on its own track.
Obviously if you're thinking about making copies of tracks, i.e. to break down the mixed wave into a single track per instrument you're going to be looking at a lot of work. My understanding is that because of the frequency overlaps of instruments, this isn't going to be straightforward by any means... not impossible though as you've already been told.
There's quite an interesting blog post by Comparisonics about sound matching technologies which might be useful as a start for your quest for information:
To extract the beat or rhythm, you might not need perfect isolation of the instrument you're targeting. A general solution may be hard, but if you're trying to solve it for a particular piece, it may be possible. Try implementing a band-pass filter and see if you can tune it to selects th instrument you're after.
Also, I just found this Mac product called PhotoSounder. They have a blog showing different ways it can be used, including isolating an individual instrument (with manual intervention).
Look into Karaoke machine algorithms. If they can remove voice from a song, I'm sure the same principles can be applied to extract a single instrument.
Most instruments make sound within certain frequency ranges.
If you write a tunable bandpass filter - a filter that only lets a certain frequency range through - it'll be about as close as you're likely to get. It will not be anywhere near perfect; you're asking for black magic. The only way to perfectly extract a single instrument from a track is to have an audio sample of the track without that instrument, and do a difference of the two waveforms.
C, C++, Java, C#, Python, Perl should all be able to do all of this with the right libraries. Which one is "best" depends on what you already know.
It's possible in principle, but very difficult - an open area of research, even. You may be interested in the project paper for Dancing Monkeys, a step generation program for StepMania. It does some fairly sophisticated beat detection and music analysis, which is detailed in the paper (linked near the bottom of that page).

Voice Alteration Algorithm

Could somebody point me to a voice alteration algorithm? Preferably in Java or C? Something that I could use to change a stream of recorded vocals into something that sounds like Optimus Prime. (FYI- Optimus Prime is the lead Autobot from transformers with a very distinctive sounding voice... not everybody may know this.) Is there an open source solution?
You can't just change the sample rate. The human voice has formants. What you want to do is move the formants. That should be your line of research.
Read about vocoders and filter banks.
Could you provide a link as example? I haven't seen the film so I'm just speculating.
Audacity is an open-source wave editor which includes effect filters - since it's open source you could see what algorithms they use.
Not knowing what it sounded like, I figured it would be a vocoder, but after listening to a few samples, it's definitely not a vocoder (or if it is, it's pretty low in the mix.) It sounds like maybe there's a short, fast delay effect on it, along with some heavy EQ to make it sound kind of like a tiny AM radio, and maybe a little ring modulator. I think that a LOT of the voice actor's voice is coming through relatively intact, so a big part of the sound is just getting your own voice to sound right, and no effects will do that part for you.
All the above info is just me guessing based on having messed around with a lot of guitar pedals over the years, and done some amateur recording and sound-effects making, so I could be way off.
