Suggestions For Syncing Audio With Audio Queue Services? - cocoa

I'm decoding a video format that has an accompanying audio track in a separate file. Per the specs, I render a frame of video every 1/75th second. And the length of the video file is the same as the length of the audio track.
I'm playing the audio with Audio Queue Services (which I chose because I figured there would be situations where I needed precise timing control -- just the sort of situation I'm encountering!). It's a big API and I haven't progressed much past the sample code in Apple's programming guide (though I have wrapped things up in a nicer ObjC API).
In ideal situations, things work fine with the basic playback setup. The video and audio stays synced and both end at the same time (within my own ability to tell the difference). However, if performance hiccups (or I attach the Leaks Instrument or something), they quickly get out of sync.
This is the first time I've ever written something of this nature: I have no prior experience with sound or video. I certainly have no experience with Audio Queue Services. So I'm not sure where to go from here.
Have you done something like this? Do you have some advice or tips or tricks to offer? Is there some fundamental piece of documentation I need to read? Any help would be greatly appreciated.

First off, I've never actually coded anything like this so I'm shooting from the hip. Also, I've done a decent amount of programming with the HAL and AUHAL but never with AudioQueue so my approach might not be the best way to use AQ.
Obviously the first thing to decide is whether to sync the audio to the video or the video to the audio. From the question it seems you've decided the video will be the master and the audio should sync to it.
I would approach this by keeping track of the number of frames of video rendered, along with the frame rate. Then, when enqueuing your audio buffers, rather than passing a monotonically increasing value for the startTime adjust the buffer's start time to match any discontinuities observed in the video. This is a bit vague because I don't know exactly where your audio is coming from or how you are enqueuing it, but hopefully the principle is clear.

Related

Is there any way to lower the fps on a mac?

So I play this game on safari (mac). Freeriderhd.com to be exact. I want to know if there is any way to lower the fps on that specific game while I am playing it on safari because it would make it a lot easier. If there is no way to do this, can I use a macro to click the spacebar an infinite amount of times with x amount of time in between each click? If someone can help with any of these questions that would be great. Thanks.
I don't really know - sorry - but this seems unlikely. FPS is usually controlled internally by the game software. I have seen a very few games that allow the user to change the frame rate, or at least the requested frame rate. This is usually for the purpose of raising the FPS, not lowering it. Usually the game has an optimum frame rate that it strives for. Especially browser-based ones (which I've written some of).
Your alternative sounds like a means to "bog down the browser" which might peg your processor; not a thing I would tempt the Fates with, personally.

What is the fastest way to combine audio files on a web server?

Disclaimer: Forgive my ignorance of audio/sound processing, my background is web and mobile development and this is a bespoke requirement for one of my clients!
I have a requirement to concatenate 4 audio files, with a background track playing behind all 4 audio files. The source audio files can be created in any format, or have any treatment applied to them, to improve the processing time, but the output quality is still important. For clarity, the input files could be named as follows (.wav is only an example format):
background.wav
segment-a.wav
segment-b.wav
segment-c.wav
segment-d.wav
And would need to be structured something like this:
[------------------------------background.wav------------------------------]
[--segment-a.wav--][--segment-b.wav--][--segment-c.wav--][--segment-d.wav--]
I have managed to use the SoX tool to achieve the concatenation portion of the above using MP3 files, but on a reasonably fast computer I am getting roughly an hours worth of concatenated audio per minute of processing, which isn't fast enough for my requirements, and I haven't applied the background sound or any 'nice to haves' such as trimming/fading yet.
My questions are:
Is SoX the best/only tool for this kind of operation?
Is there any way to make the process faster without sacrificing (too much) quality?
Would changing the input file format result in improved performance? If so, which format is best?
Any suggestions from this excellent community would be much appreciated!
Sox may not be the best tool, but I doubt you will find anything much better without hand-coding.
I would venture to guess that you are doing pretty well to process that much audio in that time. You might do better, but you'll have to experiment. You are right that probably the main way to improve speed is to change the file format.
MP3 and OGG will probably give you similar performance, so first identify how MP3 compares to uncompressed audio, such as wav or aiff. If MP3/OGG is better, try different compression ratios and sample rates to see which goes faster. With wav files, you can try lowering the sample rate (you can do this with MP3/OGG as well). If this is speech, you can probably go as low as 8kHz, which should speed things up considerably. For music, I would say 32kHz, but it depends on the requirements. Also, try mono instead of stereo, which should also speed things up.

multi channel sound with winmm, many WaveOutOpen?

I am trying to play a sound on Windows XP in multi-channel (parallel) manner.
I had read somewhere that playing parallel sounds with WinMM is maybe not possible,
but here is what I observe:
When I call WaveOutOpen() once, and then call many WaveOutWrite() many times
then sounds are not parallel - they are queued.
But when I call WaveOutOpen say nine times (and then store nine handles to it)
and then call nine times WaveOutWrite() with nine different sounds they are
played in parallel (multi-channel) - that is they are mixed.
It seems to work but I am not sure if it is okay because I don't find it stated clearly
in any tutorial or documentation.
It is okay to play sound in such 'many WaveOutOpen' way??
When I call WaveOutOpen() once, and then call many WaveOutWrite() many
times then sounds are not parallel - they are queued.
That's exactly what is supposed to happen. WaveOutWrite queues the next buffer. It allows you to send the audio you want to play in small chunks.
But when I call WaveOutOpen say nine times (and then store nine
handles to it) and then call nine times WaveOutWrite() with nine
different sounds they are played in parallel (multi-channel) - that is
they are mixed.
Again, this is correct and expected. This is the simplest way to play back many simultaneous sounds. If you want sample accurate mixing however, you should mix the audio samples yourself into one stream of samples and play that through a single WaveOut device.
I stand corrected with ability of waveOut* API to play sounds simultaneously and mixed.
Here is test code for the curious: http://www.alax.info/trac/public/browser/trunk/Utilities/WaveOutMultiPlay An application started with arguments abc plays on different threads sounds at 1, 5 and 15 kHz respectively and they mix well.
At the same time DirectShow Audio Renderer (WaveOut) Filter built on top of the same API is unable to play anything more that a single stream, for no visible reason.
FYI waveOutOpen API is retired since long ago and currently is wrapper on top of newer APIs. waveOutOpen assumes that audio output device is opened for exclusive use so there is no guarantee that multiply opened devices simultanesouly would produce mixed audio output. To achieve such behavior you would be better off with a newer audio API: DirectSound, DirectShow on top of DirectSound or WASAPI.
I suggest going with DirectSound if your product is for consumers.
From DirectX8 onwards the API is at the point where it is actually quite painless and most consumer machines will have it installed.

Polyphony with AudioRenderCallback and AudioUnitRenderFlag

I am getting my bearings in core audio / audio units, so please forgive me if this should be self evident.
If i want to allow for five voices, I need 5 buses on the mixer and at least 5 buffers if i want 5 different sounds. I have figured out a basic way to schedule a note by checking the time and using a start time. I think i should use ioActionFlags to indicate when I am rendering silence, but I don't quite get how.
I ask because, with 2 buses, I get buzzing when one is silent for a while but the other one plays.
If you are getting buzzing, it is probably what is known as the "satan saw", which is a sawtooth-sounding noise created by an uncleared buffer playing over and over again out of a channel. Any sound other than silence repeated in this manner will sound a bit like a sawtooth wave.
When you are rendering silence, you should simply clear out all of the samples in your output buffer to 0.0f for the given voice. I don't think that there is a way to stop the callback from trying to fetch your samples, and anyways, this is a much easier (and more portable) solution than fiddling around with the system's rendering setup.

Voice Alteration Algorithm

Could somebody point me to a voice alteration algorithm? Preferably in Java or C? Something that I could use to change a stream of recorded vocals into something that sounds like Optimus Prime. (FYI- Optimus Prime is the lead Autobot from transformers with a very distinctive sounding voice... not everybody may know this.) Is there an open source solution?
You can't just change the sample rate. The human voice has formants. What you want to do is move the formants. That should be your line of research.
Read about vocoders and filter banks.
Could you provide a link as example? I haven't seen the film so I'm just speculating.
Audacity is an open-source wave editor which includes effect filters - since it's open source you could see what algorithms they use.
Not knowing what it sounded like, I figured it would be a vocoder, but after listening to a few samples, it's definitely not a vocoder (or if it is, it's pretty low in the mix.) It sounds like maybe there's a short, fast delay effect on it, along with some heavy EQ to make it sound kind of like a tiny AM radio, and maybe a little ring modulator. I think that a LOT of the voice actor's voice is coming through relatively intact, so a big part of the sound is just getting your own voice to sound right, and no effects will do that part for you.
All the above info is just me guessing based on having messed around with a lot of guitar pedals over the years, and done some amateur recording and sound-effects making, so I could be way off.

Resources