Polyphony with AudioRenderCallback and AudioUnitRenderFlag

Polyphony with AudioRenderCallback and AudioUnitRenderFlag - core-audio

I am getting my bearings in core audio / audio units, so please forgive me if this should be self evident.
If i want to allow for five voices, I need 5 buses on the mixer and at least 5 buffers if i want 5 different sounds. I have figured out a basic way to schedule a note by checking the time and using a start time. I think i should use ioActionFlags to indicate when I am rendering silence, but I don't quite get how.
I ask because, with 2 buses, I get buzzing when one is silent for a while but the other one plays.

If you are getting buzzing, it is probably what is known as the "satan saw", which is a sawtooth-sounding noise created by an uncleared buffer playing over and over again out of a channel. Any sound other than silence repeated in this manner will sound a bit like a sawtooth wave.
When you are rendering silence, you should simply clear out all of the samples in your output buffer to 0.0f for the given voice. I don't think that there is a way to stop the callback from trying to fetch your samples, and anyways, this is a much easier (and more portable) solution than fiddling around with the system's rendering setup.

Related

Using AVFoundation / CoreMedia / Other Frameworks to Detect Beats / Onsets

(Preface: This is my first audio-related question on Stack Overflow, so I'll try to word this as best as I possibly can. Edits welcome.)
I'm creating an application that'll allow users to loop music. At the moment our prototypes allow these "loop markers" (implemented as UISliders) to snap at every second, specifying the beginning and end of a loop. Obviously, when looping music, seconds are a very crude manner to handle this, so I would like to use beats instead.
I don't want to do anything other than mark beats for the UISliders to snap to:
Feed our loadMusic method an audio file.
Run it through a library to detect beats or the intervals between them (maybe).
Feed that value into the slider's setNumberOfTickMarks: method.
Profit!
Unfortunately, most of the results I've run into via Google and SO have yielded much more advanced beat detection libraries like those that remixers would use. Overkill in my case.
Is this something that CoreMedia, AVFoundation or AudioToolbox can handle? If not, are there other libraries that can handle this? My research into Apple's documentation has only yielded relevant results... for MIDI files. But Apple's own software have features like this, such as iMovie's snap-to-beats functionality.
Any guidance, code or abstracts would be immensely helpful at this point.
EDIT: After doing a bit more digging around, it seems the correct terminology for what I'm looking for is onset detection.

Onset Detection algorithms come in many flavors from looking at the raw music signal to using frequency domain techniques.
if you want a quick and easy way to determin where beats are:
Chop up the music signal into small segments (20-50ms chunks)
Compute the squared sum average of the signal: Sum(Xn ^2) / N (where N is the number of sample per 20-50ms)
If you want more sophisticated techniques look into:
https://adamhess.github.io/Onset_Detection_Nov302011.pdf
or for hardcore treatment of it:
https://scholar.google.com/citations?view_op=view_citation&hl=en&user=PMHXcoAAAAAJ&citation_for_view=PMHXcoAAAAAJ:uJ-U7cs_P_0C

multi channel sound with winmm, many WaveOutOpen?

I am trying to play a sound on Windows XP in multi-channel (parallel) manner.
I had read somewhere that playing parallel sounds with WinMM is maybe not possible,
but here is what I observe:
When I call WaveOutOpen() once, and then call many WaveOutWrite() many times
then sounds are not parallel - they are queued.
But when I call WaveOutOpen say nine times (and then store nine handles to it)
and then call nine times WaveOutWrite() with nine different sounds they are
played in parallel (multi-channel) - that is they are mixed.
It seems to work but I am not sure if it is okay because I don't find it stated clearly
in any tutorial or documentation.
It is okay to play sound in such 'many WaveOutOpen' way??

When I call WaveOutOpen() once, and then call many WaveOutWrite() many
times then sounds are not parallel - they are queued.
That's exactly what is supposed to happen. WaveOutWrite queues the next buffer. It allows you to send the audio you want to play in small chunks.
But when I call WaveOutOpen say nine times (and then store nine
handles to it) and then call nine times WaveOutWrite() with nine
different sounds they are played in parallel (multi-channel) - that is
they are mixed.
Again, this is correct and expected. This is the simplest way to play back many simultaneous sounds. If you want sample accurate mixing however, you should mix the audio samples yourself into one stream of samples and play that through a single WaveOut device.

I stand corrected with ability of waveOut* API to play sounds simultaneously and mixed.
Here is test code for the curious: http://www.alax.info/trac/public/browser/trunk/Utilities/WaveOutMultiPlay An application started with arguments abc plays on different threads sounds at 1, 5 and 15 kHz respectively and they mix well.
At the same time DirectShow Audio Renderer (WaveOut) Filter built on top of the same API is unable to play anything more that a single stream, for no visible reason.
FYI waveOutOpen API is retired since long ago and currently is wrapper on top of newer APIs. waveOutOpen assumes that audio output device is opened for exclusive use so there is no guarantee that multiply opened devices simultanesouly would produce mixed audio output. To achieve such behavior you would be better off with a newer audio API: DirectSound, DirectShow on top of DirectSound or WASAPI.

I suggest going with DirectSound if your product is for consumers.
From DirectX8 onwards the API is at the point where it is actually quite painless and most consumer machines will have it installed.

Windows 7 GDI Acceleration Mystery: Can we Enable it Programmatically? Yes we (kind of) can! But how?

Note: This might seem like a Super User question at first, but please read it completely -- it's a programming question.
So they removed GDI acceleration from Windows 7, and now the classic theme animations look horrible. And because it was a fundamental design change, there's no way to fix it, right?
Wrong!
I was really surprised today when I switched to Classic view (which turned off Aero) when VLC media player was running. Normally, the maximize/minimize animations look horrible (they barely even show up), but while VLC was running, the animations were perfect, just like on XP! As soon as I closed VLC, they became horrible again. (They were better when media was playing than when the player was idle.)
I'd reproduced this sometime before when I'd fired up a 3D game and noticed that the animations had improved, but I'd assumed that it it had been a DirectX-related issue. I'd tried to figure out which function calls had caused the improvement, but with no luck. So I was really surprised today when I noticed the same behavior with VLC, because it was not playing video, only audio (not even visualizations!) -- and yet playing audio improved my GDI graphics performance, making me think that maybe, just maybe, Windows 7 does have some sort of GDI acceleration after all. (?)
In the case this makes a difference, my graphics card is an NVIDIA GT 330M, and PowerMizer is off. I've controlled for every variable I can think of except for whether or not VLC was running, so I can pretty much rule out anything related to features of the graphics card.
So, now for my question:
Does anyone have any idea which API call(s) might be causing this improvement, and whether they are actually related to graphics or not?
I've tried making a program that calls IDirectDraw::CreateSurface and simply runs in the background (hoping that it would do the same thing as my 3D game did), but no; there wasn't any difference. I'm not even sure if it's a graphics-related API call that might be causing this, since like I said, VLC was playing music, not video. It's a mystery to me why the performance would improve when a multimedia app is running, so any insight to what's going on inside this would be appreciated. :)

It could just be a factor of the system clock tick period. Running VLC probably changes the clock tick to every 1ms, causing the animations to run more smoothly.
Use Clockres to check the system timer resolution with and without VLC running to see if it's making a difference.
See the timeBeginPeriod function for how to set the time period yourself. Keep in mind that the shorter the period, the less time your CPU will be able to sleep between ticks, and the hotter it will run.

Suggestions For Syncing Audio With Audio Queue Services?

I'm decoding a video format that has an accompanying audio track in a separate file. Per the specs, I render a frame of video every 1/75th second. And the length of the video file is the same as the length of the audio track.
I'm playing the audio with Audio Queue Services (which I chose because I figured there would be situations where I needed precise timing control -- just the sort of situation I'm encountering!). It's a big API and I haven't progressed much past the sample code in Apple's programming guide (though I have wrapped things up in a nicer ObjC API).
In ideal situations, things work fine with the basic playback setup. The video and audio stays synced and both end at the same time (within my own ability to tell the difference). However, if performance hiccups (or I attach the Leaks Instrument or something), they quickly get out of sync.
This is the first time I've ever written something of this nature: I have no prior experience with sound or video. I certainly have no experience with Audio Queue Services. So I'm not sure where to go from here.
Have you done something like this? Do you have some advice or tips or tricks to offer? Is there some fundamental piece of documentation I need to read? Any help would be greatly appreciated.

First off, I've never actually coded anything like this so I'm shooting from the hip. Also, I've done a decent amount of programming with the HAL and AUHAL but never with AudioQueue so my approach might not be the best way to use AQ.
Obviously the first thing to decide is whether to sync the audio to the video or the video to the audio. From the question it seems you've decided the video will be the master and the audio should sync to it.
I would approach this by keeping track of the number of frames of video rendered, along with the frame rate. Then, when enqueuing your audio buffers, rather than passing a monotonically increasing value for the startTime adjust the buffer's start time to match any discontinuities observed in the video. This is a bit vague because I don't know exactly where your audio is coming from or how you are enqueuing it, but hopefully the principle is clear.

Voice Alteration Algorithm

Could somebody point me to a voice alteration algorithm? Preferably in Java or C? Something that I could use to change a stream of recorded vocals into something that sounds like Optimus Prime. (FYI- Optimus Prime is the lead Autobot from transformers with a very distinctive sounding voice... not everybody may know this.) Is there an open source solution?

You can't just change the sample rate. The human voice has formants. What you want to do is move the formants. That should be your line of research.
Read about vocoders and filter banks.
Could you provide a link as example? I haven't seen the film so I'm just speculating.

Audacity is an open-source wave editor which includes effect filters - since it's open source you could see what algorithms they use.

Not knowing what it sounded like, I figured it would be a vocoder, but after listening to a few samples, it's definitely not a vocoder (or if it is, it's pretty low in the mix.) It sounds like maybe there's a short, fast delay effect on it, along with some heavy EQ to make it sound kind of like a tiny AM radio, and maybe a little ring modulator. I think that a LOT of the voice actor's voice is coming through relatively intact, so a big part of the sound is just getting your own voice to sound right, and no effects will do that part for you.
All the above info is just me guessing based on having messed around with a lot of guitar pedals over the years, and done some amateur recording and sound-effects making, so I could be way off.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio