I am trying to play a sound on Windows XP in multi-channel (parallel) manner.
I had read somewhere that playing parallel sounds with WinMM is maybe not possible,
but here is what I observe:
When I call WaveOutOpen() once, and then call many WaveOutWrite() many times
then sounds are not parallel - they are queued.
But when I call WaveOutOpen say nine times (and then store nine handles to it)
and then call nine times WaveOutWrite() with nine different sounds they are
played in parallel (multi-channel) - that is they are mixed.
It seems to work but I am not sure if it is okay because I don't find it stated clearly
in any tutorial or documentation.
It is okay to play sound in such 'many WaveOutOpen' way??
When I call WaveOutOpen() once, and then call many WaveOutWrite() many
times then sounds are not parallel - they are queued.
That's exactly what is supposed to happen. WaveOutWrite queues the next buffer. It allows you to send the audio you want to play in small chunks.
But when I call WaveOutOpen say nine times (and then store nine
handles to it) and then call nine times WaveOutWrite() with nine
different sounds they are played in parallel (multi-channel) - that is
they are mixed.
Again, this is correct and expected. This is the simplest way to play back many simultaneous sounds. If you want sample accurate mixing however, you should mix the audio samples yourself into one stream of samples and play that through a single WaveOut device.
I stand corrected with ability of waveOut* API to play sounds simultaneously and mixed.
Here is test code for the curious: http://www.alax.info/trac/public/browser/trunk/Utilities/WaveOutMultiPlay An application started with arguments abc plays on different threads sounds at 1, 5 and 15 kHz respectively and they mix well.
At the same time DirectShow Audio Renderer (WaveOut) Filter built on top of the same API is unable to play anything more that a single stream, for no visible reason.
FYI waveOutOpen API is retired since long ago and currently is wrapper on top of newer APIs. waveOutOpen assumes that audio output device is opened for exclusive use so there is no guarantee that multiply opened devices simultanesouly would produce mixed audio output. To achieve such behavior you would be better off with a newer audio API: DirectSound, DirectShow on top of DirectSound or WASAPI.
I suggest going with DirectSound if your product is for consumers.
From DirectX8 onwards the API is at the point where it is actually quite painless and most consumer machines will have it installed.
Related
Is it possible to get a list of audio cards (not endpoints) in Win32?
This information would be really useful when constructing full-duplex audio streams, to be sure both input and output share the same hardware clock.
So far, I found PKEY_DeviceInterface_FriendlyName, which comes close, but probably cannot be used when 2 exact same audio cards are plugged in.
I also found Enumerating audio adapters in WinAPI, and while the WMI query in the accepted answer retrieves the results I'm looking for, I see no easy way to correlate those to a WASAPI endpoint device id.
Turns out my premise was wrong. Apparently just because multiple endpoints reside on the same physical device it does not mean they share the same clock (although it might be the case). See here: https://portaudio.music.columbia.narkive.com/0qYpAMkP/understanding-multiple-streams and here: https://audiophilestyle.com/forums/topic/19715-hq-player/page/584/, so that basically defeats the purpose of my question. Thanks for the help anyway, everyone.
I'm trying to learn how to use the Allegro 5 game programming library. I'm wondering how I can find out which library functions are threadsafe. I understand how to use mutexes to ensure safety in my own code, but the amount that I may need to use them when calling Allegro's own functions is unclear to me.
The Allegro FAQ says it is threadsafe, and links to this thread. However, that thread isn't very helpful because the "really good article" linked in the first comment is a dead link, and the conclusion of the commentors seems to be "Allegro is mostly threadsafe", with no indication about which parts may not be.
ALLEGRO relies internally on OpenGL (by default) for it's graphics routines, so they are not guaranteed to be thread safe. You can assume the same to be true for audio. All other functions though are indeed thread-safe:
Synchronization routines (mutex,cond...)
Timers
Filesystem and IO
What I do in my programs, is make all graphics calls from a single thread, and all audio calls from a single thread (not necessarily the same). All other threads use ALLEGRO sync routines to synchronize with graphics and audio.
NOTE:
Just to clarify, I meant that you shouldn't DRAW from two threads simultaneously. It's alright to create, copy etc. DIFFERENT bitmaps from different threads simultaneously, so long as you don't draw them on the screen.
NOTE 2:
I feel like this is obvious, but you shouldn't write to any object in any programming language from two different threads simultaneously.
(Preface: This is my first audio-related question on Stack Overflow, so I'll try to word this as best as I possibly can. Edits welcome.)
I'm creating an application that'll allow users to loop music. At the moment our prototypes allow these "loop markers" (implemented as UISliders) to snap at every second, specifying the beginning and end of a loop. Obviously, when looping music, seconds are a very crude manner to handle this, so I would like to use beats instead.
I don't want to do anything other than mark beats for the UISliders to snap to:
Feed our loadMusic method an audio file.
Run it through a library to detect beats or the intervals between them (maybe).
Feed that value into the slider's setNumberOfTickMarks: method.
Profit!
Unfortunately, most of the results I've run into via Google and SO have yielded much more advanced beat detection libraries like those that remixers would use. Overkill in my case.
Is this something that CoreMedia, AVFoundation or AudioToolbox can handle? If not, are there other libraries that can handle this? My research into Apple's documentation has only yielded relevant results... for MIDI files. But Apple's own software have features like this, such as iMovie's snap-to-beats functionality.
Any guidance, code or abstracts would be immensely helpful at this point.
EDIT: After doing a bit more digging around, it seems the correct terminology for what I'm looking for is onset detection.
Onset Detection algorithms come in many flavors from looking at the raw music signal to using frequency domain techniques.
if you want a quick and easy way to determin where beats are:
Chop up the music signal into small segments (20-50ms chunks)
Compute the squared sum average of the signal: Sum(Xn ^2) / N (where N is the number of sample per 20-50ms)
If you want more sophisticated techniques look into:
https://adamhess.github.io/Onset_Detection_Nov302011.pdf
or for hardcore treatment of it:
https://scholar.google.com/citations?view_op=view_citation&hl=en&user=PMHXcoAAAAAJ&citation_for_view=PMHXcoAAAAAJ:uJ-U7cs_P_0C
I am getting my bearings in core audio / audio units, so please forgive me if this should be self evident.
If i want to allow for five voices, I need 5 buses on the mixer and at least 5 buffers if i want 5 different sounds. I have figured out a basic way to schedule a note by checking the time and using a start time. I think i should use ioActionFlags to indicate when I am rendering silence, but I don't quite get how.
I ask because, with 2 buses, I get buzzing when one is silent for a while but the other one plays.
If you are getting buzzing, it is probably what is known as the "satan saw", which is a sawtooth-sounding noise created by an uncleared buffer playing over and over again out of a channel. Any sound other than silence repeated in this manner will sound a bit like a sawtooth wave.
When you are rendering silence, you should simply clear out all of the samples in your output buffer to 0.0f for the given voice. I don't think that there is a way to stop the callback from trying to fetch your samples, and anyways, this is a much easier (and more portable) solution than fiddling around with the system's rendering setup.
I'm decoding a video format that has an accompanying audio track in a separate file. Per the specs, I render a frame of video every 1/75th second. And the length of the video file is the same as the length of the audio track.
I'm playing the audio with Audio Queue Services (which I chose because I figured there would be situations where I needed precise timing control -- just the sort of situation I'm encountering!). It's a big API and I haven't progressed much past the sample code in Apple's programming guide (though I have wrapped things up in a nicer ObjC API).
In ideal situations, things work fine with the basic playback setup. The video and audio stays synced and both end at the same time (within my own ability to tell the difference). However, if performance hiccups (or I attach the Leaks Instrument or something), they quickly get out of sync.
This is the first time I've ever written something of this nature: I have no prior experience with sound or video. I certainly have no experience with Audio Queue Services. So I'm not sure where to go from here.
Have you done something like this? Do you have some advice or tips or tricks to offer? Is there some fundamental piece of documentation I need to read? Any help would be greatly appreciated.
First off, I've never actually coded anything like this so I'm shooting from the hip. Also, I've done a decent amount of programming with the HAL and AUHAL but never with AudioQueue so my approach might not be the best way to use AQ.
Obviously the first thing to decide is whether to sync the audio to the video or the video to the audio. From the question it seems you've decided the video will be the master and the audio should sync to it.
I would approach this by keeping track of the number of frames of video rendered, along with the frame rate. Then, when enqueuing your audio buffers, rather than passing a monotonically increasing value for the startTime adjust the buffer's start time to match any discontinuities observed in the video. This is a bit vague because I don't know exactly where your audio is coming from or how you are enqueuing it, but hopefully the principle is clear.