IAudioClient - get notified when playback ends? - winapi

I continuously send data to IAudioClient (GetBufferSize / GetCurrentPadding / GetBuffer / ReleaseBuffer), but I want to know when the audio device finishes playing the last data I sent. I do not want to assume the player stopped just because I sent the last chunk of data to the device: it might still be playing the buffered data.
I tried using IAudioClock / IAudioClock2 to check the hardware buffer position, but it stays the same from the moment I send the last chunk.
I also don't see anything relevant in the IMMNotificationClient and IAudioSessionNotification interfaces...
What am I missing?
Thanks!

IMMNotificationClient and IAudioSessionNotification are not gonna help you, these are for detecting new devices / new application sessions respectively. As far as i know there's nothing in WASAPI that explicitly sends out an event when the last sample is consumed by the device (exclusive mode) or audio engine (shared mode). A trick I used in the past (albeit with DirectSound, but should work equally well with WASAPI) is to continuously check the available space in the audio buffer (for WASAPI, using GetCurrentPadding). After you send the last sample, immediately record the current padding, let's say it is N frames. Then keep writing zeroes to the AudioClient untill N frames have been processed (as reported by IAudioClock(2), or just guestimate using a timer), then stop the stream. Whether this works on an exclusive event-driven mode stream is a driver quality issue; the driver may choose to report the "real" playback position or just process it in chunks of full buffer size.

Related

Control Chromecast buffering at start

Is there a way to control the amount of buffering CC devices do before they start playback?
My sender apps sends real time audio flac and CC waits +10s before starting to play. I've built a customer receiver and tried to change the autoPauseDuration and autoResumeDuration but it does not seem to matter. I assume it's only used when an underflow event happens, but not at startup.
I realize that forcing a start with low buffering level might endup in underflow, but that's a "risk" that is much better than always waiting such a long time before playback starts. And if it happens, the autoPause/Resume hysteresis would allow a larger re-buffering to take place then.
If you are using the Media Player Library, take a look at player.getBufferDuration. The docs cover more details about how you can customize the player behavior: https://developers.google.com/cast/docs/player#frequently-asked-questions
Finally, it turned to be a problem with the way to send audio to the default receiver. I was streaming flac, and as it is a streamable format, I did not include any header (you might be able to start anywhere in the stream, it's just a matter of finding the synchro). But the flac decoder in the CC does not like that and was talking 10+ second to start. As soon as I've added a STREAMINFO header, problem went away

Best way to handle buffer under-runs?

I'm implementing the media handlers in Starboard, and I'm running into a situation where my client application in Cobalt doesn't buffer content aggressively enough. This results in it just idling with an empty buffer. What is the proper Starboard event to trigger when the platform's buffer is depleted? Should I be bubbling up an error somehow, or is there a signal I can give the client app to request more data?
When there is an underrun, the player implementation should handle it by pausing the video playback internally. To the end user the media playback is paused while the state of the media stack is still considered as "playing". This gives the player a chance to receive some video data before resuming playback again. In the reference implementation the PlayerWorker achieves this by pausing audio playback. As the media time and video playback are linked to the audio time, the whole player is paused.
When new data comes, the player should resume playback automatically. The player implementation may also choose to increase the amount of buffer required for preroll/resuming to avoid future underruns but this is usually not required.
As you mentioned that your app constantly runs into underruns. It is great to solve this for a better user experience even if underrun can be handled properly.
The first thing I'd check is that the test environment has enough network bandwidth for the requested video quality. If the app is targeted to a market with very poor network, consider buffer more media data.
If the app underruns when there is enough network bandwidth, it indicates that the media data is not processed fast enough. A good way is to check if kSbPlayerDecoderStateNeedsData is fired frequent enough and SbPlayerWriteSample() are called without much delay as this is the only place that moves media data across Starboard boundary.

Outputting Sound to Multiple Audio Devices Simultaneously

OK, the first issue. I am trying to write a virtual soundboard that will output to multiple devices at once. I would prefer OpenAL for this, but if I have to switch over to MS libs (I'm writing this initially on Windows 7) I will.
Anyway, the idea is that you have a bunch of sound files loaded up and ready to play. You're on Skype, and someone fails in a major way, so you hit the play button on the Price is Right fail ditty. Both you and your friends hear this sound at the same time, and have a good laugh about it.
I've gotten OAL to the point where I can play on the default device, and selecting a device at this point seems rather trivial. However, from what I understand, each OAL device needs its context to be current in order for the buffer to populate/propagate properly. Which means, in a standard program, the sound would play on one device, and then the device would be switched and the sound buffered then played on the second device.
Is this possible at all, with any audio library? Would threads be involved, and would those be safe?
Then, the next problem is, in order for it to integrate seamlessly with end-user setups, it would need to be able to either output to the default recording device, or intercept the recording device, mix it with the sound, and output it as another playback device. Is either of these possible, and if both are, which is more feasible? I think it would be preferable to be able to output to the recording device itself, as then the program wouldn't have to be running in order to have the microphone still work for calls.
If I understood well there are two questions here, mainly.
Is it possible to play a sound on two or more audio output devices simultaneously, and how to achieve this?
Is it possible to loop back data through a audio input (recording) device so that is is played on the respective monitor i.e for example sent through the audio stream of Skype to your partner, in your respective case.
Answer to 1: This is absolutely feasable, all independent audio outputs of your system can play sounds simultaneously. For example some professional audio interfaces (for music production) have 8, 16, 64 independent outputs of which all can be played sound simultaneously. That means that each output device maintains its own buffer that it consumes independently (apart from concurrency on eventual shared memory to feed the buffer).
How?
Most audio frameworks / systems provide functions to get a "device handle" which will need you to pass a callback for feeding the buffer with samples (so does Open AL for example). This will be called independently and asynchroneously by the framework / system (ultimately the audio device driver(s)).
Since this all works asynchroneously you dont necessarily need multi-threading here. All you need to do in principle is maintaining two (or more) audio output device handles, each with a seperate buffer consuming callback, to feed the two (or more) seperate devices.
Note You can also play several sounds on one single device. Most devices / systems allow this kind of "resources sharing". Actually, that is one purpose for which sound cards are actually made for. To mix together all the sounds produced by the various programs (and hence take off that heavy burden from the CPU). When you use one (physical) device to play several sounds, the concept is the same as with multiple device. For each sound you get a logical device handle. Only that those handle refers to several "channels" of one physical device.
What should you use?
Open AL seems a little like using heavy artillery for this simple task I would say (since you dont want that much portability, and probably dont plan to implement your own codec and effects ;) )
I would recommend you to use Qt here. It is highly portable (Win/Mac/Linux) and it has a very handy class that will do the job for you: http://qt-project.org/doc/qt-5.0/qtmultimedia/qaudiooutput.html
Check the example in the documentation to see how to play a WAV file, with a couple of lines of code. To play several WAV files simultaneously you simply have to open several QAudioOutput (basically put the code from the example in a function and call it as often as you want). Note that you have to close / stop the QAudioOutput in order for the sound to stop playing.
Answer to 2: What you want to do is called a loopback. Only a very limited number of sound cards i.e audio devices provide a so called loopback input device, which would permit for recording what is currently output by the main output mix of the soundcard for example. However, even this kind of device provided, it will not permit you to loop back anything into the microphone input device. The microphone input device only takes data from the microphone D/A converter. This is deep in the H/W, you can not mix in anything on your level there.
This said, it will be very very hard (IMHO practicably impossible) to have Skype send your sound with a standard setup to your conversation partner. Only thing I can think of would be having an audio device with loopback capabilities (or simply have a physical cable connection a possible monitor line out to any recording line in), and have then Skype set up to use this looped back device as an input. However, Skype will not pick up from your microphone anymore, hence, you wont have a conversation ;)
Note: When saying "simultaneous" playback here, we are talking about synchronizing the playback of two sounds as concerned by real-time perception (in the range of 10-20ms). We are not looking at actual synchronization on a sample level, and the related clock jitter and phase shifting issues that come into play when sending sound onto two physical devices with two independent (free running) clocks. Thus, when the application demands in phase signal generation on independent devices, clock recovery mechanisms are necessary, which may be provided by the drivers or OS.
Note: Virtual audio device software such as Virtual Audio Cable will provide virtual devices to achieve loopback functionnality in Windows. Frameworks such as Jack Audio may achieve the same in UX environment.
There is a very easy way to output audio on two devices at the same time:
For Realtek devices you can use the Audio-mixer "trick" (but this will give you a delay / echo);
For everything else (and without echo) you can use Voicemeeter (which is totaly free).
I have explained BOTH solutions in this video: https://youtu.be/lpvae_2WOSQ
Best Regards

Determining how Speex encoded audio differs from expected settings

I'm trying to integrate an application with another application that encodes audio using speex. However, when I decode audio sent from the first application to the second, I'm getting noise (not static, more like bleep-bloopy twangs).
I need to know where to look for the problem.
The first application can talk to other instances of itself. The second application can talk to other instances of itself. They just can't talk to each other.
The Speex settings are apparently mismatched, but I can't figure out which ones. I've compared the source line by line and it appears that they do the same setup. They both use narrow band mode. They both use the same parameters for enhancer (1), variable bit rate (0), quality (3), complexity (1), and sample rate (8000). The observed length of encoded frames matches, too.
In case it's any help, Here's some sample audio data, covering 6 frames from the beginning of a call (hopefully the parameters I mentioned are enough to decode it):
1dde5c800039ce70001ce7207b60000a39242d95
e8bda0cf21b6ec4629ad0f3b04290474110e70fb
1bdd3a9dfc211845e0ed90dabde11451e191186c
0ba5de5bea933ed1d3675f786947444781407e17
1bd5549fefa91b63d4968b299bf603d7e533b98c
6351b7953f4470d63bbb2b8c49be650ee89488b5
// at this point I get:
// notification: More than two wideband layers found. The stream is corrupted."
I'm at a bit of a lose. I don't know what to check next.
What are other reasons that audio data transferred from one computer to another, encoded with Speex, might end up being misinterpreted? I'm especially interested in the stupid reasons.
Self-answer: Check the entire data path from end to end, with logging at each point.
The issue we were having is that the audio was being encrypted with AES CTR mode but the apps were using different endian-ness on the counter. The first 32 bytes of audio made it through, making it seem like an encoding issue by having some non-noise, but the rest of the data was garbled.

AudioQueueOutputCallback not called at first

My question may be similar to this: Why might my AudioQueueOutputCallback not be called?
It seems that person was able to fix by running audio stuff on main thread. I cannot do that.
I enqueue buffers to prime audio Q, then start audio Q. Shouldn't those buffers complete immediately once I start my queue?
I am setting the data size correctly.
As a hack I just re-use buffers without waiting for them to be reported by cabllback as done. If I do this, I run for a couple of seconds like this, then the buffer callback starts working from them on.
definitely not a good idea to hack your way around with core audio.. while it may be a quick fix, it will definitely hurt you in ambiguous ways in the long run.
your problem isn't the same as the link you posted, their problem was assigning the callback on the wrong thread.. in your case, your callback is in the right thread, it's just that the audio buffers you are feeding it initially are either empty, too small or contains data not fit for audio playback
keep in mind that the purpose of the callback is to fire after each audio buffer supplied to the audio queue has been played (ie consumed).. the fact that after you start the queue the callback isn't being fired.. it means that there is nothing in the audio buffers for it to consume.. or too little meaningful information for it to consume..
when you do it manually you see a lag b/c the audio queue is trying to process the empty/erroneous buffers you supplied it.. then you resupply the same buffers with valid data that the queue eventually plays and then fires the callback
solution: compare the data you put in the buffers before starting the queue with the data you are supplying manually.. i'm sure there is a difference.. if that doesn't work please show your code for further analysis

Resources