How can I capture microphone data and route it to a virtual microphone device? - macos

Recently, I wanted to get my hands dirty with Core Audio, so I started working on a simple desktop app that will apply effects (eg. echo) on the microphone data in real-time and then the processed data can be used on communication apps (eg. Skype, Zoom, etc).
To do that, I figured that I have to create a virtual microphone, to be able to send processed (with the applied effects) data over communication apps. For example, the user will need to select this new microphone (virtual) device as Input Device in a Zoom call so that the other users in the call can hear her with her voiced being processed.
My main concern is that I need to find a way to "route" the voice data captured from the physical microphone (eg. the built-in mic) to the virtual microphone. I've spent some time reading the book "Learning Core Audio" by Adamson and Avila, and in Chapter 8 the author explains how to write an app that a) uses an AUHAL in order to capture data from the system's default input device and b) then sends the data to the system's default output using an AUGraph. So, following this example, I figured that I also need to do create an app that captures the microphone data only when it's running.
So, what I've done so far:
I've created the virtual microphone, for which I followed the NullAudio driver example from Apple.
I've created the app that captures the microphone data.
For both of the above "modules" I'm certain that they work as expected independently, since I've tested them with various ways. The only missing piece now is how to "connect" the physical mic with the virtual mic. I need to connect the output of the physical microphone with the input of the virtual microphone.
So, my questions are:
Is this something trivial that can be achieved using the AUGraph approach, as described in the book? Should I just find the correct way to configure the graph in order to achieve this connection between the two devices?
The only related thread I found is this, where the author states that the routing is done by
sending this audio data to driver via socket connection So other apps that request audio from out virtual mic in fact get this audio from user-space application that listen for mic at the same time (so it should be active)
but I'm not quite sure how to even start implementing something like that.
The whole process I did for capturing data from the microphone seems quite long and I was thinking if there's a more optimal way to do this. The book seems to be from 2012 with some corrections done in 2014. Has Core Audio changed dramatically since then and this process can be achieved more easily with just a few lines of code?

I think you'll get more results by searching for the term "play through" instead of "routing".
The Adamson / Avila book has an ideal play through example that unfortunately for you only works for when both input and output are handled by the same device (e.g. the built in hardware on most mac laptops and iphone/ipad devices).
Note that there is another audio device concept called "playthru" (see kAudioDevicePropertyPlayThru and related properties) which seems to be a form of routing internal to a single device. I wish it were a property that let you set a forwarding device, but alas, no.
Some informal doco on this: https://lists.apple.com/archives/coreaudio-api/2005/Aug/msg00250.html
I've never tried it but you should be able to connect input to output on an AUGraph like this. AUGraph is however deprecated in favour of AVAudioEngine which last time I checked did not handle non default input/output devices well.
I instead manually copy buffers from the input device to the output device via a ring buffer (TPCircularBuffer works well). The devil is in the detail, and much of the work is deciding on what properties you want and their consequences. Some common and conflicting example properties:
minimal lag
minimal dropouts
no time distortion
In my case, if output is lagging too much behind input, I brutally dump everything bar 1 or 2 buffers. There is some dated Apple sample code called CAPlayThrough which elegantly speeds up the output stream. You should definitely check this out.
And if you find a simpler way, please tell me!
Update
I found a simpler way:
create an AVCaptureSession that captures from your mic
add an AVCaptureAudioPreviewOutput that references your virtual device
When routing from microphone to headphones, it sounded like it had a few hundred milliseconds' lag, but if AVCaptureAudioPreviewOutput and your virtual device handle timestamps properly, that lag may not matter.

Related

HAL plugin buffer size kAudioDevicePropertyBufferFrameSize

I'm working on a HAL virtual audio device.
I'm having problems getting the correct buffer size from the virtual audio device to my application...
How would I implement the properties kAudioDevicePropertyBufferFrameSize or kAudioDevicePropertyBufferFrameSizeRange to my virtual HAL device...
How would I do if I want to implement them to the apple nullaudio example found here: https://developer.apple.com/documentation/coreaudio/creating_an_audio_server_driver_plug-in
I tried to add them to my device the sam way as kAudioDevicePropertyNominalSampleRate is added to the nullAudio.c example. but with no success...
You have to set kAudioDevicePropertyBufferFrameSize in your client application (using AudioObjectSetPropertyData).
You can't control the kAudioDevicePropertyBufferFrameSize property from an AudioServerPlugin. It's only used by client processes to set the size of the IO buffers their IO procs receive.
When several clients use your device at the same time, CoreAudio lets them all use different IO buffer sizes (which might not be multiples/factors of each other), so your plug-in has to handle buffers of various sizes.
Source: https://lists.apple.com/archives/coreaudio-api/2013/Mar/msg00152.html
I'm not completely sure, but as far as I can tell, you can't control kAudioDevicePropertyBufferFrameSizeRange from an AudioServerPlugin either.

Sending Bluetooth Advertising Packets and Getting Some Answers

I want to build something with Raspberry Pi Zero and write in Go,
I never tried bluetooth before and my goal is;
Sending a dynamic packet which it will change every second, an iOS app will expand this message and with a button, client will send a message back without a connection.
Is Bluetooth Advertising what I am looking for and do you know any GoLang library for it? Where should I start?
There are quite a lot of parts to your question. If you want to be connection-less then the BLE roles are Broadcaster (beacon) and Observer (scanner). There are a number of "standard" beacon formats out there. They are summarized nicely on this cheat sheet
Of course you can create your own format as these are using either the Service Data or Manufacturing Data in a BLE advertisement.
On Linux (Raspberry Pi) the official Bluetooth stack is BlueZ which documents the API's available at: https://git.kernel.org/pub/scm/bluetooth/bluez.git/tree/doc
If you want to be connection-less then each device is going to have to change it's role regularly. This requires a bit of careful thought on how long each is listening and broadcasting as you don't want them always talking at the same time and listening at the same time.
You might find the following article of interest to get you started with BLE and Go Lang:
https://towardsdatascience.com/spelunking-bluetooth-le-with-go-c2cff65a7aca

HAL virtual device: how to "proxy" microphone

I`m trying to create "virtual microphone", that should work "in front" of default input device/microphone. So when user select "virtual microphone" as input (in Audacity, for example) and starts to record sound - Audacity will receive samples from virtual device driver, that was taken by driver from real/default microphone. So "virtual microphone" is a kind of proxy device for real (default/built-in/etc) one. This is needed for later processing of microphone input on-the-fly.
So far i created virtual HAL device (based on NullAudio.c driver example from Apple), i can generate procedural sound for Audacity, but i still can not figure out the way to read data from real microphone (using its deviceID) from inside driver.
Is it ok to use normal recording as in usual app (via AudioUnits/AURemoteIO/AUHAL), etc? Or something like IOServices should be used?
Documentation states that
An AudioServerPlugIn operates in a limited environment. First and
foremost, an AudioServerPlugIn may not make any calls to the client
HAL API in the CoreAudio.framework. This will result in undefined
(but generally bad) behavior.
but it is not clear what API is "client" API and what is not, in regard of reading microphone data.
What kind of API can/should be used from virtual device driver for accessing real microphone data in realtime?

Windows API to subscribe for VoIP activities like "Sound -> Communications" does?

Situation: at Windows "Control Panel", you can visit "Sound" widget and switch to "Communications" tab. There, you can configure how much %% the OS should reduce all other sounds if we have incoming VoIP call ringing (to not miss the call, indeed).
Question: is there any API that allows a developer to subscribe and react on such events too? (let say, auto-pause your game app, or "do not disturb" auto-status for the call duration in your messenger app, or any other smart thing you can do for better user experience).
Note: I'm looking for OS-wide API, not "SDK for VoIP app X only".
It turns out that the Microsoft term for this is Custom Ducking Behavior. The seemingly-odd name is explained by the Wikipedia page on ducking:
Ducking is an audio effect commonly used in radio and pop music,
especially dance music. In ducking, the level of one audio signal is
reduced by the presence of another signal. In radio this can typically
be achieved by lowering (ducking) the volume of a secondary audio
track when the primary track starts, and lifting the volume again when
the primary track is finished. A typical use of this effect in a daily
radio production routine is for creating a voice-over: a foreign
language original sound is dubbed (and ducked) by a professional
speaker reading the translation. Ducking becomes active as soon as the
translation starts.
From the MSDN, the APIs you need to implement custom ducking behavior are COM-based. In summary:
MMDevice API for multimedia device enumeration and selection.
WASAPI for accessing the communications capture and render device, stream management operations, and handling ducking events.
WAVE APIs for accessing the communications device and capturing audio input.
Code samples to implement the functionality you want are available at the respective MSDN pages.

Is it possible to capture the rendering audio session from another process?

I am taking my first dives in to the WASAPI system of windows and I do not know if what I want is even possible with the windows API.
I am attempting to write program that will record the sound from various programs and break each in to a separate recorded track/audio file. From the reseacrch I have done I know the unit I need to record is the various audio sessions being rendered to a endpoint, and the normal way of recording is by taking the render endpoint and performing a loopback. However from what I have read so far in the MSDN the only interaction with sessions I can do is through IAudioSessionControl and that does not provide me with a way to get a copy of the stream for the session.
Am I missing something that would allow me to do this with the WASAPI (or some other windows API) and get the individual sessions (or individual streams) before they are mixed together to form the endpoint or is this a imposable goal?
The mixing takes place inside the API (WASAPI) and you don't have access to buffers of other audio clients, esp. that they don't exist in the context of the current process in first place. Perhaps one's best (not so good, but there are no better alternatives) way would be to hook the API calls and intercept data on its way to WASAPI, if the task in question permits dirty tricks like this.

Resources