How would it be possible to capture the audio programmatically? I am implementing an application that streams in real time the desktop on the network. The video part is finished. I need to implement the audio part. I need a way to get PCM data from the sound card to feed to my encoder (implemented using Windows Media Format).
I think the answer is related to the openMixer(), waveInOpen() functions in Win32 API, but I am not sure exactly what should I do.
How to open the necessary channel and how to read PCM data from it?
Thanks in advance.
The new Windows Vista Core Audio APIs have support for this explicitly (called Loopback Recording), so if you can live with a Vista only application this is the way to go.
See the Loopback Recording article on MSDN for instructions on how to do this.
I don't think there is a direct way to do this using the OS - it's a feature that may (or may not) be present on the sound card. Some sound cards have a loopback interface - Creative calls it "What U Hear". You simply select this as the input rather than the microphone, and record from it using the normal waveInOpen() that you already know about.
If the sound card doesn't have this feature then I think you're out of luck other than by doing something crazy like making your own driver. Or you could convince your users to run a cable from the speaker output to the line input :)
Related
I have been working with CoreAudio last couple of days and I'm able to access all the AudioDevices, their properties and receive notifications when something changes. However, I'm now struggling to "intercept" the audio (ideally before it is played) from any application in the mac. I wasn't sure it was possible but this app actually does it.
So I'm in search of some guidance. Thank you in advance.
Unfortunately macOS provides no facility to do this.
The only way to achieve this is to patch lower level frameworks and/or drivers like Audio Hijack does. I will be a messy process.
Depending on the use case however it might suffice to have an audio loopback virtual audio device so application A can send audio into it while application B can receive the audio from it. This can be done by creating a Core Audio AudioServerPlugin which is a user space audio driver mechanism. For an example please have a look at:
https://github.com/ExistentialAudio/BlackHole
https://github.com/kyleneideck/BackgroundMusic
In regards to the limitations, you can read more about it here:
https://stackoverflow.com/a/18595698/2576876
From there you can go in a lot of directions by following the links provided.
What i really want to achieve is this-->
Suppose i play an audio file(using my application) which can either be streamed from the internet/or accessed directly from the local storage.
Now i want to configure SAPI to listen to this source instead of the microphone and convert the speech from the audio to text like it does normally.
I do not think SAPI supports this itself.
There are some approaches you could use that are "external" to SAPI:
Get a male-to-male miniplug cable and plug your soundcard's output into your soundcard's input
Use Virtual Audio Cable which basically achieves #1 but with virtual soundcard software instead of hardware. It can be very tricky at first to understand how Virtual Audio Cable works, and how to use it, but it does work very well once you figure it out.
Some soundcards have a built-in loopback feature, which allows you to record what the soundcard is playing instead of recording from e.g. a microphone. Here are some good info links: What U Hear and Stereo Mix. Also try Googling those terms for more info.
Only WAV seems to be supported out of the box - See here
Quoting:
The wav file input scenario is special because it uses controlled, reproducible audio input and requires a dedicated SR engine, without interference from other applications (e.g., a shared desktop microphone). The file input scenario should use a generic SAPI audio stream connected to the input wav file and an InProc SR engine.
In Windows XP, you can configure your sound card properties via the preloaded windows software. In the recording properties, if "stereo mix" or "wave out" (or something similar) is selected as the recording device, programs that can record audio ("Sound Recorder" in windows for example) record a decent quality wave file of the audio stream. I usually use Goldwave from download.com to do this as an example of a third-party application that functions the same.
Well, I've had trouble getting this scenario to happen on Windows Vista or later in a direct no-bullsh*t manner as described above. It's more than just Vista+, it's also that some sound cards don't have that option at all.
I was just wondering if there is a way to run a windows-friendly program (VB?) that takes your audio output stream and converts it (in realtime, obviously) to a WAV file with the default sampling rate as other WAV files have.
Ideally, it would cool if it worked on any operating system, so is it possible to write a web service that "listens" to your audio card like that without making the computer think it's getting a virus attack or something?
Possibly related question:
How to save web audio streaming to file ( c++ / java )
I'm only aware of one manufacturer of sound cards that enabled that option (Creative). However Vista and beyond support a "loopback" mode which gives you effectively the same functionality. You need to use the low level WASAPI rendering stack but it should work just fine.
https://github.com/rdp/virtual-audio-output-sniffer provides a directshow input device to capture the sum of wave out for vista+
You could use low level waveOut API injection and capture what it receives.
I have SkypeMXrecorder, a software that does just that - inject into any exe and 'sniffs' what's going out from it and into the sound hardware. But, it seems rather complicated to implement...
Does anyone know how to programmatically capture the sound that is being played (that is, everything that is coming from the sound card, not the input devices such as a microphone).
Assuming that you are talking about Windows, there are essentially three ways to do this.
The first is to open the audio device's main output as a recording source. This is only possible when the driver supports it, although most do these days. Common names for the virtual device are "What You Hear" or "Wave Out". You will need to use a suitable API (see WaveIn or DirectSound in MSDN) to do the capturing.
The second way is to write a filter driver that can intercept the audio stream before it reaches the physical device. Again, this technique will only work for devices that have a suitable driver topology and it's certainly not for the faint-hearted.
This means that neither of these options will be guaranteed to work on a PC with arbitrary hardware.
The last alternative is to use a virtual audio device, such as Virtual Audio Cable. If this device is set as the defualt playback device in Windows then all well-behaved apps will play through it. You can then record from the same device to capture the summed output. As long as you have control over the device that the application you want to record uses then this option will always work.
All of these techniques have their pros and cons - it's up to you to decide which would be the most suitable for your needs.
You can use the Waveform Audio Interface, there is an MSDN article on how to access it per PInvoke.
In order to capture the sound that is being played, you just need to open the playback device instead of the microphone. Open for input, of course, not for output ;-)
If you were using OSX, Audio Hijack Pro from Rogue Amoeba probably is the easiest way to go.
Anyway, why not just looping your audio back into your line in and recording that? This is a very simple solution. Just plug a cable in your audio output jack and your line in jack and start recordung.
You have to enable device stero mix. if you do this, direct sound find this device.
I'd like to pull a stream of PCM samples from a Mac's line-in or built-in mic and do a little live analysis (the exact nature doesn't pertain to this question, but it could be an FFT every so often, or some basic statistics on the sample levels, or what have you).
What's a good fit for this? Writing an AudioUnit that just passes the sound through and incidentally hands it off somewhere for analysis? Writing a JACK-aware app and figuring out how to get it to play with the JACK server? Ecasound?
This is a cheesy proof-of-concept hobby project, so simplicity of API is the driving factor (followed by reasonable choice of programming language).
The principal framework for audio development in Mac OS X is Core Audio; it's the basis for all audio I/O. There are layers on top of it like Audio Toolbox, Audio Queue Services, QuickTime, and QTKit that you can use if you want a simplified API for common tasks.
To just pull a stream of samples, you'd probably want to use Audio Queue Services; the AudioQueueNewInput function will set up recording of PCM data and pass it to a callback you supply.
On your Mac there's a set of Core Audio examples in /Developer/Examples/CoreAudio/SimpleSDK that includes a use (AQRecord in AudioQueueTools) of the Audio Queue Services recording APIs.
I think portaudio is what you need.
Reading from the mike from a console app is a 10 line C file (see patests in the portaudio distrib).
Apple provides sample code for reading and writing audio data. Additionally there is a lot of good information in the Audio section of the Apple Developer site.