Best backwards compatible way to programmatically capture sound going to speakers - windows

What would be the best approach to take in order to achieve a backwards compatible (Windows XP through Windows 7) way to capture sound that is being sent to the speakers on a machine, even if the audio driver doesn't expose the "Stereo Mix" recording device.
For extra points, it would be great if this approach allowed for capturing sound from a single application only (excluding sounds from other apps and Windows itself).
Is writing a loopback audio driver the only solution?

AFAIK there is no realiable way of doing this. Even if a card does have something like a stereo mix its a nightmare under XP and below due to a lack of coherent interfaces. CoreAudio under Vista and 7 massively improves matters but if you don't have stereo mix then you won't get far ...

Related

Audio for Windows in Assembly

I'm thinking of making a game in 8086 ASM using VGA for graphics, but before I proceed with anything I want to make sure that I can get audio into my project. I doubt PC Speaker will be sufficient.
I'm looking for a way to program music in 8086 for Windows. Is there some kind of standard in modern sound cards that I can access directly, or will I have to use the Windows API? I'm not really sure what to look for at this point, so any suggestions are welcome.
Unlike the displays, which ultimately converged to (S)VGA, soundcards never reached the same compatibility. There were different Sound Blasters, Gravis Ultrasound and others. These days the modern hardware is often incompatible with those and you cannot program them in DOS as the Sound Blaster without having a proper DOS driver or without knowing the supported memory regions, ports, formats and protocols.
I can only suggest writing such a program for a PC/DOS emulator like DosBox, which emulates Sound Blaster and (S)VGA. That should work.
Alternatively, you can just write a normal Windows program, using Win32 APIs for input, drawing and sound.

Can I programatically save the data stream sent to the sound card as a WAV file?

In Windows XP, you can configure your sound card properties via the preloaded windows software. In the recording properties, if "stereo mix" or "wave out" (or something similar) is selected as the recording device, programs that can record audio ("Sound Recorder" in windows for example) record a decent quality wave file of the audio stream. I usually use Goldwave from download.com to do this as an example of a third-party application that functions the same.
Well, I've had trouble getting this scenario to happen on Windows Vista or later in a direct no-bullsh*t manner as described above. It's more than just Vista+, it's also that some sound cards don't have that option at all.
I was just wondering if there is a way to run a windows-friendly program (VB?) that takes your audio output stream and converts it (in realtime, obviously) to a WAV file with the default sampling rate as other WAV files have.
Ideally, it would cool if it worked on any operating system, so is it possible to write a web service that "listens" to your audio card like that without making the computer think it's getting a virus attack or something?
Possibly related question:
How to save web audio streaming to file ( c++ / java )
I'm only aware of one manufacturer of sound cards that enabled that option (Creative). However Vista and beyond support a "loopback" mode which gives you effectively the same functionality. You need to use the low level WASAPI rendering stack but it should work just fine.
https://github.com/rdp/virtual-audio-output-sniffer provides a directshow input device to capture the sum of wave out for vista+
You could use low level waveOut API injection and capture what it receives.
I have SkypeMXrecorder, a software that does just that - inject into any exe and 'sniffs' what's going out from it and into the sound hardware. But, it seems rather complicated to implement...

How to capture PCM data from Wave Out

How would it be possible to capture the audio programmatically? I am implementing an application that streams in real time the desktop on the network. The video part is finished. I need to implement the audio part. I need a way to get PCM data from the sound card to feed to my encoder (implemented using Windows Media Format).
I think the answer is related to the openMixer(), waveInOpen() functions in Win32 API, but I am not sure exactly what should I do.
How to open the necessary channel and how to read PCM data from it?
Thanks in advance.
The new Windows Vista Core Audio APIs have support for this explicitly (called Loopback Recording), so if you can live with a Vista only application this is the way to go.
See the Loopback Recording article on MSDN for instructions on how to do this.
I don't think there is a direct way to do this using the OS - it's a feature that may (or may not) be present on the sound card. Some sound cards have a loopback interface - Creative calls it "What U Hear". You simply select this as the input rather than the microphone, and record from it using the normal waveInOpen() that you already know about.
If the sound card doesn't have this feature then I think you're out of luck other than by doing something crazy like making your own driver. Or you could convince your users to run a cable from the speaker output to the line input :)

Sound processing: Should I use DirectSound or directly Win32 APIs?

I'm making an application where I will:
Record from the microphone and do some realtime processing on the input
Play an MP3 file (a regular song), but manipulating the output in realtime
Every now and then I'll need to play additional sounds over this song too, but I guess I can do that by simply adding the buffers.
In short, I need to have circular buffers for both recording and playing, and I need to be "feeding" the output buffer every 20 ms or so with the new data that is just about to be played.
I've been looking at DirectSound, and it doesn't seem to help a lot. The reading and writing to the output buffers seem very similar to Win32, the only place where it seems it'd help is in playing the "additional sounds" over the main song.
Should I use DirectSound, or should I go straight to raw Windows APIs?
Is DirectSound going to do anything for me?
Thanks in Advance!
The Directsound API's give you better realtime control. They are also the supported way to use sound in Windows. I was under the impression that the win32 api's were depracated, but I could be wrong on this.
This question is close to yours
https://stackoverflow.com/questions/314522/what-is-the-best-c-sound-api-for-windows
also
Is DirectSound the best audio abstraction layer for Windows?
last but not least, this is what microsoft has to say http://msdn.microsoft.com/en-us/library/dd370784(VS.85).aspx
Neither? :)
The story is that DirectSound is the replacement for waveOut, but DirectSound joined DirectInput as deprecated APIs in Vista and is replaced with WASAPI. DirectSound and waveOut are implemented on top of the User-Space WASAPI in Vista. On XP, waveOut and DirectSound feed to the same kernel level Mixer API.
To consolidate all of these interfaces take a look at something like OpenAL, it's a well supported audio standard along the same lines as OpenGL.
It sounds like you're going to be quite sensitive to latency. It might pay to look at ASIO
I found Harmony Central - Audio Programming. Also read w:DirectSound.
Windows Vista features a completely
re-written audio stack based on the
Universal Audio Architecture. Because
of the architectural changes in the
redesigned audio stack, a direct path
from DirectSound to the audio drivers
does not exist.
Because of Xbox 360 and Microsoft
Windows integration, Microsoft is
actively pushing developers to migrate
new applications to equivalent Xbox
audio APIs such as XAudio and XACT.
OpenAL looks promising.

How do I capture the audio that is being played?

Does anyone know how to programmatically capture the sound that is being played (that is, everything that is coming from the sound card, not the input devices such as a microphone).
Assuming that you are talking about Windows, there are essentially three ways to do this.
The first is to open the audio device's main output as a recording source. This is only possible when the driver supports it, although most do these days. Common names for the virtual device are "What You Hear" or "Wave Out". You will need to use a suitable API (see WaveIn or DirectSound in MSDN) to do the capturing.
The second way is to write a filter driver that can intercept the audio stream before it reaches the physical device. Again, this technique will only work for devices that have a suitable driver topology and it's certainly not for the faint-hearted.
This means that neither of these options will be guaranteed to work on a PC with arbitrary hardware.
The last alternative is to use a virtual audio device, such as Virtual Audio Cable. If this device is set as the defualt playback device in Windows then all well-behaved apps will play through it. You can then record from the same device to capture the summed output. As long as you have control over the device that the application you want to record uses then this option will always work.
All of these techniques have their pros and cons - it's up to you to decide which would be the most suitable for your needs.
You can use the Waveform Audio Interface, there is an MSDN article on how to access it per PInvoke.
In order to capture the sound that is being played, you just need to open the playback device instead of the microphone. Open for input, of course, not for output ;-)
If you were using OSX, Audio Hijack Pro from Rogue Amoeba probably is the easiest way to go.
Anyway, why not just looping your audio back into your line in and recording that? This is a very simple solution. Just plug a cable in your audio output jack and your line in jack and start recordung.
You have to enable device stero mix. if you do this, direct sound find this device.

Resources