How Does Windows Handle Non-Realtime USB Audio Sources? - windows

I am currently researching the feasibility of making a device that outputs PCM audio through the USB audio class streaming interface. This device has its own clock, and importantly does not generate samples at a multiple of 1 hz as the USB spec can specify, and produces packets in asynchronous mode. How does Windows handle it when a USB audio stream is consistently giving it samples at a rate above or below what the USB descriptor indicates, and at what level of the OS is this handled?
Second (and depending on the answer to the first question this may be already answered), the entire purpose of this project would be to capture this digital audio in its native format and sampling rate. What application Windows APIs would provide the exact PCM input from the USB audio stream, with no interpolation or other alterations or artifacts?

IDK about Windows in specific here, but Java on Windows would likely be set up to read the data as an AudioInputStream and would output as a SourceDataLine.
As far as timing issues, PCM processed by the SourceDataLine will be configured to a given sample rate and byte structure (configuration details provided in an AudioFormat class). The code underlying a SourceDataLine employs a buffer and a I think some sort of BlockingQueue or the native code implementation of something similar. I'm not entirely clear on this latter detail.
But the gist is that the SourceDataLine will suspend operation until it is able to fulfill its task. Thus, if the native code's DAC function is not ready, the SourceDataLine thread will suspend and wait until the output stage is ready to accept the next block of data for processing.
There are multiple transmission layers on the incoming data, much of which I don't know enough about. But I presume that if you have a way of assembling the incoming packets into a stream (with whatever buffering is required), then you should be able to receive and play PCM. Surely there are structures in C that would provide the equivalent functions of the Java classes I cited.

Related

What API should be used for real-time audio in OSX?

I am looking to build an application that requires real-time control (or as good as possible) control of audio output in OSX.
I need the ability to send samples of audio to the sound-card with as much control as possible, no delays as when the audio frames are sent will be closely tied to a timer event run via the clock.
Is the Audio Queue what I am looking for?
Audio Units can be configured for the lowest latency (using short buffers of raw PCM sample) in OS X and iOS. The Audio Queue API is built on top of Audio Units, and may have buffering overhead, thus increasing latency.

(libusb) Confusion about continous isochronous USB streams

I am using a 32-bit AVR microcontroller (AT32UC3A3256) with High speed USB support. I want to stream data regularly from my PC to the device (without acknowledge of data), so exactly like a USB audio interface, except the data I want to send isn't audio. Such an interface is described here: http://www.edn.com/design/consumer/4376143/Fundamentals-of-USB-Audio.
I am a bit confused about USB isochronous transfers. I understand how a single transfer works, but how and when is the next subsequent transfer planned? I want a continuous stream of data that is calculated a little ahead of time, but streamed with minimum latency and without interruptions (except some occasional data loss). From my understanding, Windows is not a realtime OS so I think the transfers should not be planned with a timer every x milliseconds, but rather using interrupts/events? Or maybe a buffer needs to be filled continuously with as much data as there is available?
I think my question is still about the concepts of USB and not code-related, but if anyone wants to see my code, I am testing and modifying the "USB Vendor Class" example in the ASF framework of Atmel Studio, which contains the firmware source for the AVR and the source for the Windows EXE as well. The Windows example program uses libusb with a supplied driver.
Stephen -
You say "exactly like USB Audio"; but beware! The USB Audio class is very, very complicated because it implements a closed-loop servo system to establish long-term synchronisation between the PC and the audio device. You probably don't need all of that in your application.
To explain a bit more about long-term synchronisation: The audio codec at one end (e.g. the USB headphones) may run at a nominal 48KHz sampling rate, and the audio file at the other end (e.g. the PC) may be designed to offer 48 thousand samples per second, but the PC and the headphones are never going to run at exactly the same speed. Sooner or later there is going to be a buffer overrun or under-run. So the USB audio class implements a control pipe as well as the audio pipe(s). The control pipe is used to negotiate a slight speed-up or slow-down at one end, usually the Device end (e.g. headphones), to avoid data loss. That's why the USB descriptors for audio device class products are so incredibly complex.
If your application can tolerate a slight error in the speed at which data is delivered to the AVR from the PC, you can dispense with the closed-loop servo. That makes things much, much simpler.
You are absolutely right in assuming the need for long-term buffering when streaming data using isochronous pipes. A single isochronous transfer is pointless - you may as well use a bulk pipe for that. The whole reason for isochronous pipes is to handle data streaming. So a lot of look-ahead buffering has to be set up, just as you say.
I use LibUsbK for my iso transfers in product-specific applications which do not fit any preconceived USB classes. There is reasonably good documentation at libusbk for iso transfers. In short - you decide how many bytes per packet and how many packets per transfer. You decide how many buffers to pre-fill (I use five), and offer the libusbk driver the whole lot to start things going. Then you get callbacks as each of those buffers gets emptied by the driver, so you can fill them with new data. It works well for me, even though I have awkward sampling rates to deal with. In my case I set up a bunch of twenty-one packets where twenty of them carry 40 bytes and the twenty-first carries 44 bytes!
Hope that helps
- Tony

Method for audio playback with known output latency on Windows

I have a C++ application that receives a timestamped audio stream and attempts to play the audio samples as close as possible to the specified timestamp. To do so I need to know the delay (with reasonable accuracy) from when I place the audio samples in the output buffer until the audio is actually heard.
There are many discussions about audio output latency but everything I have found is about minimizing latency. This is irrelevant to me, all I need is an (at run-time) known latency.
On Linux I solve this with snd_pcm_delay() with very good results, but I'm looking for a decent solution for Windows.
I have looked at the following:
With OpenAL I have measured delays at 80ms that are unaccounted for. I assume this isn't a hardcoded value and I haven't found any API to read the latency. There are some extensions to OpenAL that claims to support this but from what I can tell it's only implemented on Linux.
Wasapi has GetStreamLatency() which sounds like the real deal but this is apparently only some thread polling interval or something so it's also useless. I still have 30ms unaccounted delay on my machine.
DirectSound has no API for getting latency? But can I get close enough by just keeping track of my output buffers?
Edit in response to Brad's comment:
My impression of ASIO is that it is primarily targeted for professional audio applications and audio connoiseurs, and that the user might have to install special sound card drivers and I will have to deal with licensing. Feature-wise it seems like a good option though.

RTP: recommend strategy in order to achieve fluent audio stream

Let me explain what I mean when I say fluent audio stream.
I have a VOIP application which transfers PCMU encoded audio wrapped in RTP packages through UDP. I already implemented mechanisms which deal with package losses(as suggested in rfc3550).
The problem is that due to platform limitations(blackberry OS) I need to maintain a constant flow of data i.e. I need to pass X bytes every S milliseconds.
Because of network delays, undelivered datagrams etc. I can't guarantee that constant data flow so I created a separate thread which compensates the packages which were dropped or delivered late with fake packages("silence").
So my question is - can anyone suggest a good way to combine the fake packages and the real ones? I realize that adding a fake package automatically increases the lag and maybe I should ignore a real RTP packages after that but as I said this is because of platform limitations and I am willing to make compromises with the quality of the audio and have some additional speech loss.
You need to read up on:
Jitter Buffers
Packet Loss Concealment
These exist to handle exactly the sort of problems you're dealing with.

Accessing audio out in Windows

I am looking to write an Arduino script that uses whatever audio signal is going to the speakers to create a physical visualization.
The Arduino is connected to the windows machine only through USB, so I need to use USB to find out what is being sent to the speakers. How would I access this information?
As of right now, the Arduino can only communicate with the computer via serial over USB. Things have changed with the new Arduino Uno, but the examples have not yet been released to show how to have the new Arduino act as other USB devices.
You would have to write something for the Windows box that monitor's the system audio and sends the info about it over serial to the Arduino, as long as you want it to only connect via USB.
There isn't a very good way to interface an audio signal to an Arduino without some external hardware.
One way to do it though would be to connect the audio line to a biased pin with a capacitor, then you could use the ADC directly. There will be pretty terrible dynamic range, but it only takes 3 passive parts. Running that through an opamp before going to the ADC pin could significantly improve dynamic range and provide a filtering opportunity (see below). Alternatively, you could switch on an on-chip voltage reference to use (typically 1-1.5 V) instead of the main supply.
It doesn't matter that much for a straight visualization, but the sample rate will not be good enough to capture the full spectral content of the audio (in addition to the poor dynamic range resolution). The default Arduino sample rate is 10 kHz(-ish...possibly asynchronous), so you will only get valid data if your signal is below 5 kHz, otherwise aliasing will muck it up. (If you write your own analog driver for the ATmega32P you could get up to 76 kHz sample rate with 8-bit samples)
Then to actually communicate that data to a computer, you can fairly easily throw all those ADC values onto the UART for the computer to pick up and process as it sees fit. An ATmega will not have the power to compute FFTs on the fly (what you'd do almost always do for a viz anyways).
Or to skip all that, connect the audio signal to your computer's sound card (or USB sound card...they're pretty nice) and use some audio driver.
There is a Java library for processing called ESS that lets you access audio out.

Resources