Method for audio playback with known output latency on Windows

Method for audio playback with known output latency on Windows - windows

I have a C++ application that receives a timestamped audio stream and attempts to play the audio samples as close as possible to the specified timestamp. To do so I need to know the delay (with reasonable accuracy) from when I place the audio samples in the output buffer until the audio is actually heard.
There are many discussions about audio output latency but everything I have found is about minimizing latency. This is irrelevant to me, all I need is an (at run-time) known latency.
On Linux I solve this with snd_pcm_delay() with very good results, but I'm looking for a decent solution for Windows.
I have looked at the following:
With OpenAL I have measured delays at 80ms that are unaccounted for. I assume this isn't a hardcoded value and I haven't found any API to read the latency. There are some extensions to OpenAL that claims to support this but from what I can tell it's only implemented on Linux.
Wasapi has GetStreamLatency() which sounds like the real deal but this is apparently only some thread polling interval or something so it's also useless. I still have 30ms unaccounted delay on my machine.
DirectSound has no API for getting latency? But can I get close enough by just keeping track of my output buffers?
Edit in response to Brad's comment:
My impression of ASIO is that it is primarily targeted for professional audio applications and audio connoiseurs, and that the user might have to install special sound card drivers and I will have to deal with licensing. Feature-wise it seems like a good option though.

Related

How to stream the video from one PC to another with an acceptable quality and synchronization?

I have the following task: to organize the broadcast of several gamers on the director's computer, which will switch the image to, to put it simply, the one who currently has more interesting gameplay.
The obvious solution would be to raise an RTMP server and broadcast to it. We tried that. The image quality clearly correlates with the bitrate of the broadcast, but the streams aren't synchronized and there is no way to synchronize them. As far as I know, it's just not built into the RTMP protocol.
We also tried streaming via UDP, SRT and RTSP protocols. We got minimal delay but a very blurry image and artifacts from lost packets. It feels like all these formats are trying to achieve constant FPS and sacrifice the quality.
What we need:
A quality image.
Broken frames can be discarded (it's okay to have not constant FPS).
Latency isn't important.
The streams should be synchronized within a second or two.
There is an assumption that broadcasting on UDP should be a solution, but some kind of intermediate buffer is needed to provide the necessary broadcasting conditions. But I don't know how to do that. I assume that we need an intermediate ffmpeg instance, which will read the incoming stream, buffer it and publish the result to some local port, from which the picture will be already taken by the director's OBS.
Is there any solution to achieve our goals?

NDI is perfect for this, and in fact used a lot to broadcast games. Providing your network is in order, it offers great quality at very low latency and comes with a free utility to capture the screens and output them in NDI. There are several programmes supporting the NDI intake and the broadcasting (I developed one of them). With proper soft- and hardware you can quite easily handle a few dozen games. If you're limited to OBS then you'll have to check it supports NDI, but I'd find that very likely. Not sure which programmes support synchronisation between streams, but there's at least one ;). See also ndi.newtek.com.

Quartz Composer 4k ProRes 422HQ glitches

For a project due to perform soon enough I happen to have a problem. Task is to play 4k PRORES 422 files according to a sequence that is written on a file (XML) while listening on a OSC port for CUE signals and some feedback to the operator. Player also can smooth the speed +- 15% and has a general fader, and sends back to OSC controller few data to update the performer.
The playback is now unpredictably un-smooth And I don't know why.
I also tend to think is not a problem of hardware: Machine used is a mac pro 2014 with 10.9 (https://www.apple.com/mac-pro/specs/) with 64mb Ram, all data on SDD and a hell of a graphic card. The un-smoothness in playback is rather unpredictable, random frame drops in different places. I tried to use external time on the player and is a bit better but still not satisfactory. I am going to package it without the editor in a app, but on preliminary test is not that faster.
I wander also what is the best way to examine the code for leaks...
Playback of files in quicktime player uses 20% of cpu and in quartz composer +90%
I am stuck on this issue, having done all the obvious things I think, and would like at least to understand how to profile the performance of the patch to find what is wrong and were.
Suggestions are welcome and thanks for help!

If its not interactive you could try rendering it in quicktime player.

(libusb) Confusion about continous isochronous USB streams

I am using a 32-bit AVR microcontroller (AT32UC3A3256) with High speed USB support. I want to stream data regularly from my PC to the device (without acknowledge of data), so exactly like a USB audio interface, except the data I want to send isn't audio. Such an interface is described here: http://www.edn.com/design/consumer/4376143/Fundamentals-of-USB-Audio.
I am a bit confused about USB isochronous transfers. I understand how a single transfer works, but how and when is the next subsequent transfer planned? I want a continuous stream of data that is calculated a little ahead of time, but streamed with minimum latency and without interruptions (except some occasional data loss). From my understanding, Windows is not a realtime OS so I think the transfers should not be planned with a timer every x milliseconds, but rather using interrupts/events? Or maybe a buffer needs to be filled continuously with as much data as there is available?
I think my question is still about the concepts of USB and not code-related, but if anyone wants to see my code, I am testing and modifying the "USB Vendor Class" example in the ASF framework of Atmel Studio, which contains the firmware source for the AVR and the source for the Windows EXE as well. The Windows example program uses libusb with a supplied driver.

Stephen -
You say "exactly like USB Audio"; but beware! The USB Audio class is very, very complicated because it implements a closed-loop servo system to establish long-term synchronisation between the PC and the audio device. You probably don't need all of that in your application.
To explain a bit more about long-term synchronisation: The audio codec at one end (e.g. the USB headphones) may run at a nominal 48KHz sampling rate, and the audio file at the other end (e.g. the PC) may be designed to offer 48 thousand samples per second, but the PC and the headphones are never going to run at exactly the same speed. Sooner or later there is going to be a buffer overrun or under-run. So the USB audio class implements a control pipe as well as the audio pipe(s). The control pipe is used to negotiate a slight speed-up or slow-down at one end, usually the Device end (e.g. headphones), to avoid data loss. That's why the USB descriptors for audio device class products are so incredibly complex.
If your application can tolerate a slight error in the speed at which data is delivered to the AVR from the PC, you can dispense with the closed-loop servo. That makes things much, much simpler.
You are absolutely right in assuming the need for long-term buffering when streaming data using isochronous pipes. A single isochronous transfer is pointless - you may as well use a bulk pipe for that. The whole reason for isochronous pipes is to handle data streaming. So a lot of look-ahead buffering has to be set up, just as you say.
I use LibUsbK for my iso transfers in product-specific applications which do not fit any preconceived USB classes. There is reasonably good documentation at libusbk for iso transfers. In short - you decide how many bytes per packet and how many packets per transfer. You decide how many buffers to pre-fill (I use five), and offer the libusbk driver the whole lot to start things going. Then you get callbacks as each of those buffers gets emptied by the driver, so you can fill them with new data. It works well for me, even though I have awkward sampling rates to deal with. In my case I set up a bunch of twenty-one packets where twenty of them carry 40 bytes and the twenty-first carries 44 bytes!
Hope that helps
- Tony

Look for fastest video encoder with least lag to stream webcam streaming to ipad

I'm looking for the fastest way to encode a webcam stream that will be viewable in a html5 video tag. I'm using a Pandaboard: http://www.digikey.com/product-highlights/us/en/texas-instruments-pandaboard/686#tabs-2 for the hardware. Can use gstreamer, cvlc, ffmpeg. I'll be using it to drive a robot, so need the least amount of lag in the video stream. Quality doesn't have to be great and it doesn't need audio. Also, this is only for one client so bandwidth isn't an issue. The best solution so far is using ffmpeg with a mpjpeg gives me around 1 sec delay. Anything better?

I have been asked this many times so I will try and answer this a bit generically and not just for mjpeg. Getting very low delays in a system requires a bit of system engineering effort and also understanding of the components.
Some simple top level tweaks I can think of are:
Ensure the codec is configured for the lowest delay. Codecs will have (especially embedded system codecs) a low delay configuration. Enable it. If you are using H.264 it's most useful. Most people don't realize that by standard requirements H.264 decoders need to buffer frames before displaying it. This can be upto 16 for Qcif and upto 5 frames for 720p. That is a lot of delay in getting the first frame out. If you do not use H.264 still ensure you do not have B pictures enabled. This adds delay to getting the first picture out.
Since you are using mjpeg, I don't think this is applicable to you much.
Encoders will also have a rate control delay. (Called init delay or vbv buf size). Set it to the smallest value that gives you acceptable quality. That will also reduce the delay. Think of this as the bitstream buffer between encoder and decoder. If you are using x264 that would be the vbv buffer size.
Some simple other configurations: Use as few I pictures as possible (large intra period).
I pictures are huge and add to the delay to send over the network. This may not be very visible in systems where end to end delay is in the range of 1 second or more but when you are designing systems that need end to end delay of 100ms or less, this and several other aspects come into play. Also ensure you are using a low latency audio codec aac-lc (and not heaac).
In your case to get to lower latencies I would suggest moving away from mjpeg and use at least mpeg4 without B pictures (Simple profile) or best is H.264 baseline profile (x264 gives a zerolatency option). The simple reason you will get lower latency is that you will get lower bitrate post encoding to send the data out and you can go to full framerate. If you must stick to mjpeg you have close to what you can get without more advanced features support from the codec and system using the open source components as is.
Another aspect is the transmission of the content to the display unit. If you can use udp it will reduce latency quite a lot compared to tcp, though it can be lossy at times depending on network conditions. You have mentioned html5 video. I am curious as to how you are doing live streaming to a html5 video tag.
There are other aspects that can also be tweaked which I would put in the advanced category and requires the system engineer to try various things out
What is the network buffering in the OS? The OS also buffers data before sending it out for performance reasons. Tweak this to get a good balance between performance and speed.
Are you using CR or VBR encoding? While CBR is great for low jitter you can also use capped vbr if the codec provides it.
Can your decoder start decoding partial frames? So you don't have to worry about framing the data before providing it to the decoder. Just keep pushing the data to the decoder as soon as possible.
Can you do field encoding? Halves the time from frame encoding before getting the first picture out.
Can you do sliced encoding with callbacks whenever a slice is available to send over the network immediately?
In sub 100 ms latency systems that I have worked in all of the above are used. Some of the features may not be available in open source components but if you really need it and are enthusiastic you could go ahead and implement them.
EDIT:
I realize you cannot do a lot of the above for a ipad streaming solution and there are limitations because of hls also to the latency you can achieve. But I hope it will prove useful in other cases when you need any low latency system.

We had a similar problem, in our case it was necessary to time external events and sync them with the video stream. We tried several solutions but the one described here solved the problem and is extremely low latency:
Github Link
It uses gstreamer transcode to mjpeg which is then sent to a small python streaming server. This has the advantage that it uses the tag instead of so it can be viewed by most modern browsers, including the iPhone.
As you want the <video> tag, a simple solution is to use http-launch. That
had the lowest latency of all the solutions we tried so it might work for you. Be warned that ogg/theora will not work on Safari or IE so those wishing to target the Mac or Windows will have to modify the pipe to use MP4 or WebM.
Another solution that looks promising, gst-streaming-server. We simply couldn't find enough documentation to make it worth pursuing. I'd grateful if somebody could ask a stackoverflow question about how it should be used!

What is the latency (or delay) time for callbacks from the waveOutWrite API method?

I'm having a debate with some developers on another forum about accurately generating MIDI events (Note On messages and so forth). The human ear is pretty sensitive to slight timing inaccuracies, and I think their main problem comes from their use of relatively low-resolution timers which quantize their events around 15 millisecond intervals (which is large enough to cause perceptible inaccuracies).
About 10 years ago, I wrote a sample application (Visual Basic 5 on Windows 95) that was a combined software synthesizer and MIDI player. The basic premise was a leapfrog-buffer playback system with each buffer being the duration of a sixteenth note (example: with 120 quarter-notes per minute, each quarter-note was 500 ms and thus each sixteenth-note was 125 ms, so each buffer is 5513 samples). Each buffer was played via the waveOutWrite method, and the callback function from this method was used to queue up the next buffer and also to send MIDI messages. This kept the WAV-based audio and the MIDI audio synchronized.
To my ear, this method worked perfectly - the MIDI notes did not sound even slightly out of step (whereas if you use an ordinary timer accurate to 15 ms to play MIDI notes, they will sound noticeably out of step).
In theory, this method would produce MIDI timing accurate to the sample, or 0.0227 milliseconds (since there are 44.1 samples per millisecond). I doubt that this is the true latency of this approach, since there is presumably some slight delay between when a buffer finishes and when the waveOutWrite callback is notified. Does anyone know how big this delay would actually be?

The Windows scheduler runs at either 10ms or 16ms intervals by default depending on the processor. If you use the timeBeginPeriod() API you can change this interval (at a fairly significant power consumption cost).
In Windows XP and Windows 7, the wave APIs run with a latency of about 30ms, for Windows Vista the wave APIs have a latency of about 50ms. You then need to add in the audio engine latency.
Unfortunately I don't have numbers for the engine latency in one direction, but we do have some numbers regarding engine latency - we ran a test that played a tone looped back through a USB audio device and measured the round-trip latency (render to capture). On Vista the round trip latency was about 80ms with a variation of about 10ms. On Win7 the round trip latency was about 40ms with a variation of about 5ms. YMMV however since the amount of latency introduced by the audio hardware is different for each piece of hardware.
I have absolutely no idea what the latency was for the XP audio engine or the Win9x audio stack.

At the very basic level, Windows is a multi threaded OS. And it schedules threads with 100ms time slices.
Which means that, if there is no CPU contention, the delay between the end of the buffer and the waveOutWrite callback could be arbitrailly short. Or, if there are other busy threads, you have to wait up to 100ms per thread.
In the best case however... CPU speeds clock in at the GHz now. Which puts an absolute lower bound on how fast the callback can be called in the 0.000,000,000,1 second order of magnitude.
Unless you can figure out the maximum number of waveOutWrite callbacks you can process in a single second, which could imply the latency of each call, I think that really, the latency is going to be orders of magnitude below preception most of the time, unless there are too many busy threads, in which case its going to go horribly, horribly wrong.

To add to great answers above.
Your question is about the latency Windows neither promised not cared of. And as such, it might be quite different depending on OS version, hardware and other factors. WaveOut API, and DirectSound too (not sure about WASAPI, but I guess it is also true for this latest Vista+ audio API) are all set for buffered audio output. Specific callback accuracy is not required as long as your are on time queuing next buffer while current is still being played.
When you start audio playback, you have a few assumptions such as no underflows during playback and all output is continuous, and audio clock rate is exactly as you expect is, such as 44,100 Hz precisely. Then you do simple math to schedule your wave output in time, converting time to samples and then to bytes.
Sadly, effective playback rate is not precise, e.g. imagine real hardware sampling rate may be 44,100 Hz -3%, and in long run the time-to-byte math might be letting you down. There has been attempt to compensate for this effect, such as making audio hardware the playback clock and synchronizing video to it (this is how players work), and rate matching technique to match incoming data rate to actual playback rate on hardware. Both these make absolute time measurements and latency in question quite a speculative knowledge.
More to this, the API latencies 20 ms, 30 ms, 50 ms and so on. Since long ago waveOut API is a layer on top of other APIs. This means that some processing takes place before data actually reach the hardware and this processing requires that you put your hands off the queued data well in advance, or the data won't reach the hardware. Let's say if you attempt to queue your data in 10 ms buffers right before playback time, the API will accept this data but it will be late itself passing this data downstream, and there will be silence or comfort noise on the speakers.
Now this is also related to callbacks that you receive. You could say that you don't care about latency of buffers and what is important to you is precise callback time. However since the API is layered, you receive callback at the accuracy of inner layer synchronization, such second inner layer notifies on free buffer, and first inner layer updates its records and checks if it can release your buffer too (hey, those buffers don't have to match too). This makes callback accuracy expectations really weak and unreliable.
Provided that I have not been touching waveOut API for quite some time, if such question of synchronization accuracy would come up, I would probably first of all thought of two things:
Windows provides access to audio hardware clock (I am aware of IReferenceClock interface available through DirectShow, and it probably comes from another lower level thing which is also accessible) and having that available I would try to synchronize with it
Latest audio API from Microsoft, WASAPI, provides special support for low latency audio with new cool stuff there like better media thread scheduling, exclusive mode streams and <10 ms latency for PCM - this is where better sync is to be looked at

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio