Smooth play of live network audio samples - winapi

I am writing a client/server app in that server send live audio data that capture audio samples that captured from some external device( mic. for example ) and send it to the client. Then client want to play those samples. My app will run on local network so I have no problem with bandwidth( My sound is 8k, 8bit stereo while my net card 1000Mb ). In client I buffer the data for a small time and then start playback. and as data arrive from server I send them to sound card. This seems to work fine but there is a problem:
when my buffer in the client side finished, I will experience gaps in played sound.
I consider this is because of the difference in sampling time of the server and the client, it means that 8K on server is not same as 8K on client.
I can solve this with pausing client's playback and buffer again, but my boss doesn't accept it, since I have proper bandwidth and I should be able to play sound with no gap or pause.
So I decided to dynamically change playback speed in the client but I don't know how.
I am programming in Windows( native ) and I currently use waveOutXXX to play the sound. I can use any other native library( DirectX/DirectSound, Jack or ... ) but they should provide a smooth playback in the client.
I have programmed with waveOutXXX many times without any problem and I know it good but I can't solve my problem of dynamic resampling

I would suggest that your problem isn't likely due to mis-matched sample rates, but something to do with your buffering. You should be continuously dumping data to the sound card, and continuously filling your buffer. Use a reasonable buffer size... 300ms should be enough for most applications.
Now, over long periods of time, it is possible for the clock on the recording side and the clock on the playback side to drift apart enough that the 300ms buffer is no longer sufficient. I would suggest that rather than resampling at such a small difference, which could introduce artifacts, simply add samples at the encoding end. You still record at 8kHz, but you might add a sample or two every second, to make that 8.001kHz or so. Simply doubling one of the existing samples for this (or even a simple average between one sample and the next) will not be audible. Adjust this as necessary for your application.

I had a similar problem in an application I worked on. It did not involve network, but it did involve source data being captured in real-time at a certain fixed sampling rate, a large amount of signal processing, and finally output to the sound card at a fixed rate. Like you, I had gaps in the playback at buffer boundaries.
It seemed to me like the problem was that the processing being done caused audio data to make it to the sound card in a very jerky manner. That is, it would get a large chunk, then it would be a long time before it got another chunk. The overall throughput was correct, but this latency caused the sound card to often be starved for data. I suppose you may have the same situation with the network piece in your system.
The way I solved it was to first make the audio buffer longer. Then, every time a new chunk of audio was received, I checked how full the buffer was. If it was less than 20% full, I would write some silence to make it around 60% full.
You may think that this goes against reducing the gaps in playback since it is actually adding a gap, but it actually helps. The problem that I was having was that even though I had a significantly large audio buffer, I was always right at the verge of it being empty. With the other latencies in the system, this resulted in playback gaps on almost every buffer.
Writing the silence when the buffer started to get empty, but before it actually did, ensured that the buffer always had some data to spare if the processing fell behind a little. Also, just a single small gap in playback is very hard to notice compared to many periodic gaps.
I don't know if this will work for you, but it should be easy to implement and try out.

Related

How to stream the video from one PC to another with an acceptable quality and synchronization?

I have the following task: to organize the broadcast of several gamers on the director's computer, which will switch the image to, to put it simply, the one who currently has more interesting gameplay.
The obvious solution would be to raise an RTMP server and broadcast to it. We tried that. The image quality clearly correlates with the bitrate of the broadcast, but the streams aren't synchronized and there is no way to synchronize them. As far as I know, it's just not built into the RTMP protocol.
We also tried streaming via UDP, SRT and RTSP protocols. We got minimal delay but a very blurry image and artifacts from lost packets. It feels like all these formats are trying to achieve constant FPS and sacrifice the quality.
What we need:
A quality image.
Broken frames can be discarded (it's okay to have not constant FPS).
Latency isn't important.
The streams should be synchronized within a second or two.
There is an assumption that broadcasting on UDP should be a solution, but some kind of intermediate buffer is needed to provide the necessary broadcasting conditions. But I don't know how to do that. I assume that we need an intermediate ffmpeg instance, which will read the incoming stream, buffer it and publish the result to some local port, from which the picture will be already taken by the director's OBS.
Is there any solution to achieve our goals?
NDI is perfect for this, and in fact used a lot to broadcast games. Providing your network is in order, it offers great quality at very low latency and comes with a free utility to capture the screens and output them in NDI. There are several programmes supporting the NDI intake and the broadcasting (I developed one of them). With proper soft- and hardware you can quite easily handle a few dozen games. If you're limited to OBS then you'll have to check it supports NDI, but I'd find that very likely. Not sure which programmes support synchronisation between streams, but there's at least one ;). See also ndi.newtek.com.

How to synchronize HLS and/or MPEG-DASH videos on multiple clients using ExoPlayer?

I'm trying to guarantee synchronization between multiple clients using DASH and/or HLS. Synchronization between each client must fall within 40 milliseconds.
Live streaming seems to be an obvious choice. However, the only way to really get within a small time frame of synchronization would be to lower the segment times. Is this the only viable solution? Are there any tags that would help me keep clients within 40 milliseconds to the live time?
Currently, I'm using FFMPEG to encode video and audio to live content.
There are a couple of separate issues here:
'Live time' - assuming the is the real time the event actually happens that is being broadcast, for example the actual time that a football is kicked in a game, then achieving a full end to end delivery to a end screen within 40 milliseconds is pushing the boundaries of any possible delivery technology. Certainly HLS and DASH streams won't give you that.
Your target may be to have each end user be no more than 40ms different than each other end user - e.g. every user receives the broadcast with a 10 second delay, but that delay is the same plus or minus 40ms for each user. This is still quite a tricky problem as, unless you have some common clock that all the devices are synched to, you will be relying on some mechanism to signal the position in the stream between each device and some central or distributed control mechanism and, again, 40ms is not a lot of time to allow even for small messages to travel back and forth along with any processing required to calculate any time difference and adjust.
Synchronising internet delivered media streams is not an easy problem but there is at least some work you can look at to help you get some ideas - see here for some examples: https://stackoverflow.com/a/51819066/334402

BackgroundAudioPlayer- Buffering & MediaStreamSource

I have created a MediaStreamSource to decode an live internet audio stream and pass it to the BackgroundAudioPlayer. This now works very well on the device. However I would now like to implement some form of buffering control. Currently all works well over WLAN - however i fear that in live situations over mobile operator networks that there will be a lot of cutting in an out in the stream.
What I would like to find out is if anybody has any advice on how best to implement buffering.
Does the background audio player itself build up some sort of buffer before it begings to play and if so can the size of this be increased if necessary?
Is there something I can set whilst sampling to help with buffering or do i simply need to implement a kind of storeage buffer as i retrieve the stream from the network and build up a substantial reserve in this before sampling.
What approach have others taken to this problem?
Thanks,
Brian
One approach to this that I've seen is to have two processes managing the stream. The first gets the stream and writes it a series of sequentially numbered files in Isolated Storage. The second reads the files and plays them.
Obviously that's a very simplified description but hopefully you get the idea.
I don't know how using a MediaStreamSource might affect this, but from experience with a simple Background Audio Player agent streaming direct from remote MP3 files or MP3 live radio streams:
The player does build up a buffer of data received from the server before it will start playing your track.
you can't control the size of this buffer or how long it takes to fill that buffer (I've seen it take over a minute of buffering in some cases).
once playback starts if you lose connection or bandwidth goes so low that your buffer is emptied after the stream has started then the player doesn't try and rebuffer the audio, so you can lose the audio completely or it can cut in or out.
you can't control that either.
Implementing the suggestion in Matt's answer solves this by allowing you to take control of the buffering and separates download and playback neatly.

Why does the game I ported to Mac destroy all sound on Mac until reboot?

The setup
The game in question is using CoreAudio and single AudioGraph to play sounds.
The graph looks like this:
input callbacks -> 3DMixer -> DefaultOutputDevice
3DMixer's BusCount is set to 50 sounds max.
All samples are converted to default output device's stream format before being fed to input callbacks. Unused callbacks aren't set (NULL). Most sounds are 3D, so azimuth, pan, distance and gain are usually set for each mixer input, not left alone. They're checked to make sure only valid values are set. Mixer input's playback rate is also sometimes modified slightly to simulate pitch, but for most sounds it's kept at default setting.
The problem
Let's say I run the game and start a level populated with many sounds, lot of action.
I'm running HALLab -> IO Cycle Telemetry window to see how much time it takes to process each sound cycle - it doesn't take any more than 4ms out of over 10 available ms in each cycle, and I can't spot a single peak that would make it go over alloted time.
At some point when playing the game, when many sounds are playing at the same time (less than 50, but not less than 20), I hear a poof, and from then on only silence. No Mac sounds can be generated from any application on Mac. The IO Telemetry window shows my audio ticks still running, still taking time, still providing samples to output device.
This state persists even if there are less, then no sounds playing in my game.
Even if I quit the game entirely, Mac sounds generated by other applications don't come back.
Putting Mac to sleep and waking it up doesn't help anything either.
Only rebooting it fully results in sounds coming back. After it's back, first few sounds have crackling in them.
What can I do to avoid the problem? The game is big, complicated, and I can't modify what's playing - but it doesn't seem to overload the IO thread, so I'm not sure if I should. The problem can't be caused by any specific sound data, because all sound samples are played many times before the problem occurs. I'd think any sound going to physical speakers would be screened to avoid overloading them physically, and the sound doesn't have to be loud at all to cause the bug.
I'm out of ideas.

What is the latency (or delay) time for callbacks from the waveOutWrite API method?

I'm having a debate with some developers on another forum about accurately generating MIDI events (Note On messages and so forth). The human ear is pretty sensitive to slight timing inaccuracies, and I think their main problem comes from their use of relatively low-resolution timers which quantize their events around 15 millisecond intervals (which is large enough to cause perceptible inaccuracies).
About 10 years ago, I wrote a sample application (Visual Basic 5 on Windows 95) that was a combined software synthesizer and MIDI player. The basic premise was a leapfrog-buffer playback system with each buffer being the duration of a sixteenth note (example: with 120 quarter-notes per minute, each quarter-note was 500 ms and thus each sixteenth-note was 125 ms, so each buffer is 5513 samples). Each buffer was played via the waveOutWrite method, and the callback function from this method was used to queue up the next buffer and also to send MIDI messages. This kept the WAV-based audio and the MIDI audio synchronized.
To my ear, this method worked perfectly - the MIDI notes did not sound even slightly out of step (whereas if you use an ordinary timer accurate to 15 ms to play MIDI notes, they will sound noticeably out of step).
In theory, this method would produce MIDI timing accurate to the sample, or 0.0227 milliseconds (since there are 44.1 samples per millisecond). I doubt that this is the true latency of this approach, since there is presumably some slight delay between when a buffer finishes and when the waveOutWrite callback is notified. Does anyone know how big this delay would actually be?
The Windows scheduler runs at either 10ms or 16ms intervals by default depending on the processor. If you use the timeBeginPeriod() API you can change this interval (at a fairly significant power consumption cost).
In Windows XP and Windows 7, the wave APIs run with a latency of about 30ms, for Windows Vista the wave APIs have a latency of about 50ms. You then need to add in the audio engine latency.
Unfortunately I don't have numbers for the engine latency in one direction, but we do have some numbers regarding engine latency - we ran a test that played a tone looped back through a USB audio device and measured the round-trip latency (render to capture). On Vista the round trip latency was about 80ms with a variation of about 10ms. On Win7 the round trip latency was about 40ms with a variation of about 5ms. YMMV however since the amount of latency introduced by the audio hardware is different for each piece of hardware.
I have absolutely no idea what the latency was for the XP audio engine or the Win9x audio stack.
At the very basic level, Windows is a multi threaded OS. And it schedules threads with 100ms time slices.
Which means that, if there is no CPU contention, the delay between the end of the buffer and the waveOutWrite callback could be arbitrailly short. Or, if there are other busy threads, you have to wait up to 100ms per thread.
In the best case however... CPU speeds clock in at the GHz now. Which puts an absolute lower bound on how fast the callback can be called in the 0.000,000,000,1 second order of magnitude.
Unless you can figure out the maximum number of waveOutWrite callbacks you can process in a single second, which could imply the latency of each call, I think that really, the latency is going to be orders of magnitude below preception most of the time, unless there are too many busy threads, in which case its going to go horribly, horribly wrong.
To add to great answers above.
Your question is about the latency Windows neither promised not cared of. And as such, it might be quite different depending on OS version, hardware and other factors. WaveOut API, and DirectSound too (not sure about WASAPI, but I guess it is also true for this latest Vista+ audio API) are all set for buffered audio output. Specific callback accuracy is not required as long as your are on time queuing next buffer while current is still being played.
When you start audio playback, you have a few assumptions such as no underflows during playback and all output is continuous, and audio clock rate is exactly as you expect is, such as 44,100 Hz precisely. Then you do simple math to schedule your wave output in time, converting time to samples and then to bytes.
Sadly, effective playback rate is not precise, e.g. imagine real hardware sampling rate may be 44,100 Hz -3%, and in long run the time-to-byte math might be letting you down. There has been attempt to compensate for this effect, such as making audio hardware the playback clock and synchronizing video to it (this is how players work), and rate matching technique to match incoming data rate to actual playback rate on hardware. Both these make absolute time measurements and latency in question quite a speculative knowledge.
More to this, the API latencies 20 ms, 30 ms, 50 ms and so on. Since long ago waveOut API is a layer on top of other APIs. This means that some processing takes place before data actually reach the hardware and this processing requires that you put your hands off the queued data well in advance, or the data won't reach the hardware. Let's say if you attempt to queue your data in 10 ms buffers right before playback time, the API will accept this data but it will be late itself passing this data downstream, and there will be silence or comfort noise on the speakers.
Now this is also related to callbacks that you receive. You could say that you don't care about latency of buffers and what is important to you is precise callback time. However since the API is layered, you receive callback at the accuracy of inner layer synchronization, such second inner layer notifies on free buffer, and first inner layer updates its records and checks if it can release your buffer too (hey, those buffers don't have to match too). This makes callback accuracy expectations really weak and unreliable.
Provided that I have not been touching waveOut API for quite some time, if such question of synchronization accuracy would come up, I would probably first of all thought of two things:
Windows provides access to audio hardware clock (I am aware of IReferenceClock interface available through DirectShow, and it probably comes from another lower level thing which is also accessible) and having that available I would try to synchronize with it
Latest audio API from Microsoft, WASAPI, provides special support for low latency audio with new cool stuff there like better media thread scheduling, exclusive mode streams and <10 ms latency for PCM - this is where better sync is to be looked at

Resources