I have created a MediaStreamSource to decode an live internet audio stream and pass it to the BackgroundAudioPlayer. This now works very well on the device. However I would now like to implement some form of buffering control. Currently all works well over WLAN - however i fear that in live situations over mobile operator networks that there will be a lot of cutting in an out in the stream.
What I would like to find out is if anybody has any advice on how best to implement buffering.
Does the background audio player itself build up some sort of buffer before it begings to play and if so can the size of this be increased if necessary?
Is there something I can set whilst sampling to help with buffering or do i simply need to implement a kind of storeage buffer as i retrieve the stream from the network and build up a substantial reserve in this before sampling.
What approach have others taken to this problem?
Thanks,
Brian
One approach to this that I've seen is to have two processes managing the stream. The first gets the stream and writes it a series of sequentially numbered files in Isolated Storage. The second reads the files and plays them.
Obviously that's a very simplified description but hopefully you get the idea.
I don't know how using a MediaStreamSource might affect this, but from experience with a simple Background Audio Player agent streaming direct from remote MP3 files or MP3 live radio streams:
The player does build up a buffer of data received from the server before it will start playing your track.
you can't control the size of this buffer or how long it takes to fill that buffer (I've seen it take over a minute of buffering in some cases).
once playback starts if you lose connection or bandwidth goes so low that your buffer is emptied after the stream has started then the player doesn't try and rebuffer the audio, so you can lose the audio completely or it can cut in or out.
you can't control that either.
Implementing the suggestion in Matt's answer solves this by allowing you to take control of the buffering and separates download and playback neatly.
Related
I have the following task: to organize the broadcast of several gamers on the director's computer, which will switch the image to, to put it simply, the one who currently has more interesting gameplay.
The obvious solution would be to raise an RTMP server and broadcast to it. We tried that. The image quality clearly correlates with the bitrate of the broadcast, but the streams aren't synchronized and there is no way to synchronize them. As far as I know, it's just not built into the RTMP protocol.
We also tried streaming via UDP, SRT and RTSP protocols. We got minimal delay but a very blurry image and artifacts from lost packets. It feels like all these formats are trying to achieve constant FPS and sacrifice the quality.
What we need:
A quality image.
Broken frames can be discarded (it's okay to have not constant FPS).
Latency isn't important.
The streams should be synchronized within a second or two.
There is an assumption that broadcasting on UDP should be a solution, but some kind of intermediate buffer is needed to provide the necessary broadcasting conditions. But I don't know how to do that. I assume that we need an intermediate ffmpeg instance, which will read the incoming stream, buffer it and publish the result to some local port, from which the picture will be already taken by the director's OBS.
Is there any solution to achieve our goals?
NDI is perfect for this, and in fact used a lot to broadcast games. Providing your network is in order, it offers great quality at very low latency and comes with a free utility to capture the screens and output them in NDI. There are several programmes supporting the NDI intake and the broadcasting (I developed one of them). With proper soft- and hardware you can quite easily handle a few dozen games. If you're limited to OBS then you'll have to check it supports NDI, but I'd find that very likely. Not sure which programmes support synchronisation between streams, but there's at least one ;). See also ndi.newtek.com.
I am working on this project where we are doing a live performance with about 6 musicians placed away from each other in a big space. The audience will be wearing their headphones and as they move around we want them to hear different kinds of effects in different areas of the place. For calculating the position of users we are using bluetooth beacons. We're expecting around a 100 users and we can't have a latency of more than 2 seconds.
Is such kind of a setup possible?
The current way we're thinking of implementing this is that we'll divide the place into about 30 different sections.
For the server we'll take the input from all the musicians and mix a different stream for every section and stream it on a local WLAN using the RTP protocol.
We'll have Android and iOS apps that will locate the users using Bluetooth beacons and switch the live streams accordingly.
Presonus Studio One music mixer - Can have multiple channels that can be output to devices. 30 channels.
Virtual Audio Cable - Used to create virtual devices that will get the output from the channels. 30 devices.
FFMpeg streaming - Used to create an RTP stream for each of the devices. 30 streams.
Is this a good idea? Are there other ways of doing this?
Any help will be appreciated.
Audio Capture and Mixing
First, you need to capture those six channels of audio into something you can use. I don't think your idea of virtual audio cables is sustainable. In my experience, once you get more than a few, they don't work so great. You need to be able to go from your mixer directly to what's doing the encoding for the stream, which means you need something like JACK audio.
There are two ways to do this. One is to use a digital mixer to create those 30 mixes for you, and send you the resulting stream. Another is to simply capture the 6 channels of audio and then do the mixing in software. Normally I think the flexibility of mixing externally is what you want, and typically I'd recommend the Behringer X32 series for you. I haven't tried it with JACK audio myself, but I've heard it can work and the price point is good. You can get just a rackmount package of it for cheap which has all the functionality without the surface for control (cheaper, and sufficient for what you need). However, the X32 only has 16 buses so you would need two of them to get the number of mixes you need. (You could get creative with the matrix mixes, but that only gets you 6 more, a total of 22.)
I think what you'll need to do is capture that audio and mix in software. You'll probably want to use Liquidsoap for this. It can programmatically mix audio streams pulled in via JACK, and create internet radio style streams on the output end.
Streaming
You're going to need a server. There are plenty of RTP/RTSP servers available, but I'd recommend Icecast. It's going to be easier to setup and clients are more compatible. (Rather than making an app for example, you could easily play back these streams in HTML5 audio tags on a web page.) Liquidsoap can send streams directly to Icecast.
Latency
Keeping latency under 2 seconds is going to be a problem. You'll want to lower the buffers everywhere you can, particularly on your Icecast server. This is on the fringe of what is reasonably possible, so you'll want to test to ensure the latency meets your requirements.
Network
100 clients on the same spectrum is also problematic. What you need depends on the specifics of your space, but you're right on the line of what you can get away with using regular consumer access points. Given your latency and bandwidth requirements, I'd recommend getting some commercial access points with built-in sector antennas and multiple radios. There are many manufacturers of such gear.
Best of luck with this unique project! Please post some photos of your setup once you've done it.
I am getting a latency that seems dependent on the computer, between AVFoundation and the simple [NSSound play].
My program is playing one video track and 3 audio tracks arranged inside an AVPlayer. This is working nicely. Independently, the program generates a metronome for each beat of the measure, following information from the music score. The two metronome sounds are very short files that I load in an NSSound and use [NSSound play] to play them. I noticed that I had to shift the metronome playback of about 90 milliseconds so that it is perfectly synchronized. Part of it may be the exact moment when the impact of the metronome is located in the metronome file, but if that was the only reason, then this delay would be the same on all Mac computers. However, on different Macs, this delay must be adjusted. As it is a metronome beat synchronized with the music, it is quite critical, as a slight shift makes it sound off beat. Is there any way to calculate this delay directly from AVFoundation API ? Or to compensate it or to play the metronome in another way so that there is no delay between the AVPlayer and the NSSound play ? I would appreciate some link or idea about this.
Thanks !
Dominique
Arpege Music, Belgium
I suggest looking into a low level audio library to manage and instantly play your music. BASS is a low level library built upon Audio Units which allows for refined, precise and fast control over your stream. By manipulating your buffer, and possibly creating a mixer stream (Refer to the docs) you should be able to instantly play the sound on any device. Specifically, look into buffering the sound before hand and keeping it in memory since it's a short sound.
I'm looking for the fastest way to encode a webcam stream that will be viewable in a html5 video tag. I'm using a Pandaboard: http://www.digikey.com/product-highlights/us/en/texas-instruments-pandaboard/686#tabs-2 for the hardware. Can use gstreamer, cvlc, ffmpeg. I'll be using it to drive a robot, so need the least amount of lag in the video stream. Quality doesn't have to be great and it doesn't need audio. Also, this is only for one client so bandwidth isn't an issue. The best solution so far is using ffmpeg with a mpjpeg gives me around 1 sec delay. Anything better?
I have been asked this many times so I will try and answer this a bit generically and not just for mjpeg. Getting very low delays in a system requires a bit of system engineering effort and also understanding of the components.
Some simple top level tweaks I can think of are:
Ensure the codec is configured for the lowest delay. Codecs will have (especially embedded system codecs) a low delay configuration. Enable it. If you are using H.264 it's most useful. Most people don't realize that by standard requirements H.264 decoders need to buffer frames before displaying it. This can be upto 16 for Qcif and upto 5 frames for 720p. That is a lot of delay in getting the first frame out. If you do not use H.264 still ensure you do not have B pictures enabled. This adds delay to getting the first picture out.
Since you are using mjpeg, I don't think this is applicable to you much.
Encoders will also have a rate control delay. (Called init delay or vbv buf size). Set it to the smallest value that gives you acceptable quality. That will also reduce the delay. Think of this as the bitstream buffer between encoder and decoder. If you are using x264 that would be the vbv buffer size.
Some simple other configurations: Use as few I pictures as possible (large intra period).
I pictures are huge and add to the delay to send over the network. This may not be very visible in systems where end to end delay is in the range of 1 second or more but when you are designing systems that need end to end delay of 100ms or less, this and several other aspects come into play. Also ensure you are using a low latency audio codec aac-lc (and not heaac).
In your case to get to lower latencies I would suggest moving away from mjpeg and use at least mpeg4 without B pictures (Simple profile) or best is H.264 baseline profile (x264 gives a zerolatency option). The simple reason you will get lower latency is that you will get lower bitrate post encoding to send the data out and you can go to full framerate. If you must stick to mjpeg you have close to what you can get without more advanced features support from the codec and system using the open source components as is.
Another aspect is the transmission of the content to the display unit. If you can use udp it will reduce latency quite a lot compared to tcp, though it can be lossy at times depending on network conditions. You have mentioned html5 video. I am curious as to how you are doing live streaming to a html5 video tag.
There are other aspects that can also be tweaked which I would put in the advanced category and requires the system engineer to try various things out
What is the network buffering in the OS? The OS also buffers data before sending it out for performance reasons. Tweak this to get a good balance between performance and speed.
Are you using CR or VBR encoding? While CBR is great for low jitter you can also use capped vbr if the codec provides it.
Can your decoder start decoding partial frames? So you don't have to worry about framing the data before providing it to the decoder. Just keep pushing the data to the decoder as soon as possible.
Can you do field encoding? Halves the time from frame encoding before getting the first picture out.
Can you do sliced encoding with callbacks whenever a slice is available to send over the network immediately?
In sub 100 ms latency systems that I have worked in all of the above are used. Some of the features may not be available in open source components but if you really need it and are enthusiastic you could go ahead and implement them.
EDIT:
I realize you cannot do a lot of the above for a ipad streaming solution and there are limitations because of hls also to the latency you can achieve. But I hope it will prove useful in other cases when you need any low latency system.
We had a similar problem, in our case it was necessary to time external events and sync them with the video stream. We tried several solutions but the one described here solved the problem and is extremely low latency:
Github Link
It uses gstreamer transcode to mjpeg which is then sent to a small python streaming server. This has the advantage that it uses the tag instead of so it can be viewed by most modern browsers, including the iPhone.
As you want the <video> tag, a simple solution is to use http-launch. That
had the lowest latency of all the solutions we tried so it might work for you. Be warned that ogg/theora will not work on Safari or IE so those wishing to target the Mac or Windows will have to modify the pipe to use MP4 or WebM.
Another solution that looks promising, gst-streaming-server. We simply couldn't find enough documentation to make it worth pursuing. I'd grateful if somebody could ask a stackoverflow question about how it should be used!
I am writing a client/server app in that server send live audio data that capture audio samples that captured from some external device( mic. for example ) and send it to the client. Then client want to play those samples. My app will run on local network so I have no problem with bandwidth( My sound is 8k, 8bit stereo while my net card 1000Mb ). In client I buffer the data for a small time and then start playback. and as data arrive from server I send them to sound card. This seems to work fine but there is a problem:
when my buffer in the client side finished, I will experience gaps in played sound.
I consider this is because of the difference in sampling time of the server and the client, it means that 8K on server is not same as 8K on client.
I can solve this with pausing client's playback and buffer again, but my boss doesn't accept it, since I have proper bandwidth and I should be able to play sound with no gap or pause.
So I decided to dynamically change playback speed in the client but I don't know how.
I am programming in Windows( native ) and I currently use waveOutXXX to play the sound. I can use any other native library( DirectX/DirectSound, Jack or ... ) but they should provide a smooth playback in the client.
I have programmed with waveOutXXX many times without any problem and I know it good but I can't solve my problem of dynamic resampling
I would suggest that your problem isn't likely due to mis-matched sample rates, but something to do with your buffering. You should be continuously dumping data to the sound card, and continuously filling your buffer. Use a reasonable buffer size... 300ms should be enough for most applications.
Now, over long periods of time, it is possible for the clock on the recording side and the clock on the playback side to drift apart enough that the 300ms buffer is no longer sufficient. I would suggest that rather than resampling at such a small difference, which could introduce artifacts, simply add samples at the encoding end. You still record at 8kHz, but you might add a sample or two every second, to make that 8.001kHz or so. Simply doubling one of the existing samples for this (or even a simple average between one sample and the next) will not be audible. Adjust this as necessary for your application.
I had a similar problem in an application I worked on. It did not involve network, but it did involve source data being captured in real-time at a certain fixed sampling rate, a large amount of signal processing, and finally output to the sound card at a fixed rate. Like you, I had gaps in the playback at buffer boundaries.
It seemed to me like the problem was that the processing being done caused audio data to make it to the sound card in a very jerky manner. That is, it would get a large chunk, then it would be a long time before it got another chunk. The overall throughput was correct, but this latency caused the sound card to often be starved for data. I suppose you may have the same situation with the network piece in your system.
The way I solved it was to first make the audio buffer longer. Then, every time a new chunk of audio was received, I checked how full the buffer was. If it was less than 20% full, I would write some silence to make it around 60% full.
You may think that this goes against reducing the gaps in playback since it is actually adding a gap, but it actually helps. The problem that I was having was that even though I had a significantly large audio buffer, I was always right at the verge of it being empty. With the other latencies in the system, this resulted in playback gaps on almost every buffer.
Writing the silence when the buffer started to get empty, but before it actually did, ensured that the buffer always had some data to spare if the processing fell behind a little. Also, just a single small gap in playback is very hard to notice compared to many periodic gaps.
I don't know if this will work for you, but it should be easy to implement and try out.