Real Time indoor streaming and music mixing - ffmpeg

I am working on this project where we are doing a live performance with about 6 musicians placed away from each other in a big space. The audience will be wearing their headphones and as they move around we want them to hear different kinds of effects in different areas of the place. For calculating the position of users we are using bluetooth beacons. We're expecting around a 100 users and we can't have a latency of more than 2 seconds.
Is such kind of a setup possible?
The current way we're thinking of implementing this is that we'll divide the place into about 30 different sections.
For the server we'll take the input from all the musicians and mix a different stream for every section and stream it on a local WLAN using the RTP protocol.
We'll have Android and iOS apps that will locate the users using Bluetooth beacons and switch the live streams accordingly.
Presonus Studio One music mixer - Can have multiple channels that can be output to devices. 30 channels.
Virtual Audio Cable - Used to create virtual devices that will get the output from the channels. 30 devices.
FFMpeg streaming - Used to create an RTP stream for each of the devices. 30 streams.
Is this a good idea? Are there other ways of doing this?
Any help will be appreciated.

Audio Capture and Mixing
First, you need to capture those six channels of audio into something you can use. I don't think your idea of virtual audio cables is sustainable. In my experience, once you get more than a few, they don't work so great. You need to be able to go from your mixer directly to what's doing the encoding for the stream, which means you need something like JACK audio.
There are two ways to do this. One is to use a digital mixer to create those 30 mixes for you, and send you the resulting stream. Another is to simply capture the 6 channels of audio and then do the mixing in software. Normally I think the flexibility of mixing externally is what you want, and typically I'd recommend the Behringer X32 series for you. I haven't tried it with JACK audio myself, but I've heard it can work and the price point is good. You can get just a rackmount package of it for cheap which has all the functionality without the surface for control (cheaper, and sufficient for what you need). However, the X32 only has 16 buses so you would need two of them to get the number of mixes you need. (You could get creative with the matrix mixes, but that only gets you 6 more, a total of 22.)
I think what you'll need to do is capture that audio and mix in software. You'll probably want to use Liquidsoap for this. It can programmatically mix audio streams pulled in via JACK, and create internet radio style streams on the output end.
Streaming
You're going to need a server. There are plenty of RTP/RTSP servers available, but I'd recommend Icecast. It's going to be easier to setup and clients are more compatible. (Rather than making an app for example, you could easily play back these streams in HTML5 audio tags on a web page.) Liquidsoap can send streams directly to Icecast.
Latency
Keeping latency under 2 seconds is going to be a problem. You'll want to lower the buffers everywhere you can, particularly on your Icecast server. This is on the fringe of what is reasonably possible, so you'll want to test to ensure the latency meets your requirements.
Network
100 clients on the same spectrum is also problematic. What you need depends on the specifics of your space, but you're right on the line of what you can get away with using regular consumer access points. Given your latency and bandwidth requirements, I'd recommend getting some commercial access points with built-in sector antennas and multiple radios. There are many manufacturers of such gear.
Best of luck with this unique project! Please post some photos of your setup once you've done it.

Related

How to stream the video from one PC to another with an acceptable quality and synchronization?

I have the following task: to organize the broadcast of several gamers on the director's computer, which will switch the image to, to put it simply, the one who currently has more interesting gameplay.
The obvious solution would be to raise an RTMP server and broadcast to it. We tried that. The image quality clearly correlates with the bitrate of the broadcast, but the streams aren't synchronized and there is no way to synchronize them. As far as I know, it's just not built into the RTMP protocol.
We also tried streaming via UDP, SRT and RTSP protocols. We got minimal delay but a very blurry image and artifacts from lost packets. It feels like all these formats are trying to achieve constant FPS and sacrifice the quality.
What we need:
A quality image.
Broken frames can be discarded (it's okay to have not constant FPS).
Latency isn't important.
The streams should be synchronized within a second or two.
There is an assumption that broadcasting on UDP should be a solution, but some kind of intermediate buffer is needed to provide the necessary broadcasting conditions. But I don't know how to do that. I assume that we need an intermediate ffmpeg instance, which will read the incoming stream, buffer it and publish the result to some local port, from which the picture will be already taken by the director's OBS.
Is there any solution to achieve our goals?
NDI is perfect for this, and in fact used a lot to broadcast games. Providing your network is in order, it offers great quality at very low latency and comes with a free utility to capture the screens and output them in NDI. There are several programmes supporting the NDI intake and the broadcasting (I developed one of them). With proper soft- and hardware you can quite easily handle a few dozen games. If you're limited to OBS then you'll have to check it supports NDI, but I'd find that very likely. Not sure which programmes support synchronisation between streams, but there's at least one ;). See also ndi.newtek.com.

How to synchronize HLS and/or MPEG-DASH videos on multiple clients using ExoPlayer?

I'm trying to guarantee synchronization between multiple clients using DASH and/or HLS. Synchronization between each client must fall within 40 milliseconds.
Live streaming seems to be an obvious choice. However, the only way to really get within a small time frame of synchronization would be to lower the segment times. Is this the only viable solution? Are there any tags that would help me keep clients within 40 milliseconds to the live time?
Currently, I'm using FFMPEG to encode video and audio to live content.
There are a couple of separate issues here:
'Live time' - assuming the is the real time the event actually happens that is being broadcast, for example the actual time that a football is kicked in a game, then achieving a full end to end delivery to a end screen within 40 milliseconds is pushing the boundaries of any possible delivery technology. Certainly HLS and DASH streams won't give you that.
Your target may be to have each end user be no more than 40ms different than each other end user - e.g. every user receives the broadcast with a 10 second delay, but that delay is the same plus or minus 40ms for each user. This is still quite a tricky problem as, unless you have some common clock that all the devices are synched to, you will be relying on some mechanism to signal the position in the stream between each device and some central or distributed control mechanism and, again, 40ms is not a lot of time to allow even for small messages to travel back and forth along with any processing required to calculate any time difference and adjust.
Synchronising internet delivered media streams is not an easy problem but there is at least some work you can look at to help you get some ideas - see here for some examples: https://stackoverflow.com/a/51819066/334402

(libusb) Confusion about continous isochronous USB streams

I am using a 32-bit AVR microcontroller (AT32UC3A3256) with High speed USB support. I want to stream data regularly from my PC to the device (without acknowledge of data), so exactly like a USB audio interface, except the data I want to send isn't audio. Such an interface is described here: http://www.edn.com/design/consumer/4376143/Fundamentals-of-USB-Audio.
I am a bit confused about USB isochronous transfers. I understand how a single transfer works, but how and when is the next subsequent transfer planned? I want a continuous stream of data that is calculated a little ahead of time, but streamed with minimum latency and without interruptions (except some occasional data loss). From my understanding, Windows is not a realtime OS so I think the transfers should not be planned with a timer every x milliseconds, but rather using interrupts/events? Or maybe a buffer needs to be filled continuously with as much data as there is available?
I think my question is still about the concepts of USB and not code-related, but if anyone wants to see my code, I am testing and modifying the "USB Vendor Class" example in the ASF framework of Atmel Studio, which contains the firmware source for the AVR and the source for the Windows EXE as well. The Windows example program uses libusb with a supplied driver.
Stephen -
You say "exactly like USB Audio"; but beware! The USB Audio class is very, very complicated because it implements a closed-loop servo system to establish long-term synchronisation between the PC and the audio device. You probably don't need all of that in your application.
To explain a bit more about long-term synchronisation: The audio codec at one end (e.g. the USB headphones) may run at a nominal 48KHz sampling rate, and the audio file at the other end (e.g. the PC) may be designed to offer 48 thousand samples per second, but the PC and the headphones are never going to run at exactly the same speed. Sooner or later there is going to be a buffer overrun or under-run. So the USB audio class implements a control pipe as well as the audio pipe(s). The control pipe is used to negotiate a slight speed-up or slow-down at one end, usually the Device end (e.g. headphones), to avoid data loss. That's why the USB descriptors for audio device class products are so incredibly complex.
If your application can tolerate a slight error in the speed at which data is delivered to the AVR from the PC, you can dispense with the closed-loop servo. That makes things much, much simpler.
You are absolutely right in assuming the need for long-term buffering when streaming data using isochronous pipes. A single isochronous transfer is pointless - you may as well use a bulk pipe for that. The whole reason for isochronous pipes is to handle data streaming. So a lot of look-ahead buffering has to be set up, just as you say.
I use LibUsbK for my iso transfers in product-specific applications which do not fit any preconceived USB classes. There is reasonably good documentation at libusbk for iso transfers. In short - you decide how many bytes per packet and how many packets per transfer. You decide how many buffers to pre-fill (I use five), and offer the libusbk driver the whole lot to start things going. Then you get callbacks as each of those buffers gets emptied by the driver, so you can fill them with new data. It works well for me, even though I have awkward sampling rates to deal with. In my case I set up a bunch of twenty-one packets where twenty of them carry 40 bytes and the twenty-first carries 44 bytes!
Hope that helps
- Tony

BackgroundAudioPlayer- Buffering & MediaStreamSource

I have created a MediaStreamSource to decode an live internet audio stream and pass it to the BackgroundAudioPlayer. This now works very well on the device. However I would now like to implement some form of buffering control. Currently all works well over WLAN - however i fear that in live situations over mobile operator networks that there will be a lot of cutting in an out in the stream.
What I would like to find out is if anybody has any advice on how best to implement buffering.
Does the background audio player itself build up some sort of buffer before it begings to play and if so can the size of this be increased if necessary?
Is there something I can set whilst sampling to help with buffering or do i simply need to implement a kind of storeage buffer as i retrieve the stream from the network and build up a substantial reserve in this before sampling.
What approach have others taken to this problem?
Thanks,
Brian
One approach to this that I've seen is to have two processes managing the stream. The first gets the stream and writes it a series of sequentially numbered files in Isolated Storage. The second reads the files and plays them.
Obviously that's a very simplified description but hopefully you get the idea.
I don't know how using a MediaStreamSource might affect this, but from experience with a simple Background Audio Player agent streaming direct from remote MP3 files or MP3 live radio streams:
The player does build up a buffer of data received from the server before it will start playing your track.
you can't control the size of this buffer or how long it takes to fill that buffer (I've seen it take over a minute of buffering in some cases).
once playback starts if you lose connection or bandwidth goes so low that your buffer is emptied after the stream has started then the player doesn't try and rebuffer the audio, so you can lose the audio completely or it can cut in or out.
you can't control that either.
Implementing the suggestion in Matt's answer solves this by allowing you to take control of the buffering and separates download and playback neatly.

RTP: recommend strategy in order to achieve fluent audio stream

Let me explain what I mean when I say fluent audio stream.
I have a VOIP application which transfers PCMU encoded audio wrapped in RTP packages through UDP. I already implemented mechanisms which deal with package losses(as suggested in rfc3550).
The problem is that due to platform limitations(blackberry OS) I need to maintain a constant flow of data i.e. I need to pass X bytes every S milliseconds.
Because of network delays, undelivered datagrams etc. I can't guarantee that constant data flow so I created a separate thread which compensates the packages which were dropped or delivered late with fake packages("silence").
So my question is - can anyone suggest a good way to combine the fake packages and the real ones? I realize that adding a fake package automatically increases the lag and maybe I should ignore a real RTP packages after that but as I said this is because of platform limitations and I am willing to make compromises with the quality of the audio and have some additional speech loss.
You need to read up on:
Jitter Buffers
Packet Loss Concealment
These exist to handle exactly the sort of problems you're dealing with.

Resources