I have built an internet radio based on ESP32 & VS1053B MP3 decoder. The radio works OK with a number
of my favorite stations but for some there is a "gap" (or stutter) during reception. The ESP32 app was
coded to forward 32 bytes (recommended value) to VS1053B. I wonder if there is a way to "buffer", by using either hardware or software, more bytes to the ESP32 and/or VS1053B so it could better handle the "gap". Searches on the subject resulted in the use of circular (ring) buffer but I didn't find a reliable article on how to implement it. Any suggestions will be greatly appreciated. Thank you.
There is a Youtube video that describes exactly what I am trying to accomplish:
#206 ESP32 Circular Buffer for Internet Radio - and ESP32 WiFi Woes
https://www.youtube.com/watch?v=6BK4fzRaFGY
Youtuber Mr. Ralph S. Bacon
The term to google for is "jitter buffer". In short, you create a fixed time delay between receiving a packet with audio data, and sending that data to the decoder chip. When receiving the first audio data packet from your radio station, store it in a FIFO buffer. Keep doing so until you have e.g. 100 ms (exact amount depends on how much jitter you're expecting) worth of audio buffered. Then start feeding the earlier packets to the decoder. If there are problems getting new audio from the source, you have 100 ms of buffer to play back before running out of audio.
Related
I am working on this project where we are doing a live performance with about 6 musicians placed away from each other in a big space. The audience will be wearing their headphones and as they move around we want them to hear different kinds of effects in different areas of the place. For calculating the position of users we are using bluetooth beacons. We're expecting around a 100 users and we can't have a latency of more than 2 seconds.
Is such kind of a setup possible?
The current way we're thinking of implementing this is that we'll divide the place into about 30 different sections.
For the server we'll take the input from all the musicians and mix a different stream for every section and stream it on a local WLAN using the RTP protocol.
We'll have Android and iOS apps that will locate the users using Bluetooth beacons and switch the live streams accordingly.
Presonus Studio One music mixer - Can have multiple channels that can be output to devices. 30 channels.
Virtual Audio Cable - Used to create virtual devices that will get the output from the channels. 30 devices.
FFMpeg streaming - Used to create an RTP stream for each of the devices. 30 streams.
Is this a good idea? Are there other ways of doing this?
Any help will be appreciated.
Audio Capture and Mixing
First, you need to capture those six channels of audio into something you can use. I don't think your idea of virtual audio cables is sustainable. In my experience, once you get more than a few, they don't work so great. You need to be able to go from your mixer directly to what's doing the encoding for the stream, which means you need something like JACK audio.
There are two ways to do this. One is to use a digital mixer to create those 30 mixes for you, and send you the resulting stream. Another is to simply capture the 6 channels of audio and then do the mixing in software. Normally I think the flexibility of mixing externally is what you want, and typically I'd recommend the Behringer X32 series for you. I haven't tried it with JACK audio myself, but I've heard it can work and the price point is good. You can get just a rackmount package of it for cheap which has all the functionality without the surface for control (cheaper, and sufficient for what you need). However, the X32 only has 16 buses so you would need two of them to get the number of mixes you need. (You could get creative with the matrix mixes, but that only gets you 6 more, a total of 22.)
I think what you'll need to do is capture that audio and mix in software. You'll probably want to use Liquidsoap for this. It can programmatically mix audio streams pulled in via JACK, and create internet radio style streams on the output end.
Streaming
You're going to need a server. There are plenty of RTP/RTSP servers available, but I'd recommend Icecast. It's going to be easier to setup and clients are more compatible. (Rather than making an app for example, you could easily play back these streams in HTML5 audio tags on a web page.) Liquidsoap can send streams directly to Icecast.
Latency
Keeping latency under 2 seconds is going to be a problem. You'll want to lower the buffers everywhere you can, particularly on your Icecast server. This is on the fringe of what is reasonably possible, so you'll want to test to ensure the latency meets your requirements.
Network
100 clients on the same spectrum is also problematic. What you need depends on the specifics of your space, but you're right on the line of what you can get away with using regular consumer access points. Given your latency and bandwidth requirements, I'd recommend getting some commercial access points with built-in sector antennas and multiple radios. There are many manufacturers of such gear.
Best of luck with this unique project! Please post some photos of your setup once you've done it.
I am using a 32-bit AVR microcontroller (AT32UC3A3256) with High speed USB support. I want to stream data regularly from my PC to the device (without acknowledge of data), so exactly like a USB audio interface, except the data I want to send isn't audio. Such an interface is described here: http://www.edn.com/design/consumer/4376143/Fundamentals-of-USB-Audio.
I am a bit confused about USB isochronous transfers. I understand how a single transfer works, but how and when is the next subsequent transfer planned? I want a continuous stream of data that is calculated a little ahead of time, but streamed with minimum latency and without interruptions (except some occasional data loss). From my understanding, Windows is not a realtime OS so I think the transfers should not be planned with a timer every x milliseconds, but rather using interrupts/events? Or maybe a buffer needs to be filled continuously with as much data as there is available?
I think my question is still about the concepts of USB and not code-related, but if anyone wants to see my code, I am testing and modifying the "USB Vendor Class" example in the ASF framework of Atmel Studio, which contains the firmware source for the AVR and the source for the Windows EXE as well. The Windows example program uses libusb with a supplied driver.
Stephen -
You say "exactly like USB Audio"; but beware! The USB Audio class is very, very complicated because it implements a closed-loop servo system to establish long-term synchronisation between the PC and the audio device. You probably don't need all of that in your application.
To explain a bit more about long-term synchronisation: The audio codec at one end (e.g. the USB headphones) may run at a nominal 48KHz sampling rate, and the audio file at the other end (e.g. the PC) may be designed to offer 48 thousand samples per second, but the PC and the headphones are never going to run at exactly the same speed. Sooner or later there is going to be a buffer overrun or under-run. So the USB audio class implements a control pipe as well as the audio pipe(s). The control pipe is used to negotiate a slight speed-up or slow-down at one end, usually the Device end (e.g. headphones), to avoid data loss. That's why the USB descriptors for audio device class products are so incredibly complex.
If your application can tolerate a slight error in the speed at which data is delivered to the AVR from the PC, you can dispense with the closed-loop servo. That makes things much, much simpler.
You are absolutely right in assuming the need for long-term buffering when streaming data using isochronous pipes. A single isochronous transfer is pointless - you may as well use a bulk pipe for that. The whole reason for isochronous pipes is to handle data streaming. So a lot of look-ahead buffering has to be set up, just as you say.
I use LibUsbK for my iso transfers in product-specific applications which do not fit any preconceived USB classes. There is reasonably good documentation at libusbk for iso transfers. In short - you decide how many bytes per packet and how many packets per transfer. You decide how many buffers to pre-fill (I use five), and offer the libusbk driver the whole lot to start things going. Then you get callbacks as each of those buffers gets emptied by the driver, so you can fill them with new data. It works well for me, even though I have awkward sampling rates to deal with. In my case I set up a bunch of twenty-one packets where twenty of them carry 40 bytes and the twenty-first carries 44 bytes!
Hope that helps
- Tony
I have created a MediaStreamSource to decode an live internet audio stream and pass it to the BackgroundAudioPlayer. This now works very well on the device. However I would now like to implement some form of buffering control. Currently all works well over WLAN - however i fear that in live situations over mobile operator networks that there will be a lot of cutting in an out in the stream.
What I would like to find out is if anybody has any advice on how best to implement buffering.
Does the background audio player itself build up some sort of buffer before it begings to play and if so can the size of this be increased if necessary?
Is there something I can set whilst sampling to help with buffering or do i simply need to implement a kind of storeage buffer as i retrieve the stream from the network and build up a substantial reserve in this before sampling.
What approach have others taken to this problem?
Thanks,
Brian
One approach to this that I've seen is to have two processes managing the stream. The first gets the stream and writes it a series of sequentially numbered files in Isolated Storage. The second reads the files and plays them.
Obviously that's a very simplified description but hopefully you get the idea.
I don't know how using a MediaStreamSource might affect this, but from experience with a simple Background Audio Player agent streaming direct from remote MP3 files or MP3 live radio streams:
The player does build up a buffer of data received from the server before it will start playing your track.
you can't control the size of this buffer or how long it takes to fill that buffer (I've seen it take over a minute of buffering in some cases).
once playback starts if you lose connection or bandwidth goes so low that your buffer is emptied after the stream has started then the player doesn't try and rebuffer the audio, so you can lose the audio completely or it can cut in or out.
you can't control that either.
Implementing the suggestion in Matt's answer solves this by allowing you to take control of the buffering and separates download and playback neatly.
I am making a VOIP application for mobile platform. My question is what algorithms should be used to calculate whether the RTP package is "expired".
I transfer PCMU encoded audio wrapped into RTP packages via UDP.
As you know some of the datagrams are not delivered while others are delivered late and its pointless to play the audio in those packages.
Using the sequence number in the RTP header you can calculate the lost packages, but I want to know how to calculate when a packet is late.
I saw that there is something called jitter which basically measures the difference between the times for sending two consecutive packages and receiving them.
Can I use that? Or something else?
What your application considers 'expired' or 'too late' is really up to your application, but you should make sure you play out the audio evenly. So the measure for too late is the size of your playout buffer and the size of the buffer depends on the type of application. Bidirectional communication will need a smaller buffer than simple movie playback.
I am looking to write an Arduino script that uses whatever audio signal is going to the speakers to create a physical visualization.
The Arduino is connected to the windows machine only through USB, so I need to use USB to find out what is being sent to the speakers. How would I access this information?
As of right now, the Arduino can only communicate with the computer via serial over USB. Things have changed with the new Arduino Uno, but the examples have not yet been released to show how to have the new Arduino act as other USB devices.
You would have to write something for the Windows box that monitor's the system audio and sends the info about it over serial to the Arduino, as long as you want it to only connect via USB.
There isn't a very good way to interface an audio signal to an Arduino without some external hardware.
One way to do it though would be to connect the audio line to a biased pin with a capacitor, then you could use the ADC directly. There will be pretty terrible dynamic range, but it only takes 3 passive parts. Running that through an opamp before going to the ADC pin could significantly improve dynamic range and provide a filtering opportunity (see below). Alternatively, you could switch on an on-chip voltage reference to use (typically 1-1.5 V) instead of the main supply.
It doesn't matter that much for a straight visualization, but the sample rate will not be good enough to capture the full spectral content of the audio (in addition to the poor dynamic range resolution). The default Arduino sample rate is 10 kHz(-ish...possibly asynchronous), so you will only get valid data if your signal is below 5 kHz, otherwise aliasing will muck it up. (If you write your own analog driver for the ATmega32P you could get up to 76 kHz sample rate with 8-bit samples)
Then to actually communicate that data to a computer, you can fairly easily throw all those ADC values onto the UART for the computer to pick up and process as it sees fit. An ATmega will not have the power to compute FFTs on the fly (what you'd do almost always do for a viz anyways).
Or to skip all that, connect the audio signal to your computer's sound card (or USB sound card...they're pretty nice) and use some audio driver.
There is a Java library for processing called ESS that lets you access audio out.