I am making a VOIP application for mobile platform. My question is what algorithms should be used to calculate whether the RTP package is "expired".
I transfer PCMU encoded audio wrapped into RTP packages via UDP.
As you know some of the datagrams are not delivered while others are delivered late and its pointless to play the audio in those packages.
Using the sequence number in the RTP header you can calculate the lost packages, but I want to know how to calculate when a packet is late.
I saw that there is something called jitter which basically measures the difference between the times for sending two consecutive packages and receiving them.
Can I use that? Or something else?
What your application considers 'expired' or 'too late' is really up to your application, but you should make sure you play out the audio evenly. So the measure for too late is the size of your playout buffer and the size of the buffer depends on the type of application. Bidirectional communication will need a smaller buffer than simple movie playback.
Related
I have the following task: to organize the broadcast of several gamers on the director's computer, which will switch the image to, to put it simply, the one who currently has more interesting gameplay.
The obvious solution would be to raise an RTMP server and broadcast to it. We tried that. The image quality clearly correlates with the bitrate of the broadcast, but the streams aren't synchronized and there is no way to synchronize them. As far as I know, it's just not built into the RTMP protocol.
We also tried streaming via UDP, SRT and RTSP protocols. We got minimal delay but a very blurry image and artifacts from lost packets. It feels like all these formats are trying to achieve constant FPS and sacrifice the quality.
What we need:
A quality image.
Broken frames can be discarded (it's okay to have not constant FPS).
Latency isn't important.
The streams should be synchronized within a second or two.
There is an assumption that broadcasting on UDP should be a solution, but some kind of intermediate buffer is needed to provide the necessary broadcasting conditions. But I don't know how to do that. I assume that we need an intermediate ffmpeg instance, which will read the incoming stream, buffer it and publish the result to some local port, from which the picture will be already taken by the director's OBS.
Is there any solution to achieve our goals?
NDI is perfect for this, and in fact used a lot to broadcast games. Providing your network is in order, it offers great quality at very low latency and comes with a free utility to capture the screens and output them in NDI. There are several programmes supporting the NDI intake and the broadcasting (I developed one of them). With proper soft- and hardware you can quite easily handle a few dozen games. If you're limited to OBS then you'll have to check it supports NDI, but I'd find that very likely. Not sure which programmes support synchronisation between streams, but there's at least one ;). See also ndi.newtek.com.
I am currently researching the feasibility of making a device that outputs PCM audio through the USB audio class streaming interface. This device has its own clock, and importantly does not generate samples at a multiple of 1 hz as the USB spec can specify, and produces packets in asynchronous mode. How does Windows handle it when a USB audio stream is consistently giving it samples at a rate above or below what the USB descriptor indicates, and at what level of the OS is this handled?
Second (and depending on the answer to the first question this may be already answered), the entire purpose of this project would be to capture this digital audio in its native format and sampling rate. What application Windows APIs would provide the exact PCM input from the USB audio stream, with no interpolation or other alterations or artifacts?
IDK about Windows in specific here, but Java on Windows would likely be set up to read the data as an AudioInputStream and would output as a SourceDataLine.
As far as timing issues, PCM processed by the SourceDataLine will be configured to a given sample rate and byte structure (configuration details provided in an AudioFormat class). The code underlying a SourceDataLine employs a buffer and a I think some sort of BlockingQueue or the native code implementation of something similar. I'm not entirely clear on this latter detail.
But the gist is that the SourceDataLine will suspend operation until it is able to fulfill its task. Thus, if the native code's DAC function is not ready, the SourceDataLine thread will suspend and wait until the output stage is ready to accept the next block of data for processing.
There are multiple transmission layers on the incoming data, much of which I don't know enough about. But I presume that if you have a way of assembling the incoming packets into a stream (with whatever buffering is required), then you should be able to receive and play PCM. Surely there are structures in C that would provide the equivalent functions of the Java classes I cited.
I am looking to build an application that requires real-time control (or as good as possible) control of audio output in OSX.
I need the ability to send samples of audio to the sound-card with as much control as possible, no delays as when the audio frames are sent will be closely tied to a timer event run via the clock.
Is the Audio Queue what I am looking for?
Audio Units can be configured for the lowest latency (using short buffers of raw PCM sample) in OS X and iOS. The Audio Queue API is built on top of Audio Units, and may have buffering overhead, thus increasing latency.
I'm looking for the fastest way to encode a webcam stream that will be viewable in a html5 video tag. I'm using a Pandaboard: http://www.digikey.com/product-highlights/us/en/texas-instruments-pandaboard/686#tabs-2 for the hardware. Can use gstreamer, cvlc, ffmpeg. I'll be using it to drive a robot, so need the least amount of lag in the video stream. Quality doesn't have to be great and it doesn't need audio. Also, this is only for one client so bandwidth isn't an issue. The best solution so far is using ffmpeg with a mpjpeg gives me around 1 sec delay. Anything better?
I have been asked this many times so I will try and answer this a bit generically and not just for mjpeg. Getting very low delays in a system requires a bit of system engineering effort and also understanding of the components.
Some simple top level tweaks I can think of are:
Ensure the codec is configured for the lowest delay. Codecs will have (especially embedded system codecs) a low delay configuration. Enable it. If you are using H.264 it's most useful. Most people don't realize that by standard requirements H.264 decoders need to buffer frames before displaying it. This can be upto 16 for Qcif and upto 5 frames for 720p. That is a lot of delay in getting the first frame out. If you do not use H.264 still ensure you do not have B pictures enabled. This adds delay to getting the first picture out.
Since you are using mjpeg, I don't think this is applicable to you much.
Encoders will also have a rate control delay. (Called init delay or vbv buf size). Set it to the smallest value that gives you acceptable quality. That will also reduce the delay. Think of this as the bitstream buffer between encoder and decoder. If you are using x264 that would be the vbv buffer size.
Some simple other configurations: Use as few I pictures as possible (large intra period).
I pictures are huge and add to the delay to send over the network. This may not be very visible in systems where end to end delay is in the range of 1 second or more but when you are designing systems that need end to end delay of 100ms or less, this and several other aspects come into play. Also ensure you are using a low latency audio codec aac-lc (and not heaac).
In your case to get to lower latencies I would suggest moving away from mjpeg and use at least mpeg4 without B pictures (Simple profile) or best is H.264 baseline profile (x264 gives a zerolatency option). The simple reason you will get lower latency is that you will get lower bitrate post encoding to send the data out and you can go to full framerate. If you must stick to mjpeg you have close to what you can get without more advanced features support from the codec and system using the open source components as is.
Another aspect is the transmission of the content to the display unit. If you can use udp it will reduce latency quite a lot compared to tcp, though it can be lossy at times depending on network conditions. You have mentioned html5 video. I am curious as to how you are doing live streaming to a html5 video tag.
There are other aspects that can also be tweaked which I would put in the advanced category and requires the system engineer to try various things out
What is the network buffering in the OS? The OS also buffers data before sending it out for performance reasons. Tweak this to get a good balance between performance and speed.
Are you using CR or VBR encoding? While CBR is great for low jitter you can also use capped vbr if the codec provides it.
Can your decoder start decoding partial frames? So you don't have to worry about framing the data before providing it to the decoder. Just keep pushing the data to the decoder as soon as possible.
Can you do field encoding? Halves the time from frame encoding before getting the first picture out.
Can you do sliced encoding with callbacks whenever a slice is available to send over the network immediately?
In sub 100 ms latency systems that I have worked in all of the above are used. Some of the features may not be available in open source components but if you really need it and are enthusiastic you could go ahead and implement them.
EDIT:
I realize you cannot do a lot of the above for a ipad streaming solution and there are limitations because of hls also to the latency you can achieve. But I hope it will prove useful in other cases when you need any low latency system.
We had a similar problem, in our case it was necessary to time external events and sync them with the video stream. We tried several solutions but the one described here solved the problem and is extremely low latency:
Github Link
It uses gstreamer transcode to mjpeg which is then sent to a small python streaming server. This has the advantage that it uses the tag instead of so it can be viewed by most modern browsers, including the iPhone.
As you want the <video> tag, a simple solution is to use http-launch. That
had the lowest latency of all the solutions we tried so it might work for you. Be warned that ogg/theora will not work on Safari or IE so those wishing to target the Mac or Windows will have to modify the pipe to use MP4 or WebM.
Another solution that looks promising, gst-streaming-server. We simply couldn't find enough documentation to make it worth pursuing. I'd grateful if somebody could ask a stackoverflow question about how it should be used!
Let me explain what I mean when I say fluent audio stream.
I have a VOIP application which transfers PCMU encoded audio wrapped in RTP packages through UDP. I already implemented mechanisms which deal with package losses(as suggested in rfc3550).
The problem is that due to platform limitations(blackberry OS) I need to maintain a constant flow of data i.e. I need to pass X bytes every S milliseconds.
Because of network delays, undelivered datagrams etc. I can't guarantee that constant data flow so I created a separate thread which compensates the packages which were dropped or delivered late with fake packages("silence").
So my question is - can anyone suggest a good way to combine the fake packages and the real ones? I realize that adding a fake package automatically increases the lag and maybe I should ignore a real RTP packages after that but as I said this is because of platform limitations and I am willing to make compromises with the quality of the audio and have some additional speech loss.
You need to read up on:
Jitter Buffers
Packet Loss Concealment
These exist to handle exactly the sort of problems you're dealing with.