Pion WebRTC Audio stream cutting out while video works - go

I am trying to send an MP4 video through Pion WebRTC to the browser.
Using FFmpeg, I split it into an Opus OGG stream and an Annex-B H.264 video stream. While the video works fine, the audio keeps cutting in and out. It plays fine for a few seconds, then stops for a second, and continues.
This is the FFmpeg command I use for audio:
ffmpeg -i demo.mp4 -c:a libopus -vn -page_duration 20000 demo.ogg
And this is my transmitter (shortened):
var lastGranule uint64
for {
pageData, pageHeader, err := ogg.ParseNextPage() // Uses Pion OggReader
// Taken from the play-from-disk example
sampleCount := float64(pageHeader.GranulePosition - lastGranule)
lastGranule = pageHeader.GranulePosition
sampleDuration := time.Duration((sampleCount/48000)*1000) * time.Millisecond
err = audioTrack.WriteSample(media.Sample{Data: pageData, Duration: sampleDuration})
util.HandleError(err)
time.Sleep(sampleDuration)
}
I tried hardcoding the delay to 15ms, which fixes the issue that it's cutting out, but then it randomly plays way too fast or starts skipping. Since I had glitchy video before updating my FFmpeg command (add keyframes and remove b-frames), I assume this is also an encoder problem.
What could be the cause for this?
Update: Using WebRTC logging in Chrome, I discovered the following log lines that occurred frequently:
[27216:21992:0809/141533.175:WARNING:rtcp_receiver.cc(452)] 30 RTCP blocks were skipped due to being malformed or of unrecognized/unsupported type, during the past 10 second period.
This is probably the reason for the cutouts, although I can't figure out why it receives malformed data.

The problem in the end was an inaccuracy in the Sleep time caused by issue #44343 in Go itself. It caused the samples not to be sent at a constant rate, but at a rate that randomly was between 5 and 15ms off, resulting in a choppy stream.
Sean DuBois and me fixed this in the latest play-from-disk and play-from-disk-h264 examples in the Pion repository by replacing the for-loop and Sleep() with a Ticker, which is more accurate.

Related

How to scale and mux audio?

First problem is with audio rescaling. I'm trying to redo doc/examples/transcode_aac.c so that it also resamples from 41100 to 48000, it contained a warning that it can't do it.
Using doc/examples/resampling_audio.c as a reference, I saw that before doing swr_convert, I need to find the number of audio samples at the output with the code like this:
int dst_nb_samples = av_rescale_rnd( input_frame->nb_samples + swr_get_delay(resampler_context, 41100),
48000, 41100, AV_ROUND_UP);
Problem is, when I just set int dst_nb_samples = input_frame->nb_samples (which is 1024), it encodes and plays normally, but when I do that av_rescale_rnd thing (which results in 1196), audio is slowed down and distorted, like there are skips in the audio.
Second problem is with trying to mux webm with opus audio.
When I set AVStream->time_base to 1/48000, and increase AVFrame->pts by 960, the resulted file is played in the player as a file that is much bigger. 17 seconds audio shows as 16m11s audio, but it plays normally.
When I increase pts by 20, it displays normally, but has a lot of [libopus # 00ffa660] Queue input is backward in time messages during the encoding. Same for pts 30, still has those messages.
Should I try time_scale 1/1000? webm always have timecodes in milliseconds, and opus have packet size of 20ms (960 samples at 48000 Hz).
Search for pts += 20;
Here is the whole file, all modification I did are marked with //MINE: http://www.mediafire.com/file/jlgo7x4hiz7bw64/transcode_aac.c
Here is the file I tested it on http://www.mediafire.com/file/zdy0zarlqw3qn6s/480P_600K_71149981_soundonly.mkv
The easiest way to achieve that is by using swr_convert_frame which take a frame and resample it to a completely different one.
You can read more about it here: https://ffmpeg.org/doxygen/3.2/swresample_8h_source.html
dst_nb_samples can be calculated as this:
dst_nb_samples = 48000.0 / audio_stream->codec->sample_rate * inputAudioFrame->nb_samples;
Yours probably correct too, I didn't check, but this one I used before, confirm with yours but the number you gave check out. So real problem is probably somewhere else. Try to supply 960 samples in sync with video frames, to do this you need to store audio frames to an additional liner buffer. See if problem fixes.
And/or:
2ndly my experiences says audio pts increase as number of samples per frame (i.e. 960 for 50fps video for 48000hz (48000/50)), not by ms. If you supply 1196 samples, use pts += 1196 (if not used additional buffer I mentioned above). This is different then video frame pts. Hope that helps.
You are definitely in right path. I'll examine the source code if I have time. Anyway hope that helps.

FFMPEG: RTSP to HLS restream stops with "No more output streams to write to, finishing."

I'm trying to do a live restream an RTSP feed from a webcam using ffmpeg, but the stream repeatedly stops with the error:
"No more output streams to write to, finishing."
The problem seems to get worse at higher bitrates (256kbps is mostly reliable) and is pretty random in its occurrence. At 1mbps, sometimes the stream will run for several hours without any trouble, on other occasions the stream will fail every few minutes. I've got a cron job running which restarts the stream automatically when it fails, but I would prefer to avoid the continued interruptions.
I have seen this problem reported in a handful of other forums, so this is not a unique problem, but not one of those reports had a solution attached to it. My ffmpeg command looks like this:
ffmpeg -loglevel verbose -r 25 -rtsp_transport tcp -i rtsp://user:password#camera.url/live/ch0 -reset_timestamps 1 -movflags frag_keyframe+empty_moov -bufsize 7168k -stimeout 60000 -hls_flags temp_file -hls_time 5 -hls_wrap 180 -acodec copy -vcodec copy streaming.m3u8 > encode.log 2>&1
What gets me is that the error makes no sense, this is a live stream so output is always wanted until I shut off the stream. So having it shut down because output isn't wanted is downright odd. If ffmpeg was complaining because of a problem with input it would make more sense.
I'm running version 3.3.4, which I believe is the latest.
Update 13 Oct 17:
After extensive testing I've established that "No more outputs" error message generated by FFMPEG is very misleading. The error seems to be generated if the data coming in from RTSP is delayed, eg by other activity on the router the camera is connected via. I've got a large buffer and timeout set which should be sufficient for 60 seconds, but I can still deliberately trigger this error with far shorter interruptions, so clearly the buffer and timeout aren't having the desired effect. This might be fixed by setting a QOS policy on the router and by checking that the TCP packets from the camera have a suitably high priority set, it's possible this isn't the case.
However, I would still like to improve the robustness of the input stream if it is briefly interrupted. Is there any way to persuade FFMPEG to tolerate this or to actually make use of the buffer it seems to be ignoring? Can FFMPEG be persuaded to simply stop writing output and wait for input to become available rather than bailing out? Or could I get FFMPEG to duplicate the last complete frame until it's able to get more data? I can live with the stream stuttering a bit, but I've got to significantly reduce the current behaviour where the stream drops at the slightest hint of a problem.
Further update 13 Oct 2017:
After more tests, I've found that the problem actually seems to be that HLS is incapable of coping with a discontinuity in the incoming video stream. If I deliberately cut the network connection between the camera and FFMPEG, FFMPEG will wait for the connection to be re-established for quite a long time. If the interruption was long (>10 seconds) the stream will immediately drop with the "No More Outputs" error the instant that the connection is re-established. If the interruption is short, then RTSP will actually start pulling data from the camera again, but the stream will then drop with the same error a few seconds later. So it seems clear that the gap in the input data is causing the HLS encoder to have a fit and give up once the stream is resumed, but the size of the gap has an impact on whether the drop is instant or not.
I had a similar problem. In my case stream stopped without any errors after few minutes. I fixed this by switching from freebsd to linux. Maybe the problem is bad package dependencies or ffmpeg version. So my suggestion is to try older or newer version of ffmpeg or another OS.
Update: Actually this doesn't solve the problem. I've tested a bit more and stream stopped after 15 minutes.
Been facing the same problem. After an extended trial and error i found that the problem resided in my cctv camera parameters. More exactly i adjusted the key frame interval parameter to match the frame-rate of the recording camera.
My syntax (windows)
SET cam1_rtsp="rtsp://192.168.0.93:554/11?timeout=30000000&listen_timeout=30000000&recv_buffer_size=30000000"
ffmpeg -rtsp_transport tcp -vsync -1 -re -i %cam1_rtsp% -vcodec copy -af apad -shortest -async 1 -strftime 1 -reset_timestamps 1 -metadata title=\\"Cam\\" -map 0 -f segment -segment_time 300 -segment_atclocktime 1 -segment_format mp4 CCTV\%%Y-%%m-%%d_%%H-%%M-%%S.mp4 -loglevel verbose
After this correction got a 120 hour smooth input stream with no errors.
Hope this helps anyone.

Can ffmpeg periodically report statistics on a real-time audio stream (rather than file)?

I currently use ffmpeg to capture desktop screen and audio that the computer speakers are playing, something like a screencast. ffmpeg is started by an app that captures its console output, so I can have that app read the output and look for info
I'd like to know if there are a set of switches I can supply to ffmpeg whereby it will periodically output some audio statistics that will directly report, or allow me to infer, that the audio stream has gone silent?
I see some audio statistics switches/filters but the help docs for these seem to imply they will collect their stats over the processing of an entire stream and then report them at the end.. I'd prefer something like "the average audio volume over the past 5 seconds" reported every 5 seconds. I could even deduce from the audio bitrate of the encoder I think, if it's VBR and the rate consistently falls because it's encoding nothing
Turns out there's a silencedetect audio filter:
https://ffmpeg.org/ffmpeg-filters.html#silencedetect
It works just fine on the streaming audio, used like:
//some switches have been removed for clarity
ffmpeg -i audio="Line 1 (Virtual Audio Cable)" -af silencedetect=n=-50dB:d=5
The d=5 relates to the number of seconds to look back over. After 5 seconds of silence the standard output/err has something like the following pumped into it:
[silencedetect # 0000000002ffe5a0] silence_start: 12.345
After noise returns, something like the following appears in the console
[silencedetect # 0000000002ffe5a0] silence_end: 23.456 | silence_duration: 11.111
It's the job of the app reading the output to parse this and do something with it. In my case, as the recording is unattended, I'll signal an alert that the screencast has lost audio

FFmpegFrameRecorder calls record() 20 times but the resulting mp4 file only has 2 frames

I'm using FFmpegFrameRecorder to create mp4(H264) video from camera preview. My recorder configuration is as follows.
recorder = new FFmpegFrameRecorder(filePath, width, height);
recorder.setVideoCodec(avcodec.AV_CODEC_ID_H264);
recorder.setFormat("mp4");
recorder.setFrameRate(VIDEO_FPS);
recorder.setVideoBitrate(16384);
recorder.setPixelFormat(avutil.AV_PIX_FMT_YUV420P);
For the rest I follows closely to the sample code RecordActivity.java and was able to verify that
recorder.record(yuvIplimage)
gets called 20 (or more) times, which should create an mp4 with 20 frames. However, the resulting mp4 files after open up only has 2 frames (two first frame of the preview)! I have no idea what have caused such behavior. Any help would be greatly appreciate. Thank you.
Long Le
I figured it out: the issue was because I didn't know what I was doing. I was new to javacv, and I was assuming, based on this stackoverflow entry, that the number of frames in the resulting video should be equal to the number of record() calls. However, this is not the case with video encoding, especially with H264. I figured this out by trying with MPEG4 encoding and there's definitely more than 2 frames. H264 seems to require a minimum number of input frames and hence is not suitable for short (<1 minute) video clips generation (which is my application). One solution is to switch to MPEG4 encoding. However, most browser that does play .mp4 files does not support MPEG4 encoding. Another the solution is to use H264 with minimize compression, by adding the following configuration
recorder.setVideoQuality(0); // maximum quality, replace recorder.setVideoBitrate(16384);
recorder.setVideoOption("preset", "veryfast"); // or ultrafast or fast, etc.

Extract frames as images from an RTMP stream in real-time

I am streaming short videos (4 or 5 seconds) encoded in H264 at 15 fps in VGA quality from different clients to a server using RTMP which produced an FLV file. I need to analyse the frames from the video as images as soon as possible so I need the frames to be written as PNG images as they are received.
Currently I use Wowza to receive the streams and I have tried using the transcoder API to access the individual frames and write them to PNGs. This partially works but there is about a second delay before the transcoder starts processing and when the stream ends Wowza flushes its buffers causing the last second not to get transcoded meaning I can lose the last 25% of the video frames. I have tried to find a workaround but Wowza say that it is not possible to prevent the buffer getting flushed. It is also not the ideal solution because there is a 1 second delay before I start getting frames and I have to re-encode the video when using the transcoder which is computationally expensive and unnecessarily for my needs.
I have also tried piping a video in real-time to FFmpeg and getting it to produce the PNG images but unfortunately it waits until it receives the entire video before producing the PNG frames.
How can I extract all of the frames from the stream as close to real-time as possible? I don’t mind what language or technology is used as long as it can run on a Linux server. I would be happy to use FFmpeg if I can find a way to get it to write the images while it is still receiving the video or even Wowza if I can find a way not to lose frames and not to re-encode.
Thanks for any help or suggestions.
Since you linked this question from the red5 user list, I'll add my two cents. You may certainly grab the video frames on the server side, but the issue you'll run into is transcoding from h.264 into PNG. The easiest was would be to use ffmpeg / avconv after getting the VideoData object. Here is a post that gives some details about getting the VideoData: http://red5.5842.n7.nabble.com/Snapshot-Image-from-VideoData-td44603.html
Another option is on the player side using one of Dan Rossi's FlowPlayer plugins: http://flowplayer.electroteque.org/snapshot
I finally found a way to do this with FFmpeg. The trick was to disable audio, use a different flv meta data analyser and to reduce the duration that FFmpeg waits for before processing. My FFmpeg command now starts like this:
ffmpeg -an -flv_metadata 1 -analyzeduration 1 ...
This starts producing frames within a second of receiving an input from a pipe so writes the streamed frames pretty close to real-time.

Resources