Video Streaming is slow with OpenCV - performance

I'm doing video processing application using C++ with OpenCV. This is how I have written coding to initialize web cam.
storage = cvCreateMemStorage( 0 );
capture = cvCaptureFromCAM(1);
cvNamedWindow( "video", 1 );
while( key != 'q' ) {
frame = cvQueryFrame( capture );
if( !frame ) {
fprintf( stderr, "Cannot query frame!\n" );
break;
}
cvFlip( frame, frame, 1 );
frame->origin = 0;
key = cvWaitKey( 1 );
}
Can anyone suggest me a solution in order to increase the speed of capturing frames from web cam. There is like 3 seconds delay when compare to actual web cam video stream with OpenCV application web cam video stream.
Thank you.

What version of opencv are you using? Are you using the build that uses Intel Threading Building Blocks (tbb.dll)? If not, then use it, that is your speed-up right there.
You can also try a bare-bones code just to see what type of speed up you get:
storage = cvCreateMemStorage( 0 );
capture = cvCaptureFromCAM(1);
while (1) {
frame = cvQueryFrame(capture);
cvWaitKey(1);
}
Other than that, I suggest using the c++ interface to opencv, the c interface is quite ugly and might be slower.

Related

FFMEG libavcodec decoder then re-encode video issue

I'm trying to use libavcodec library in FFMpeg to decode then re-encode a h264 video.
I have the decoding part working (rendes to an SDL window fine) but when I try to re-encode the frames I get bad data in the re-encoded videos samples.
Here is a cut down code snippet of my encode logic.
EncodeResponse H264Codec::EncodeFrame(AVFrame* pFrame, StreamCodecContainer* pStreamCodecContainer, AVPacket* pPacket)
{
int result = 0;
result = avcodec_send_frame(pStreamCodecContainer->pEncodingCodecContext, pFrame);
if(result < 0)
{
return EncodeResponse::Fail;
}
while (result >= 0)
{
result = avcodec_receive_packet(pStreamCodecContainer->pEncodingCodecContext, pPacket);
// If the encoder needs more frames to create a packed then return and wait for
// method to be called again upon a new frame been present.
// Else check if we have failed to encode for some reason.
// Else a packet has successfully been returned, then write it to the file.
if (result == AVERROR(EAGAIN) || result == AVERROR_EOF)
{
// Higher level logic, dedcodes next frame from source
// video then calls this method again.
return EncodeResponse::SendNextFrame;
}
else if (result < 0)
{
return EncodeResponse::Fail;
}
else
{
// Prepare packet for muxing.
if (pStreamCodecContainer->codecType == AVMEDIA_TYPE_VIDEO)
{
av_packet_rescale_ts(m_pPacket, pStreamCodecContainer->pEncodingCodecContext->time_base,
m_pDecodingFormatContext->streams[pStreamCodecContainer->streamIndex]->time_base);
}
m_pPacket->stream_index = pStreamCodecContainer->streamIndex;
int result = av_interleaved_write_frame(m_pEncodingFormatContext, m_pPacket);
av_packet_unref(m_pPacket);
}
}
return EncodeResponse::EncoderEndOfFile;
}
Strange behaviour I notice is that before I get the first packet from avcodec_receive_packet I have to send 50+ frames to avcodec_send_frame.
I built a debug build of FFMpeg and stepping into the code I notice that AVERROR(EAGAIN) is returned by avcodec_receive_packet because of the following in x264encoder::encode in encoder.c
if( h->frames.i_input <= h->frames.i_delay + 1 - h->i_thread_frames )
{
/* Nothing yet to encode, waiting for filling of buffers */
pic_out->i_type = X264_TYPE_AUTO;
return 0;
}
For some reason my code-context (h) never has any frames. I have spent a long time trying to debug ffmpeg and to determine what I'm doing wrong. But have reached the limit of my video codec knowledge (which is little).
I'm testing this with a video that has no audio to reduce complication.
I have created a cut down version of my application and provided a self contained (with ffmpeg and SDL built dependencies) project. Hopefully this can help anyone-one willing to help me :).
Project Link
https://github.com/maxhap/video-codec
After looking into encoder initialisation I found that I have to set the codec AV_CODEC_FLAG_GLOBAL_HEADER before calling avcodec_open2
pStreamCodecContainer->pEncodingCodecContext->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
This change led to the re-encoded moov box looking much heathier (used MP4Box.js to parse it). However, the video still does not play correctly, the output video has grey frames at the start when played in VLC and won't play in other players.
I have since tried creating an encoding context via the sample code, rather than using my decoding codec parameters. This led to fixing the bad/data or encoding issue. However, my DTS times are scaling to huge numbers
Here is my new codec init
if (pStreamCodecContainer->codecType == AVMEDIA_TYPE_VIDEO)
{
pStreamCodecContainer->pEncodingCodecContext->height = pStreamCodecContainer->pDecodingCodecContext->height;
pStreamCodecContainer->pEncodingCodecContext->width = pStreamCodecContainer->pDecodingCodecContext->width;
pStreamCodecContainer->pEncodingCodecContext->sample_aspect_ratio = pStreamCodecContainer->pDecodingCodecContext->sample_aspect_ratio;
/* take first format from list of supported formats */
if (pStreamCodecContainer->pEncodingCodec->pix_fmts)
{
pStreamCodecContainer->pEncodingCodecContext->pix_fmt = pStreamCodecContainer->pEncodingCodec->pix_fmts[0];
}
else
{
pStreamCodecContainer->pEncodingCodecContext->pix_fmt = pStreamCodecContainer->pDecodingCodecContext->pix_fmt;
}
/* video time_base can be set to whatever is handy and supported by encoder */
pStreamCodecContainer->pEncodingCodecContext->time_base = av_inv_q(pStreamCodecContainer->pDecodingCodecContext->framerate);
pStreamCodecContainer->pEncodingCodecContext->sample_aspect_ratio = pStreamCodecContainer->pDecodingCodecContext->sample_aspect_ratio;
}
else
{
pStreamCodecContainer->pEncodingCodecContext->channel_layout = pStreamCodecContainer->pDecodingCodecContext->channel_layout;
pStreamCodecContainer->pEncodingCodecContext->channels =
av_get_channel_layout_nb_channels(pStreamCodecContainer->pEncodingCodecContext->channel_layout);
/* take first format from list of supported formats */
pStreamCodecContainer->pEncodingCodecContext->sample_fmt = pStreamCodecContainer->pEncodingCodec->sample_fmts[0];
pStreamCodecContainer->pEncodingCodecContext->time_base = AVRational{ 1, pStreamCodecContainer->pEncodingCodecContext->sample_rate };
}
Any ideas why my DTS time is re-scaling incorrectly?
I managed to fix the DTS scalling by using the time_base value directly from the decoding streams.
So
pStreamCodecContainer->pEncodingCodecContext->time_base = m_pDecodingFormatContext->streams[pStreamCodecContainer->streamIndex]->time_base
Instead of
pStreamCodecContainer->pEncodingCodecContext->time_base = av_inv_q(pStreamCodecContainer->pDecodingCodecContext->framerate);
I will create an answer based on all my finding.
To fix the initial problem of a corrupted moov box I had to add the AV_CODEC_FLAG_GLOBAL_HEADER flag to the encoding codec context before calling avcodec_open2.
encCodecContext->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
The next issue was badly scaled DTS values in the encoded package, this was causing a side effect of the final mp4 duration being in the hundreds of hours long. To fix this I had to change the encoding codec context timebase to be that of the decoding context streams timebase. This is different than using av_inv_q(framerate) as suggested in the avcodec transcoding example.
encCodecContext->time_base = decCodecFormatContext->streams[streamIndex]->time_base;

How to have custom video media/stream sink request RGB32 frames in media foundation?

I am trying to make a custom media sink for video playback in an OpenGL application (without the various WGL_NV_DX_INTEROP, as I am not sure if all my target devices support this).
What I have done so far is to write a custom stream sink that accepts RGB32 samples and set up playback with a media session, however i encountered a problem with initial testing of playing an mp4 file:
one (or more) of the MFTs in the generated topology keep failing with an error code MF_E_TRANSFORM_NEED_MORE_INPUT, therefore my stream sink never receives samples
After a few samples have been requested, the media session receives the event MF_E_ATTRIBUTENOTFOUND, but I still don't know where it is coming from
If, however, I configure the stream sink to receive NV12 samples, everything seems to work fine.
My best guess is the color converter MFT generated by the TopologyLoader needs some more configuration, but I don't know how to do that, considering that I need to keep this entire process indipendent from the original file types.
I've made a minimal test case, that demonstrate the use of a custom video renderer with a classical Media Session.
I use big_buck_bunny_720p_50mb.mp4, and i don't see any problems using RGB32 format.
Sample code here : https://github.com/mofo7777/Stackoverflow under MinimalSinkRenderer.
EDIT
Your program works well with big_buck_bunny_720p_50mb.mp4. I think that your mp4 file is the problem. Share it, if you can.
I just made a few changes :
You Stop on MESessionEnded, and you Close on MESessionStopped.
case MediaEventType.MESessionEnded:
Debug.WriteLine("MediaSession:SesssionEndedEvent");
hr = mediaSession.Stop();
break;
case MediaEventType.MESessionClosed:
Debug.WriteLine("MediaSession:SessionClosedEvent");
receiveSessionEvent = false;
break;
case MediaEventType.MESessionStopped:
Debug.WriteLine("MediaSession:SesssionStoppedEvent");
hr = mediaSession.Close();
break;
default:
Debug.WriteLine("MediaSession:Event: " + eventType);
break;
Adding this to wait for the sound, and to check sample is ok :
internal HResult ProcessSample(IMFSample s)
{
//Debug.WriteLine("Received sample!");
CurrentFrame++;
if (s != null)
{
long llSampleTime = 0;
HResult hr = s.GetSampleTime(out llSampleTime);
if (hr == HResult.S_OK && ((CurrentFrame % 50) == 0))
{
TimeSpan ts = TimeSpan.FromMilliseconds(llSampleTime / (10000000 / 1000));
Debug.WriteLine("Frame {0} : {1}", CurrentFrame.ToString(), ts.ToString());
}
// Do not call SafeRelease here, it is done by the caller, it is a parameter
//SafeRelease(s);
}
System.Threading.Thread.Sleep(26);
return HResult.S_OK;
}
In
public HResult SetPresentationClock(IMFPresentationClock pPresentationClock)
adding
SafeRelease(PresentationClock);
before
if (pPresentationClock != null)
PresentationClock = pPresentationClock;

Why isn't my v210 format video showing as such through a V4L loopback device?

In a user-space application, I'm writing v210 formatted video data to a V4L2 loopback device. When I watch the video in VLC or other viewer, I just get clownbarf and claims that the stream is UYUV or other, not v210. I suspect I need to tell the loopback device something more than what I have, to make the stream appear as v210 to the viewer. Is there one more place/way to tell it that it'll be handling a certain format?
What I do now:
int frame_w, frame_h = ((some sane values))
outputfd = open("/dev/video4", O_RDWR);
// check VIDIOC_QUERYCAPS, ...
struct v4l2_format fmt;
memset(&fmt, 0, sizeof(fmt));
fmt.type = V4L2_BUF_TYPE_VIDEO_OUTPUT;
fmt.fmt.pix.width = frame_w;
fmt.fmt.pix.height = frame_h;
fmt.fmt.pix.bytesperline = inbpr; // no padding
fmt.fmt.pix.field = 1;
fmt.fmt.pix.sizeimage = frame_h * fmt.fmt.pix.bytesperline;
fmt.fmt.pix.colorspace = V4L2_COLORSPACE_SRGB;
v210width = (((frame_w+47)/48)*48); // round up to mult of 48 px
byte_per_row = (v210width*8)/3;
fmt.fmt.pix.pixelformat = 'v' | '2' << 8 | '1' << 16 | '0' << 24;
fmt.fmt.pix.width = v210width;
fmt.fmt.pix.bytesperline = byte_per_row ;
ioctl(outputfd, VIDIOC_S_FMT, &fmt);
// later, in some inner loop...
... write stuff to uint8_t buffer[] ...
write(outputfd, buffer, buffersize);
If I write UYVY format, or RGB or others, it can be made to work. Viewers display the video and report the correct format.
This code is based on examples, reading the V4L docs, and some working in-house code. No one here knows exactly what are all the things one must do to open and write to a video device.
While there is an easily found example online of how to read video from a V4L device, I couldn't find a similar quality example for writing. If such exists, it may show the missing piece.

MultiRoute Audio Input in iOS

We've been working with AudioUnits in Core Audio. It is simultaniously a very powerful audio framework, and one of the worst documented which makes it both a joy and a frustration to work with.
We want to accomplish something we know iPads had been able to do since iOS 6.0 - Multiple audio inputs.
So far - from the 2012 Developer Talk - It appears you have to set the audio session to MultiRoute. We've done this. If I plug in an a soundcard from a keyboard. I can see that there are two inputs. Great. We're then told that we need to set a ChannelMap on a Remote I/O unit.
To what? Well... here's where it gets vague. We need to set all the channels we don't want to -1 and the channels we want to 0 and 1 (for stereo input or for mono?).
We attempt this and... nothing. Sound still plays through on the 'last in wins' principle. Microphone if everything plugged out, soundcard if that's the one plugged in. But we can't switch between them.
This setup code is always run before the other function listed
func setupAudioSession() {
self.audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(AVAudioSessionCategoryMultiRoute, with: [.mixWithOthers])
try audioSession.setActive(true)
audioSessionWasSetup = true
} catch let error {
//TODO: Implement something here
print(error)
audioSessionWasSetup = false
}
}
We then have a remote I/O with an associated audiograph set up. This has been tested and works beautifully. But we need to be able to set where it's pulling sound from.
I've attempted to do it with this, but not only doesn't it have any effect... nothing happens.
Am I missing something?
private func setChannelMap(onAudioUnit audioUnit: AudioUnit?, toChannel channelIndex: Int = 0) {
var channelMap: [Int32] = []
if audioUnit == nil {
return
}
var numberOfInputChannels: UInt32 = 4 // Two stereo inputs? - I'm just guessing here
let mapSize: UInt32 = numberOfInputChannels * UInt32(MemoryLayout<Int32>.size);
for _ in 0...(numberOfInputChannels) {
channelMap.append(-1)
}
channelMap[2 * channelIndex] = 0;
channelMap[2 * channelIndex + 1] = 1;
let status = AudioUnitSetProperty(audioUnit!,
kAudioOutputUnitProperty_ChannelMap,
kAudioUnitScope_Input,
0,
&channelMap,
mapSize);
self.checkError(status, "Failed to set Channel Map on input unit")
}
There isn't any documentation on this at all as far as I've been able to find. Nor any code examples.
I hope you can help us.

Stream publishing using ffmpeg rtmp: network bandwidth not fully utilized

I'm developing an application that needs to publish a media stream to an rtmp "ingestion" url (as used in YouTube Live, or as input to Wowza Streaming Engine, etc), and I'm using the ffmpeg library (programmatically, from C/C++, not the command line tool) to handle the rtmp layer. I've got a working version ready, but am seeing some problems when streaming higher bandwidth streams to servers with worse ping. The problem exists both when using the ffmpeg "native"/builtin rtmp implementation and the librtmp implementation.
When streaming to a local target server with low ping through a good network (specifically, a local Wowza server), my code has so far handled every stream I've thrown at it and managed to upload everything in real time - which is important, since this is meant exclusively for live streams.
However, when streaming to a remote server with a worse ping (e.g. the youtube ingestion urls on a.rtmp.youtube.com, which for me have 50+ms pings), lower bandwidth streams work fine, but with higher bandwidth streams the network is underutilized - for example, for a 400kB/s stream, I'm only seeing ~140kB/s network usage, with a lot of frames getting delayed/dropped, depending on the strategy I'm using to handle network pushback.
Now, I know this is not a problem with the network connection to the target server, because I can successfully upload the stream in real time when using the ffmpeg command line tool to the same target server or using my code to stream to a local Wowza server which then forwards the stream to the youtube ingestion point.
So the network connection is not the problem and the issue seems to lie with my code.
I've timed various parts of my code and found that when the problem appears, calls to av_write_frame / av_interleaved_write_frame (I never mix & match them, I am always using one version consistently in any specific build, it's just that I've experimented with both to see if there is any difference) sometimes take a really long time - I've seen those calls sometimes take up to 500-1000ms, though the average "bad case" is in the 50-100ms range. Not all calls to them take this long, most return instantly, but the average time spent in these calls grows bigger than the average frame duration, so I'm not getting a real time upload anymore.
The main suspect, it seems to me, could be the rtmp Acknowledgement Window mechanism, where a sender of data waits for a confirmation of receipt after sending every N bytes, before sending any more data - this would explain the available network bandwidth not being fully used, since the client would simply sit there and wait for a response (which takes a longer time because of the lower ping), instead of using the available bandwidth. Though I haven't looked at ffmpeg's rtmp/librtmp code to see if it actually implements this kind of throttling, so it could be something else entirely.
The full code of the application is too much to post here, but here are some important snippets:
Format context creation:
const int nAVFormatContextCreateError = avformat_alloc_output_context2(&m_pAVFormatContext, nullptr, "flv", m_sOutputUrl.c_str());
Stream creation:
m_pVideoAVStream = avformat_new_stream(m_pAVFormatContext, nullptr);
m_pVideoAVStream->id = m_pAVFormatContext->nb_streams - 1;
m_pAudioAVStream = avformat_new_stream(m_pAVFormatContext, nullptr);
m_pAudioAVStream->id = m_pAVFormatContext->nb_streams - 1;
Video stream setup:
m_pVideoAVStream->codecpar->codec_type = AVMEDIA_TYPE_VIDEO;
m_pVideoAVStream->codecpar->codec_id = AV_CODEC_ID_H264;
m_pVideoAVStream->codecpar->width = nWidth;
m_pVideoAVStream->codecpar->height = nHeight;
m_pVideoAVStream->codecpar->format = AV_PIX_FMT_YUV420P;
m_pVideoAVStream->codecpar->bit_rate = 10 * 1000 * 1000;
m_pVideoAVStream->time_base = AVRational { 1, 1000 };
m_pVideoAVStream->codecpar->extradata_size = int(nTotalSizeRequired);
m_pVideoAVStream->codecpar->extradata = (uint8_t*)av_malloc(m_pVideoAVStream->codecpar->extradata_size + AV_INPUT_BUFFER_PADDING_SIZE);
// Fill in the extradata here - I'm sure I'm doing that correctly.
Audio stream setup:
m_pAudioAVStream->time_base = AVRational { 1, 1000 };
// Let's leave creation of m_pAudioCodecContext out of the scope of this question, I'm quite sure everything is done right there.
const int nAudioCodecCopyParamsError = avcodec_parameters_from_context(m_pAudioAVStream->codecpar, m_pAudioCodecContext);
Opening the connection:
const int nAVioOpenError = avio_open2(&m_pAVFormatContext->pb, m_sOutputUrl.c_str(), AVIO_FLAG_WRITE);
Starting the stream:
AVDictionary * pOptions = nullptr;
const int nWriteHeaderError = avformat_write_header(m_pAVFormatContext, &pOptions);
Sending a video frame:
AVPacket pkt = { 0 };
av_init_packet(&pkt);
pkt.dts = nTimestamp;
pkt.pts = nTimestamp;
pkt.duration = nDuration; // I know what I have the wrong duration sometimes, but I don't think that's the issue.
pkt.data = pFrameData;
pkt.size = pFrameDataSize;
pkt.flags = bKeyframe ? AV_PKT_FLAG_KEY : 0;
pkt.stream_index = m_pVideoAVStream->index;
const int nWriteFrameError = av_write_frame(m_pAVFormatContext, &pkt); // This is where too much time is spent.
Sending an audio frame:
AVPacket pkt = { 0 };
av_init_packet(&pkt);
pkt.pts = m_nTimestampMs;
pkt.dts = m_nTimestampMs;
pkt.duration = m_nDurationMs;
pkt.stream_index = m_pAudioAVStream->index;
const int nWriteFrameError = av_write_frame(m_pAVFormatContext, &pkt);
Any ideas? Am I on the right track with thinking about the Acknowledgement Window? Am I doing something else completely wrong?
I don't think this explains everything, but, just in case, for someone in a similar situation, the fix/workaround I found was:
1) build ffmpeg with the librtmp implementation of the rtmp protocol
2) build ffmpeg with --enable-network, it adds a couple of features to the librtmp protocol
3) pass "rtmp_buffer_size" parameter to avio_open2, and increase it's value to a satisfactory one
I can't give you a full step-by-step explanation of what was going wrong, but this fixed at least the symptom that was causing me problems.

Resources