While working with video in OpenCV using Python, when we are reading the frames using
cap=cv2.VideoCapture(0)
ret,frame=cap.read()
What do the variables ret and frame store?
Reference Material: Getting Started with Videos
ret,frame=cap.read()
From the documentation above:
cap.read() returns a bool (True/False). If frame is read correctly, it
will be True. So you can check end of the video by checking this
return value.
So in assence,
ret = True or ret = False based on whether a frame was read correctly and frame is the actual frame that was read (provided ret == True)
Related
I have a camera recording app that takes in a camera image and records and saves the camera image output as an .avi file that I can play and do whatever once done recording. I want to create a real time opencv code that can take these .avi file that is being created real time, open it, manipulate it do some classification real time. Is there anyway for opencv to open these .avi as they are being written? preferably python but also C++ implementation? This will be done on windows10.
edit:
Currently when I try to do the generic video capture the with the output .avi with CV2 as
cap = cv2.VideoCapture('out.avi')
ret, frame = cap.read()
while(True):
ret, frame = cap.read()
cv2.imshow('frame', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
it gives me the error
Traceback (most recent call last):
File "video_grab.py", line 15, in <module>
cv2.imshow('frame', frame)
cv2.error: OpenCV(4.1.2) C:\projects\opencv-
python\opencv\modules\highgui\src\window.cpp:376: error: (-215:Assertion
failed) size.width>0 && size.height>0 in function 'cv::imshow'
where the assertion fail most likely since the video has 0 by 0 dimension while the capture is stopped and saved.
The following solution may work or not, depending on the codec of the AVI file:
Verify that ret value is True.
Increase the time in waitKey to something larger than 1msec.
import cv2
cap = cv2.VideoCapture('out.avi')
frame_period = 100 # 100msec - assume frame rate is about 10Hz
while(True):
ret, frame = cap.read()
if ret:
cv2.imshow('frame', frame)
if cv2.waitKey(frame_period) & 0xFF == ord('q'):
break
I testes it using "Motion JPEG" codec, and I am getting a warning message like: [mjpeg # 000002a22394b0e0] overread 8, when reading in faster rate than actual frame rate.
You may also try, start reading fast, and reduce the rate when ret = False:
frame_period = 1 # Start reading fast (wait only 1msec)
while(True):
ret, frame = cap.read()
if ret:
cv2.imshow('frame', frame)
else:
frame_period = 100 # Reduce the rate to 10Hz when reaching end of file.
if cv2.waitKey(frame_period) & 0xFF == ord('q'):
break
I think it's going to work better if you have some indication that new frame was captured.
I couldn't find a solution for that by just "querying" the AVI file using OpenCV.
I'm trying to use libavcodec library in FFMpeg to decode then re-encode a h264 video.
I have the decoding part working (rendes to an SDL window fine) but when I try to re-encode the frames I get bad data in the re-encoded videos samples.
Here is a cut down code snippet of my encode logic.
EncodeResponse H264Codec::EncodeFrame(AVFrame* pFrame, StreamCodecContainer* pStreamCodecContainer, AVPacket* pPacket)
{
int result = 0;
result = avcodec_send_frame(pStreamCodecContainer->pEncodingCodecContext, pFrame);
if(result < 0)
{
return EncodeResponse::Fail;
}
while (result >= 0)
{
result = avcodec_receive_packet(pStreamCodecContainer->pEncodingCodecContext, pPacket);
// If the encoder needs more frames to create a packed then return and wait for
// method to be called again upon a new frame been present.
// Else check if we have failed to encode for some reason.
// Else a packet has successfully been returned, then write it to the file.
if (result == AVERROR(EAGAIN) || result == AVERROR_EOF)
{
// Higher level logic, dedcodes next frame from source
// video then calls this method again.
return EncodeResponse::SendNextFrame;
}
else if (result < 0)
{
return EncodeResponse::Fail;
}
else
{
// Prepare packet for muxing.
if (pStreamCodecContainer->codecType == AVMEDIA_TYPE_VIDEO)
{
av_packet_rescale_ts(m_pPacket, pStreamCodecContainer->pEncodingCodecContext->time_base,
m_pDecodingFormatContext->streams[pStreamCodecContainer->streamIndex]->time_base);
}
m_pPacket->stream_index = pStreamCodecContainer->streamIndex;
int result = av_interleaved_write_frame(m_pEncodingFormatContext, m_pPacket);
av_packet_unref(m_pPacket);
}
}
return EncodeResponse::EncoderEndOfFile;
}
Strange behaviour I notice is that before I get the first packet from avcodec_receive_packet I have to send 50+ frames to avcodec_send_frame.
I built a debug build of FFMpeg and stepping into the code I notice that AVERROR(EAGAIN) is returned by avcodec_receive_packet because of the following in x264encoder::encode in encoder.c
if( h->frames.i_input <= h->frames.i_delay + 1 - h->i_thread_frames )
{
/* Nothing yet to encode, waiting for filling of buffers */
pic_out->i_type = X264_TYPE_AUTO;
return 0;
}
For some reason my code-context (h) never has any frames. I have spent a long time trying to debug ffmpeg and to determine what I'm doing wrong. But have reached the limit of my video codec knowledge (which is little).
I'm testing this with a video that has no audio to reduce complication.
I have created a cut down version of my application and provided a self contained (with ffmpeg and SDL built dependencies) project. Hopefully this can help anyone-one willing to help me :).
Project Link
https://github.com/maxhap/video-codec
After looking into encoder initialisation I found that I have to set the codec AV_CODEC_FLAG_GLOBAL_HEADER before calling avcodec_open2
pStreamCodecContainer->pEncodingCodecContext->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
This change led to the re-encoded moov box looking much heathier (used MP4Box.js to parse it). However, the video still does not play correctly, the output video has grey frames at the start when played in VLC and won't play in other players.
I have since tried creating an encoding context via the sample code, rather than using my decoding codec parameters. This led to fixing the bad/data or encoding issue. However, my DTS times are scaling to huge numbers
Here is my new codec init
if (pStreamCodecContainer->codecType == AVMEDIA_TYPE_VIDEO)
{
pStreamCodecContainer->pEncodingCodecContext->height = pStreamCodecContainer->pDecodingCodecContext->height;
pStreamCodecContainer->pEncodingCodecContext->width = pStreamCodecContainer->pDecodingCodecContext->width;
pStreamCodecContainer->pEncodingCodecContext->sample_aspect_ratio = pStreamCodecContainer->pDecodingCodecContext->sample_aspect_ratio;
/* take first format from list of supported formats */
if (pStreamCodecContainer->pEncodingCodec->pix_fmts)
{
pStreamCodecContainer->pEncodingCodecContext->pix_fmt = pStreamCodecContainer->pEncodingCodec->pix_fmts[0];
}
else
{
pStreamCodecContainer->pEncodingCodecContext->pix_fmt = pStreamCodecContainer->pDecodingCodecContext->pix_fmt;
}
/* video time_base can be set to whatever is handy and supported by encoder */
pStreamCodecContainer->pEncodingCodecContext->time_base = av_inv_q(pStreamCodecContainer->pDecodingCodecContext->framerate);
pStreamCodecContainer->pEncodingCodecContext->sample_aspect_ratio = pStreamCodecContainer->pDecodingCodecContext->sample_aspect_ratio;
}
else
{
pStreamCodecContainer->pEncodingCodecContext->channel_layout = pStreamCodecContainer->pDecodingCodecContext->channel_layout;
pStreamCodecContainer->pEncodingCodecContext->channels =
av_get_channel_layout_nb_channels(pStreamCodecContainer->pEncodingCodecContext->channel_layout);
/* take first format from list of supported formats */
pStreamCodecContainer->pEncodingCodecContext->sample_fmt = pStreamCodecContainer->pEncodingCodec->sample_fmts[0];
pStreamCodecContainer->pEncodingCodecContext->time_base = AVRational{ 1, pStreamCodecContainer->pEncodingCodecContext->sample_rate };
}
Any ideas why my DTS time is re-scaling incorrectly?
I managed to fix the DTS scalling by using the time_base value directly from the decoding streams.
So
pStreamCodecContainer->pEncodingCodecContext->time_base = m_pDecodingFormatContext->streams[pStreamCodecContainer->streamIndex]->time_base
Instead of
pStreamCodecContainer->pEncodingCodecContext->time_base = av_inv_q(pStreamCodecContainer->pDecodingCodecContext->framerate);
I will create an answer based on all my finding.
To fix the initial problem of a corrupted moov box I had to add the AV_CODEC_FLAG_GLOBAL_HEADER flag to the encoding codec context before calling avcodec_open2.
encCodecContext->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
The next issue was badly scaled DTS values in the encoded package, this was causing a side effect of the final mp4 duration being in the hundreds of hours long. To fix this I had to change the encoding codec context timebase to be that of the decoding context streams timebase. This is different than using av_inv_q(framerate) as suggested in the avcodec transcoding example.
encCodecContext->time_base = decCodecFormatContext->streams[streamIndex]->time_base;
In a user-space application, I'm writing v210 formatted video data to a V4L2 loopback device. When I watch the video in VLC or other viewer, I just get clownbarf and claims that the stream is UYUV or other, not v210. I suspect I need to tell the loopback device something more than what I have, to make the stream appear as v210 to the viewer. Is there one more place/way to tell it that it'll be handling a certain format?
What I do now:
int frame_w, frame_h = ((some sane values))
outputfd = open("/dev/video4", O_RDWR);
// check VIDIOC_QUERYCAPS, ...
struct v4l2_format fmt;
memset(&fmt, 0, sizeof(fmt));
fmt.type = V4L2_BUF_TYPE_VIDEO_OUTPUT;
fmt.fmt.pix.width = frame_w;
fmt.fmt.pix.height = frame_h;
fmt.fmt.pix.bytesperline = inbpr; // no padding
fmt.fmt.pix.field = 1;
fmt.fmt.pix.sizeimage = frame_h * fmt.fmt.pix.bytesperline;
fmt.fmt.pix.colorspace = V4L2_COLORSPACE_SRGB;
v210width = (((frame_w+47)/48)*48); // round up to mult of 48 px
byte_per_row = (v210width*8)/3;
fmt.fmt.pix.pixelformat = 'v' | '2' << 8 | '1' << 16 | '0' << 24;
fmt.fmt.pix.width = v210width;
fmt.fmt.pix.bytesperline = byte_per_row ;
ioctl(outputfd, VIDIOC_S_FMT, &fmt);
// later, in some inner loop...
... write stuff to uint8_t buffer[] ...
write(outputfd, buffer, buffersize);
If I write UYVY format, or RGB or others, it can be made to work. Viewers display the video and report the correct format.
This code is based on examples, reading the V4L docs, and some working in-house code. No one here knows exactly what are all the things one must do to open and write to a video device.
While there is an easily found example online of how to read video from a V4L device, I couldn't find a similar quality example for writing. If such exists, it may show the missing piece.
I use packet duration to translate from frame index to pts and back, and I'd like to be sure that this is a reliable method of doing so.
Alternatively, is there a better way to translate pts to a frame index and vice versa?
A snippet showing my usage:
bool seekFrame(int64_t frame)
{
if(frame > container.frameCount)
frame = container.frameCount;
// Seek to a frame behind the desired frame because nextFrame() will also increment the frame index
int64_t seek = pts_cache[frame-1]; // pts_cache is an array of all frame pts values
// get the nearest prior keyframe
int preceedingKeyframe = av_index_search_timestamp(container.video_st, seek, AVSEEK_FLAG_BACKWARD);
// here's where I'm worried that packetDuration isn't a reliable method of translating frame index to
// pts value
int64_t nearestKeyframePts = preceedingKeyframe * container.packetDuration;
avcodec_flush_buffers(container.pCodecCtx);
int ret = av_seek_frame(container.pFormatCtx, container.videoStreamIndex, nearestKeyframePts, AVSEEK_FLAG_ANY);
if(ret < 0) return false;
container.lastPts = nearestKeyframePts;
AVFrame *pFrame = NULL;
while(nextFrame(pFrame, NULL) && container.lastPts < seek)
{
;
}
container.currentFrame = frame-1;
av_free(pFrame);
return true;
}
No, not guaranteed. It may work with some codec/container combination where frame-rate is static. avi, h264 raw (annex-b) and yuv4mpeg come to mind. But other containers like flv, mp4, ts, have a PTS/DTS (or CTS) for EVERY frame. The source could be variable frame rate, or frames could have be dropped at some point during processing due to bandwidth. Also some codecs will remove duplicate frames.
So unless you created the file yourself. Do not trust it. There is no guaranteed way to look at a frame and know its 'index' except start at the beginning and count.
Your method, MAY be good enough for most files however.
I have built some code to process video files on OSX, frame by frame. The following is an extract from the code which builds OK, opens the file, locates the video track (only track) and starts reading CMSampleBuffers without problem. However each CMSampleBufferRef I obtain returns NULL when I try to extract the pixel buffer frame. There's no indication in iOS documentation as to why I could expect a NULL return value or how I could expect to fix the issue. It happens with all the videos on which I've tested it, regardless of capture source or CODEC.
Any help greatly appreciated.
NSString *assetInPath = #"/Users/Dave/Movies/movie.mp4";
NSURL *assetInUrl = [NSURL fileURLWithPath:assetInPath];
AVAsset *assetIn = [AVAsset assetWithURL:assetInUrl];
NSError *error;
AVAssetReader *assetReader = [AVAssetReader assetReaderWithAsset:assetIn error:&error];
AVAssetTrack *track = [assetIn.tracks objectAtIndex:0];
AVAssetReaderOutput *assetReaderOutput = [[AVAssetReaderTrackOutput alloc]
initWithTrack:track
outputSettings:nil];
[assetReader addOutput:assetReaderOutput];
// Start reading
[assetReader startReading];
CMSampleBufferRef sampleBuffer;
do {
sampleBuffer = [assetReaderOutput copyNextSampleBuffer];
/**
** At this point, sampleBuffer is non-null, has all appropriate attributes to indicate that
** it's a video frame, 320x240 or whatever and looks perfectly fine. But the next
** line always returns NULL without logging any obvious error message
**/
CVImageBufferRef pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
if( pixelBuffer != NULL ) {
size_t width = CVPixelBufferGetWidth(pixelBuffer);
size_t height = CVPixelBufferGetHeight(pixelBuffer);
CVPixelBufferLockBaseAddress(pixelBuffer, 0);
...
other processing removed here for clarity
}
} while( ... );
To be clear, I've stripped all error checking code but no problems were being indicated in that code. i.e. The AVAssetReader is reading, CMSampleBufferRef looks fine etc.
You haven't specified any outputSettings when creating your AVAssetReaderTrackOutput. I've run into your issue when specifying "nil" in order to receive the video track's original pixel format when calling copyNextSampleBuffer. In my app I wanted to ensure no conversion was happening when calling copyNextSampleBuffer for the sake of performance, if this isn't a big concern for you, specify a pixel format in the output settings.
The following are Apple's recommend pixel formats based on the hardware capabilities:
kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange
kCVPixelFormatType_420YpCbCr8BiPlanarFullRange
Because you haven't supplied any outputSettings you're forced to use the raw data contained within in the frame.
You have to get the block buffer from the sample buffer using CMSampleBufferGetDataBuffer(sampleBuffer), after you have that you need to get the actual location of the block buffer using
size_t blockBufferLength;
char *blockBufferPointer;
CMBlockBufferGetDataPointer(blockBuffer, 0, NULL, &blockBufferLength, &blockBufferPointer);
Look at *blockBufferPointer and decode the bytes using the frame header information for your required codec.
FWIW: Here is what official docs say for the return value of CMSampleBufferGetImageBuffer:
"Result is a CVImageBuffer of media data. The result will be NULL if the CMSampleBuffer does not contain a CVImageBuffer, or if the CMSampleBuffer contains a CMBlockBuffer, or if there is some other error."
Also note that the caller does not own the returned dataBuffer from CMSampleBufferGetImageBuffer, and must retain it explicitly if the caller needs to maintain a reference to it.
Hopefully this info helps.