JavaCV overlay FFmpegFrameFilter not working - filter

When I use overlay in JavaCV to overlay two videos, it does'nt work.
video1 info:Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuv420p(progressive), 1512x982, 2208 kb/s, SAR 1:1 DAR 756:491, 29.37 fps, 29.42 tbr, 16k tbn (default)
video2 info: png (png / 0x20676E70), rgba(pc, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 470102 kb/s, 50 fps, 50 tbr, 12800 tbn (default)
video2 has a transparent background, and it was converted with a chromakey filter.
I can do it with ffmpeg command:
ffmpeg -i bg.mp4 -i out1.mov -filter_complex overlay xx.mp4
But when I use JavaCV to do the same thing, it does'nt work. The output video lost color, and it was divided into three equal parts, each with the same content.
Here is the code:
String bgMp4 = "bg.mp4";
String mov = "front.mov";
FFmpegFrameGrabber video1 = new FFmpegFrameGrabber(bgMp4);
FFmpegFrameGrabber video2 = new FFmpegFrameGrabber(mov);
// Open the videos for reading
video1.start();
video2.start();
// Create an instance of the FFmpegFrameFilter class to handle the video overlay
FFmpegFrameFilter overlayFilter = new FFmpegFrameFilter("[0:v][1:v]overlay=10:10[v]", video1.getImageWidth(), video1.getImageHeight());
overlayFilter.setVideoInputs(2);
overlayFilter.start();
FFmpegFrameRecorder recorder = new FFmpegFrameRecorder("out.mp4", video1.getImageWidth(), video1.getImageHeight());
recorder.setFrameRate(video1.getFrameRate());
// wether AV_CODEC_ID_PNG or AV_CODEC_ID_H264, not work.
recorder.setVideoCodec(avcodec.AV_CODEC_ID_H264);
recorder.setPixelFormat(avutil.AV_PIX_FMT_YUV420P);
recorder.start();
// Read in the frames from the input videos and apply the overlay filter
Frame frame1, frame2;
while ((frame1 = video1.grabImage()) != null && (frame2 = video2.grabImage()) != null) {
overlayFilter.push(0, frame1);
overlayFilter.push(1, frame2);
Frame frame = overlayFilter.pullImage();
recorder.record(frame);
}
// Close the videos and filter
video1.stop();
video2.stop();
overlayFilter.stop();
recorder.stop();

Related

How to get actual PTS and DTS after frame decoding with libavcodec of FFmpeg

C++, FFmpeg, libavcodec, video frames decoding.
To parse data from the network and split it into frames, I use the av_parser_parse2(_parser, _context, &_packet_encoded_frame->data, &_packet_encoded_frame->size, data, data_size, pts ? pts : AV_NOPTS_VALUE, dts ? dts : AV_NOPTS_VALUE, 0) function to which I pass non-zero values PTS and DTS as inputs.
Next, I use the avcodec_send_packet(_context, _packet_encoded_frame) and avcodec_receive_frame(_context, _frame) functions to decode the frames.
Why do I ALWAYS have PTS(_frame->pts) and DTS(_frame->pkt_dts) equal to AV_NOPTS_VALUE after SUCCESSFUL frame decoding? How can I get the actual PTS and DTS values of that frame after decoding a frame? What am I doing wrong?

Converting numpy list of images to mp4 video

I have Numpy list of 1000 RGB images (1000, 96, 96, 3). I have used openCV to create a mp4 video out of these images. my road is brown and car is red but after I create the video they turned blue.
Could you please tell me how could I avoid this problem?
My code:
img_array = []
for img in brown_dataset:
img_array.append(img)
size = (96,96)
out = cv2.VideoWriter('project_brown.mp4',cv2.VideoWriter_fourcc(*'DIVX'),15, size)
for i in range(len(img_array)):
out.write(img_array[i])
out.release()
Before video:
After video:
As mentioned in the comments , OpenCV uses BGR format by default, where your input dataset is RGB.
Here is one way to fix it
img_array = []
for img in brown_dataset:
img_array.append(img)
size = (96,96)
out = cv2.VideoWriter('project_brown.mp4',cv2.VideoWriter_fourcc(*'DIVX'),15, size)
for i in range(len(img_array)):
rgb_img = cv2.cvtColor(img_array[i], cv2.COLOR_RGB2BGR)
out.write(rgb_img)
out.release()

Using ImageMagick to efficiently stitch together a line scan image

I’m looking for alternatives for line scan cameras to be used in sports timing, or rather in the part where placing needs to be figured out. I found that common industrial cameras can readily match the speed of commercial camera solutions at >1000 frames per second. For my needs, usually the timing accuracy is not important, but the relative placing of athletes. I figured I could use one of the cheapest Basler, IDS or any other area scan industrial cameras for this purpose. Of course there are line scan cameras that can do a lot more than a few thousand fps (or hz), but it is possible to get area scan cameras that can do the required 1000-3000fps for less than 500€.
My holy grail would of course be the near-real time image composition capabilities of FinishLynx (or any other line scan system), basically this part: https://youtu.be/7CWZvFcwSEk?t=23s
The whole process I was thinking for my alternative is:
Use Basler Pylon Viewer (or other software) to record 2px wide images at the camera’s fastest read speed. For the camera I am
currently using it means it has to be turned on it’s side and the
height needs to be reduced, since it is the only way it will read
1920x2px frames # >250fps
Make a program or batch script that then stitches these 1920x2px frames together to, for example one second of recording 1000*1920x2px
frames, meaning a resulting image with a resolution of 1920x2000px
(Horizontal x Vertical).
Finally using the same program or another way, just rotate the image so it reflects how the camera is positioned, thus achieving an image
with a resolution of 2000x1920px (again Horizontal x Vertical)
Open the image in an analyzing program (currently ImageJ) to quickly analyze results
I am no programmer, but this is what I was able to put together just using batch scripts, with the help of stackoverflow of course.
Currently recording a whole 10 seconds for example to disk as a raw/mjpeg(avi/mkv) stream can be done in real time.
Recording individual frames as TIFF or BMP, or using FFMPEG to save them as PNG or JPG takes ~20-60 seconds The appending and rotation
then takes a further ~45-60 seconds
This all needs to be achieved in less than 60 seconds for 10 seconds of footage(1000-3000fps # 10s = 10000-30000 frames) , thus why I need something faster.
I was able to figure out how to be pretty efficient with ImageMagick:
magick convert -limit file 16384 -limit memory 8GiB -interlace Plane -quality 85 -append +rotate 270 “%folder%\Basler*.Tiff” “%out%”
#%out% has a .jpg -filename that is dynamically made from folder name and number of frames.
This command works and gets me 10000 frames encoded in about 30 seconds on a i5-2520m (most of the processing seems to be using only one thread though, since it is working at 25% cpu usage). This is the resulting image: https://i.imgur.com/OD4RqL7.jpg (19686x1928px)
However since recording to TIFF frames using Basler’s Pylon Viewer takes just that much longer than recording an MJPEG video stream, I would like to use the MJPEG (avi/mkv) file as a source for the appending. I noticed FFMPEG has “image2pipe” -command, which should be able to directly give images to ImageMagick. I was not able to get this working though:
$ ffmpeg.exe -threads 4 -y -i "Basler acA1920-155uc (21644989)_20180930_043754312.avi" -f image2pipe - | convert - -interlace Plane -quality 85 -append +rotate 270 "%out%" >> log.txt
ffmpeg version 3.4 Copyright (c) 2000-2017 the FFmpeg developers
built with gcc 7.2.0 (GCC)
configuration: –enable-gpl –enable-version3 –enable-sdl2 –enable-bzlib –enable-fontconfig –enable-gnutls –enable-iconv –enable-libass –enable-libbluray –enable-libfreetype –enable-libmp3lame –enable-libopenjpeg –enable-libopus –enable-libshine –enable-libsnappy –enable-libsoxr –enable-libtheora –enable-libtwolame –enable-libvpx –enable-libwavpack –enable-libwebp –enable-libx264 –enable-libx265 –enable-libxml2 –enable-libzimg –enable-lzma –enable-zlib –enable-gmp –enable-libvidstab –enable-libvorbis –enable-cuda –enable-cuvid –enable-d3d11va –enable-nvenc –enable-dxva2 –enable-avisynth –enable-libmfx
libavutil 55. 78.100 / 55. 78.100
libavcodec 57.107.100 / 57.107.100
libavformat 57. 83.100 / 57. 83.100
libavdevice 57. 10.100 / 57. 10.100
libavfilter 6.107.100 / 6.107.100
libswscale 4. 8.100 / 4. 8.100
libswresample 2. 9.100 / 2. 9.100
libpostproc 54. 7.100 / 54. 7.100
Invalid Parameter - -interlace
[mjpeg # 000000000046b0a0] EOI missing, emulating
Input #0, avi, from 'Basler acA1920-155uc (21644989)_20180930_043754312.avi’:
Duration: 00:00:50.02, start: 0.000000, bitrate: 1356 kb/s
Stream #0:0: Video: mjpeg (MJPG / 0x47504A4D), yuvj422p(pc, bt470bg/unknown/unknown), 1920x2, 1318 kb/s, 200 fps, 200 tbr, 200 tbn, 200 tbc
Stream mapping:
Stream #0:0 -> #0:0 (mjpeg (native) -> mjpeg (native))
Press [q] to stop, [?] for help
Output #0, image2pipe, to ‘pipe:’:
Metadata:
encoder : Lavf57.83.100
Stream #0:0: Video: mjpeg, yuvj422p(pc), 1920x2, q=2-31, 200 kb/s, 200 fps, 200 tbn, 200 tbc
Metadata:
encoder : Lavc57.107.100 mjpeg
Side data:
cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: -1
av_interleaved_write_frame(): Invalid argument
Error writing trailer of pipe:: Invalid argument
frame= 1 fps=0.0 q=1.6 Lsize= 0kB time=00:00:00.01 bitrate= 358.4kbits/s speed=0.625x
video:0kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000000%
Conversion failed!
If I go a bit higher for the height, I no longer get the “[mjpeg # 000000000046b0a0] EOI missing, emulating” -error. However the whole thing will only work with <2px high/wide footage.
edit: Oh yes, I can also use ffmpeg -i file.mpg -r 1/1 $filename%03d.bmp or ffmpeg -i file.mpg $filename%03d.bmp to extract all the frames from the MJPEG/RAW stream. However this is an extra step I do not want to take. (just deleting a folder of 30000 jpgs takes 2 minutes alone…)
Can someone think of a working solution for the piping method or a totally different alternative way of handling this?
I had another go at this to see if I could speed up my other answer by doing things a couple of different ways - hence a different answer. I used the same synthetic videoclip that I generated in the other answer to test with.
Rather than pass the 2x1920 scanlines into ImageMagick for it to append together and write as a JPEG, I did the following:
created the full output frame up front in a C++ program, and then looped reading in a 2x1920 scanline on each iteration and stuffed that into the correct position in the output frame, and
when the entire sequence has been read, compressed it into a JPEG using turbo-jpeg and wrote it disk.
As such, ImageMagick is no longer required. The entire program now runs in around 1.3 seconds rather than the 10.3 seconds via ImageMagick.
Here is the code:
////////////////////////////////////////////////////////////////////////////////
// stitch.cpp
// Mark Setchell
//
// Read 2x1920 RGB frames from `ffmpeg` and stitch into 20000x1920 RGB image.
////////////////////////////////////////////////////////////////////////////////
#include <iostream>
#include <fstream>
#include <stdio.h>
#include <unistd.h>
#include <turbojpeg.h>
using namespace std;
int main()
{
int frames = 10000;
int height = 1920;
int width = frames *2;
// Buffer in which to assemble complete output image (RGB), e.g. 20000x1920
unsigned char *img = new unsigned char [width*height*3];
// Buffer for one scanline image 1920x2 (RGB)
unsigned char *scanline = new unsigned char[2*height*3];
// Output column
int ocol=0;
// Read frames from `ffmpeg` fed into us like this:
// ffmpeg -threads 4 -y -i video.mov -frames 10000 -vf "transpose=1" -f image2pipe -vcodec rawvideo -pix_fmt rgb24 - | ./stitch
for(int f=0;f<10000;f++){
// Read one scanline from stdin, i.e. 2x1920 RGB image...
ssize_t bytesread = read(STDIN_FILENO, scanline, 2*height*3);
// ... and place into finished frame
// ip is pointer to input image
unsigned char *ip = scanline;
for(int row=0;row<height;row++){
unsigned char *op = &(img[(row*width*3)+3*ocol]);
// Copy 2 RGB pixels from scanline to output image
*op++ = *ip++; // red
*op++ = *ip++; // green
*op++ = *ip++; // blue
*op++ = *ip++; // red
*op++ = *ip++; // green
*op++ = *ip++; // blue
}
ocol += 2;
}
// Now encode to JPEG with turbo-jpeg
const int JPEG_QUALITY = 75;
long unsigned int jpegSize = 0;
unsigned char* compressedImage = NULL;
tjhandle _jpegCompressor = tjInitCompress();
// Compress in memory
tjCompress2(_jpegCompressor, img, width, 0, height, TJPF_RGB,
&compressedImage, &jpegSize, TJSAMP_444, JPEG_QUALITY,
TJFLAG_FASTDCT);
// Clean up
tjDestroy(_jpegCompressor);
// And write to disk
ofstream f("result.jpg", ios::out | ios::binary);
f.write (reinterpret_cast<char*>(compressedImage), jpegSize);
}
Notes:
Note 1: In order to pre-allocate the output image, the program needs to know how many frames are coming in advance - I did not parameterize that, I just hard-coded 10,000 but it should be easy enough to change.
One way to determine the number of frames in the video sequence is this:
ffprobe -v error -count_frames -select_streams v:0 -show_entries stream=nb_frames -of default=nokey=1:noprint_wrappers=1 video.mov
Note 2: Note that I compiled the code with a couple of switches for performance:
g++-8 -O3 -march=native stitch.cpp -o stitch
Note 3: If you are running on Windows, you may need to re-open stdin in binary mode before doing:
read(STDIN_FILENO...)
Note 4: If you don't want to use turbo-jpeg, you could remove everything after the end of the main loop, and simply send a NetPBM PPM image to ImageMagick via a pipe and let it do the JPEG writing. That would look like, very roughly:
writeToStdout("P6 20000 1920 255\n");
writeToStdout(img, width*height*3);
Then you would run with:
ffmpeg ... | ./stitch | magick ppm:- result.jpg
I generated a sample video of 10,000 frames, and did some tests. Obviously, my machine is not the same specification as yours, so results are not directly comparable, but I found it is quicker to let ffmpeg transpose the video and the pipe it into ImageMagick as raw RGB24 frames.
I found that I can convert a 10 second movie into a 20,000x1920 pixel JPEG in 10.3s like this:
ffmpeg -threads 4 -y -i video.mov -frames 10000 -vf "transpose=1" -f image2pipe -vcodec rawvideo -pix_fmt rgb24 - | convert -depth 8 -size 2x1920 rgb:- +append result.jpg
The resulting image looks like this:
I generated the video like this with CImg. Basically it just draws a Red/Green/Blue splodge successively further across the frame till it hits the right edge, then starts again at the left edge:
#include <iostream>
#include "CImg.h"
using namespace std;
using namespace cimg_library;
int main()
{
// Frame we will fill
CImg<unsigned char> frame(1920,2,1,3);
int width =frame.width();
int height=frame.height();
// Item to draw in frame - created with ImageMagick
// convert xc:red xc:lime xc:blue +append -resize 256x2\! splodge.ppm
CImg<unsigned char> splodge("splodge.ppm");
int offs =0;
// We are going to output 10000 frames of RGB raw video
for(int f=0;f<10000;f++){
// Generate white image
frame.fill(255);
// Draw coloured splodge at correct place
frame.draw_image(offs,0,splodge);
offs = (offs + 1) % (width - splodge.width());
// Output to ffmpeg to make video, in planar GBR format
// i.e. run program like this
// ./main | ffmpeg -y -f rawvideo -pixel_format gbrp -video_size 1920x2 -i - -c:v h264 -pix_fmt yuv420p video.mov
char* s=reinterpret_cast<char*>(frame.data()+(width*height)); // Get start of G plane
std::cout.write(s,width*height); // Output it
s=reinterpret_cast<char*>(frame.data()+2*(width*height)); // Get start of B plane
std::cout.write(s,width*height); // Output it
s=reinterpret_cast<char*>(frame.data()); // Get start of R plane
std::cout.write(s,width*height); // Output it
}
}
The splodge is 192x2 pixels and looks like this:

How to overlay two videos with blend filter in ffmpeg

I need to do a lot of videos with the next specifications:
A background video (bg.mp4)
Overlay a sequence of png images img1.png to img300.png (img%d.png) with a rate of 30 fps
Overlay a video with dust effects using a blend-lighten filter (dust.mp4)
Scale all the inputs to 1600x900 and if the input have not the aspect ratio, then crop them.
Specify the duration of the output-video to 10 sec (is the duration of image sequence at 30fps).
I've being doing a lot of test with different commands but always shows error.
Well, I think I got it in the next command:
ffmpeg -ss 00:00:18.300 -i music.mp3 -loop 1 -i bg.mp4 -i ac%d.png -i dust.mp4 -filter_complex "[1:0]scale=1600:ih*1200/iw, crop=1600:900[a];[a][2:0] overlay=0:0[b]; [3:0] scale=1600:ih*1600/iw, crop=1600:900,setsar=1[c]; [b][c] blend=all_mode='overlay':all_opacity=0.2" -shortest -y output.mp4
I'm going to explain in order to share what I've found:
Declaring the inputs:
ffmpeg -ss 00:00:18.300 -i music.mp3 -loop 1 -i bg.mp4 -i ac%d.png -i dust.mp4
Adding the filter complex. First part: [1,0] is the second element of the inputs (bg.mp4) and scaling to get the max values, and then cropping with the size I need, the result of this opperation, is in the [a] element.
[1:0]scale=1600:ih*1600/iw, crop=1600:900, setsar=1[a];
Second Part: Put the PNGs sequence over the resized video (bg.mp4, now [a]) and saving the resunt in the [b] element.
[a][2:0] overlay=0:0[b];
Scaling and cropping the fourth input (overlay.mp4) and saving in the [c] element.
[3:0]scale=1600:ih*1600/iw, crop=1600:900,setsar=1[c];
Mixing the first result with the overlay video with an "overlay" blending mode, and with an opacity of 0.1 because the video has gray tones and makes the result so dark.
[b][c] blend=all_mode='overlay':all_opacity=0.1
That's all.
If anyone can explay how this scaling filter works, I would thank a lot!
I needed to process a stack of images and was unable to get ffmpeg to work for me reliably, so I built a Python tool to help mediate the process:
#!/usr/bin/env python3
import functools
import numpy as np
import os
from PIL import Image, ImageChops, ImageFont, ImageDraw
import re
import sys
import multiprocessing
import time
def get_trim_box(image_name):
im = Image.open(image_name)
bg = Image.new(im.mode, im.size, im.getpixel((0,0)))
diff = ImageChops.difference(im, bg)
diff = ImageChops.add(diff, diff, 2.0, -100)
#The bounding box is returned as a 4-tuple defining the left, upper, right, and lower pixel coordinate. If the image is completely empty, this method returns None.
return diff.getbbox()
def rect_union(rect1, rect2):
left1, upper1, right1, lower1 = rect1
left2, upper2, right2, lower2 = rect2
return (
min(left1,left2),
min(upper1,upper2),
max(right1,right2),
max(lower1,lower2)
)
def blend_images(img1, img2, steps):
return [Image.blend(img1, img2, alpha) for alpha in np.linspace(0,1,steps)]
def make_blend_group(options):
print("Working on {0}+{1}".format(options["img1"], options["img2"]))
font = ImageFont.truetype(options["font"], size=options["fontsize"])
img1 = Image.open(options["img1"], mode='r').convert('RGB')
img2 = Image.open(options["img2"], mode='r').convert('RGB')
img1.crop(options["trimbox"])
img2.crop(options["trimbox"])
blends = blend_images(img1, img2, options["blend_steps"])
for i,img in enumerate(blends):
draw = ImageDraw.Draw(img)
draw.text(options["textloc"], options["text"], fill=options["fill"], font=font)
img.save(os.path.join(options["out_dir"],"out_{0:04}_{1:04}.png".format(options["blendnum"],i)))
if len(sys.argv)<3:
print("Syntax: {0} <Output Directory> <Images...>".format(sys.argv[0]))
sys.exit(-1)
out_dir = sys.argv[1]
image_names = sys.argv[2:]
pool = multiprocessing.Pool()
image_names = sorted(image_names)
image_names.append(image_names[0]) #So we can loop the animation
#Assumes image names are alphabetic with a UNIX timestamp mixed in.
image_times = [re.sub('[^0-9]','', x) for x in image_names]
image_times = [time.strftime('%Y-%m-%d (%a) %H:%M', time.localtime(int(x))) for x in image_times]
#Crop off the edges, assuming upper left pixel is representative of background color
print("Finding trim boxes...")
trimboxes = pool.map(get_trim_box, image_names)
trimboxes = [x for x in trimboxes if x is not None]
trimbox = functools.reduce(rect_union, trimboxes, trimboxes[0])
# #Put dates on images
testimage = Image.open(image_names[0])
font = ImageFont.truetype('DejaVuSans.ttf', size=90)
draw = ImageDraw.Draw(testimage)
tw, th = draw.textsize("2019-04-04 (Thu) 00:30", font)
tx, ty = (50, trimbox[3]-1.1*th) # starting position of the message
options = {
"blend_steps": 10,
"trimbox": trimbox,
"fill": (255,255,255),
"textloc": (tx,ty),
"out_dir": out_dir,
"font": 'DejaVuSans.ttf',
"fontsize": 90
}
#Generate pairs of images to blend
pairs = zip(image_names, image_names[1:])
#Tuple of (Image,Image,BlendGroup,Options)
pairs = [{**options, "img1": x[0], "img2": x[1], "blendnum": i, "text": image_times[i]} for i,x in enumerate(pairs)]
#Run in parallel
pool.map(make_blend_group, pairs)
This produces a series of images which can be made into a video like this:
ffmpeg -pattern_type glob -i "/z/out_*.png" -pix_fmt yuv420p -vf "pad=ceil(iw/2)*2:ceil(ih/2)*2" -r 30 /z/out.mp4

Decoded H.264 gives different frame and context size

We're using avcodec to decode H.264, and in some circumstances, after changing the resolution, avcodec gets confused, and gives two different sizes for the decoded frame:
if (av_init_packet_dll)
av_init_packet_dll(&avpkt);
avpkt.data = pBuffer;
avpkt.size = lBuffer;
// Make sure the output frame has NULLs for the data lines
pAVFrame->data[0] = NULL;
pAVFrame->data[1] = NULL;
pAVFrame->data[2] = NULL;
pAVFrame->data[3] = NULL;
res = avcodec_decode_video2_dll(pCodecCtx, pAVFrame, &FrameFinished, &avpkt);
DEBUG_LOG("Decoded frame: %d, %d, resulting dimensions: context: %dx%d, frame: %dx%d\n", res, FrameFinished, pCodecCtx->width, pCodecCtx->height, pAVFrame->width, pAVFrame->height);
if (pCodecCtx->width != pAVFrame->width || pCodecCtx->height != pAVFrame->height) {
OutputDebugStringA("Size mismatch, ignoring frame!\n");
FrameFinished = 0;
}
if (FrameFinished == 0)
OutputDebugStringA("Unfinished frame\n");
This results in this log (with some surrounding lines):
[5392] Decoded frame: 18690, 1, resulting dimensions: context: 640x480, frame: 640x480
[5392] Set dimensions to 640x480 in DecodeFromMap
[5392] checking size 640x480 against 640x480
[5392] Drawing 640x480, 640x480, 640x480, 0x05DB0060, 0x05DFB5C0, 0x05E0E360, 0x280, to surface 0x03198100, 1280x800
[5392] Drawing 640x480, 640x480, 640x480, 0x05DB0060, 0x05DFB5C0, 0x05E0E360, 0x280, to surface 0x03198100, 1280x800
[5392] Delayed frames seen. Reenabling low delay requires a codec flush.
[5392] Reinit context to 1280x800, pix_fmt: yuvj420p
*[5392] Decoded frame: 54363, 1, resulting dimensions: context: 1280x800, frame: 640x480
[5392] Set dimensions to 1280x800 in DecodeFromMap
[5392] checking size 1280x800 against 640x480
[5392] Found adapter NVIDIA GeForce GTX 650 ({D7B71E3E-4C86-11CF-4E68-7E291CC2C435}) on monitor 00020003
[5392] Found adapter NVIDIA GeForce GTX 650 ({D7B71E3E-4C86-11CF-4E68-7E291CC2C435}) on monitor FA650589
[5392] Creating Direct3D interface on adapter 1 at 1280x800 window 0015050C
[5392] Direct3D created using hardware vertex processing on HAL.
[5392] Creating D3D surface of 1280x800
[5392] Result 0x00000000, got surface 0x03210C40
[5392] Drawing 1280x800, 1280x800, 640x480, 0x02E3B0A0, 0x02E86600, 0x02E993A0, 0x280, to surface 0x03210C40, 1280x800
The line where this breaks is marked with a *. pAVFrame contains the old frame dimensions, while pCodecCtx contains the new dimensions. When the drawing code than tries to access the data as a 1280x800 image, it hits an access violation.
When going down a size, avcodec transitions correctly, and sets FrameFinished to 0 and leaves pAVFrame resolution at 0x0.
Can anyone think what is causing this, why avcodec is reporting success, yet not doing anything, and what I can do to correctly resolve this?
For now, the mismatch check is protecting against this.
The avcodec in use is built from git-5cba529 by Zeranoe.
FFmpeg version: 2015-03-31 git-5cba529
libavutil 54. 21.100 / 54. 21.100
libavcodec 56. 32.100 / 56. 32.100
AVCodecContext.width/height is not guaranteed to be identical to AVFrame.width/height. For any practical purpose, use AVFrame.width/height.
AVCodecContext.width/height is the size of the current state of the decoder, which may be several frames ahead of the AVFrame being returned to the user. Example: let's assume that you have a display sequence of IBPBP in any MPEG-style codec, which is coded as IPBPB. Let's assume that there was scalability, so each frame has a different size. When the P is consumed, it's not yet returned, but an earlier frame is returned instead. In this example, when P1 is decoded, nothing is returned, when B1 is decoded, it is returned (before P1), and when P2 is decoded, P1 is returned. If each P had a different size, this means when you're decoding P2, P1 is returned to the user, and thus AVCodecContext.w/h and AVFrame.w/h are different (since one reflects P2, yet the other reflects P1). Another example where this happens is when frame-level multithreading is enabled.
In all cases, rely on AVFrame.width/height, and ignore AVCodecContext.width/height.

Resources