Using ImageMagick to efficiently stitch together a line scan image - ffmpeg

I’m looking for alternatives for line scan cameras to be used in sports timing, or rather in the part where placing needs to be figured out. I found that common industrial cameras can readily match the speed of commercial camera solutions at >1000 frames per second. For my needs, usually the timing accuracy is not important, but the relative placing of athletes. I figured I could use one of the cheapest Basler, IDS or any other area scan industrial cameras for this purpose. Of course there are line scan cameras that can do a lot more than a few thousand fps (or hz), but it is possible to get area scan cameras that can do the required 1000-3000fps for less than 500€.
My holy grail would of course be the near-real time image composition capabilities of FinishLynx (or any other line scan system), basically this part: https://youtu.be/7CWZvFcwSEk?t=23s
The whole process I was thinking for my alternative is:
Use Basler Pylon Viewer (or other software) to record 2px wide images at the camera’s fastest read speed. For the camera I am
currently using it means it has to be turned on it’s side and the
height needs to be reduced, since it is the only way it will read
1920x2px frames # >250fps
Make a program or batch script that then stitches these 1920x2px frames together to, for example one second of recording 1000*1920x2px
frames, meaning a resulting image with a resolution of 1920x2000px
(Horizontal x Vertical).
Finally using the same program or another way, just rotate the image so it reflects how the camera is positioned, thus achieving an image
with a resolution of 2000x1920px (again Horizontal x Vertical)
Open the image in an analyzing program (currently ImageJ) to quickly analyze results
I am no programmer, but this is what I was able to put together just using batch scripts, with the help of stackoverflow of course.
Currently recording a whole 10 seconds for example to disk as a raw/mjpeg(avi/mkv) stream can be done in real time.
Recording individual frames as TIFF or BMP, or using FFMPEG to save them as PNG or JPG takes ~20-60 seconds The appending and rotation
then takes a further ~45-60 seconds
This all needs to be achieved in less than 60 seconds for 10 seconds of footage(1000-3000fps # 10s = 10000-30000 frames) , thus why I need something faster.
I was able to figure out how to be pretty efficient with ImageMagick:
magick convert -limit file 16384 -limit memory 8GiB -interlace Plane -quality 85 -append +rotate 270 “%folder%\Basler*.Tiff” “%out%”
#%out% has a .jpg -filename that is dynamically made from folder name and number of frames.
This command works and gets me 10000 frames encoded in about 30 seconds on a i5-2520m (most of the processing seems to be using only one thread though, since it is working at 25% cpu usage). This is the resulting image: https://i.imgur.com/OD4RqL7.jpg (19686x1928px)
However since recording to TIFF frames using Basler’s Pylon Viewer takes just that much longer than recording an MJPEG video stream, I would like to use the MJPEG (avi/mkv) file as a source for the appending. I noticed FFMPEG has “image2pipe” -command, which should be able to directly give images to ImageMagick. I was not able to get this working though:
$ ffmpeg.exe -threads 4 -y -i "Basler acA1920-155uc (21644989)_20180930_043754312.avi" -f image2pipe - | convert - -interlace Plane -quality 85 -append +rotate 270 "%out%" >> log.txt
ffmpeg version 3.4 Copyright (c) 2000-2017 the FFmpeg developers
built with gcc 7.2.0 (GCC)
configuration: –enable-gpl –enable-version3 –enable-sdl2 –enable-bzlib –enable-fontconfig –enable-gnutls –enable-iconv –enable-libass –enable-libbluray –enable-libfreetype –enable-libmp3lame –enable-libopenjpeg –enable-libopus –enable-libshine –enable-libsnappy –enable-libsoxr –enable-libtheora –enable-libtwolame –enable-libvpx –enable-libwavpack –enable-libwebp –enable-libx264 –enable-libx265 –enable-libxml2 –enable-libzimg –enable-lzma –enable-zlib –enable-gmp –enable-libvidstab –enable-libvorbis –enable-cuda –enable-cuvid –enable-d3d11va –enable-nvenc –enable-dxva2 –enable-avisynth –enable-libmfx
libavutil 55. 78.100 / 55. 78.100
libavcodec 57.107.100 / 57.107.100
libavformat 57. 83.100 / 57. 83.100
libavdevice 57. 10.100 / 57. 10.100
libavfilter 6.107.100 / 6.107.100
libswscale 4. 8.100 / 4. 8.100
libswresample 2. 9.100 / 2. 9.100
libpostproc 54. 7.100 / 54. 7.100
Invalid Parameter - -interlace
[mjpeg # 000000000046b0a0] EOI missing, emulating
Input #0, avi, from 'Basler acA1920-155uc (21644989)_20180930_043754312.avi’:
Duration: 00:00:50.02, start: 0.000000, bitrate: 1356 kb/s
Stream #0:0: Video: mjpeg (MJPG / 0x47504A4D), yuvj422p(pc, bt470bg/unknown/unknown), 1920x2, 1318 kb/s, 200 fps, 200 tbr, 200 tbn, 200 tbc
Stream mapping:
Stream #0:0 -> #0:0 (mjpeg (native) -> mjpeg (native))
Press [q] to stop, [?] for help
Output #0, image2pipe, to ‘pipe:’:
Metadata:
encoder : Lavf57.83.100
Stream #0:0: Video: mjpeg, yuvj422p(pc), 1920x2, q=2-31, 200 kb/s, 200 fps, 200 tbn, 200 tbc
Metadata:
encoder : Lavc57.107.100 mjpeg
Side data:
cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: -1
av_interleaved_write_frame(): Invalid argument
Error writing trailer of pipe:: Invalid argument
frame= 1 fps=0.0 q=1.6 Lsize= 0kB time=00:00:00.01 bitrate= 358.4kbits/s speed=0.625x
video:0kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000000%
Conversion failed!
If I go a bit higher for the height, I no longer get the “[mjpeg # 000000000046b0a0] EOI missing, emulating” -error. However the whole thing will only work with <2px high/wide footage.
edit: Oh yes, I can also use ffmpeg -i file.mpg -r 1/1 $filename%03d.bmp or ffmpeg -i file.mpg $filename%03d.bmp to extract all the frames from the MJPEG/RAW stream. However this is an extra step I do not want to take. (just deleting a folder of 30000 jpgs takes 2 minutes alone…)
Can someone think of a working solution for the piping method or a totally different alternative way of handling this?

I had another go at this to see if I could speed up my other answer by doing things a couple of different ways - hence a different answer. I used the same synthetic videoclip that I generated in the other answer to test with.
Rather than pass the 2x1920 scanlines into ImageMagick for it to append together and write as a JPEG, I did the following:
created the full output frame up front in a C++ program, and then looped reading in a 2x1920 scanline on each iteration and stuffed that into the correct position in the output frame, and
when the entire sequence has been read, compressed it into a JPEG using turbo-jpeg and wrote it disk.
As such, ImageMagick is no longer required. The entire program now runs in around 1.3 seconds rather than the 10.3 seconds via ImageMagick.
Here is the code:
////////////////////////////////////////////////////////////////////////////////
// stitch.cpp
// Mark Setchell
//
// Read 2x1920 RGB frames from `ffmpeg` and stitch into 20000x1920 RGB image.
////////////////////////////////////////////////////////////////////////////////
#include <iostream>
#include <fstream>
#include <stdio.h>
#include <unistd.h>
#include <turbojpeg.h>
using namespace std;
int main()
{
int frames = 10000;
int height = 1920;
int width = frames *2;
// Buffer in which to assemble complete output image (RGB), e.g. 20000x1920
unsigned char *img = new unsigned char [width*height*3];
// Buffer for one scanline image 1920x2 (RGB)
unsigned char *scanline = new unsigned char[2*height*3];
// Output column
int ocol=0;
// Read frames from `ffmpeg` fed into us like this:
// ffmpeg -threads 4 -y -i video.mov -frames 10000 -vf "transpose=1" -f image2pipe -vcodec rawvideo -pix_fmt rgb24 - | ./stitch
for(int f=0;f<10000;f++){
// Read one scanline from stdin, i.e. 2x1920 RGB image...
ssize_t bytesread = read(STDIN_FILENO, scanline, 2*height*3);
// ... and place into finished frame
// ip is pointer to input image
unsigned char *ip = scanline;
for(int row=0;row<height;row++){
unsigned char *op = &(img[(row*width*3)+3*ocol]);
// Copy 2 RGB pixels from scanline to output image
*op++ = *ip++; // red
*op++ = *ip++; // green
*op++ = *ip++; // blue
*op++ = *ip++; // red
*op++ = *ip++; // green
*op++ = *ip++; // blue
}
ocol += 2;
}
// Now encode to JPEG with turbo-jpeg
const int JPEG_QUALITY = 75;
long unsigned int jpegSize = 0;
unsigned char* compressedImage = NULL;
tjhandle _jpegCompressor = tjInitCompress();
// Compress in memory
tjCompress2(_jpegCompressor, img, width, 0, height, TJPF_RGB,
&compressedImage, &jpegSize, TJSAMP_444, JPEG_QUALITY,
TJFLAG_FASTDCT);
// Clean up
tjDestroy(_jpegCompressor);
// And write to disk
ofstream f("result.jpg", ios::out | ios::binary);
f.write (reinterpret_cast<char*>(compressedImage), jpegSize);
}
Notes:
Note 1: In order to pre-allocate the output image, the program needs to know how many frames are coming in advance - I did not parameterize that, I just hard-coded 10,000 but it should be easy enough to change.
One way to determine the number of frames in the video sequence is this:
ffprobe -v error -count_frames -select_streams v:0 -show_entries stream=nb_frames -of default=nokey=1:noprint_wrappers=1 video.mov
Note 2: Note that I compiled the code with a couple of switches for performance:
g++-8 -O3 -march=native stitch.cpp -o stitch
Note 3: If you are running on Windows, you may need to re-open stdin in binary mode before doing:
read(STDIN_FILENO...)
Note 4: If you don't want to use turbo-jpeg, you could remove everything after the end of the main loop, and simply send a NetPBM PPM image to ImageMagick via a pipe and let it do the JPEG writing. That would look like, very roughly:
writeToStdout("P6 20000 1920 255\n");
writeToStdout(img, width*height*3);
Then you would run with:
ffmpeg ... | ./stitch | magick ppm:- result.jpg

I generated a sample video of 10,000 frames, and did some tests. Obviously, my machine is not the same specification as yours, so results are not directly comparable, but I found it is quicker to let ffmpeg transpose the video and the pipe it into ImageMagick as raw RGB24 frames.
I found that I can convert a 10 second movie into a 20,000x1920 pixel JPEG in 10.3s like this:
ffmpeg -threads 4 -y -i video.mov -frames 10000 -vf "transpose=1" -f image2pipe -vcodec rawvideo -pix_fmt rgb24 - | convert -depth 8 -size 2x1920 rgb:- +append result.jpg
The resulting image looks like this:
I generated the video like this with CImg. Basically it just draws a Red/Green/Blue splodge successively further across the frame till it hits the right edge, then starts again at the left edge:
#include <iostream>
#include "CImg.h"
using namespace std;
using namespace cimg_library;
int main()
{
// Frame we will fill
CImg<unsigned char> frame(1920,2,1,3);
int width =frame.width();
int height=frame.height();
// Item to draw in frame - created with ImageMagick
// convert xc:red xc:lime xc:blue +append -resize 256x2\! splodge.ppm
CImg<unsigned char> splodge("splodge.ppm");
int offs =0;
// We are going to output 10000 frames of RGB raw video
for(int f=0;f<10000;f++){
// Generate white image
frame.fill(255);
// Draw coloured splodge at correct place
frame.draw_image(offs,0,splodge);
offs = (offs + 1) % (width - splodge.width());
// Output to ffmpeg to make video, in planar GBR format
// i.e. run program like this
// ./main | ffmpeg -y -f rawvideo -pixel_format gbrp -video_size 1920x2 -i - -c:v h264 -pix_fmt yuv420p video.mov
char* s=reinterpret_cast<char*>(frame.data()+(width*height)); // Get start of G plane
std::cout.write(s,width*height); // Output it
s=reinterpret_cast<char*>(frame.data()+2*(width*height)); // Get start of B plane
std::cout.write(s,width*height); // Output it
s=reinterpret_cast<char*>(frame.data()); // Get start of R plane
std::cout.write(s,width*height); // Output it
}
}
The splodge is 192x2 pixels and looks like this:

Related

JavaCV overlay FFmpegFrameFilter not working

When I use overlay in JavaCV to overlay two videos, it does'nt work.
video1 info:Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuv420p(progressive), 1512x982, 2208 kb/s, SAR 1:1 DAR 756:491, 29.37 fps, 29.42 tbr, 16k tbn (default)
video2 info: png (png / 0x20676E70), rgba(pc, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 470102 kb/s, 50 fps, 50 tbr, 12800 tbn (default)
video2 has a transparent background, and it was converted with a chromakey filter.
I can do it with ffmpeg command:
ffmpeg -i bg.mp4 -i out1.mov -filter_complex overlay xx.mp4
But when I use JavaCV to do the same thing, it does'nt work. The output video lost color, and it was divided into three equal parts, each with the same content.
Here is the code:
String bgMp4 = "bg.mp4";
String mov = "front.mov";
FFmpegFrameGrabber video1 = new FFmpegFrameGrabber(bgMp4);
FFmpegFrameGrabber video2 = new FFmpegFrameGrabber(mov);
// Open the videos for reading
video1.start();
video2.start();
// Create an instance of the FFmpegFrameFilter class to handle the video overlay
FFmpegFrameFilter overlayFilter = new FFmpegFrameFilter("[0:v][1:v]overlay=10:10[v]", video1.getImageWidth(), video1.getImageHeight());
overlayFilter.setVideoInputs(2);
overlayFilter.start();
FFmpegFrameRecorder recorder = new FFmpegFrameRecorder("out.mp4", video1.getImageWidth(), video1.getImageHeight());
recorder.setFrameRate(video1.getFrameRate());
// wether AV_CODEC_ID_PNG or AV_CODEC_ID_H264, not work.
recorder.setVideoCodec(avcodec.AV_CODEC_ID_H264);
recorder.setPixelFormat(avutil.AV_PIX_FMT_YUV420P);
recorder.start();
// Read in the frames from the input videos and apply the overlay filter
Frame frame1, frame2;
while ((frame1 = video1.grabImage()) != null && (frame2 = video2.grabImage()) != null) {
overlayFilter.push(0, frame1);
overlayFilter.push(1, frame2);
Frame frame = overlayFilter.pullImage();
recorder.record(frame);
}
// Close the videos and filter
video1.stop();
video2.stop();
overlayFilter.stop();
recorder.stop();

How to get actual PTS and DTS after frame decoding with libavcodec of FFmpeg

C++, FFmpeg, libavcodec, video frames decoding.
To parse data from the network and split it into frames, I use the av_parser_parse2(_parser, _context, &_packet_encoded_frame->data, &_packet_encoded_frame->size, data, data_size, pts ? pts : AV_NOPTS_VALUE, dts ? dts : AV_NOPTS_VALUE, 0) function to which I pass non-zero values PTS and DTS as inputs.
Next, I use the avcodec_send_packet(_context, _packet_encoded_frame) and avcodec_receive_frame(_context, _frame) functions to decode the frames.
Why do I ALWAYS have PTS(_frame->pts) and DTS(_frame->pkt_dts) equal to AV_NOPTS_VALUE after SUCCESSFUL frame decoding? How can I get the actual PTS and DTS values of that frame after decoding a frame? What am I doing wrong?

Detect if video is black and white in bash

I have a folder with hundreds of films, and I'd like to separate the color ones from black and white. Is there a bash command to do this for general video files?
I already extract a frame:
ffmpeg -ss 00:15:00 -i vid.mp4 -t 1 -r 1/1 image.bmp
How can I check if the image has a color component?
I never found out why video processing questions are answered on SO but as they typically are not closed, i'll do my best... As this is a developer board, i cannot recommend any ready commandline tool to use for your bash command, nor do i know any. Also i cannot give a bash only solution because i do not know how to process binary data in bash.
To get out if an image is grey or not, you'll need to check each pixel for it's color and "guess" if it is kind of grey. As others say in the comments, you will need to analyze multiple pictures of each video to get a more accurete result. For this you could possibly use the scene change detection filter of ffmpeg but thats another topic.
I'd start by resizing the image to save processing power, e.g. to 4x4 pixels. Also make sure you guarantee the colorspace or better pix_format is known so you know what a pixel looks like.
Using this ffmpeg line, you would extract one frame in 4x4 pixels to raw RGB24:
ffmpeg -i D:\smpte.mxf -pix_fmt rgb24 -t 1 -r 1/1 -vf scale=4:4 -f rawvideo d:\out_color.raw
The resulting file contains exactly 48 bytes, 16 pixels each 3 bytes, representing R,G,B color. To check if all pixels are gray, you need to compare the difference between R G and B. Typically R G and B have the same value when they are gray, but in reality you will need to allow some more fuzzy matching, e.g. if all values are the same +-10.
Some example perl code:
use strict;
use warnings;
my $fuzz = 10;
my $inputfile ="d:\\out_grey.raw";
die "input file is not an RGB24 raw picture." if ( (-s $inputfile) %3 != 0);
open (my $fh,$inputfile);
binmode $fh;
my $colordetected = 0;
for (my $i=0;$i< -s $inputfile;$i+=3){
my ($R,$G,$B);
read ($fh,$R,1);
$R = ord($R);
read ($fh,$B,1);
$B = ord($B);
read ($fh,$G,1);
$G = ord($G);
if ( $R >= $B-$fuzz and $R <= $B+$fuzz and $B >= $G-$fuzz and $B <= $G+$fuzz ) {
#this pixel seems gray
}else{
$colordetected ++,
}
}
if ($colordetected != 0){
print "There seem to be colors in this image"
}

Imagemagick & Pillow generate malformed GIF frames

I need to extract the middle frame of a gif animation.
Imagemagick:
convert C:\temp\orig.gif -coalesce C:\temp\frame.jpg
generates the frames properly:
However when I extract a single frame:
convert C:\temp\orig.gif[4] -coalesce C:\temp\frame.jpg
then the frame is malformed, as if the -coalesce option was ignored:
Extraction of individual frames with Pillow and ffmpeg also results in malformed frames, tested on a couple of gifs.
Download gif: https://i.imgur.com/Aus8JpT.gif
I need to be able to extract middle frames of every gif version in either PIL, Imagemagick of ffmpeg (ideally PIL).
You are attempting to coalesce a single input image into single output image. What you got is what you asked for.
Instead you should "flatten" frames 0-4 into a single output image:
convert C:\temp\orig.gif[0-4] -flatten C:\temp\frame.jpg
If you use "-coalesce" you'll get 5 frames of output in frame-0.jpg through frame-4.jpg, the last of them being the image you wanted.
Ok, this script will find and save the middle frame of an animated GIF using Pillow.
It will also display the duration of the GIF by counting the milliseconds of each frame.
from PIL import Image
def iter_frames(im):
try:
i = 0
while 1:
im.seek(i)
frame = im.copy()
if i == 0:
# Save pallete of the first frame
palette = frame.getpalette()
else:
# Copy the pallete to the subsequent frames
frame.putpalette(palette)
yield frame
i += 1
except EOFError: # End of gif
pass
im = Image.open('animated.gif')
middle_frame_pos = int(im.n_frames / 2)
durations = []
for i, frame in enumerate(iter_frames(im)):
if i == middle_frame_pos:
middle_frame = frame.copy()
try:
durations.append(frame.info['duration'])
except KeyError:
pass
middle_frame.save('middle_frame.png', **frame.info)
duration = float("{:.2f}".format(sum(durations)))
print('Total duration: %d ms' % (duration))
Helpful code:
Python: Converting GIF frames to PNG
https://github.com/alimony/gifduration
You can do it like this:
convert pour.gif -coalesce -delete 0-3,5-8 frame4.png
Basically, it generates in full, all the frames and then deletes all frames other than from 4.

Decoded H.264 gives different frame and context size

We're using avcodec to decode H.264, and in some circumstances, after changing the resolution, avcodec gets confused, and gives two different sizes for the decoded frame:
if (av_init_packet_dll)
av_init_packet_dll(&avpkt);
avpkt.data = pBuffer;
avpkt.size = lBuffer;
// Make sure the output frame has NULLs for the data lines
pAVFrame->data[0] = NULL;
pAVFrame->data[1] = NULL;
pAVFrame->data[2] = NULL;
pAVFrame->data[3] = NULL;
res = avcodec_decode_video2_dll(pCodecCtx, pAVFrame, &FrameFinished, &avpkt);
DEBUG_LOG("Decoded frame: %d, %d, resulting dimensions: context: %dx%d, frame: %dx%d\n", res, FrameFinished, pCodecCtx->width, pCodecCtx->height, pAVFrame->width, pAVFrame->height);
if (pCodecCtx->width != pAVFrame->width || pCodecCtx->height != pAVFrame->height) {
OutputDebugStringA("Size mismatch, ignoring frame!\n");
FrameFinished = 0;
}
if (FrameFinished == 0)
OutputDebugStringA("Unfinished frame\n");
This results in this log (with some surrounding lines):
[5392] Decoded frame: 18690, 1, resulting dimensions: context: 640x480, frame: 640x480
[5392] Set dimensions to 640x480 in DecodeFromMap
[5392] checking size 640x480 against 640x480
[5392] Drawing 640x480, 640x480, 640x480, 0x05DB0060, 0x05DFB5C0, 0x05E0E360, 0x280, to surface 0x03198100, 1280x800
[5392] Drawing 640x480, 640x480, 640x480, 0x05DB0060, 0x05DFB5C0, 0x05E0E360, 0x280, to surface 0x03198100, 1280x800
[5392] Delayed frames seen. Reenabling low delay requires a codec flush.
[5392] Reinit context to 1280x800, pix_fmt: yuvj420p
*[5392] Decoded frame: 54363, 1, resulting dimensions: context: 1280x800, frame: 640x480
[5392] Set dimensions to 1280x800 in DecodeFromMap
[5392] checking size 1280x800 against 640x480
[5392] Found adapter NVIDIA GeForce GTX 650 ({D7B71E3E-4C86-11CF-4E68-7E291CC2C435}) on monitor 00020003
[5392] Found adapter NVIDIA GeForce GTX 650 ({D7B71E3E-4C86-11CF-4E68-7E291CC2C435}) on monitor FA650589
[5392] Creating Direct3D interface on adapter 1 at 1280x800 window 0015050C
[5392] Direct3D created using hardware vertex processing on HAL.
[5392] Creating D3D surface of 1280x800
[5392] Result 0x00000000, got surface 0x03210C40
[5392] Drawing 1280x800, 1280x800, 640x480, 0x02E3B0A0, 0x02E86600, 0x02E993A0, 0x280, to surface 0x03210C40, 1280x800
The line where this breaks is marked with a *. pAVFrame contains the old frame dimensions, while pCodecCtx contains the new dimensions. When the drawing code than tries to access the data as a 1280x800 image, it hits an access violation.
When going down a size, avcodec transitions correctly, and sets FrameFinished to 0 and leaves pAVFrame resolution at 0x0.
Can anyone think what is causing this, why avcodec is reporting success, yet not doing anything, and what I can do to correctly resolve this?
For now, the mismatch check is protecting against this.
The avcodec in use is built from git-5cba529 by Zeranoe.
FFmpeg version: 2015-03-31 git-5cba529
libavutil 54. 21.100 / 54. 21.100
libavcodec 56. 32.100 / 56. 32.100
AVCodecContext.width/height is not guaranteed to be identical to AVFrame.width/height. For any practical purpose, use AVFrame.width/height.
AVCodecContext.width/height is the size of the current state of the decoder, which may be several frames ahead of the AVFrame being returned to the user. Example: let's assume that you have a display sequence of IBPBP in any MPEG-style codec, which is coded as IPBPB. Let's assume that there was scalability, so each frame has a different size. When the P is consumed, it's not yet returned, but an earlier frame is returned instead. In this example, when P1 is decoded, nothing is returned, when B1 is decoded, it is returned (before P1), and when P2 is decoded, P1 is returned. If each P had a different size, this means when you're decoding P2, P1 is returned to the user, and thus AVCodecContext.w/h and AVFrame.w/h are different (since one reflects P2, yet the other reflects P1). Another example where this happens is when frame-level multithreading is enabled.
In all cases, rely on AVFrame.width/height, and ignore AVCodecContext.width/height.

Resources