I am right now using ffmpeg library to extract image from 1080i YUV 422 raw file. As i use interlaced data, it will drop some lines when i extracct image from one frame of video, Is it possible to merge two or three frame and make a single high definition image? Please guide me to move forward
Related
I am getting a list of small rentangle images with contain the parts of the image that have changed from the previous image. This results from the desktop image capture with directx11 which provides what parts of the desktop image have changed and the rectangles from them.
I am trying to figure out if I can pass them to ffmpeg libavcodecs encoder for h.264. I looked into AVFrame and didn't see a way to specify the actual parts that have changed from the previous image.
Is there a way to actually do this, when passing an image to the ffmpeg codecContext to encode it in the video, to just pass the changed parts from the previous frame? Maybe doing this will reduce the amount of CPU usage because this is for a live stream.
I use the standard avcodec_send_frame to send a frame to the codec for encoding, it only has an AVframe and a codec context as parameters.
I've got a video out of OBS that play's normally on my system if I open it with VLC for example, but when I import it into my editor (Adobe Premiere) it gets weirdly cropped down. When inspecting the data for the video it's because for some reason the video gets encoded with a new width and height over top of the old one! Is there a way using ffmpeg to re-encode/transcode the video to a new file with only the original width and height?
Bonus question: would there be a way for me to extract the audio channels from my video as separate .mp3s? There are 4 audio channels on the video
Every time you reencode a video you will lose quality. Scaling the video up will not reintroduce details that were lost when it was scaled down.
I am stucked by a video processing feature, Specifically, upload an image and then generate a video based on various video templates.
Here are the video templates:
http://video-static.biku8.com/data/video/template/3286012076458048/7437ab55-2e83-4a36-9046-5708fcddf4c1.mp4
http://video-static.biku8.com/data/video/template/3274256089907264/ae8fa3f7-6c9c-45ca-810f-48db92cc14cb.mp4
http://video-static.biku8.com/data/video/template/3213894231425088/bf107d439b9043a58c1ea0ba26f811db_template.mp4
...
As shown in the video templates above, I just need to upload a photo to generate a great video.
My question
What is the specific idea for implementing this video?
Which third-party libraries are needed? (ffmpeg, opencv)
PS: I am using dlib and opencv for face recognition. I can generate face image, but I don't know how to insert face image into the correct position of these template videos.
I would suggest you to follow the below 3 steps
Load the template video by opencv, you can access the video frame by frame
Modify each frame, one by one.
Save frame to video stream writer
Regarding step 2, actually, you must copy the uploaded image to the each frame by a mask (the pixel from source image would be copied to destination image if its coordinate on the mask is non-black). The mask could be defined by a list of points OR by an image. You should pre-define a mask for each frame in a file. Then load the mask for each frame and copy.
How to read video, save video OpenCV read-write Video
How to insert image to another image Copy non rectangular ROI
Generating videos like them are all not easy tasks. I recommend to use Adobe After Effects or other video creating software (with some scripts and actions) if you don't need to generate it by a single program or program language.
Then, I answer them below when you need to generate it by programatically.
For the first one, you should recognize faces and bones. So you should use OpenCV. ( I recommend to use tools like OpenFrameworks or TouchDesigner and so on. )
For the second one, I don't know what you exactly want, but if you want to recognize the position of the bottle dynamically, you have to use deep learning or other way to detect it. Then you may need TensorFlow or OpenCV. ( If you just want to merge layers, you can use ffmpeg etc. )
For the last one, you should split the video frame into the boxes, then you have to control. I think there are many ways to implement this. I may use OpenFrameworks, TouchDesigner, vvvv, or Processing.
I think using ffmpeg for them is not recommended. This tool is not the best for generating complicated video. But ffmpeg will do good, for example if you just merge two videos with alpha.
I have been looking for a way to convert a sequence of PNGs to a video. There are ways to do that using the CONCAT function within FFmpeg and using a script.
The problem is that I want to show certain images longer than others. And I need it to be accurate. I can set a duration (in seconds) in the script file. But I need it to be frame-accurate. So far I have not been successful.
This is what I want to make:
Quicktime video with transparancy (Prores4444 or other codec that supports transparancy + alpha channel)
25fps
This is what I have: [ TimecodeIn - TimecodeOut in destination video ]
img001.png [0:00:05:10 - 0:00:07:24]
img002.png [0:00:09:02 - 0:00:12:11]
img003.png [0:00:15:00 - 0:00:17:20]
...
img120.png [0:17:03:11 - 0:17:07:01]
Of course this is not the format of the script file. Just an idea about what kind of data I am dealing with. The PNG-imagefiles are subtitles I generate elsewhere in my application. I would like to be able to export the subtitles as a transparent movie that I can easily import in my video editing software.
I also have been thinking of using blank transparent images I will use as spacers, between the actual subtitle images.
After looking around I think this might help:
On the FFMPEG site they explain about making a timed slideshow
In the Concat demuxer section they talk about making a slideshow, based on a text file, with references to the image files and the duration of the image.
So, I create all the PNG images I need. These images have the subtitle text. Each image holds one subtitle page.
For the moments I want to hide the subtitle, I use a blank PNG.
I generate a text file as explained on the FFMPEG website.
This text file will reference to all the PNGs. For the duration I just calculate the outcue - incue. Easy... I think...
I have some images that were taken from a video via screen capture. I would like to know when in the video these images appear (timestamps). Is there a way to programmatically match an image with a specific frame in a video using ffmpeg or some other tool?
I am very open to different technologies as I'm eager to automate this. It would be extremely time consuming to do this manually.
You can get the psnr between that image and each frame in the video, and the match is the frame with the highest psnr. ffmpeg has a tool to calculate the psnr in tests/tiny_psnr which you can use to script this together, or there's also a psnr filter in the libavfilter module in ffmpeg if you prefer to code rather than script.
Scripting, you'd basically decode the video to a FIFO, decode the image to a file, and then match the FIFO frames repeatedly against the image file using tiny_psnr, selecting the framenumber for the frame with highest psnr. The output will be a frame-number, which (using fps output on the commandline) you can approximately convert to a timestamp.
Programming-wise, you'd decode the the video and image to AVFrame, use the psnr filter to compare the two, and then look at the output frame metadata to record the psnr value in your program, and search for the frame with the highest metadata psnr value, and for that frame, AVFrame->pkt_pts would be the timestamp.