Creating video from images using Microsoft media foundation - winapi

Is there a way to create video of pre defined duration from a set of images using Microsoft media foundation?
Say for example, I have 50 images, can Is there a way to use those 50 images and create a video of say 50 seconds or 100 seconds ?

Sink Writer API is exactly for this.
You set it up and start feeding with images (such as RGB data) and associated time data. The respectively configured API will create a pipeline to accept images, convert and encode them as necessary, write into file such as MP4.
MSDN article links tutorial at the bottom of the page. You will find other questions on StackOverflow which also reference this tutorial.


How to add a Poster Frame to an MP4 video by timecode?

The mvhd atom or box of the original Quicktime MOV format supports a poster time variable for a timecode to use as a poster frame that can be used in preview scenarios as a thumbnail image or cover picture. As far as I can tell, the ISOBMFF-based MP4 format (.m4v) has inherited this feature, but I cannot find a way to set it using FFmpeg or MP4box or similar cross-platform CLI software. Edit: Actually, neither ISOBMFF nor MP4 imports this feature from MOV. Is there any other way to achieve this, e.g. using something like HEIFʼs derived images with a thmb (see Amendment 2) role?
The original Apple Quicktime (Pro) editor did have a menu option for doing just that. (Apple Compressor and Photos could do it, too).
To be clear, I do not want to attach a separate image file, which could possibly be a screenshot grabbed from a movie still, as a separate track to the multimedia container. I know how to do that:
Stackoverflow #54717175
Superuser #597945
I also know that some people used to copy the designated poster frame from its original position to the very first frame, but many automatically generated previews use a later time index, e.g. from 10 seconds, 30 seconds, 10% or 50% into the video stream.

Upload a picture to generate a video with special effects

I am stucked by a video processing feature, Specifically, upload an image and then generate a video based on various video templates.
Here are the video templates:
As shown in the video templates above, I just need to upload a photo to generate a great video.
My question
What is the specific idea for implementing this video?
Which third-party libraries are needed? (ffmpeg, opencv)
PS: I am using dlib and opencv for face recognition. I can generate face image, but I don't know how to insert face image into the correct position of these template videos.
I would suggest you to follow the below 3 steps
Load the template video by opencv, you can access the video frame by frame
Modify each frame, one by one.
Save frame to video stream writer
Regarding step 2, actually, you must copy the uploaded image to the each frame by a mask (the pixel from source image would be copied to destination image if its coordinate on the mask is non-black). The mask could be defined by a list of points OR by an image. You should pre-define a mask for each frame in a file. Then load the mask for each frame and copy.
How to read video, save video OpenCV read-write Video
How to insert image to another image Copy non rectangular ROI
Generating videos like them are all not easy tasks. I recommend to use Adobe After Effects or other video creating software (with some scripts and actions) if you don't need to generate it by a single program or program language.
Then, I answer them below when you need to generate it by programatically.
For the first one, you should recognize faces and bones. So you should use OpenCV. ( I recommend to use tools like OpenFrameworks or TouchDesigner and so on. )
For the second one, I don't know what you exactly want, but if you want to recognize the position of the bottle dynamically, you have to use deep learning or other way to detect it. Then you may need TensorFlow or OpenCV. ( If you just want to merge layers, you can use ffmpeg etc. )
For the last one, you should split the video frame into the boxes, then you have to control. I think there are many ways to implement this. I may use OpenFrameworks, TouchDesigner, vvvv, or Processing.
I think using ffmpeg for them is not recommended. This tool is not the best for generating complicated video. But ffmpeg will do good, for example if you just merge two videos with alpha.

Render images progressively in a MFC based application

Browser can render progressive images progressively.
And the images can only be progressively decoded if they were progressively encoded.
e.g., GIF or PNG images saved with the "interlaced" option, or JPEG images saved with the "progressive" option.
I want to render the progressive images in my MFC based application just like the browser does.
Windows Imaging Component provide IWICProgressiveLevelControl interface to decode image progressively.
But I can't find out any example to show how to stream and display image progressively at the same time using IWICProgressiveLevelControl.
Any advice would be appreciated. Thanks.
There's a good sample here:
Once you've used IWICProgressiveLevelControl::SetCurrentLevel to select the scan, the decoder will behave normally but only use the scans up to and including the one you selected. So any call to CopyPixels or any IWICBitmapSource components in your chain will receive the fully decoded image at the selected scan level.
The trick, as demonstrated in the sample, is that you can't use IWICProgressiveLevelControl::GetLevelCount and select the max level immediately if you don't know the complete file is available. As the documentation for the sample states,
IWICProgressiveLevelControl allows you to control which progressive level of detail to use on the frame decode. It also allows you to query the total number of progressive levels in the file; however it is not recommended to use this method on JPEG images because the total count is not known until the entire image has been downloaded, defeating the purpose of progressive decode. Instead, this sample demonstrates the recommended practice of iteratively requesting increasing levels of detail until WIC returns WINCODEC_ERR_INVALIDPROGRESSIVELEVEL.

Media Foundation video decoding

I'm using Media Foundation and the IMFSampleGrabberSinkCallback to playback video files and render them to a texture. I am able to get video samples in the IMFSampleGrabberSinkCallback::OnProcessSample method, but those samples are compressed. I have way less samples than I have pixels in my render target. According to this, the media session should load any decoder that is needed (if available), but that does not seem to be the case. Even if I create the decoder and add it to the topology myself, the video samples are still compressed. Is there anything in particular I am missing here ?

Where does directshow get image dimensions from?

We are using a directshow interface to capture images from a video stream. These images are presented in a fixed size window.
Once we have captured an image we store it as a bitmap. Downstream we have the ability to add annotation to the image, for example letters in a fixed size font.
In one of our desktop environments, the annotation has started appearing at half the size that it normally appears at. This implies that the image we are merging the text onto has dimensions that are maybe twice as large.
The system that this happens on is a shared resource as in some unknown individual has installed software on the system that differs from our baseline.
We have two approaches - the 1st is to reimage the system to get our default text size behaviour back. The 2nd is to figure out how directshow manages image dimensions so that we can set the scaling on the image correctly.
A survey of the directshow literature indicates that the above is not a trivial task. The original work was done by another team that did not document what they did. Can anybody point us in the direction of what directshow object we want to deal with to properly size the sampled image?
DirectShow - as a framework - does not deal with resolutions directly. Your video source (such as capture hardware) is capable of providing video feed in certain resolution which you possibly can change. You normally use IAMStreamConfig as described in Configure the Video Output Format in order to choose capture resolution.
Sometimes you cannot affect capture resolution and you need to resample the image in whatever dimensions you captured it. There is no stock filter for this, however Media Foundation provides a suitable Video Resizer DSP which does most of the task. Unfortunately it does not fit DirectShow pipeline smoothly, so you need fitting and/or custom filter for resizing.
When filters connect in DirectShow, they have an AM_MEDIA_TYPE. Here you will find a VIDEOINFOHEADER with a BITMAPINFOHEADER and this header has a biWidth and biHeight.
Try to build the FilterGraph manually (with GraphEdit or GraphStudioNext) and inspect these fields.
