Where does directshow get image dimensions from? - image

We are using a directshow interface to capture images from a video stream. These images are presented in a fixed size window.
Once we have captured an image we store it as a bitmap. Downstream we have the ability to add annotation to the image, for example letters in a fixed size font.
In one of our desktop environments, the annotation has started appearing at half the size that it normally appears at. This implies that the image we are merging the text onto has dimensions that are maybe twice as large.
The system that this happens on is a shared resource as in some unknown individual has installed software on the system that differs from our baseline.
We have two approaches - the 1st is to reimage the system to get our default text size behaviour back. The 2nd is to figure out how directshow manages image dimensions so that we can set the scaling on the image correctly.
A survey of the directshow literature indicates that the above is not a trivial task. The original work was done by another team that did not document what they did. Can anybody point us in the direction of what directshow object we want to deal with to properly size the sampled image?

DirectShow - as a framework - does not deal with resolutions directly. Your video source (such as capture hardware) is capable of providing video feed in certain resolution which you possibly can change. You normally use IAMStreamConfig as described in Configure the Video Output Format in order to choose capture resolution.
Sometimes you cannot affect capture resolution and you need to resample the image in whatever dimensions you captured it. There is no stock filter for this, however Media Foundation provides a suitable Video Resizer DSP which does most of the task. Unfortunately it does not fit DirectShow pipeline smoothly, so you need fitting and/or custom filter for resizing.

When filters connect in DirectShow, they have an AM_MEDIA_TYPE. Here you will find a VIDEOINFOHEADER with a BITMAPINFOHEADER and this header has a biWidth and biHeight.
Try to build the FilterGraph manually (with GraphEdit or GraphStudioNext) and inspect these fields.


Upload a picture to generate a video with special effects

I am stucked by a video processing feature, Specifically, upload an image and then generate a video based on various video templates.
Here are the video templates:
As shown in the video templates above, I just need to upload a photo to generate a great video.
My question
What is the specific idea for implementing this video?
Which third-party libraries are needed? (ffmpeg, opencv)
PS: I am using dlib and opencv for face recognition. I can generate face image, but I don't know how to insert face image into the correct position of these template videos.
I would suggest you to follow the below 3 steps
Load the template video by opencv, you can access the video frame by frame
Modify each frame, one by one.
Save frame to video stream writer
Regarding step 2, actually, you must copy the uploaded image to the each frame by a mask (the pixel from source image would be copied to destination image if its coordinate on the mask is non-black). The mask could be defined by a list of points OR by an image. You should pre-define a mask for each frame in a file. Then load the mask for each frame and copy.
How to read video, save video OpenCV read-write Video
How to insert image to another image Copy non rectangular ROI
Generating videos like them are all not easy tasks. I recommend to use Adobe After Effects or other video creating software (with some scripts and actions) if you don't need to generate it by a single program or program language.
Then, I answer them below when you need to generate it by programatically.
For the first one, you should recognize faces and bones. So you should use OpenCV. ( I recommend to use tools like OpenFrameworks or TouchDesigner and so on. )
For the second one, I don't know what you exactly want, but if you want to recognize the position of the bottle dynamically, you have to use deep learning or other way to detect it. Then you may need TensorFlow or OpenCV. ( If you just want to merge layers, you can use ffmpeg etc. )
For the last one, you should split the video frame into the boxes, then you have to control. I think there are many ways to implement this. I may use OpenFrameworks, TouchDesigner, vvvv, or Processing.
I think using ffmpeg for them is not recommended. This tool is not the best for generating complicated video. But ffmpeg will do good, for example if you just merge two videos with alpha.

How can I overlay an image onto a video

How can I overlay an image onto a video without changing the video file?
I have many videos and I want to be able to open them and overlay a ruler onto them and then measure the distance an individual moved visually. All I want is to play a video and then to open up an image with some transparency and position the image over the video. This way i would be able to look at the video and see how far the individual moved.
I would like to do this without having to embed the image like a watermark, because that is computationally expensive. I would need to copy the video, embed it with the ruler and then watch the video, then delete that video file. This seems unnecessary. I would like to just watch the video and have a transparent image over it while I a watching.
Is there a program that does this all together?
Alternatively, is there a program which I can use to open an image and make it transparent and then move it over the video that is playing?
Note: I am using Windows.
It sounds form your requirements that simply overlaying a separate image layer over the video will meet your needs.
Implementing this approach will depend on the video player client you are using, but you could implement an HTML5 based solution and play the videos locally with this (or even from a URL on the web if you have them there).
There is a nice answer with a working fiddle which shows how to do this with HTML5 here: https://stackoverflow.com/a/31175193/334402
One thing to note - you have not mentioned scale in your question. If you need to measure how far the person has moved in real distance, rather than in just cm's across the video screen, then you will need to somehow work out the scale of the video. This makes things considerably harder as the video may zoom in and out during the sequence you want to measure, so you would need some reference to calculate the scale for each frame. One approach would be to use the individual as a reference, assuming they are in all the frames you are interested in.
What about using good old VLC for that?
Open VLC go to Tools→Effects and Filters→Video Effects→Overlay and select Add logo checkbox:
Then, add your transparent overlay image and play any video with VLC. The output looks like this:

Render images progressively in a MFC based application

Browser can render progressive images progressively.
And the images can only be progressively decoded if they were progressively encoded.
e.g., GIF or PNG images saved with the "interlaced" option, or JPEG images saved with the "progressive" option.
I want to render the progressive images in my MFC based application just like the browser does.
Windows Imaging Component provide IWICProgressiveLevelControl interface to decode image progressively.
But I can't find out any example to show how to stream and display image progressively at the same time using IWICProgressiveLevelControl.
Any advice would be appreciated. Thanks.
There's a good sample here:
Once you've used IWICProgressiveLevelControl::SetCurrentLevel to select the scan, the decoder will behave normally but only use the scans up to and including the one you selected. So any call to CopyPixels or any IWICBitmapSource components in your chain will receive the fully decoded image at the selected scan level.
The trick, as demonstrated in the sample, is that you can't use IWICProgressiveLevelControl::GetLevelCount and select the max level immediately if you don't know the complete file is available. As the documentation for the sample states,
IWICProgressiveLevelControl allows you to control which progressive level of detail to use on the frame decode. It also allows you to query the total number of progressive levels in the file; however it is not recommended to use this method on JPEG images because the total count is not known until the entire image has been downloaded, defeating the purpose of progressive decode. Instead, this sample demonstrates the recommended practice of iteratively requesting increasing levels of detail until WIC returns WINCODEC_ERR_INVALIDPROGRESSIVELEVEL.

Better thumbnail creation of raw images

I'm building a web application (RoR) that manages images that are in raw image format. I need to create thumbnail/web versions of these images to be displayed on the site. Currently, I'm using imagemagick, which delegates to dcraw to produce the jpeg thumbnail. The problem I'm running into is that the thumbnail deviates from the look of the original; the image gets darker and the white balance is sometimes heavily shifted.
I'm assuming that the raw format default setting can't be read by dcraw, and thus it's left guessing how to parameterize the raw conversion. I can play around with customizing these setting, but it seems getting it right on one image causes others to be further off the mark.
Is there a better way to do this in order to get a result that more closely mimics the what I might see in a raw viewer like photoshop, or even Mac OSX preview? Given that Mac OS X supports a variety of digital camera raw formats, is there anyway to utilize the OS's ability to render preview images (especially considering that result is what is expected).
The raw images that I'm using are 3FRs and fffs (both from Hasselblad).
I can post samples if people are interested.
Look at "sips" and "Resizing images using the command line" to get you started.

Drawing video with text on top

I am working on an application and I have a problem I just cant seem to find a solution for. The application is written in vc++. What I need to do is display a YUV video feed with text on top of it.
Right now it works correctly by drawing the text in the OnPaint method using GDI and the video on a DirectDraw overlay. I need to get rid of the overlay because it causes to many problems. It wont work on some video cards, vista, 7, etc.
I cant figure out a way to complete the same thing in a more compatible way. I can draw the video using DirectDraw with a back buffer and copy it to the primary buffer just fine. The issue here is that the text being drawn in GDI flashes because of the amount of times the video is refreshed. I would really like to keep the code to draw the text intact if possible since it works well.
Is there a way to draw the text directly to a DirectDraw buffer or memory buffer or something and then blt it to the back buffer? Should I be looking at another method all together? The two important OS's are XP and 7. If anyone has any ideas just let me know and I will test them out. Thanks.
Try to look into DirectShow and the Ticker sample on microsoft.com:
DirectShow Ticker sample
This sample uses the Video Mixing Renderer to blend video and text. It uses the IVMRMixerBitmap9 interface to blend text onto the bottom portion of the video window.
DirectShow is for building filter graphs for playing back audio or video streams an adding different filters for different effects and manipulation of video and audio samples.
Instead of using the Video Mixing Renderer of DirectShow, you can also use the ISampleGrabber interface. The advantage is, that it is a filter which can be used with other renderers as well, for example when not showing the video on the screen but streaming it over network or dumping it to a file.
