Image similarity comparison - algorithm

I originally asked this question on cstheory.stackexchange.com but was suggested to move it to stats.stackexchange.com.
Is there an existing algorithm that returns to me a similarity metric between two bitmap images? By "similar", I mean a human would say these two images were altered from the same photograph. For example, the algorithm should say the following 3 images are the same (original, position shifted, shrunken).
Same
I don't need to detect warped or flipped images. I also don't need to detect if it's the same object in different orientations.
Different
I would like to use this algorithm to prevent spam on my website. I noticed that the spammers are too lazy to change their spam images. It's not limited to faces. I already know there's already many great facial recognition algorithms out there. The spam image could be anything from a URL to a soccer field to a naked body.

There is a discussion of image similarity algorithms at stack overflow. Since you don't need to detect warped or flipped images, the histogram approach may be sufficient providing the image crop isn't too severe.

You can use existing deep learning architectures like VGG to generate features from images and then use a similarity metric like cosine similarity to see if two images are essentially the same.
The whole pipeline is pretty easy to set up and you do not need to understand the neural network architecture (you can just treat it like a black box). Also, these features are pretty generic and can be applied to find similarity between any kind of objects, not just faces.
Here are a couple of blogs that walk you through the process.
http://blog.ethanrosenthal.com/2016/12/05/recasketch-keras/
https://erikbern.com/2015/09/24/nearest-neighbor-methods-vector-models-part-1.html

Amazon has a new API called Rekognition which allows you to compare two images for facial similarity. The api returns a similarity percentage for each face with one another and the bounding boxes for each face.
Rekognition also includes an api for both Facial Analysis (returning the gender, approximate age, and other relevant facial details) and Object Scene Detection(returning tags of objects that are within in image).

One of the great technique to calculate similarity of images is "mean structural similarity".
import cv2
from skimage import compare_ssim
img = cv2.imread('img_1.png')
img_2 = cv2.imread('img_2.png')
print(compare_ssim(img, img_2))

If you just want image similarity that's one thing, but facial similarity is quite another. Two very different individuals could appear in the same background and an analysis of image similarity show them to be the same while the same person could be shot in two different settings and the similarity analysis show them to be different.
If you need to do facial analysis you should search for algorithms specific to that. Calculating relative eye, nose and mouth size and position is often done in this kind of analysis.

Use https://github.com/Netflix/vmaf to compare the two sets of images.
First convert the images to yuv422p using ffmpeg and then run the test. Note the score difference. This can be used to tell if the image is similar or different. For this sample they both look quite similiar...
ffmpeg -i .\different-pose-1.jpg -s 1920x1080 -pix_fmt yuv422p different-pose-1.yuv
ffmpeg -i .\different-pose-2.jpg -s 1920x1080 -pix_fmt yuv422p different-pose-2.yuv
.\vmafossexec.exe yuv422p 1920 1080 different-pose-1.yuv different-pose-2.yuv vmaf_v0.6.1.pkl --ssim --ms-ssim --log-fmt json --log different.json
Start calculating VMAF score...
Exec FPS: 0.772885
VMAF score = 2.124272
SSIM score = 0.424488
MS-SSIM score = 0.415149
ffmpeg.exe -i .\same-pose-1.jpg -s 1920x1080 -pix_fmt yuv422p same-pose-1.yuv
ffmpeg.exe -i .\same-pose-2.jpg -s 1920x1080 -pix_fmt yuv422p same-pose-2.yuv
.\vmafossexec.exe yuv422p 1920 1080 same-pose-1.yuv same-pose-2.yuv vmaf_v0.6.1.pkl --ssim --ms-ssim --log-fmt json --log same.json
Start calculating VMAF score...
Exec FPS: 0.773098
VMAF score = 5.421821
SSIM score = 0.285583
MS-SSIM score = 0.400130
References How can I create a YUV422 frame from a JPEG or other image on Ubuntu

Robust Hash Functions do that. But there's still a lot of research going on in that domain. I'm not sure if there are already usable prototypes.
Hope that helps.

Related

perspective correction example

I have some videos taken of a display, with the camera not perfectly oriented, so that the result shows a strong trapezoidal effect.
I know that there is a perspective filter in ffmpeg https://ffmpeg.org/ffmpeg-filters.html#perspective, but I'm too dumb to understand how it works from the docs - and I cannot find a single example.
Somebody can show me how it works?
The following example extracts a trapezoidal perspective section from an input Matroska video to an output video.
An estimated coordinate had to be inserted to complete the trapezoidal pattern (out-of-frame coordinate x2=-60,y2=469).
Input video frame was 1280x720. Pixel interpolation was specified linear, however that is the default if not specified at all. Cubic interpolation bloats the output with NO apparent improvement in video quality. Output video frame size will be of the input video's frame size.
Video output was viewable but rough quality due to sampling error.
ffmpeg -hide_banner -i input.mkv -lavfi "perspective=x0=225:y0=0:x1=715:y1=385:x2=-60:y2=469:x3=615:y3=634:interpolation=linear" output.mkv
You can also make use of ffplay (or any player which lets you access ffmpeg filters, like mpv) to preview the effect, or if you want to keystone-correct a display surface.
For example, if you have your TV above your fireplace mantle and you're sitting on the floor looking up at it, this will un-distort the image to a large extent:
ffplay video.mkv -vf 'perspective=W*.1:0:W*.9:0:-W*.1:H:W*1.1:H'
The above expands the top by 20% and compresses the bottom by 20%, cropping the top and infilling the bottom with the edge pixels.
Also handy for playing back video of a building you're standing in front of with the camera pointed up around 30 degrees.

How to filter motion vectors?

My video is very noisy temporally. The video was taken under low light conditions at a high frame rate.
Currently I've tried
ffplay -flags2 +export_mvs -i test.mp4 -vf edgedetect=low=0.05:high=0.17,hqdn3d=4.0:3.0:6.0:4.5,codecview=mv=pf+bf+bb,"lutyuv=y='if(lt(val,19),0,val)'
The motion vectors are tracking noise as in the near dark areas the vectors varying greatly in magnitude and angle.
How do I decimate or filter the display motion vectors based on magnitude and/or location?
Remember that codecview will display the motion vectors from the encoded file, so if you denoise that file after decoding (such as ffplay [..] -vf hqdn3d), then the motion vectors aren't actually affected by the denoising, because they come from an earlier part in the pipeline.
To change the motion vectors in the compressed file, you need to re-encode it and denoise/degrain before encoding. I don't remember if there's a way to generate motion vectors (post-decoding) within the filter chain.

Does simple rescaling from 1080p to frame height of 720 lead to 720p?

I want to convert a 1080p to 720p and also lower resolutions eventually.
I have been using ffmpeg for all my video processing activities so far, and would simply approach this task using the following command:
ffmpeg -i tos.mov -vf scale=-1:720 tos_0x720.mov
I understand that this will rescale my video to a new frame size having 720 pixels set as a fixed height and the width dynamically calculated.
What I am not sure about are the implications regarding the quality factors of the video when using ffmpeg this way.
Is it valid to assume that running this command will output a perfect HD 720p quality video?
What would be a benefit of using dedicated video conversion software to accomplish my goal compared to running the above command?
You can choose which scaling algorithm to use by setting the flags option in the scale filter. Some algorithms work better for up-scaling (bilinear) while others are better for down-sampling (bicubic, lanczos). Some are better for sharp graphics, others for gradual changes, some are faster and some are slower.
I think the default value for flags downsampling is bicubic, while some people recommend lanczos.
To set the flag use:
-vf scale=-1:720:flags=lanczos
Commercial video conversion software use the same algorithms. For eg. Adobe Premiere used variable-radius bicubic for Maximum Render Quality. They might help you choose one ore another depending on what you're after (speed vs. quality) and they may provide tweaks to reduce artifacts resulting from scaling.
There's a lot of literature covering the different algorithms.

Similar images compression

I have bunch of similar images. Those images contains different noise, but edges and histograms are very similar. I need to compress this images loselessly.
Is there any algorithm, that can use image similarity for more efficient compression ?
I have tried to use improved compression via prediction (changed MED predictor from LOCO), but my gain was only about 0,4%
What do you mean exactly by "similar"? Having similar histograms isn't going to help much. Do the images look the same?
You could simply try subtracting the previous image from this image, pixel by pixel and color by color, and see if the difference image is more compressible.
The next step would be to make the series of images a video, and use video compression, which can exploit more complex correlations between successive images.
If you have heard about Set Redundancy Compression (SRC) that would make your task very easy. It provides loss less and lossy image image compression techniques for set of similar images. Min-Max differential technique might be the one you seek.
Try doing "composite -compose difference image1 image2 diff" on sequential images (or arbitrarily order the images in some way if you don't already have an order). The 'diff' image might be very small, and you can recover image2 by doing composite -compose difference diff image1 (or some variation).
Are the images in gray scale or are colored. If they are gray scale ones, I have an application that I developed 7 years ago. I can try it for you.
It is based on a technique that is called Set Redundancy Compression.
If they are colored, you can try Lagarith (a video codec)

Detecting image equality at different resolutions

I'm trying to build a script to go through my original, high-res photos and replace the old, low-res ones I uploaded to Flickr before I had a pro account.
For many of them I can just use Exif info such as date taken to determine a match. But some are really old, and either the original file didn't have Exif info, or it got clobbered by whatever stupid resizing software I used at the time.
So, unable to rely on metadata, I'm forced to resort to the content itself. The problem is that the originals are in different resolutions than the ones on Flickr (which is the whole point of this endeavour). So is there a way for me to compare them with some sort of fuzzy similarity measure that would allow me to set a threshold for requiring human input or not?
I guess knowing one image is a resized version of the other can yield better results than general similarity. A solution in any language will do, but Ruby would be a plus :)
Interesting problem, btw :)
Slow-ish solution - excellent chance of success
Use a scale-invariant feature detector to find corresponding features in both images. If the features are matched with with a high score at similar locations, then you have your match.
I'd recommend SIFT which generates a scale & rotation invariant 128-integer descriptor for a feature found in an image. SURF (available in OpenCV) is another (faster) feature point detector.
You can match features across two images via bruteforce (compare each descriptor to a descriptor in the other image) which is O(n^2) but pretty fast (especially in the VL SIFT implementation). But if you need to compare the features in one image to several images (which you might have to) you should build a tree of the features to query it with the other image's features. K-D trees are useful, and OpenCV has a nice implementation.
Fast solution - might work
Downsample your high-res image to the low-res dimensions and use a similarity measure like SAD (where the sum of the differences between block of, say, 3x3 pixels around a pixel in both images is the score) to determine a match.
I'd recommend scripting a solution off of ImageMagick. The following (from the documentation on comparing images with IM) would output a comparative value that you can use.
convert image1 image2 \
-compose difference -composite -colorspace gray miff:- |\
identify -verbose - |\
sed -n '/^.*Mean: */{s//scale=2;/;s/(.*)//;s/$/*100\/32768/;p;q;}' | bc
Compute the normalized color histogram of both images and compare them using some method (histogram intersection, for example - see the link above). Note the normalized histogram is needed because the images present different resolutions. If the images are so dissimilar, they are not the same picture. But if they are similar, you have one of these two cases: (i) they are the same picture or (ii) they are different pictures but present similar global color distributions.
For case (ii), split the images and rectangular tiles and repeat the process, comparing correspondent tiles. You are trying to account for local properties of the image. Rank the results and pick the best match.

Resources