What's the best technique for comparing images' similarity? - ruby

I have an image master.png and more than 10.000 of other images (slave_1.png, slave_2.png, ...). They all have:
The same dimensions (Eg. 100x50 pixels)
The same format (png)
The same image background
98% of the slaves are identical to the master, but 2% of the slaves have a slightly different content:
New colors appear
New small shapes appear in the middle of the image
I need to spot those different slaves. I'm using Ruby, but I have no problem in use a different technology.
I tried to File.binread both images and then compare using ==. It worked for 80% of the slaves. In other slaves, it was spotting changes but the images was visually identical. So it doesn't work.
Alternatives are:
Count the number of colors present in each slave and compare with master. It will work in 100% of the time. But I don't know how to do it in Ruby in a "light" way.
Use some image processor to compare by histograms like RMagick or ruby-vips8. This way should also work but I need to consume the less CPU/Memory possible.
Write a C++/Go/Crystal program to read pixel by pixel and return a number of colors. I think in this way we can get performance out of if. But for sure is the hard way.
Any enlightenment? Suggestions?

In ruby-vips, you could do it like this:
#!/usr/bin/ruby
require 'vips'
# find normalised histogram of reference image
ref = Vips::Image.new_from_file ARGV[0], access: :sequential
ref_hist = ref.hist_find.hist_norm
ARGV[1..-1].each do |filename|
# find sample hist
sample = Vips::Image.new_from_file filename, access: :sequential
sample_hist = sample.hist_find.hist_norm
# calculate sum of squares of differences; if it's over a threshold, print
# the filename
diff_hist = (ref_hist - sample_hist) ** 2
diff = diff_hist.avg * diff_hist.width * diff_hist.height
if diff > 100
puts "#{filename}, #{diff}"
end
end
If I make some test data:
$ vips crop ~/pics/k2.jpg ref.png 0 0 100 50
$ for i in {1..10000}; do cp ref.png $i.png; done
I can run it like this:
$ time ../similarity.rb ref.png *.png
real 0m55.974s
user 1m31.921s
sys 0m54.433s
It runs in a steady ~80mb of memory.

Related

Memory not being released back to OS

I've created an image resizing server that creates a few different thumbnails of and image that you upload to it. I'm using the package https://github.com/h2non/bimg for resizing, which is using libvips with c-bindings.
Before going to production I've started to stress test my app with jmeter and upload 100 images to it concurrently for a few times after each other and noticed that the memory is not being released back to the OS.
To illustrate the problem I've written a few lines of code that reads 100 images and resize them (without saving them anywhere) and then waits for 10 minutes. It repeats like this for 5 times
My code and memory/CPU graph can be found here:
https://github.com/hamochi/bimg-memory-issue
It's clear that the memory is being reused for ever cycle, otherwise it should have doubled (I think). But it's never released back to the OS.
Is this a general behaviour for cgo? Or bimg that is doing something weird. Or is it just my code that is faulty?
Thank you very much for any help you can give!
There's a libvips thing to track and debug reference counts -- you could try enabling that and see if you have any leaks.
https://libvips.github.io/libvips/API/current/libvips-vips.html#vips-leak-set
Though from your comment above about bimg memory stats, it sounds like it's probably all OK.
It's easy to test libvips memory from Python. I made this small program:
#!/usr/bin/python3
import pyvips
import sys
# disable libvips operation caching ... without this, it'll cache all the
# thumbnail operations and we'll just be testing the jpg write
pyvips.cache_set_max(0)
for i in range(0, 10000):
print("loop {} ...".format(i))
for filename in sys.argv[1:]:
# thumbnail to fit 128x128 box
image = pyvips.Image.thumbnail(filename, 128)
thumb = image.write_to_buffer(".jpg")
ie. repeatedly thumbnail a set of source images. I ran it like this:
$ for i in {1..100}; do cp ~/pics/k2.jpg $i.jpg; done
$ ../fing.py *
And watched RES in top. I saw:
loop | RES (kb)
-- | --
100 | 39220
250 | 39324
300 | 39276
400 | 39316
500 | 39396
600 | 39464
700 | 39404
1000 | 39420
As long as you have no refcount leaks, I think what you are seeing is expected behaviour. Linux processes can only release pages at the end of the heap back to the OS (have a look at the brk and sbrk sys calls):
https://en.wikipedia.org/wiki/Sbrk
Now imagine if 1) libvips allocates 6GB, 2) the Go runtime allocates 100kb, 3) libvips releases 6GB. Your libc (the thing in your process that will call sbrk and brk on your behalf) can't hand the 6GB back to the OS because of the 100kb alloc at the end of the heap. Some malloc implementations have better memory fragmentation behaviour than others, but the default linux one is pretty good.
In practice, it doesn't matter. malloc will reuse holes in your memory space, and even if it doesn't, they will get paged out anyway under memory pressure and won't end up eating RAM. Try running your process for a few hours, and watch RES. You should see it creep up, but then stabilize.
(I'm not at all a kernel person, the above is just my understanding, corrections very welcome of course)
The problem is in the resize code:
_, err = bimg.NewImage(buffer).Resize(width, height)
The image is gobject and need unref explicitly to release the memory, try:
image, err = bimg.NewImage(buffer).Resize(width, height)
defer C.g_object_unref(C.gpointer(image))

OpenCV - Removing the independent short line in an image

How do I remove the short and disconnected line but retain all the other connected lines in the following image?
If you image is always this well connected, you can select the components based on their size. My code in Python (might be a simpler way, but that's how I do it) :
#get all connected components in the image with their stats (including their size, in pixel)
nb_edges, output, stats, _ = cv2.connectedComponentsWithStats(img, connectivity=8)
#output is an image where every component has a different value
size=stats[1:,-1] #extracting the size from the statistics
#selecting bigger components
for e in range(0,nb_edges-1):
#replace this line depending on your application, here I chose to keep
#all components above the mean size of components in the image
if size[e]>=np.mean(size):
th_up = e + 2
th_do = th_up
#masking to keep only the components which meet the condition
mask = cv2.inRange(output, th_do, th_up)
result = cv2.bitwise_xor(original_img, mask)

How to use output of CIFilter recursively as new input?

I've written an own CIFilter kernel which is doing some image processing on the camera signal. It takes two arguments:
Argument one is "inputImage" (the current camera image) argument 2 is "backgroundImage" which is being initialized with the first camera image.
The filter is supposed to work recursively. The result of the filter should be used as new "backgroundImage" in the next iteration. I am calculating a background image and some variances and therefore need the result from the previous render.
Unfortunately I cannot use the output CIImage of the CIFilter in the next iteration, because the memory load gets up and up. After 10 seconds of processing it ends up with 1.4GB of RAM usage. Using the filter in a standard manner (without recursion) memory management is fine.
How can I reuse the output of a filter as input in the next iteration?
I've done a NSLog on the result image. Ant it told me
background {
CISampler:0x1002e0360 image {
FEPromise: 0x1013aa230 extent [0 0 1280 720]; DOD [0 0 1280 720]; filter MyFeatureDetectFilter: 0x101388dd0; kernel coreImageKernel; image {
CISampler:0x10139e200 image {
FEBufferImage: 0x10139bee0 metadata.colorspace: HDTV; extent: [0 0 1280 720]; format: BGRA_8; uid 5
}
After some seconds the log becomes sth. like
}
}
}
}
}
This tells me that CIImages are 'always' prototypes of the desired operation. And using them recursively adds just the "resulting CIImage 'prototype'" as input into the new 'prototype'.
Over time the "rule" for rendering blows up into a huge structure of nested prototypes.
Is there any way to force CIImages to flatten the structure inside memory?
I would be happy if I could do recursive processing, because this would blow up the power of QuartzCore to the extreme.
I tried the same in QuartzComposer. Connecting the output with the input works, but takes a lot of memory, too. After some time it crashes. Then I tried to use the Queue from QC and everything worked fine. What is the "xcode" equivalent of the QC Queue? Or is there any mechanism to rewrite my kernel to keep "results" in memory for the next iteration?
It seems like what you're looking for is the CIImageAccumulator class. This allows you to use the output of a filter as its input on the next iteration.
Edit:
For an example of how to use it, you can check out this Apple sample code.

MATLAB : access loaded MAT file very slow

I'm currently working on a project involving saving/loading quite big MAT files (around 150 MB), and I realized that it was much slower to access a loaded cell array than the equivalent version created inside a script or a function.
I created this example to simulate my code and show the difference :
clear; clc;
disp('Test for computing with loading');
if exist('data.mat', 'file')
delete('data.mat');
end
n_tests = 10000;
data = {};
for i=1:n_tests
data{end+1} = rand(1, 4096);
end
% disp('Saving data');
% save('data.mat', 'data');
% clear('data');
%
% disp('Loading data');
% load('data.mat', '-mat');
for i=1:n_tests
tic;
for j=1:n_tests
d = sum((data{i} - data{j}) .^ 2);
end
time = toc;
disp(['#' num2str(i) ' computed in ' num2str(time) ' s']);
end
In this code, no MAT file is saved nor loaded. The average time for one iteration over i is 0.75s. When I uncomment the lines to save/load the file, the computation for one iteration over i takes about 6.2s (the saving/loading time is not taking into consideration). The difference is 8x slower !
I'm using MATLAB 7.12.0 (R2011a) 64 bits with Windows 7 64 bits, and the MAT files are saved with the version v7.3.
Can it be related to the compression of the MAT file? Or caching variables ?
Is there any way to prevent/avoid this ?
I also know this problem. I think it's also related to the inefficient managing of memory in matlab - and as I remember it's not doing well with swapping.
A 150MB file can easily hold a lot of data - maybe more than can be quickly allocated.
I just made a quick calculation for your example using the information by mathworks
In your case total_size = n_tests*121 + n_tests*(1*4096* 8) is about 313MB.
First I would suggest to save them in format 7 (instead of 7.3) - I noticed very poor performance in reading this new format. That alone could be the reason of your slowdown.
Personally I solved this in two ways:
Split the data in smaller sets and then use functions that load the data when needed or create it on the fly (can be elegantly done with classes)
Move the data into a database. SQLite and MySQL are great. Both work efficiently with MUCH larger datasets (in the TBs instead of GBs). And the SQL language is quite efficient to quickly get subsets to manipulate.
I test this code with Windows 64bit, matlab 64bit 2014b.
Without saving and loading, the computation is around 0.22s,
Save the data file with '-v7' and then load, the computation is around 0.2s.
Save the data file with '-v7.3' and then load, the computation is around 4.1s.
So it is related to the compression of the MAT file.

How do I Acquire Images at Timed Intervals using MATLAB?

I'm a MATLAB beginner and I would like to know how I can acquire and save 20 images at 5 second intervals from my camera. Thank you very much.
First construct a video input interface
vid = videoinput('winvideo',1,'RGB24_400x300');
You'll need to adjust the last bit for your webcam. To find a list of webcam devices (and other things besides) use:
imaqhwinfo
The following makes the first webcam into an object
a=imaqhwinfo('winvideo',1)
Find the list of supported video formats with
a.SupportedFormats
You'll then want to start up the interface:
start(vid);
preview(vid);
Now you can do the following:
pics=cell(1,20)
for i=1:20
pause(5);
pics{i}=getsnapshot(vid);
end
Or, as other commentators have noted, you could also use a Matlab timer for the interval.
If you wish to capture images with a considerably shorter interval (1 or more per second), it may be more useful to consider the webcam as a video source. I've left an answer to this question which lays out methods for achieving that.
There are several ways to go about this, each with advantages and disadvantages. Based on the information that you've posted so far, here is how I would do this:
vid = videoinput('dcam', 1'); % Change for your hardware of course.
vid.FramesPerTrigger = 20;
vid.TriggerRepeat = inf;
triggerconfig(vid, 'manual');
vid.TimerFcn = 'trigger(vid)';
vid.TimerPeriod = 5;
start(vid);
This will acquire 20 images every five seconds until you call STOP. You can change the TriggerRepeat parameter to change how many times acquisition will occur.
This obviously doesn't do any processing on the images after they are acquired.
Here is a quick tutorial on getting one image http://www.mathworks.com/products/imaq/description5.html Have you gotten this kind of thing to work yet?
EDIT:
Now that you can get one image, you want to get twenty. A timer object or a simple for loop is what you are going to need.
Simple timer object example
Video example of timers in MATLAB
Be sure to set the "tasks to execute" field to twenty. Also, you should wrap up all the code you have for one picture snap into a single function.
To acquire the image, does the camera comes with some documented way to control it from a computer? MATLAB supports linking to outside libraries. Or you can buy the appropriate MATLAB toolbox as suggested by MatlabDoug.
To save the image, IMWRITE is probably the easiest option.
To repeat the action, a simple FOR loop with a PAUSE will give you roughly what you want with very little work:
for ctr = 1:20
img = AcquireImage(); % your function goes here
fname = ['Image' num2str(ctr)]; % make a file name
imwrite(img, fname, 'TIFF');
pause(5); % or whatever number suits your needs
end
If, however, you need exact 5 second intervals, you'll have to dive into TIMERs. Here's a simple example:
function AcquireAndSave
persistent FileNum;
if isempty(FileNum)
FileNum = 1;
end
img = AcquireImage();
fname = ['Image' num2str(FileNum)];
imwrite(img, fname, 'TIFF');
disp(['Just saved image ' fname]);
FileNum = FileNum + 1;
end
>> t = timer('TimerFcn', 'ShowTime', 'Period', 5.0, 'ExecutionMode', 'fixedRate');
>> start(t);
...you should see the disp line from AcquireAndSave repeat every 5 seconds...
>> stop(t);
>> delete(t);

Resources