zoom a large picture - algorithm

There is a very large picture that could not load into memory once. Because it may cause out of memory exception. I need to zoom this picture to small size. So what should I do?
The simple thought is open an inputstream, and process a buffer size at a time. But the zoom algorithm?

If you can access the picture row-by-row (e.g. it's a bitmap), the simplest thing you could do is just downsample it, e.g. only read every nth pixel of every nth row.
// n is an integer that is the downsampling factor
// width, height are the width and height of the original image, in pixels
// down is a new image that is (height/n * width/n) pixels in size
for (y = 0; y < height; y += n) {
row = ... // read row y from original image into a buffer
for (x = 0; x < width; x += n) {
down[y/n, x/n] = row[x]; // image[row,col] -- shorthand for accessing a pixel
}
}
This is a quick-and-dirty way that can quickly and cheaply resize the original image without ever loading the whole thing into memory. Unfortunately, it also introduces aliasing in the output image (down). Dealing with aliasing would require performing interpolation -- still possible using the above row-by-row approach, but is a bit more involved.
If you can't easily access the image row-by-row, e.g. it's a JPEG, which encodes data in 8x8 blocks, you can still do something similar to the approach I described above. You would simply read a row of blocks instead of a row of pixels -- the remainder of the algorithm would work the same. Furthermore, if you're downsampling by a factor of 8, then it's really easy with JPEG -- you just take the DC coefficient of each block. Downsampling by factors that are multiples of 8 is also possible using this approach.
I've glossed over many other details (such as color channels, pixel stride, etc), but it should be enough to get you started.

There are a lot of different resizing algorithms which offer varying level of quality with the trade off being cpu time.
I believe with any of these you should be able to process a massive file in chunks relatively easily, however, you should probably try existing tools to see whether they can already just handle the massive file anyway.
Gd graphics library allows you to define how much working memory it can use I believe so it obviously already has logic for processing files in chunks.

Related

Blob detection on embedded platform, memory restricted

I have a STM32H7 MCU with 1MB of RAM and 1MB of ROM. I need to make a blob detection algorithm on a binary image array of max size 1280x1024.
I have searched about blob detection algorithms and found out that they are mainly divided into 2 categories, LINK:
Algorithms based on label-propagation (One component at a time):
They first search an unlabeled object pixel, label the pixel with a new label; then, in the later processing, they propagate the same label to all object pixels that are connected to the pixel. A demo code would look something like this:
void setLabels(){
int m=2;
for(int y=0; y<height; y++){
for(int x=0; x<width; x++){
if(getPixel(x,y) == 1) compLabel(x,y,m++);
}
}
}
void compLabel(int i, int j,int m){
if(getPixel(i,j)==1){
setPixel(i,j,m); //assign label
compLabel(i-1,j-1,m);
compLabel(i-1,j,m);
compLabel(i-1,j+1,m);
compLabel(i,j-1,m);
compLabel(i,j+1,m);
compLabel(i+1,j-1,m);
compLabel(i+1,j,m);
compLabel(i+1,j+1,m);
}
}
Algorithms based on label-equivalent-resolving (Two-pass): They consist of two steps: in the first step, they assign a provisional label to each object pixel. In the second step, they integrate all provisional labels assigned to each object, which are called equivalent labels, to a unique label, which called the representative label, and replace the provisional label of each object pixel by its representative label.
The down sides of the 1st algorithm is that it is using recursive calls for all the pixel around the original pixel. I am afraid that it will cause hard fault errors on STM32 because of the limited stack.
The down sides of the 2nd algorithm is that it requires a lot of memory for the labeling image. For instance, for the max. resolution of 1280x1024 and for the max. number of labels 255 (0 for no label), image label size is 1.25MB. Way more than we have available.
I am looking for some advice on how to proceed. How to get center coordinates and area information of all blobs in the image without using to much memory? Any help is appreciated. I presume that the 2nd algorithm is out of the picture since there is no memory available.
You firstly have to go over you image with a scaling kernel to scale your image back to something that is able to be processed. 4:1 or 9:1 are good possibilities. Or you are going to have to get more RAM. Because this situation seems unworkable otherwise. Bit access is not really fast and is going to kill your efficiency and I don't even think that you need that big of an image. (at least that is my experience with vision systems)
You can then store the pixels in straight unsigned char array which can be labeled with the first method you named. It doesn't have to be a recursive process. You can also determine if a blob was relabeled to another blob and set a flag to do it again.
This makes it possible to have an externally visible function have a while loop which keeps calling your labeling function without creating a big stack.
Area determination is then done by going over the image and counting the instance of a pixel for every labeled blob.
The center of a certain blob can be found by calulating the moments of a blob and then calculating the center of mass. This is some pretty hefty math so don't be discouraged, it is a though apple to bite through but it is a great solution.
(small hint: you can take the C++ code from OpenCV and look through their code to find out how it's done)

is it possible to partial update pvrtc texture

I created a 1024*1024 texture with
glCompressedTexImage2D(GL_TEXTURE_2D, 0, GL_COMPRESSED_RGBA_PVRTC_4BPPV1_IMG, 1024, 1024, 0, nDataLen*4, pData1);
then update it's first 512*512 part like this
glCompressedTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 512, 512, GL_COMPRESSED_RGBA_PVRTC_4BPPV1_IMG, nDataLen, pData2);
This update generated glerror 1282(invalid operation), if I update the whole 1024*1024 region all are ok, it seems that pvrtc texture cannot be partial updated.
Is it possible to partial update pvrtc textur, if it is how how?
Sounds to me like you can't on GLES2 (link to spec, see 3.7.3.)
Calling CompressedTexSubImage2D will result in an INVALID_OPERATION error if xoffset or yoffset is not equal to zero, or if width and height do not match the width and height of the texture, respectively. The contents of any texel outside the region modified by the call are undefined. These restrictions may be relaxed for specific compressed internal formats whose images are easily modified
Makes glCompressedTexSubImage2D sound a bit useless to me, tbh, but I guess it's for updating individual mips or texture array levels.
Surprisingly, i copyed a small pvrtc texture data into a large one, it works just like glCompressedTexSubImage2D.But i'am not sure whether it's safe to use this solution in my engine.
Rightly or wrongly, the reason PVRTC1 does not have CompressedTexSubImage2D support is that unlike, say, ETC* or S3TC, the texture data is not compressed as independent 4x4 squares of texels which, in turn, get represented as either 64 or 128 bits of data depending on the format. With ETC*/S3TC any aligned 4x4 block of texels can be replaced without affecting any other region of the texture simply by just replacing its corresponding 64- or 128-bit data block.
With PVRTC1, two aims were to avoid block artifacts and to take advantage of the fact that neighbouring areas are usually very similar and thus can share information. Although the compressed data is grouped into 64-bit units, these affect overlapping areas of texels. In the case of 4bpp they are ~7x7 and for 2bpp, 15x7.
As you later point out, you could copy the data yourself but there may be a fuzzy boundary: For example, I took these 64x64 and 32x32 textures (which have been compressed and decompressed with PVRTC1 #4bpp ) ...
+
and then did the equivalent of "TexSubImage" to get:
As you should be able to see, the border of the smaller texture has smudged as the colour information is shared across the boundaries.
In practice it might not matter but since it doesn't strictly match the requirements of TexSubImage, it's not supported.
PVRTC2 has facilities to do better subimage replacement but is not exposed on at least one well-known platform.
< Unsubtle plug > BTW if you want some more info on texture compression, there is a thread on the Stack Exchange Computer Graphics site < /Unsubtle plug >

Combine multiple images using vips (ruby-vips8)

How do I apply a function to corresponding pixels of two images of the same resolution? Like Photoshop does when covering one layer with another one. What about more than two images?
If it was Wolfram Mathematica I would take a List of those images and transpose them to get a single "image" where each "pixel" would be an array of N pixels -- there I would apply a Mean[] function to them.
But how do I do that with vips? There are so many Vips::Image methods and only here I could find some minimal description on what do they all mean. So for example:
images = Dir["shots/*"].map{ |i| Vips::Image.new_from_file(i) }
ims = images.map(&:bandmean)
(ims.inject(:+) / ims.size).write_to_file "temp.png"
I wanted it to mean "calculating an average image" but I'm not sure what I've done here.
ruby-vips8 comes with a complete set of operator overloads, so you can just do arithmetic on images. It also does automatic common-subexpression elimination, so you don't need to be too careful about ordering or grouping, you can just write an equation and it should work well.
In your example:
require 'vips8'
images = Dir["shots/*"].map{ |i| Vips::Image.new_from_file(i) }
sum = images.reduce (:+)
avg = sum / images.length
avg.write_to_file "out.tif"
+-*/ with a constant always makes a float image, so you might want to cast the result down to uchar before saving (or maybe ushort?) or you'll have a HUGE output tiff. You could write:
avg = sum / images.length
avg.cast("uchar").write_to_file "out.tif"
By default, new_from_file opens images for random access. If your sources images are JPG or PNG, this will involve decompressing them entirely to memory (or to a disk temp if they are very large) before processing can start.
In this case, you only need to scan the input images from top to bottom as you write the result, so you can stream the images through your system. Change the new_from_file to be:
images = Dir["shots/*"].map { |i| Vips::Image.new_from_file(i, :access => "sequential") }
to hint that you will only be using the image pixels sequentially, and you should see a nice drop in memory and CPU use.
PNG is a horribly slow format, I would use tiff if possible.
You could experiment with bandrank This does something like a median filter over a set of images: you give it an array of images and at each pixel position it sorts the images by pixel value and selects the Nth one. It's a very effective way to remove transitory artifacts.
You can use condition.ifthenelse(then, else) to compute more complex functions. For example, to set all pixels greater than their local average equal to the local average, you could write:
(image > image.gaussblur(1)).ifthenelse(image.gaussblur(1), image)
You might be curious how vips will execute the program above. The code:
(images.reduce(:+) / images.length).cast("uchar")
will construct a pipeline of image processing operations: a series of vips_add() to sum the array, then a vips_linear() to do the divide, and finally a vips_cast() to knock it back to uchar.
When you call write_to_file, each core on your machine will be given a copy of the pipeline and they will queue up to process tiles from the source images as they arrive from the decompressor. Each time a line of output tiles is completed, a background thread will use the selected image write library (libtiff in my example) to send those scanlines back to disk.
You should see low memory use and good CPU utilization.

How to average multiple images using Octave and matrix manipulation to reduce noise?

UPDATE
Here is my code that is meant to add up the two matrices and using element by element addition and then divide by two.
function [ finish ] = stackAndMeanImage (initFrame, finalFrame)
cd 'C:\Users\Disc-1119\Desktop\Internships\Tracking\Octave\highway\highway (6-13-2014 11-13-41 AM)';
pkg load image;
i = initFrame;
f = finalFrame;
astr = num2str(i);
tmp = imread(astr, 'jpg');
d = f - i
for a = 1:d
a
astr = num2str(i + 1);
read_tmp = imread(astr, 'jpg');
read_tmp = rgb2gray(read_tmp);
tmp = tmp :+ read_tmp;
tmp = tmp / 2;
end
imwrite(tmp, 'meanimage.JPG');
finish = 'done';
end
Here are two example input images
http://imgur.com/5DR1ccS,AWBEI0d#1
And here is one output image
http://imgur.com/aX6b0kj
I am really confused as to what is happening. I have not implemented what the other answers have said yet though.
OLD
I am working on an image processing project where I am now manually choosing images that are 'empty' or only have the background, so that my algorithm can compute the differences and then do some more analysis, I have a simple piece of code that computes the mean of the two images, which I have converted to grayscale matrices, but this only works for two images, because when I find the mean of two, then take this mean and find the mean of this versus the next image, and do this repeatedly, I end up with a washed out white image that is absolutely useless. You can't even see anything.
I found that there is a function in Matlab called imFuse that is able to average images. I was wondering if anyone knew the process that imFuse uses to combine images, I am happy to implement this into Octave, or if anyone knew of or has already written a piece of code that achieves something similiar to this. Again, I am not asking for anyone to write code for me, just wondering what the process for this is and if there are already pre-existing functions out there, which I have not found after my research.
Thanks,
AeroVTP
You should not end up with a washed-out image. Instead, you should end up with an image, which is technically speaking temporally low-pass filtered. What this means is that half of the information content is form the last image, one quarter from the second last image, one eight from the third last image, etc.
Actually, the effect in a moving image is similar to a display with slow response time.
If you are ending up with a white image, you are doing something wrong. nkjt's guess of type challenges is a good one. Another possibility is that you have forgotten to divide by two after summing the two images.
One more thing... If you are doing linear operations (such as averaging) on images, your image intensity scale should be linear. If you just use the RGB values or some grayscale values simply calculated from them, you may get bitten by the nonlinearity of the image. This property is called the gamma correction. (Admittedly, most image processing programs just ignore the problem, as it is not always a big challenge.)
As your project calculates differences of images, you should take this into account. I suggest using linearised floating point values. Unfortunately, the linearisation depends on the source of your image data.
On the other hand, averaging often the most efficient way of reducing noise. So, there you are in the right track assuming the images are similar enough.
However, after having a look at your images, it seems that you may actually want to do something else than to average the image. If I understand your intention correctly, you would like to get rid of the cars in your road cam to give you just the carless background which you could then subtract from the image to get the cars.
If that is what you want to do, you should consider using a median filter instead of averaging. What this means is that you take for example 11 consecutive frames. Then for each pixel you have 11 different values. Now you order (sort) these values and take the middle (6th) one as the background pixel value.
If your road is empty most of the time (at least 6 frames of 11), then the 6th sample will represent the road regardless of the colour of the cars passing your camera.
If you have an empty road, the result from the median filtering is close to averaging. (Averaging is better with Gaussian white noise, but the difference is not very big.) But your averaging will be affected by white or black cars, whereas median filtering is not.
The problem with median filtering is that it is computationally intensive. I am very sorry I speak very broken and ancient Octave, so I cannot give you any useful code. In MatLab or PyLab you would stack, say, 11 images to a M x N x 11 array, and then use a single median command along the depth axis. (When I say intensive, I do not mean it couldn't be done in real time with your data. It can, but it is much more complicated than averaging.)
If you have really a lot of traffic, the road is visible behind the cars less than half of the time. Then the median trick will fail. You will need to take more samples and then find the most typical value, because it is likely to be the road (unless all cars have similar colours). There it will help a lot to use the colour image, as cars look more different from each other in RGB or HSV than in grayscale.
Unfortunately, if you need to resort to this type of processing, the path is slightly slippery and rocky. Average is very easy and fast, median is easy (but not that fast), but then things tend to get rather complicated.
Another BTW came into my mind. If you want to have a rolling average, there is a very simple and effective way to calculate it with an arbitrary length (arbitrary number of frames to average):
# N is the number of images to average
# P[i] are the input frames
# S is a sum accumulator (sum of N frames)
# calculate the sum of the first N frames
S <- 0
I <- 0
while I < N
S <- S + P[I]
I <- I + 1
# save_img() saves an averaged image
while there are images to process
save_img(S / N)
S <- -P[I-N] + S + P[I]
I <- I + 1
Of course, you'll probably want to use for-loops, and += and -= operators, but still the idea is there. For each frame you only need one subtraction, one addition, and one division by a constant (which can be modified into a multiplication or even a bitwise shift in some cases if you are in a hurry).
I may have misunderstood your problem but I think what you're trying to do is the following. Basically, read all images into a matrix and then use mean(). This is providing that you are able to put them all in memory.
function [finish] = stackAndMeanImage (ini_frame, final_frame)
pkg load image;
dir_path = 'C:\Users\Disc-1119\Desktop\Internships\Tracking\Octave\highway\highway (6-13-2014 11-13-41 AM)';
imgs = cell (1, 1, d);
## read all images into a cell array
current_frame = ini_frame;
for n = 1:(final_frame - ini_frame)
fname = fullfile (dir_path, sprintf ("%i", current_frame++));
imgs{n} = rgb2gray (imread (fname, "jpg"));
endfor
## create 3D matrix out of all frames and calculate mean across 3rd dimension
imgs = cell2mat (imgs);
avg = mean (imgs, 3);
## mean returns double precision so we cast it back to uint8 after
## rescaling it to range [0 1]. This assumes that images were all
## originally uint8, but since they are jpgs, that's a safe assumption
avg = im2uint8 (avg ./255);
imwrite (avg, fullfile (dir_path, "meanimage.jpg"));
finish = "done";
endfunction

Scaling Laplacian of Gaussian Edge Detection

I am using Laplacian of Gaussian for edge detection using a combination of what is described in http://homepages.inf.ed.ac.uk/rbf/HIPR2/log.htm and http://wwwmath.tau.ac.il/~turkel/notes/Maini.pdf
Simply put, I'm using this equation :
for(int i = -(kernelSize/2); i<=(kernelSize/2); i++)
{
for(int j = -(kernelSize/2); j<=(kernelSize/2); j++)
{
double L_xy = -1/(Math.PI * Math.pow(sigma,4))*(1 - ((Math.pow(i,2) + Math.pow(j,2))/(2*Math.pow(sigma,2))))*Math.exp(-((Math.pow(i,2) + Math.pow(j,2))/(2*Math.pow(sigma,2))));
L_xy*=426.3;
}
}
and using up the L_xy variable to build the LoG kernel.
The problem is, when the image size is larger, application of the same kernel is making the filter more sensitive to noise. The edge sharpness is also not the same.
Let me put an example here...
Suppose we've got this image:
Using a value of sigma = 0.9 and a kernel size of 5 x 5 matrix on a 480 × 264 pixel version of this image, we get the following output:
However, if we use the same values on a 1920 × 1080 pixels version of this image (same sigma value and kernel size), we get something like this:
[Both the images are scaled down version of an even larger image. The scaling down was done using a photo editor, which means the data contained in the images are not exactly similar. But, at least, they should be very near.]
Given that the larger image is roughly 4 times the smaller one... I also tried scaling the sigma by factor of 4 (sigma*=4) and the output was... you guessed it right, a black canvas.
Could you please help me realize how to implement a LoG edge detector that finds the same features from an input signal, even if the incoming signal is scaled up or down (scaling factor will be given).
Looking at your images, I suppose you are working in 24-bit RGB. When you increase your sigma, the response of your filter weakens accordingly, thus what you get in the larger image with a larger kernel are values close to zero, which are either truncated or so close to zero that your display cannot distinguish.
To make differentials across different scales comparable, you should use the scale-space differential operator (Lindeberg et al.):
Essentially, differential operators are applied to the Gaussian kernel function (G_{\sigma}) and the result (or alternatively the convolution kernel; it is just a scalar multiplier anyways) is scaled by \sigma^{\gamma}. Here L is the input image and LoG is Laplacian of Gaussian -image.
When the order of differential is 2, \gammais typically set to 2.
Then you should get quite similar magnitude in both images.
Sources:
[1] Lindeberg: "Scale-space theory in computer vision" 1993
[2] Frangi et al. "Multiscale vessel enhancement filtering" 1998

Resources