For example, say I have a 2D array of pixels (in other words, an image) and I want to arrange them into groups so that the number of groups will add up perfectly to a certain number (say, the total items in another 2D array of pixels). At the moment, what I try is using a combination of ratios and pixels, but this fails on anything other than perfect integer ratios (so 1:2, 1:3, 1:4, etc). When it does fail, it just scales it to the integer less than it, so, for example, a 1:2.93 ratio scale would be using a 1:2 scale with part of the image cut off. I'd rather not do this, so what are some algorithms I could use that do not get into Matrix Multipication? I remember seeing something similar to what I described at first mentioned, but I cannot find it. Is this an NP-type problem?
For example, say I have a 12-by-12 pixel image and I want to split it up into exactly 64 sub-images of n-by-m size. Through analysis one could see that I could break it up into 8 2-by-2 sub-images, and 56 2-by-1 sub-images in order to get that exact number of sub-images. So, in other words, I would get 8+56=64 sub-images using all 4(8)+56(2)=144 pixels.
Similarly, if I had a 13 by 13 pixel image and I wanted to 81 sub-images of n-by-m size, I would need to break it up into 4 2-by-2 sub-images, 76 2-by-1 sub-images, and 1 1-by-1 sub-image to get the exact number of sub-images needed. In other words, 4(4)+76(2)+1=169 and 4+76+1=81.
Yet another example, if I wanted to split the same 13 by 13 image into 36 sub-images of n-by-m size, I would need 14 4-by-2 sub-images, 7 2-by-2 sub-images, 14 2-by-1 sub-images, and 1 1-by-1 sub-image. In other words, 8(13)+4(10)+2(12)+1=169 and 13+10+12+1=36.
Of course, the image need not be square, and neither the amount of sub-images, but neither should not be prime. In addition, the amount of sub-images should be less than the number of pixels in the image. I'd probably want to stick to powers of two for the width and height of the sub-images for ease of translating one larger sub image into multiple sub images, but if I can find an algorithm which didn't do that it'd be better. That is basically what I'm trying to find an algorithm for.
I understand that you want to split a rectabgular image of a given size, into n rectangular sub-images. Let say that you have:
an image of size w * h
and you want to split into n sub-images of size x * y
I think that what you want is
R = { (x, y) | x in [1..w], y in [1..h], x * y == (w * h) / n }
That is the set of pairs (x, y) such that x * y is equal to (w * h) / n, where / is the integer division. Also, you probably want to take the x * y rectangle having the smallest perimeter, i.e. the smallest value of x + y.
For the three examples in the questions:
splitting a 12 x 12 image into 64 sub-images, you get R = {(1,2),(2,1)}, and so you have either 64 1 x 2 sub-images, or 64 2 x 1 sub-images
splitting a 13 x 13 image into 81 sub-images, you het R = {(1,2),(2,1)}, and so you have either 64 1 x 2 sub-images, or 64 2 x 1 sub-images
splitting a 13 x 13 image into 36 sub-images, you het R = {(1,4),(2,2),(4,1)}, and so you could use 36 2 x 2 sub-images (smallest perimeter)
For every example, you can of course combine different size of rectangles.
If you want to do something else, maybe tiling your original image, you may want to have a look at rectangle tiling algorithms
If you don't care about the subimages being differently sized, a simple way to do this is repeatedly splitting subimages in two. Every new split increases the number of subimages by one.
Related
I'm building an application that creates a spritesheet from a series of images. Currently the application requires to indicate the number of columns by the user but I would like to add an option that suggests this parameter automatically which allows to obtain an almost square spritesheet.
If the images were square, a square root of the total number of images would suffice but this is not the case.
The images must all be the same size but may be taller than wide or the opposite.
For example: the sprite sheet has a walking character and each image is 331 high and 160 wide. The number of frames is 25.
I should find an algorithm that suggests the number of columns (and rows) that allows me to obtain a sheet that is as square as possible.
Unfortunately I have no code to present just because I have no idea what kind of reasoning to do.
Do you have any suggestions to work on?
The basic idea is that, if the image height is twice the width, you will need twice more columns than rows.
If:
q is the image ratio (width/height)
c is the number of columns
r is the number of rows
n is the total number of images
then we have:
r / c = q and r * c = n
After a few substitutions:
c = sqrt(n / q)
r = q * c
In your case, q = 160 / 331 = 0.48
c = sqrt(25 / 0.48) = 7.2
r = 0.48 * c = 3.5
So (after rounding) the answer is 4 rows and 7 columns.
Mathematically, this is an interesting question.
I don't have time to think about it extensively, but here are my two cents:
Let the width and height of the sprites be w and h, respectively, and let the number of sprites be n.
If you put these in a grid consisting of c columns and r rows, the total width of the grid will be cw and the total height rh. You want the quotient cw/rh to be as close to 1 as possible.
Now, if you chose c and r freely, the number of grid cells, N := cr, might well be slightly larger than n. In most cases, I would expect you to accept a partially empty last row.
Since N is close to n,
Hence, we want to find c such that
is as small as possible. Clearly this happens when
Hence, if you let the number of columns be √(nh/w) rounded to the nearest integer, you will probably get a fairly square grid.
I need to find the optimal placement of a given N child rectangles keeping the aspect ratio of the father rectangle.
Use case is the following:
- the father rectangle is a big picture, let's say 4000x3000 pixels (this one can be rescaled).
- child rectangles are 296x128 pixels (e-ink displays of users)
The objective is to show the big picture across all the current number of displays (this number can change from 1 to 100)
This is an example:
Can happen that number of small rectangles will not fit the big rectangle aspect ratio, like if number of small rectangles is odd, in this case I can think to have like a small number (max 5) of spare rectangles to add in order to complete the big rectangle.
this seems to be a valid approach (python + opencv)
import cv2
import imutils
def split_image(image, boards_no=25, boards_shape=(128, 296), additional=5):
# find image aspect ratio
aspect_ratio = image.shape[1]/image.shape[0]
print("\nIMAGE INFO:", image.shape, aspect_ratio)
# find all valid combination of a,b that maximize your available badges
valid_props = [(a, b) for a in range(boards_no+additional+1) for b in range(boards_no+additional+1) if a*b in [q for q in range(boards_no, boards_no+additional)]]
print("\nVALID COMBINATIONS", valid_props)
# find all aspect ratio from previous combination
aspect_ratio_all = [
{
'board_x': a,
'board_y': b,
'aspect_ratio': (a*boards_shape[1])/(b*boards_shape[0]),
'shape': (b*boards_shape[0], a*boards_shape[1]),
'type': 'h'
} for (a, b) in valid_props]
aspect_ratio_all += [
{
'board_x': a,
'board_y': b,
'aspect_ratio': (a*boards_shape[0])/(b*boards_shape[1]),
'shape': (b*boards_shape[1], a*boards_shape[0]),
'type': 'v'
} for (a, b) in valid_props]
min_ratio_diff = min([abs(aspect_ratio-x['aspect_ratio']) for x in aspect_ratio_all])
best_ratio = [x for x in aspect_ratio_all if abs(aspect_ratio-x['aspect_ratio']) == min_ratio_diff][0]
print("\MOST SIMILAR ASPECT RATIO:", best_ratio)
# resize image maximining height or width
resized_img = imutils.resize(image, height=best_ratio['shape'][0])
border_width = int((best_ratio['shape'][1] - resized_img.shape[1]) / 2)
border_height = 0
if resized_img.shape[1] > best_ratio['shape'][1]:
resized_img = imutils.resize(image, width=best_ratio['shape'][1])
border_height = int((best_ratio['shape'][0] - resized_img.shape[0]) / 2)
border_width = 0
print("RESIZED SHAPE:", resized_img.shape, "BORDERS (H, W):", (border_height, border_width))
# fill the border with black
resized_img = cv2.copyMakeBorder(
resized_img,
top=border_height,
bottom=border_height,
left=border_width,
right=border_width,
borderType=cv2.BORDER_CONSTANT,
value=[0, 0, 0]
)
# split in tiles
M = resized_img.shape[0] // best_ratio['board_y']
N = resized_img.shape[1] // best_ratio['board_x']
return [resized_img[x:x+M,y:y+N] for x in range(0,resized_img.shape[0],M) for y in range(0,resized_img.shape[1],N)]
image = cv2.imread('image.jpeg')
tiles = split_image(image)
Our solutions will always be rectangles into which we have fit the biggest picture that we can keeping the aspect ratio correct. The question is how we grow them.
In your example a single display is 296 x 128. (Which I assume is length and height.) Our scaled image to 1 display is 170.6 x 128. (You can take out fractional pixels in your scaling.)
The rule is that at all points, whatever direction is filled gets filled in with more displays so we can expand the picture. In the single display solution we therefore go from a 1x1 rectangle to a 1x2 one and we now have 296 x 256. Our scaled image is now 296 x 222.
Our next solution will be a 2x2 display. This gives us 594 x 256 and our scaled image is 321.3 x 256.
Next we get a 2x3 display. This gives us 594 x 384 and our scaled display is now 512 x 384.
Since we are still maxing on the second dimension we next go to 2x4. This gives us 594 x 512 and our scaled display is 594 x 445.5. And so on.
For your problem it will not take long to run through all of the sizes up to however many displays you have, and you just take the biggest rectangle that you can make from the list.
Important special case. If the display rectangle and image have the same aspect ratio, you have to add to both dimensions. Which in the case that the image and the displays have the same aspect ratio gives you 1 x 1, 2 x 2, 3 x 3 and so on through the squares.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am trying to understand MATLAB's code for the Hough Transform.
Some items are clear to me in this picture,
binary_image is the monochrome version of input_image.
hough_lines is a vector containing detected lines in the image. I see that, four lines have been detected.
T contain the thetas in the (ϴ, ρ) space of the image.
R contain the rhos in the (ϴ, ρ) space of the image.
I have the following questions,
Why is the image rotated before applying Hough Transform?
What do the entries in H represent?
Why is H(Hough Matrix) of size 45x180? Where does this size come from?
Why is T of size 1x180? Where does this size come from?
Why is R of size 1x45? Where does this size come from?
What do the entries in P represent? Are they (x, y) or (ϴ, ρ) ?
29 162
29 165
28 170
21 5
29 158
Why is the value 5 passed into houghpeaks()?
What is the logic behind ceil(0.3*max(H(:)))?
Relevant source code
% Read image into workspace.
input_image = imread('Untitled.bmp');
%Rotate the image.
rotated_image = imrotate(input_image,33,'crop');
% convert rgb to grascale
rotated_image = rgb2gray(rotated_image);
%Create a binary image.
binary_image = edge(rotated_image,'canny');
%Create the Hough transform using the binary image.
[H,T,R] = hough(binary_image);
%Find peaks in the Hough transform of the image.
P = houghpeaks(H,5,'threshold',ceil(0.3*max(H(:))));
%Find lines
hough_lines = houghlines(binary_image,T,R,P,'FillGap',5,'MinLength',7);
% Plot the detected lines
figure, imshow(rotated_image), hold on
max_len = 0;
for k = 1:length(hough_lines)
xy = [hough_lines(k).point1; hough_lines(k).point2];
plot(xy(:,1),xy(:,2),'LineWidth',2,'Color','green');
% Plot beginnings and ends of lines
plot(xy(1,1),xy(1,2),'x','LineWidth',2,'Color','yellow');
plot(xy(2,1),xy(2,2),'x','LineWidth',2,'Color','red');
% Determine the endpoints of the longest line segment
len = norm(hough_lines(k).point1 - hough_lines(k).point2);
if ( len > max_len)
max_len = len;
xy_long = xy;
end
end
% Highlight the longest line segment by coloring it cyan.
plot(xy_long(:,1),xy_long(:,2),'LineWidth',2,'Color','cyan');
Those are some good questions. Here are my answers for you:
Why is the image rotated before applying Hough Transform?
This I don't believe is MATLAB's "official example". I just took a quick look at the documentation page for the function. I believe you pulled this from another website that we don't have access to. In any case, in general it is not necessary for you to rotate the images prior to using the Hough Transform. The goal of the Hough Transform is to find lines in the image in any orientation. Rotating them should not affect the results. However, if I were to guess the rotation was performed as a preemptive measure because the lines in the "example image" were most likely oriented at a 33 degree angle clockwise. Performing the reverse rotation would make the lines more or less straight.
What do the entries in H represent?
H is what is known as an accumulator matrix. Before we get into what the purpose of H is and how to interpret the matrix, you need to know how the Hough Transform works. With the Hough transform, we first perform an edge detection on the image. This is done using the Canny edge detector in your case. If you recall the Hough Transform, we can parameterize a line using the following relationship:
rho = x*cos(theta) + y*sin(theta)
x and y are points in the image and most customarily they are edge points. theta would be the angle made from the intersection of a line drawn from the origin meeting with the line drawn through the edge point. rho would be the perpendicular distance from the origin to this line drawn through (x, y) at the angle theta.
Note that the equation can yield infinity many lines located at (x, y) so it's common to bin or discretize the total number of possible angles to a predefined amount. MATLAB by default assumes there are 180 possible angles that range from [-90, 90) with a sampling factor of 1. Therefore [-90, -89, -88, ... , 88, 89]. What you generally do is for each edge point, you search over a predefined number of angles, determine what the corresponding rho is. After, we count how many times you see each rho and theta pair. Here's a quick example pulled from Wikipedia:
Source: Wikipedia: Hough Transform
Here we see three black dots that follow a straight line. Ideally, the Hough Transform should determine that these black dots together form a straight line. To give you a sense of the calculations, take a look at the example at 30 degrees. Consulting earlier, when we extend a line where the angle made from the origin to this line is 30 degrees through each point, we find the perpendicular distance from this line to the origin.
Now what's interesting is if you see the perpendicular distance shown at 60 degrees for each point, the distance is more or less the same at about 80 pixels. Seeing this rho and theta pair for each of the three points is the driving force behind the Hough Transform. Also, what's nice about the above formula is that it will implicitly find the perpendicular distance for you.
The process of the Hough Transform is very simple. Suppose we have an edge detected image I and a set of angles theta:
For each point (x, y) in the image:
For each angle A in the angles theta:
Substitute theta into: rho = x*cos(theta) + y*sin(theta)
Solve for rho to find the perpendicular distance
Remember this rho and theta and count up the number of times you see this by 1
So ideally, if we had edge points that follow a straight line, we should see a rho and theta pair where the count of how many times we see this pair is relatively high. This is the purpose of the accumulator matrix H. The rows denote a unique rho value and the columns denote a unique theta value.
An example of this is shown below:
Source: Google Patents
Therefore using an example from this matrix, located at theta between 25 - 30 with a rho of 4 - 4.5, we have found that there are 8 edge points that would be characterized by a line given this rho, theta range pair.
Note that the range of rho is also infinitely many values so you need to not only restrict the range of rho that you have, but you also have to discretize the rho with a sampling interval. The default in MATLAB is 1. Therefore, if you calculate a rho value it will inevitably have floating point values, so you remove the decimal precision to determine the final rho.
For the above example the rho resolution is 0.5, so that means that for example if you calculated a rho value that falls between 2 to 2.5, it falls in the first column. Also note that the theta values are binned in intervals of 5. You traditionally would compute the Hough Transform with a theta sampling interval of 1, then you merge the bins together. However for the defaults of MATLAB, the bin size is 1. This accumulator matrix tells you how many times an edge point fits a particular rho and theta combination. Therefore, if we see many points that get mapped to a particular rho and theta value, this is a great potential for a line to be detected here and that is defined by rho = x*cos(theta) + y*sin(theta).
Why is H(Hough Matrix) of size 45x180? Where does this size come from?
This is a consequence of the previous point. Take note that the largest distance we would expect from the origin to any point in the image is bounded by the diagonal of the image. This makes sense because going from the top left corner to the bottom right corner, or from the bottom left corner to the top right corner would give you the greatest distance expected in the image. In general, this is defined as D = sqrt(rows^2 + cols^2) where rows and cols are the rows and columns of the image.
For the MATLAB defaults, the range of rho is such that it spans from -round(D) to round(D) in steps of 1. Therefore, your rows and columns are both 16, and so D = sqrt(16^2 + 16^2) = 22.45... and so the range of D will span from -22 to 22 and hence this results in 45 unique rho values. Remember that the default resolution of theta goes from [-90, 90) (with steps of 1) resulting in 180 unique angle values. Going with this, we have 45 rows and 180 columns in the accumulator matrix and hence H is 45 x 180.
Why is T of size 1x180? Where does this size come from?
This is an array that tells you all of the angles that were being used in the Hough Transform. This should be an array going from -90 to 89 in steps of 1.
Why is R of size 1x45? Where does this size come from?
This is an array that tells you all of the rho values that were being used in the Hough Transform. This should be an array that spans from -22 to 22 in steps of 1.
What you should take away from this is that each value in H determines how many times we have seen a particular pair of rho and theta such that for R(i) <= rho < R(i + 1) and T(j) <= theta < T(j + 1), where i spans from 1 to 44 and j spans from 1 to 179, this determines how many times we see edge points for a particular range of rho and theta defined previously.
What do the entries in P represent? Are they (x, y) or (ϴ, ρ)?
P is the output of the houghpeaks function. Basically, this determines what the possible lines are by finding where the peaks in the accumulator matrix happen. This gives you the actual physical locations in P where there is a peak. These locations are:
29 162
29 165
28 170
21 5
29 158
Each row gives you a gateway to the rho and theta parameters required to generate the detected line. Specifically, the first line is characterized by rho = R(29) and theta = T(162). The second line is characterized by rho = R(29) and theta = T(165) etc. To answer your question, the values in P are neither (x, y) or (ρ, ϴ). They represent the physical locations in P where cross-referencing R and T, it would give you the parameters to characterize the line that was detected in the image.
Why is the value 5 passed into houghpeaks()?
The extra 5 in houghpeaks returns the total number of lines you'd like to detect ideally. We can see that P is 5 rows, corresponding to 5 lines. If you can't find 5 lines, then MATLAB will return as many lines possible.
What is the logic behind ceil(0.3*max(H(:)))?
The logic behind this is that if you want to determine peaks in the accumulator matrix, you have to define a minimum threshold that would tell you whether the particular rho and theta combination would be considered a valid line. Making this threshold too low would report a lot of false lines and making this threshold too high misses a lot of lines. What they decided to do here was find the largest bin count in the accumulator matrix, take 30% of that, take the mathematical ceiling and any values in the accumulator matrix that are larger than this amount, those would be candidate lines.
Hope this helps!
I have a three dimensional cell that holds images (i.e. images = cell(10,4,5)) and each cell block holds images of different sizes. The sizes are not too important in terms of what I’m trying to achieve. I would like to know if there is an efficient way to compute the sharpness of each of these cell blocks (total cell blocks = 10*4*5 = 200). I need to compute the sharpness of each block using the following function:
If it matters:
40 cell blocks contain images of size 240 X 320
40 cell blocks contain images of size 120 X 160
40 cell blocks contain images of size 60 X 80
40 cell blocks contain images of size 30 X 40
40 cell blocks contain images of size 15 X 20
which totals to 200 cells.
%% Sharpness Estimation From Image Gradients
% Estimate sharpness using the gradient magnitude.
% sum of all gradient norms / number of pixels give us the sharpness
% metric.
function [sharpness]=get_sharpness(G)
[Gx, Gy]=gradient(double(G));
S=sqrt(Gx.*Gx+Gy.*Gy);
sharpness=sum(sum(S))./(480*640);
Currently I am doing the following:
for i = 1 : 10
for j = 1 : 4
for k = 1 : 5
sharpness = get_sharpness(images{i,j,k});
end
end
end
The sharpness function isn’t anything fancy. I just have a lot of data hence it takes a long time to compute everything.
Currently I am using a nested for loop that iterates through each cell block. Hope someone can help me find a better solution.
(P.S. This is my first time asking a question hence if anything is unclear please ask further questions. THANK YOU)
In Matlab I've got a 3D matrix (over 100 frames 512x512). My goal is to find some representative points through the whole hyper-matrix. To do so I've implemented the traditional (and not very efficient) method: I subdivide the large matrix into smaller sub-matrices and then I look for the pixel with the highest value. After doing that I change those relative coordinates of that very pixel in the sub-matrix to global coordinates referenced to the large matrix.
Now, I'm redesigning the algorithm. I've seen that in order to analyze a large matrix block-by-block (that's actually what I'm doing with my old algorithm) the BLOCKPROC function is very efficient. I've read the documentation but I don't know how the "fun" function should be implemented to extract that the pixel with the highest value of each block. Thank you in advance.
*I'm trying to get the coordinates of those maximum pixels referenced to the global matrix, I really don't care about their value.
First define a function to find the location of the maximum of a (sub)matrix:
function loc = max_location(M);
[~, ii] = max(M(:));
[r c] = ind2sub(size(M),ii);
loc = [r c];
Then use
blockproc(im, blocksize, #(x) x.location+max_location(x.data)-1)
where im is your image (2D array) and blocksize is a 1x2 vector specifying block size. Within blockproc, the data field is the submatrix (which you pass to max_location), and the location field contains the coordinates of the top-left corner of the submatrix (which you add to the result of max_location, minus 1).
Example:
>> blocksize = [3 3];
>> im = [ 0.3724 0.0527 0.4177 0.6981 0.0326 0.4607
0.1981 0.7379 0.9831 0.6665 0.5612 0.9816
0.4897 0.2691 0.3015 0.1781 0.8819 0.1564
0.3395 0.4228 0.7011 0.1280 0.6692 0.8555
0.9516 0.5479 0.6663 0.9991 0.1904 0.6448
0.9203 0.9427 0.5391 0.1711 0.3689 0.3763 ];
>> blockproc(im, blocksize, #(x) x.location+max_location(x.data)-1)
ans =
2 3 2 6
5 1 5 4
meaning your block maxima are located at coordinates (2,3), (5,1), (2,6) and (5,4)
Another possiblity is to use im2col for each frame. If I is your frame (512,512):
% rearranges 512 x 512 image into 4096 x 64
% each column of I2 represents a 64 x 64 block
n = 64;
I2 = im2col(I,[n,n],'distinct');
% find max in each block
% ~ to ignore that output
[~,y] = max(I2);
% convert those values to overall indices
ind = sub2ind(size(I2),y, 1:n);
% create new matrix
I3 = zeros(size(I2));
I3(ind)=1;
I3 = col2im(I3,[n,n],size(I),'distinct');
I3 should now be an image the same size of input I but with all zeros except for the locations of the maximum points in each sub-matrix.
the tricky part with the function handle "fun" is that it refers to the subblocks which are a struct, this is an object with one or more fields and one or more values assigend to each of the fields.
The values of your subblocks are stored in a field called "data" so the function call
#(x)max(x)
is not enough, in this case the correct version of that is
#(x)max(x.data)
A 2D example of what you are looking for would look like this:
a=magic(4);
b=blockproc(a,[2,2],#(x) find(x.data==max(max(x.data)))); %linear indexes
outputs
a =
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
b =
1 3
4 2
b are the linear indexes of each subblock, so that's the values 16, 13, 14, 15 in a.
Hope that helps!