How to implement a part of histogram equalization in matlab without using for loops and influencing speed and performance - performance

Suppose that I have these Three variables in matlab Variables
I want to extract diverse values in NewGrayLevels and sum rows of OldHistogram that are in the same rows as one diverse value is.
For example you see in NewGrayLevels that the six first rows are equal to zero. It means that 0 in the NewGrayLevels has taken its value from (0 1 2 3 4 5) of OldGrayLevels. So the corresponding rows in OldHistogram should be summed.
So 0+2+12+38+113+163=328 would be the frequency of the gray level 0 in the equalized histogram and so on.
Those who are familiar with image processing know that it's part of the histogram equalization algorithm.
Note that I don't want to use built-in function "histeq" available in image processing toolbox and I want to implement it myself.
I know how to write the algorithm with for loops. I'm seeking if there is a faster way without using for loops.
The code using for loops:
for k=0:255
Condition = NewGrayLevels==k;
ConditionMultiplied = Condition.*OldHistogram;
NewHistogram(k+1,1) = sum(ConditionMultiplied);
end
I'm afraid if this code gets slow for high resolution big images.Because the variables that I have uploaded are for a small image downloaded from the internet but my code may be used for sattellite images.

I know you say you don't want to use histeq, but it might be worth your time to look at the MATLAB source file to see how the developers wrote it and copy the parts of their code that you would like to implement. Just do edit('histeq') or edit('histeq.m'), I forget which.
Usually the MATLAB code is vectorized where possible and runs pretty quick. This could save you from having to reinvent the entire wheel, just the parts you want to change.

I can't think a way to implement this without a for loop somewhere, but one optimisation you could make would be using indexing instead of multiplication:
for k=0:255
Condition = NewGrayLevels==k; % These act as logical indices to OldHistogram
NewHistogram(k+1,1) = sum(OldHistogram(Condition)); % Removes a vector multiplication, some additions, and an index-to-double conversion
end
Edit:
On rereading your initial post, I think that the way to do this without a for loop is to use accumarray (I find this a difficult function to understand, so read the documentation and search online and on here for examples to do so):
NewHistogram = accumarray(1+NewGrayLevels,OldHistogram);
This should work so long as your maximum value in NewGrayLevels (+1 because you are starting at zero) is equal to the length of OldHistogram.

Well I understood that there's no need to write the code that #Hugh Nolan suggested. See the explanation here:
%The green lines are because after writing the code, I understood that
%there's no need to calculate the equalized histogram in
%"HistogramEqualization" function and after gaining the equalized image
%matrix you can pass it to the "ExtractHistogram" function
% (which there's no loops in it) to acquire the
%equalized histogram.
%But I didn't delete those lines of code because I had tried a lot to
%understand the algorithm and write them.
For more information and studying the code, please see my next question.

Related

Poor performance in matlab

So I had to write a program in Matlab to calculate the convolution of two functions, manually. I wrote this simple piece of code that I know is not that optimized probably:
syms recP(x);
recP(x) = rectangularPulse(-1,1,x);
syms triP(x);
triP(x) = triangularPulse(-1,1,x);
t = -10:0.1:10;
s1 = -10:0.1:10;
for i = 1:201
s1(i) = 0;
for j = t
s1(i) = s1(i) + ( recP(j) * triP(t(i)-j) );
end
end
plot(t,s1);
I have a core i7-7700HQ coupled with 32 GB of RAM. Matlab is stored on my HDD and my Windows is on my SSD. The problem is that this simple code is taking I think at least 20 minutes to run. I have it in a section and I don't run the whole code. Matlab is only taking 18% of my CPU and 3 GB of RAM for this task. Which is I think probably enough, I don't know. But I don't think it should take that long.
Am I doing anything wrong? I've searched for how to increase the RAM limit of Matlab, and I found that it is not limited and it takes how much it needs. I don't know if I can increase the CPU usage of it or not.
Is there any solution to how make things a little bit faster? I have like 6 or 7 of these for loops in my homework and it takes forever if I run the whole live script. Thanks in advance for your help.
(Also, it highlights the piece of code that is currently running. It is the for loop, the outer one is highlighted)
Like Ander said, use the symbolic toolbox in matlab as a last resort. Additionally, when trying to speed up matlab code, focus on taking advantage of matlab's vectorized operations. What I mean by this is matlab is very efficient at performing operations like this:
y = x.*z;
where x and z are some Nx1 vectors each and the operator '.*' is called 'dot multiplication'. This is essentially telling matlab to perform multiplication on x1*z1, x[2]*z[2] .... x[n]*z[n] and assign all the values to the corresponding value in the vector y. Additionally, many of the functions in matlab are able to accept vectors as inputs and perform their operations on each element and return an equal size vector with the output at each element. You can check this for any given function by scrolling down in its documentation to the inputs and outputs section and checking what form of array the inputs and outputs can take. For example, rectangularPulse's documentation says it can accept vectors as inputs. Therefore, you can simplify your inner loop to this:
s1(i) = s1(i) + ( rectangularPulse(-1,1,t) * triP(t(i)-t) );
So to summarize:
Avoid the symbolic toolbox in matlab until you have a better handle of what you're doing or you absolutely have to use it.
Use matlab's ability to handle vectors and arrays very well.
Deconstruct any nested loops you write one at a time from the inside out. Usually this dramatically accelerates matlab code especially when you are new to writing it.
See if you can even further simplify the code and get rid of your outer loop as well.

Never ending 'for' loop prevents my RStudio notebook from being rendered into a .md file

I'm trying to calculate the Kolmogorov-Smirnov statistic in R. I have the following sample, which clearly comes from a random variable that follows a long-tailed distribution.
Download link
https://drive.google.com/file/d/1hIgqikX7p343zdyc-Goq34THUpsZA63n/view?usp=sharing
As you may know, the Kolmogorov-Smirnov statistic requires the calculation of the empirical cumulative distribution function and the presumed cumulative distribution function. For both calculations I take the following approach: first, I create a vector with the same length as the length of the sample, and then I modify each of the components of the vector so as for it to contain the empirical cdf (or presumed cdf) of the corresponding observation of the sample.
For the sake of illustration, I'll show you the code I wrote in order to calculate the empirical cdf.
I'm assuming that the data has been read and stored in a dataframe called data.
ecdf = vector("numeric", length(data$logueos))for (i in 1:length(data$logueos)) {ecdf[i] = sum (data$logueos <= data$logueos[i])/length(data$logueos)}
The code I wrote for the calculation of the presumed cdf is analogous to the preceding one; the only difference is that I set each component of the pcdf vector equal to the formula $P(X<=t)$ —where t is the corresponding observation of the sample— according to the distribution that I'm assuming.
The problem is that this 'for' loop never ends. If I force it to end by clicking RStudio's stop button it works: it makes the vector store what I want it to store. But, if I press Ctrl+Shift+k in order to render my notebook and preview it, the load gets stuck when trying to execute the first chunk encountered that contains one of those loops.
First of all, your loop is not endless. It will finish, eventually.
You start initializing a vector with as much elements as the number of observations (1.245.888, which is a lot of iterations). This vector is FULL OF ZEROS.
What your loop does is iterate while changing each zero with the calculus sum (data$logueos <= data$logueos[i])/length(data$logueos). Check that when you stop the execution, the first values of your vector will be values between 0 and 1 while the last values is going to be 0s (because the loop hasn't arrived there yet).
So, you will have to wait more time.
In order to make the execution faster, you could consider loop parallelization (because standard loops go sequentially, one by one, and if it's too much wait, parallelization makes it faster. For example, executing 4 by 4, depending of your computer capacities). Here you'll find some information about it: https://nceas.github.io/oss-lessons/parallel-computing-in-r/parallel-computing-in-r.html
Then, my proposal to you:
if(!require(foreach)){install.packages("foreach")}; require(foreach)
registerDoParallel(detectCores() - 1)
ecdf = vector("numeric", length(data$logueos))
foreach (i=1:length(data$logueos)) %do% {
print(i)
ecdf[i] = sum (data$logueos <= data$logueos[i])/length(data$logueos)
}
The first line will download and load foreach library, that you
need for parallelization.
detectCores() - 1 is going to use all the
processors that your computer has except one (to avoid freezing your
machine) for computing this loop. You'll see that is going to be
faster!
registerDoParallel function is what tells to foreach how many cores use.

Optimizing MATLAB work on N dim array(512,512,400)

I am working on images that are 512x512 pixels; I have written a code that analyzes my images and gives me the values that I need in matrices that have dimensions (512,512,400) in 10 minutes more or less, using pre-allocation.
My problem is when I want to work with this matrices: it takes me hours to see results and I want to implement some script that does what I want in much less time. Can you help me?
% meanm is a matrix (512,512,400) that contains the mean of every inputmatrix
% sigmam is a matrix (512,512,400) that contains the std of every inputmatrix
% Basically what I want is that for every inputmatrix (512x512), that is stored inside
% an array of dimensions (512,512,400),
% if a value is higher than the meanm + sigmam it has to be changed with
% the corrispondent value of meanm matrix.
p=400;
for h=1:p
if (inputmatrix(:,:,h) > meanm(:,:,h) + sigmam(:,:,h))
inputmatrix(:,:,h) = meanm(:,:,h);
end
end
I know that MatLab performs better on matrices calculation but I have no idea how to translate this for loop on my 400 images in something easier for it.
Try using the condition of your for loop to make a logical matrix
logical_mask = (meanm + sigmam) < inputmatrix;
inputmatrix(logical_mask) = meanm(logical_mask);
This should improve your performance by using two features of Matlab
Vectorization uses matrix operations instead of loops. To quote the linked site "Vectorized code often runs much faster than the corresponding code containing loops."
Logical Indexing allows you to access all elements in your array that meet a condition simultaneously.

Fast check if element is in MATLAB matrix

I would like to verify whether an element is present in a MATLAB matrix.
At the beginning, I implemented as follows:
if ~isempty(find(matrix(:) == element))
which is obviously slow. Thus, I changed to:
if sum(matrix(:) == element) ~= 0
but this is again slow: I am calling a lot of times the function that contains this instruction, and I lose 14 seconds each time!
Is there a way of further optimize this instruction?
Thanks.
If you just need to know if a value exists in a matrix, using the second argument of find to specify that you just want one value will be slightly faster (25-50%) and even a bit faster than using sum, at least on my machine. An example:
matrix = randi(100,1e4,1e4);
element = 50;
~isempty(find(matrix(:)==element,1))
However, in recent versions of Matlab (I'm using R2014b), nnz is finally faster for this operation, so:
matrix = randi(100,1e4,1e4);
element = 50;
nnz(matrix==element)~=0
On my machine this is about 2.8 times faster than any other approach (including using any, strangely) for the example provided. To my mind, this solution also has the benefit of being the most readable.
In my opinion, there are several things you could try to improve performance:
following your initial idea, i would go for the function any to test is any of the equality tests had a success:
if any(matrix(:) == element)
I tested this on a 1000 by 1000 matrix and it is faster than the solutions you have tested.
I do not think that the unfolding matrix(:) is penalizing since it is equivalent to a reshape and Matlab does this in a smart way where it does not actually allocate and move memory since you are not modifying the temporary object matrix(:)
If your does not change between the calls to the function or changes rarely you could simply use another vector containing all the elements of your matrix, but sorted. This way you could use a more efficient search algorithm O(log(N)) test for the presence of your element.
I personally like the ismember function for this kind of problems. It might not be the fastest but for non critical parts of the code it greatly improves readability and code maintenance (and I prefer to spend one hour coding something that will take day to run than spending one day to code something that will run in one hour (this of course depends on how often you use this program, but it is something one should never forget)
If you can have a sorted copy of the elements of your matrix, you could consider using the undocumented Matlab function ismembc but remember that inputs must be sorted non-sparse non-NaN values.
If performance really is critical you might want to write your own mex file and for this task you could even include some simple parallelization using openmp.
Hope this helps,
Adrien.

Correct use of Simplify in Mathematica (with multiphase trig)

I just started working with Mathematica (5.0) for the first time, and while the manual has been helpful, I'm not entirely sure my technique has been correct using (Full)Simplify. I am using the program to check my work on a derived transform to change between reference frames, which consisted of multiplying a trio of relatively large square matrices.
A colleague and I each did the work by hand, separately, to make sure there were no mistakes. We hoped to get a third check from the program, which seemed that it would be simple enough to ask. The hand calculations took some time due to matrix size, but we came to the same conclusions. The fact that we had the same answer made me skeptical when the program produced different results.
I've checked and double checked my inputs.
I am definitely . (dot-multiplying) the matrices for correct multiplication.
FullSimplify made no difference.
Neither have combinations with TrigReduce / expanding algebraically before simplifying.
I've taken indices from the final matrix and tryed to simplify them while isolated, to no avail, so the problem isn't due to the use of matrices.
I've also tried to multiply the first two matrices, simplify, and then multiply that with the third matrix; however, this produced the same results as before.
I thought Simplify automatically crossed into all levels of Heads, so I didn't need to worry about mapping, but even where zeros would be expected as outputs in the matrix, there are terms, and where we would expect terms, there are close answers, plus a host of sin and cosine terms that do not reduce.
Does anyone frequent any type of technique with Simplify to get more preferable results, in contrast to solely using Simplify?
If there are assumptions on parameter ranges you will want to feed them to Simplify. The following simple examples will indicate why this might be useful.
In[218]:= Simplify[a*Sqrt[1 - x^2] - Sqrt[a^2 - a^2*x^2]]
Out[218]= a Sqrt[1 - x^2] - Sqrt[-a^2 (-1 + x^2)]
In[219]:= Simplify[a*Sqrt[1 - x^2] - Sqrt[a^2 - a^2*x^2],
Assumptions -> a > 0]
Out[219]= 0
Assuming this and other responses miss the mark, if you could provide an example that in some way shows the possibly bad behavior, that would be very helpful. Disguise it howsoever necessary in order to hide proprietary features: bleach out watermarks, file down registration numbers, maybe dress it in a moustache.
Daniel Lichtblau
Wolfram Research
As you didn't give much details to chew on I can only give you a few tips:
Mma5 is pretty old. The current version is 8. If you have access to someone with 8 you might ask him to try it to see whether that makes a difference. You could also try WolframAlpha online (http://www.wolframalpha.com/), which also understands some (all?) Mma syntax.
Have you tried comparing your own and Mma's result numerically? Generate a Table of differences for various parameter values or use Plot. If the differences are negligable (use Chop to cut off small residuals) the results are probably equivalent.
Cheers -- Sjoerd

Resources