I am going to work on the dilated versions of image superpixels, however bwmorph and imdilate are very slow for my application. For example the following code snippet takes more than 1 second for N=200 (parfor on 4 threads):
parfor i=1:N
idx = superpixels==i;
bwF = bwmorph(idx,'dilate',10);
end
Does anyone know about any other MATLAB code that speeds up this process?
Thanks!
Matlab's Image Processing Toolbox includes Mathematical Morphology.
The dilation function is called imdilate. The toolbox uses the GPU for high speed.
If you are searching for high performance image processing you should change to c++ and employ the GPU (CUDA for example). It's faster than using parallel cores of the cpu.
Related
I am just creating a normal image classifier for rock-paper-scissors.I am using my local gpu itself and it isnt a high end gpu. When i began training the model it kept giving the error:
ResourceExhaustedError: OOM when allocating tensor with shape.
I googled this error and they suggested I decrease my batch size which i did. It still did not solve anything however later I changed my image size to 50*50 initially it was 200*200 and then it started training with an accuracy of 99%.
Later i wanted to see if i could do it with 150*150 sized images as i found a tutorial on the official tensorflow channel on youtube I followed their exact code and it still did not work. I reduced the batch size, still no solution. Later I changed the no. of units in the dense layer initially it was 512 and then i decreased it to 200 and it worked fine but now the accuracy is pretty trash. I was just wondering is there anyway I could tune my model according to my gpu without affecting my accuracy? So I was just wondering how does the no. of units in the dense layer matter? It would really help me alot.
i=Input(shape=X_train[0].shape)
x=Conv2D(64,(3,3),padding='same',activation='relu')(i)
x=BatchNormalization()(x)
x=Conv2D(64,(3,3),padding='same',activation='relu')(x)
x=BatchNormalization()(x)
x=MaxPool2D((2,2))(x)
x=Conv2D(128,(3,3),padding='same',activation='relu')(x)
x=BatchNormalization()(x)
x=Conv2D(128,(3,3),padding='same',activation='relu')(x)
x=BatchNormalization()(x)
x=MaxPool2D((2,2))(x)
x=Flatten()(x)
x=Dropout(0.2)(x)
x=Dense(512,activation='relu')(x)
x=Dropout(0.2)(x)
x=Dense(3,activation='softmax')(x)
model=Model(i,x)
okay now when I run this with image size of 150*150 it throws that error,
if I change the size of the image to 50*50 and reduce batch size to 8 it works and gives me an accuracy of 99. but if I use 150*150 and reduce the no. of units in the dense layer to 200(random) it works fine but accuracy is very very bad.
I am using a low end nvidia geforce mx 230 gpu.
And my vram is 4 gigs
For 200x200 images the output of the last MaxPool has a shape of (50,50,128) which is then flattened and serves as the input of the Dense layer giving you in total of 50*50*128*512=163840000 parameters. This is a lot.
To reduce the amount of parameters you can do one of the following:
- reduce the amount of filters in the last Conv2D layer
- do a MaxPool of more than 2x2
- reduce the size of the Dense layer
- reduce the size of the input images.
You have already tried the two latter options. You will only find out by trial and error which method ultimately gives you the best accuracy. You were already at 99%, which is good.
If you want a platform with more VRAM available, you can use Google Colab https://colab.research.google.com/
In an algorithm we are using a median filter with a big window of 257x257 on a UInt16 image. My task is to implement that algorithm with OpenCL on a GPU.
In fact I do not only need the median but in some cases also the 0.001, 0.02 and 0.999 quantiles.
The obvious approach is to have a OpenCL kernel running on every output pixel where the kernel loads all pixels in the window form the input image to local memory, then sorts these values and finally computes the quantiles.
Now the problem is with a window size of 257x257 this approach would need at least 132098 bytes of local memory. But local memory is very limited. My Quadro K4000 has only 49152 bytes.
So what would be a good approach to implement such a median filter on the GPU?
A solution for CUDA would also be acceptable. But I guess the underlying problem would be the same.
I am trying to use Matlab parallel computing toolbox. My PC has 6 'workers' or cores. Thus, I would expect my code to run roughly 6x as fast (ie ~600% increase in speed). However, when I actually time the opertations, I find I am only getting roughly a 40% increase in speed.
Is this normal, or am I doing something wrong?
Here is my code:
N=5000;
%%Parrelel
pp=parpool(6)
ts=tic;
parfor i=1:12
q=eye(N); q^-1;
end
disp(['Time for Parrelel Computation: ' num2str(toc(ts),3) 's']);
delete(pp);
%Serial
ts=tic;
for i=1:12
q=eye(N); q^-1;
end
disp(['Time for Serial Computation: ' num2str(toc(ts),3) 's']);
The readout is:
Time for Parrelel Computation: 24.6s
Time for Serial Computation: 35.9s
I wouldve expected the parrelel computation to be roughly 35/6~=6s, not 24s
Any advice?
Thanks
Roman
I'm familiar with matlabpool, and parfor usage, but I still need to speedup the computation.
I have a more powerful computer in my 1GB network. Both computers have R2010b, and have the same code and paths.
What is the simplest way to use both computers for parallel computation?
Example of the code I use today:
--- main.m---
matlabpool('open', 3);
% ...
x = randn(1e5,1);
y = nan(size(x));
parfor k = 1 : length(x)
y(k) = myfunc(x(k));
end
--- myfunc.m---
function y = myfunc(x)
y = x; % some computation
return
For real cluster computing, you'll need the distributed computing toolbox, as you can read on the parallel computing info page:
Without changing the code, you can run the same application on a computer cluster or a grid computing service (using MATLAB Distributed Computing Serverâ„¢). You can run parallel applications interactively or in batch.
But installing (=buying) a toolbox just for adding one computer to the worker pool might be a bit too expensive. Luckily there are also alternatives: http://www.mathworks.com/matlabcentral/fileexchange/13775
I personally haven't used this, but think it's definitely worth a look.
Recently I profiled some MATLAB code and I was shocked to see the following in a heavily used function:
5.76 198694 58 persistent CONSTANTS;
3.44 198694 59 if isempty(CONSTANTS) % initialize CONSTANTS
In other words, MATLAB spent about 9 seconds, over 198694 function calls, declaring the persistent CONSTANTS and checking if it has been initialized. That represents 13% of the total time spent in that function.
Do persistent variables really carry that much of a performance penalty in MATLAB? Or are we doing something terribly wrong here?
UPDATE
#Andrew I tried your sample script and I am very, very perplexed by the output:
time calls line
6 function has_persistent
6.48 200000 7 persistent CONSTANTS
1.91 200000 8 if isempty(CONSTANTS)
9 CONSTANTS = 42;
10 end
I tried the bench() command and it showed my machine in the middle range of the sample machines. Running Ubuntu 64 bits on a Intel(R) Core(TM) i7 CPU, 4GB RAM.
That's the standard way of using persistent variables in Matlab. You're doing what you're supposed to. There will be noticable overhead for it, but your timings do seem kind of surprisingly high.
Here's a similar test I ran in 32-bit Matlab R2009b on a 3.0 GHz Intel Core 2 QX9650 machine under Windows XP x64. Similar results on other machines and versions. About 5x faster than your timings.
Test:
function call_has_persistent
for i = 1:200000
has_persistent();
end
function has_persistent
persistent CONSTANTS
if isempty(CONSTANTS)
CONSTANTS = 42;
end
Results:
0.89 200000 7 persistent CONSTANTS
0.25 200000 8 if isempty(CONSTANTS)
What Matlab version, OS, and CPU are you running on? What does CONSTANTS get initialized with? Does Matlab's bench() output seem reasonable for your machine?
Your timings do seem high. There may be a bug or config issue there to fix. But if you really want to get Matlab code fast, the standard advice is to "vectorize" it: restructure the code so that it makes fewer function calls on larger input arrays, and makes use of Matlab's built in vectorized functions instead of loops or control structures, to avoid having 200,000 calls to the function in the first place. If possible. Matlab has relatively high overhead per function or method call (see Is MATLAB OOP slow or am I doing something wrong? for some numbers), so you can often get more mileage by refactoring to eliminate function calls instead of making the individual function calls faster.
It may be worth benchmarking some other basic Matlab operations on your machine, to see if it's just "persistent" that seems slow. Also try profiling just this little call_has_persistent test script in isolation to see if the context of your function makes a difference.