Matlab Query: Image Processing, Editing the Script - image

I am quite new to image processing and would like to produce an array that stores 10 images. After which I would like to run a for loop through some code that identifies some properties of the images, specifically the surface area of a biological specimen, which then spits out an array containing 10 areas.
Below is what I have managed to scrap up so far, and this is the ensuing error message:
??? Index exceeds matrix dimensions.
Error in ==> Testing1 at 14
nova(i).img = imread([myDir B(i).name]);
Below is the code I've been working on so far:
my_Dir = 'AC04/';
ext_img='*.jpg';
B = dir([my_Dir ext_img]);
nfile = max(size(B));
nova = zeros(1,nfile);
for i = 1:nfile
nova(i).img = imread([myDir B(i).name]);
end
areaarray = zeros(1,nfile);
for k = 1:nfile
[nova(k), threshold] = edge(nova(k), 'sobel');
.
.
.
.%code in this area is irrelevant to the problem I think%
.
.
.
areaarray(k) = bwarea(BWfinal);
end
areaarray

There are few ways you could store an image in a kind of an array structure in Matlab. You could use array of structs. In that case you could do as you did:
nova(i).img = imread([myDir B(i).name]);
You access first image with nova(1).img, second one with nova(2).img etc.
Other way to do it is to use cell array (similar to arrays but are more flexible in the sense that members could be of the different type):
nova{i} = imread([myDir B(i).name]);
You access first image with nova{1}, second one with nova{2} etc.
[ IMPORTANT ] In both cases you should remove this line from code:
nova = zeros(1,nfile);
I suppose you've tried to pre-allocate memory for images, and since you're beginner I advise you not to be concerned with it. It is an optimization concern to be addressed if you come across some performance issues - and if you don't come across them, take advantage of Matlab's automatic memory (re)allocation.

Related

cellfun in Matlab and Classification with Wavelet Scattering

I want to apply the following example to my data:
https://www.mathworks.com/help/wavelet/ug/digit-classification-with-wavelet-scattering.html
I have more than 4000 images. Images are 224x224x3. In other words 244*244 with 3 channels. After I load images in Matlab I want to apply "Wavelet Image Scattering Feature Extraction". In the beginning, I got the following error:
Error using tall/cellfun (line 21)
Argument 2 to CELLFUN must be one of the following data types: cell.
My codes are:
sf = waveletScattering2('ImageSize',[224 224],'InvarianceScale',112, ...
'NumRotations',[8 8]);
Ttrain = tall(x_train.X);
Ttest = tall(x_test.X);
trainfeatures = cellfun(#(x)helperScatImages(sf,x),Ttrain,'UniformOutput',false);
testfeatures = cellfun(#(x)helperScatImages(sf,x),Ttest,'UniformOutput',false);
As an example Ttrain is in the above code is:
4093x224x224x3 tall single (unevaluated)
How should I change the entire code in https://www.mathworks.com/help/wavelet/ug/digit-classification-with-wavelet-scattering.html to work properly?
Thank you in advance for any help.

Reduce the output layer size from XLTransformers

I'm running the following using the huggingface implementation:
t1 = "My example sentence is really great."
tokenizer = TransfoXLTokenizer.from_pretrained('transfo-xl-wt103')
model = TransfoXLLMHeadModel.from_pretrained("transfo-xl-wt103")
encoded_input = tokenizer(t1, return_tensors='pt', add_space_before_punct_symbol=True)
output = model(**encoded_input)
tmp = output[0].detach().numpy()
print(tmp.shape)
>>> (1, 7, 267735)
With the goal of getting output embeddings that I'll use downstream.
The last dimension is /substantially/ larger than I expected, and it looks like it is the size of the entire vocab_size rather than a reduction based on the ECL from the paper (which potentially I am misinterpreting).
What argument would I provide the model to reduce this layer size to a smaller dimensional space, something more like the basic BERT at 400 or 768 and still obtain good performance based on the pretrained embeddings?
That's because you used ...LMHeadModel, which predicts the next token. You can use TransfoXLModel.from_pretrained("transfo-xl-wt103") instead, then output[0] is the last hidden state which has the shape (batch_size, sequence_length, hidden_size).

Matlab : image region analyzer. Alternative for 'bwpropfilt'?

I'm running basic edge detection to detect windows region based on this http://www.mathworks.com/videos/edge-detection-with-matlab-119353.html
The edge works successfully :
final_edge = edge(gray_I,'sobel');
BW_out = bwareaopen(imfill(final_edge,'holes'),20);
figure;
imshow(BW_out);
Now when come to these following codes to filter image based on properties, it seems like my MATLAB R2013a can't identify this bwpropfilt method.
% imageRegionAnalyzer(BW);
% Filter image based on image properties
BW_out = bwpropfilt(BW_out,'Area', [400, 467]);
BW_out = bwpropfilt(BW_out,'Solidity',[0.5, 1]);
It says:
Undefined function 'bwpropfilt' for input arguments of type 'char'.
Then what should be my alternative to change this bwpropfilt?
bwpropfilt simply takes a look at the corresponding attribute that is output from regionprops and gives you objects that conform to that certain range and also filtering out those that are outside of the range. You can rewrite the algorithm by explicitly calling regionprops, creating a logical array to index into the structure to retain only the values within the right range (seen in the third input of bwpropfilt) corresponding to the property you want to examine (seen in the second input of bwpropfilt). If you want to finally reconstruct the image after filtering, you'll need to use the column major linear indices found in the PixelIdxList attribute, stack them all into a single vector and write to a new output image by setting all of these values to true.
Specifically, you can use the following code to reproduce the last two lines of code you have shown:
% Run regionprops and get all properties
s = regionprops(BW_out, 'all');
%%% For the first line of code
values = [s.Area];
s = s(values > 400 & values < 467);
%%% For the second line of code
values = [s.Solidity];
s = s(values > 0.5 & values < 1);
% Stack column major indices
ind = vertcat(s.PixelIdxList);
% Create output image
final_out = false(size(BW_out));
final_out(ind) = true;
final_out contains the filtered image only retaining the values within the range specified by the desired property.
Caution
The above logic only works for attributes returned from regionprops that contain only a single scalar value per unique region. If you examine the supported properties found in bwpropfilt, you will see that this list is a subset of the full list found in regionprops. This makes sense as certain regionprops properties return a vector or a matrix depending on what you choose so using a range to filter out properties becomes ambiguous if you have multiple values that characterize a particular unique region returned by regionprops.
Minor Note
Being curious, I opened up bwpropfilt to see how it is implemented as I currently have MATLAB R2016a. The above logic, with the exception of some exception handling, is essentially how bwpropfilt has been implemented so the code that I wrote is in line with the logic of the function.

How to setup my function with blockproc to process the image in parts?

I have an image:
I want to divide this image into 3 equal parts and calculate the SIFT for each part individually and then concatenate the results.
I found out that Matlab's blockproc does just that, but I do not know how to get it to work with my function. Here is what I have:
[r c] = size(image);
c_new = floor(c/3); %round it
B = blockproc(image, [r c_new], #block_fun)
So according to Matlabs documentation the function, block_fun will be applied to the original image in blocks of size r and c_new.
this is what I wrote as block_fun
function feats = block_fun(img)
[keypoints, descriptors] = vl_sift(single(img));
feats = descriptors;
end
So, my matrix B should be a concatenation of the SIFT descriptors of all three parts of the same image? right?
But the error that I get when I run the command:
B = blockproc(image, [r c_new], #block_fun)
Function BLOCKPROC encountered an error while evaluating the user
supplied function handle, FUN.
The cause of the error was:
Error using single Conversion to single from struct is not possible.
For your custom function, blockproc sends in a structure where the image data is stored in a field called data. As such, you simply need to change your function so that it accesses the data field in the input. Like so:
function feats = block_fun(block_struct) %// Change
[keypoints, descriptors] = vl_sift(single(block_struct.data)); %// Change
feats = descriptors;
end
This error is caused by the fact that the function that is called via its handle by blockproc expects a block struct.
The real problem is that blockproc will attempt to concatenate all results and you will have a different set of 128xN feature vectors for each block, which blockproc doesn't allow.
I think that using im2col and reshape would be much more simple.

Removing a "row" from a structure array

This is similar to a question I asked before, but is slightly different:
So I have a very large structure array in matlab. Suppose, for argument's sake, to simplify the situation, suppose I have something like:
structure(1).name, structure(2).name, structure(3).name structure(1).returns, structure(2).returns, structure(3).returns (in my real program I have 647 structures)
Suppose further that structure(i).returns is a vector (very large vector, approximately 2,000,000 entries) and that a condition comes along where I want to delete the jth entry from structure(i).returns for all i. How do you do this? or rather, how do you do this reasonably fast? I have tried some things, but they are all insanely slow (I will show them in a second) so I was wondering if the community knew of faster ways to do this.
I have parsed my data two different ways; the first way had everything saved as cell arrays, but because things hadn't been working well for me I parsed the data again and placed everything as vectors.
What I'm actually doing is trying to delete NaN data, as well as all data in the same corresponding row of my data file, and then doing the very same thing after applying the Hampel filter. The relevant part of my code in this attempt is:
for i=numStock+1:-1:1
for j=length(stock(i).return):-1:1
if(isnan(stock(i).return(j)))
for k=numStock+1:-1:1
stock(k).return(j) = [];
end
end
end
stock(i).return = sort(stock(i).return);
stock(i).returnLength = length(stock(i).return);
stock(i).medianReturn = median(stock(i).return);
stock(i).madReturn = mad(stock(i).return,1);
end;
for i=numStock:-1:1
for j = length(stock(i+1).volume):-1:1
if(isnan(stock(i+1).volume(j)))
for k=numStock:-1:1
stock(k+1).volume(j) = [];
end
end
end
stock(i+1).volume = sort(stock(i+1).volume);
stock(i+1).volumeLength = length(stock(i+1).volume);
stock(i+1).medianVolume = median(stock(i+1).volume);
stock(i+1).madVolume = mad(stock(i+1).volume,1);
end;
for i=numStock+1:-1:1
for j=stock(i).returnLength:-1:1
if (abs(stock(i).return(j) - stock(i).medianReturn) > 3*stock(i).madReturn)
for k=numStock+1:-1:1
stock(k).return(j) = [];
end
end;
end;
end;
for i=numStock:-1:1
for j=stock(i+1).volumeLength:-1:1
if (abs(stock(i+1).volume(j) - stock(i+1).medianVolume) > 3*stock(i+1).madVolume)
for k=numStock:-1:1
stock(k+1).volume(j) = [];
end
end;
end;
end;
However, this returns an error:
"Matrix index is out of range for deletion.
Error in Failure (line 110)
stock(k).return(j) = [];"
So instead I tried by parsing everything in as vectors. Then I decided to try and delete the appropriate entries in the vectors prior to building the structure array. This isn't returning an error, but it is very slow:
%% Delete bad data, Hampel Filter
% Delete bad entries
id=strcmp(returns,'');
returns(id)=[];
volume(id)=[];
date(id)=[];
ticker(id)=[];
name(id)=[];
permno(id)=[];
sp500(id) = [];
id=strcmp(returns,'C');
returns(id)=[];
volume(id)=[];
date(id)=[];
ticker(id)=[];
name(id)=[];
permno(id)=[];
sp500(id) = [];
% Convert returns from string to double
returns=cellfun(#str2double,returns);
sp500=cellfun(#str2double,sp500);
% Delete all data for which a return is not a number
nanid=isnan(returns);
returns(nanid)=[];
volume(nanid)=[];
date(nanid)=[];
ticker(nanid)=[];
name(nanid)=[];
permno(nanid)=[];
% Delete all data for which a volume is not a number
nanid=isnan(volume);
returns(nanid)=[];
volume(nanid)=[];
date(nanid)=[];
ticker(nanid)=[];
name(nanid)=[];
permno(nanid)=[];
% Apply the Hampel filter, and delete all data corresponding to
% observations deleted by the filter.
medianReturn = median(returns);
madReturn = mad(returns,1);
for i=length(returns):-1:1
if (abs(returns(i) - medianReturn) > 3*madReturn)
returns(i) = [];
volume(i)=[];
date(i)=[];
ticker(i)=[];
name(i)=[];
permno(i)=[];
end;
end
medianVolume = median(volume);
madVolume = mad(volume,1);
for i=length(volume):-1:1
if (abs(volume(i) - medianVolume) > 3*madVolume)
returns(i) = [];
volume(i)=[];
date(i)=[];
ticker(i)=[];
name(i)=[];
permno(i)=[];
end;
end
As I said, this is very slow, probably because I'm using a for loop on a very large data set; however, I'm not sure how else one would do this. Sorry for the gigantic post, but does anyone have a suggestion as to how I might go about doing what I'm asking in a reasonable way?
EDIT: I should add that getting the vector method to work is probably preferable, since my aim is to put all of the return vectors into a matrix and get all of the volume vectors into a matrix and perform PCA on them, and I'm not sure how I would do that using cell arrays (or even if princomp would work on cell arrays).
EDIT2: I have altered the code to match your suggestion (although I did decide to give up speed and keep with the for-loops to keep with the structure array, since reparsing this data will be way worse time-wise). The new code snipet is:
stock_return = zeros(numStock+1,length(stock(1).return));
for i=1:numStock+1
for j=1:length(stock(i).return)
stock_return(i,j) = stock(i).return(j);
end
end
stock_return = stock_return(~any(isnan(stock_return)), : );
This returns an Index exceeds matrix dimensions error, and I'm not sure why. Any suggestions?
I could not find a convenient way to handle structures, therefore I would restructure the code so that instead of structures it uses just arrays.
For example instead of stock(i).return(j) I would do stock_returns(i,j).
I show you on a part of your code how to get rid of for-loops.
Say we deal with this code:
for j=length(stock(i).return):-1:1
if(isnan(stock(i).return(j)))
for k=numStock+1:-1:1
stock(k).return(j) = [];
end
end
end
Now, the deletion of columns with any NaN data goes like this:
stock_return = stock_return(:, ~any(isnan(stock_return)) );
As for the absolute difference from medianVolume, you can write a similar code:
% stock_return_length is a scalar
% stock_median_return is a column vector (eg. [1;2;3])
% stock_mad_return is also a column vector.
median_return = repmat(stock_median_return, stock_return_length, 1);
is_bad = abs(stock_return - median_return) > 3.* stock_mad_return;
stock_return = stock_return(:, ~any(is_bad));
Using a scalar for stock_return_length means of course that the return lengths are the same, but you implicitly assume it in your original code anyway.
The important point in my answer is using any. Logical indexing is not sufficient in itself, since in your original code you delete all the values if any of them is bad.
Reference to any: http://www.mathworks.co.uk/help/matlab/ref/any.html.
If you want to preserve the original structure, so you stick to stock(i).return, you can speed-up your code using essentially the same scheme but you can only get rid of one less for-loop, meaning that your program will be substantially slower.

Resources