Raster merge using doParallel creates temp file for each merge step - parallel-processing

I am working with the raster and glcm packages to compute Haralick texture features on satellite imagery. I have successfully run the glcm() function using a single core but am working on running it in parallel. Here is the code I'm using:
# tiles is a list of raster extents, r is a raster
registerDoParallel(7)
out_raster = foreach(i=1:length(tiles),.combine = merge,.packages=c("raster","glcm")) %dopar%
glcm(crop(r,tiles[[i]]), n_grey=16, window=c(17,17), shift = c(1,1),
min_x = rmin, max_x = rmax)
When I examine the temp files that are created, it appears each step of the merge creates a temp file, which takes a lot of hard drive space. Here is the overall image (2GB):
Full raster
and here are two of the temp files: Merge Step 1 Merge Step 2
Because the glcm function output for each tile is 3 GB, creating a temp file for each stepwise merge operation creates ~160GB of temp raster files. Is there a more space efficient way to run this in parallel?

I managed to save hard drive space by using gdal and building vrts. Below is the code I wrote running on the example data from the glcm package. The steps were 1: Create vrt files of the tiles; 2) Run the glcm function in parallel on each vrt tile (see glcm_parallel function); 3) Merge the tiles into a vrt and write the output raster using gdal warp. The vrt files are very small and the only temp files are just those created by the glcm function. This should help a lot with large rasters.
#Load Packages
library(raster)
library(sf)
library(rgdal)
library(glcm)
library(doParallel)
library(gdalUtils)
#Source helper functions
source("./tilebuild_buff.R")
source("./glcm_parallel.R")
#Read raster - example data from glcm package (saved to disk)
rasterfile = "./L5TSR_1986.tif"
r = raster("L5TSR_1986.tif")
#Create tiles directory if it doesn't exist and clear files if it exists
if(!file.exists("./tiles")){dir.create("./tiles")}
file.remove(list.files("./tiles/",full.names=T))
#Calculate tiles for parallel processing - returns x and y offsets, and widths
#to use with gdal_translate
jobs_buff = tilebuild_buff(r,nx=5,ny=2,buffer=c(5,5))
#Create vrt files for buffered tiles
for (i in 1:length(jobs_buff[,1])){
fin = rasterfile
fout = paste0("./tiles/t_",i,'.vrt')
ex = as.numeric(jobs_buff[i,])
gdal_utils('translate',fin,fout,options = c('-srcwin',ex,'-of','vrt'))
}
#Read in vrt files of raster tiles and set the nodata value
input.rasters = lapply(paste0("./tiles/", list.files("./tiles/",pattern="\\.vrt$")), raster)
for(i in 1:length(input.rasters)){ NAvalue(input.rasters[[i]])= -3.4E38 }
#Create a directory for temporary raster grids and clear files
tempdir = "./rastertemp/"
if(!file.exists(tempdir)){dir.create(tempdir)}
file.remove(list.files(tempdir,full.names=T))
registerDoParallel(6)
#Determine min and max values over original raster
rmin = cellStats(r,'min')
rmax = cellStats(r,'max')
#Run glcm function in parallel
glcm_split = foreach(i=1:length(jobs_buff[,1]),.packages=c("raster","glcm")) %dopar%
glcm_parallel(inlist = input.rasters,temp=tempdir,window=c(3,3),n_grey=16,
min_x=rmin,max_x=rmax)
#Get list of temp raster files created by glcm function
temps = paste0(tempdir,list.files(tempdir,pattern="\\.grd$"))
#trim off buffer (from tilebuild_buff function) and create mosaic raster
mosaic_rasters(temps,dst_dataset = "./mosaic.vrt", trim_margins = 5, srcnodata=-3.4E38,overwrite=T)
#write output tif
vrt_mosaic = "./mosaic.vrt"
outtif = "./final_merged.tif"
gdalwarp(vrt_mosaic,outtif,overwrite=T,verbose=T)
The two helper functions are here:
glcm_parallel <- function(inlist, temp, n_grey=16, window=c(11,11), shift=c(1,1), min_x=NULL, max_x=NULL){
require(glcm)
#todisk option required if output rasters are small enough to fit in memory
rasterOptions(tmpdir = temp, todisk=T, maxmemory = 1E8)
## run glcm over tile
r_glcm=glcm(inlist[[i]], n_grey = n_grey, window = window, shift=shift, min_x=min_x, max_x=max_x, na_opt = 'any')
}
and here:
tilebuild_buff <- function(r, nx=5, ny=2, buffer=c(0,0)){
round_xw = floor(ncol(r)/nx)
xsize = c(rep(round_xw,nx-1), round_xw + ncol(r)%%nx)
xoff = c(0,cumsum(rep(round_xw,nx-1)))
round_yh = floor(nrow(r)/ny)
ysize = c(rep(round_yh,ny-1), round_yh + nrow(r)%%ny)
yoff = c(0,cumsum(rep(round_yh,ny-1)))
pix_widths = expand.grid(xsize = xsize ,ysize = ysize)
offsets = expand.grid(xoff = xoff,yoff = yoff)
srcwins = cbind(offsets,pix_widths)
srcwins_buff = srcwins
#Add buffer
srcwins_buff$xoff = srcwins$xoff - buffer[1]
srcwins_buff$yoff = srcwins$yoff - buffer[2]
srcwins_buff$xsize = srcwins$xsize + 2*buffer[1]
srcwins_buff$ysize = srcwins$ysize + 2*buffer[2]
return(srcwins_buff)
}

Related

Tensorflow: Best way of creating training data with float as label

I want to use tensorflow version 2.4.0-dev20201009 in python 3.7.
My dataset are in the subfolder "data\Images". The label of an image is a float number between 1 and 5 and can read from the allTestData.csv from the subfolder "data".
What is the best way to read the data with validation split of 30 percent? So far I wanted to use
tf.keras.preprocessing.image_dataset_from_directory but this doesn't help me to incorperate the labels correctly, as all my images are in one folder and do not have one-hot encoded vectors as labels. How would you do this in tensorflow?
For the sake of completeness, I planed to use
def create_model():
model = keras.Sequential()
model.add(MobileNetV2(input_shape=(224, 224, 3), include_top=False))
model.trainable = True
model.add(layers.GlobalAveragePooling2D())
model.add(layers.Dense(1024, activation="relu"))
model.add(layers.Dense(1, activation="softmax"))
model.compile(optimizer='adam',
loss=tf.losses.mean_squared_error,
metrics=[tf.metrics.SparseCategoricalAccuracy()])
model.summary()
return model
for training the model. The question is only regarding how to read the training data?
I will answer my own question.
The best way was to write a manual function that reads the labels and images.
Assume that the images are in 'data\Images' and the labels are in a .txt file and the labels are in a .txt file at 'data\train_test_files\All_labels.txt'. Then the following two methods will do the job:
def loadImages(IMG_SIZE):
path = os.path.join(os.getcwd(), 'data\\Images')
training_data=[]
labelMap = getLabelMap()
for img in os.listdir(path):
out_array = np.zeros((350,350, 3), np.float32) #350x350 is the pixel size of the images
try:
img_array = cv2.imread(os.path.join(path, img))
img_array=img_array.astype('float32')
out_array = cv2.normalize(img_array, out_array, 0, 1, cv2.NORM_MINMAX)
out_array = cv2.resize(out_array, (IMG_SIZE, IMG_SIZE)
training_data.append([out_array, float(labelMap[img])])
except Exception as e:
pass
return training_data
def getLabelMap():
map = {}
path = os.getcwd()
path = os.path.join(path, "data\\train_test_files\\All_labels.txt")
f = open(path, "r")
for line in f:
line = line.split() #lines in txt file are of the form 'image_name.jpg 3.2'
map[line[0]] = line[1] #3.2 is the label
f.close()
return map
#call of method:
training_set=[]
training_set = loadImages(244) #I want to have my images resized to 244x244

matlab imwrite new image instead of overriding it

I have function that checks for bottle cap. If there is no cap detected, it writes it as an image into folder. My problem is that if I pass different image, the old one get overridden with new one, is there a way to make new image instead of overwriting old one such as nocap0,jpg, then new one nocap1.jpg etc?
Code:
function [ ] = CHECK_FOR_CAP( image )
%crop loaction of cap
imageCROP = imcrop(image,[130 0 100 50]);
%turn to BW
imageBW=im2bw(imageCROP);
%count black pixels
answer = sum(sum(imageBW==0));
%if <250 black, save it to folder NOCAP
if answer<250
imwrite(image, 'TESTINGFOLDERS/NOCAP/nocap.jpg', 'jpg');
disp('NO CAP DETECTED');
end
UPDATE
I changed the code a bit now. Everytime I give a different image it now writes new one, BUT it overwrites the previous one aswell like so: http://imgur.com/a/KIuvg
My new code:
function [ ] = CHECK_FOR_CAP( image )
folder = 'TESTINGFOLDERS/NOCAP';
filePattern = fullfile(folder, '/*.*');
ImageFiles = dir(filePattern);
%crop loaction of cap
imageCROP = imcrop(image,[130 0 100 50]);
%turn to BW
imageBW=im2bw(imageCROP);
%count black pixels
answer = sum(sum(imageBW==0));
%if <250 black, save it to folder NOCAP
if answer<250
a = length(ImageFiles)-1;
for j = 1:a
baseFileName = [num2str(j),'.jpg'];
filename = fullfile(folder,baseFileName);
if exist(filename,'file')
imwrite(image,filename);
end
imwrite(image, fullfile(filename));
end
disp('NO CAP DETECTED');
end
You write
for j = 1:a
baseFileName = [num2str(j),'.jpg'];
filename = fullfile(folder,baseFileName);
if exist(filename,'file')
imwrite(image,filename);
end
imwrite(image, fullfile(filename));
end
This means that whenever you find a file, you overwrite it. Then you overwrite it again. You do this for as many files as exist (a comes from some dir you do on your folder). What you want is the opposite: find one that does not exist. Something like this:
j = 0;
while true
j = j + 1;
baseFileName = [num2str(j),'.jpg'];
filename = fullfile(folder,baseFileName);
if ~exist(filename,'file')
break
end
end
imwrite(image, fullfile(filename));
This could be further shortened (e.g., by looping while exist(...)) but it conveys the idea...

Call a function in MATLAB GUIDE gui

I have a script called 'main.m' that basically takes the paths where I've saved all my images and insert them in arrays. It saves the images name in a .dat file and call a function named 'selectFolder.m'.
I posted all the script and functions under, my request is at the bottom.
%% Folders
imgFolder = './1.Dataset/';
functFolder = './2.Functions/' ;
%resFolder = './3.Results/';
%% Add path
addpath(genpath(imgFolder));
addpath(genpath(functFolder));
%% Listing Folders where my images are at
myFolder1 = '../Always'; %folder path
[..] %12 folders in total
myFolder12 = '../Random'; %folder path
%% Distinguish folder 'Always' & 'Random'
% Always Folders: subset of images for all users
mfA = {myFolder1, myFolder3, myFolder5, myFolder7, myFolder9, myFolder11};
dimA = length(mfA);
% Random Folders: subset of images randomly showed
mfR = {myFolder2, myFolder4, myFolder6, myFolder8, myFolder10, myFolder12};
dimR = length(mfR);
% check if folders are present
for i = 1:dimA
if ~isdir(mfA{i})
errorMessage = sprintf('Error: The following folder does not exist:\n%s', mfA{i});
uiwait(warndlg(errorMessage));
return;
end
end
for j = 1:dimR
if ~isdir(mfR{j})
errorMessage = sprintf('Error: The following folder does not exist:\n%s', mfR{j});
uiwait(warndlg(errorMessage));
return;
end
end
%% Take images and insert'em in Arrays
% Always
MyImgs1 = dir(fullfile(mfA{1}, '*.jpg'));
[..] %for every cell
MyImgs6 = dir(fullfile(mfA{6}, '*.jpg'));
% Random
MyImgs1r = dir(fullfile(mfR{1}, '*.jpg'));
[..] %for every cell
MyImgs6r = dir(fullfile(mfR{6}, '*.jpg'));
% create arrays with images names
Array_mfA = {MyImgs1.name, MyImgs2.name, MyImgs3.name, MyImgs4.name, MyImgs5.name, MyImgs6.name};
Array_mfR = {MyImgs1r.name, MyImgs2r.name, MyImgs3r.name, MyImgs4r.name, MyImgs5r.name, MyImgs6r.name};
%% Print content of array on file
fileIDA = fopen('2.Functions/Array_Always.dat','w');
formatSpec = '%s,';
nrows = length(Array_mfA);
for row = 1 : nrows
fprintf(fileIDA, formatSpec, Array_mfA{row});
end
fclose(fileIDA);
fileIDR = fopen('2.Functions/Array_Random.dat','w');
formatSpec = '%s,';
nrows = length(Array_mfR);
for row = 1 : nrows
fprintf(fileIDR, formatSpec, Array_mfR{row});
end
fclose(fileIDR);
%disclaimer
nrc = 1;
file = fopen('2.Functions/disclaimer.dat', 'w');
fprintf(file, '%d', nrc);
fclose(file);
%% call function
selectFolder(mfA, mfR);
This function takes two array as input, these array contains all the names of my images sorted. It does some operation and then it calls another function 'selectImage.m' that displays fullscreen the selected image.
function [] = selectFolder(mfA, mfR)
clc
%% Open Arrays from file
% Always
fileID = fopen('2.Functions/Array_Always.dat', 'rt');
Array_A = textscan(fileID,'%s', 'Delimiter', ',');
fclose(fileID);
% Random
fileID2 = fopen('2.Functions/Array_Random.dat', 'rt');
Array_R = textscan(fileID2,'%s', 'Delimiter', ',');
fclose(fileID2);
%% Show Disclaimer
file = fopen('2.Functions/disclaimer.dat', 'r');
dis = fscanf(file, '%d');
fclose(file);
if (dis == 1)
set(gcf,'Toolbar','none','Menubar','none', 'NumberTitle','off');
set(gcf,'units','normalized','outerposition',[0 0 1 1]);
hAx = gca;
set(hAx,'Unit','normalized','Position',[0 0 1 1]);
imshow('1.Dataset/Disclaimer/DIS.jpg');
drawnow;
nrc = 0;
file = fopen('2.Functions/disclaimer.dat', 'w');
fprintf(file, '%d', nrc);
fclose(file);
return;
end
%% select random folder from 'Array_A' aka Always Array
dimA = length(mfA);
if ~isempty(Array_A{1})
rndn = randperm(dimA, 1);
A_check = Array_A;
while isequal(A_check,Array_A)
Array_A = selectImage(mfA{rndn}, Array_A);
if isequal(A_check,Array_A)
rndn = randperm(dimA, 1);
end
end
fileIDA = fopen('2.Functions/Array_Always.dat','w');
formatSpec = '%s,';
nrows = cellfun('length', Array_A);
for row = 1 : nrows
fprintf(fileIDA, formatSpec, Array_A{1}{row});
end
fclose(fileIDA);
return;
end
%% select random folder from 'Array_R' aka Random Array
if ~isempty(Array_R{1})
dimR = length(mfR);
rndnr = randperm(dimR, 1);
R_check = Array_R;
while isequal(R_check,Array_R)
Array_R = selectImage(mfR{rndnr}, Array_R);
if isequal(R_check, Array_R)
rndnr = randperm(dimR, 1);
end
end
fileIDR = fopen('2.Functions/Array_Random.dat','w');
formatSpec = '%s,';
nrows = cellfun('length', Array_R);
for row = 1 : nrows
fprintf(fileIDR, formatSpec, Array_R{1}{row});
end
fclose(fileIDR);
end
end
selectImage:
function [ Array ] = selectImage( myFolder, Array )
%% Check
MyImgs = dir(fullfile(myFolder, '*.jpg'));
dim = length(MyImgs);
n = 0;
for i = 1 : dim
MyImgs(i).name
if ~any(strcmp(Array{1}, MyImgs(i).name))
disp(MyImgs(i).name);disp('not present in ');disp(myFolder);
n = n + 1;
end
end
if (n == dim)
disp('empty folder')
return;
end
rN = randperm(dim, 1);
baseFileName = MyImgs(rN).name;
while ~any(strcmp(Array{1}, baseFileName))
fprintf(1, 'not present %s\n', baseFileName);
rN = randperm(dim, 1);
baseFileName = MyImgs(rN).name;
end
%% Dispay image
dim = cellfun('length', Array);
for i = 1 : dim
if strcmp(baseFileName, Array{1}(i))
Array{1}(i) = [];
break
end
end
fullFileName = fullfile(myFolder, baseFileName);
fprintf(1, 'Now reading %s\n', fullFileName);
imageArray = imread(fullFileName);
set(gcf,'Toolbar','none','Menubar','none', 'NumberTitle','off');
set(gcf,'units','normalized','outerposition',[0 0 1 1]);
hAx = gca;
set(hAx,'Unit','normalized','Position',[0 0 1 1]);
imshow(imageArray); % Display image.
drawnow;
end
Now I have to integrate these functions in my gui. What I want to do is call the 'main.m' script just one time with a button like 'Let's Start' and with that will show the disclaimer.
Then repeat the process calling only the 'Next' button, which calls 'selectFolder.m' and display the images with the procedure described above.
Is it possibile to do it this way? I mean, how can I pass the variable 'mfA' and 'mfR' to selectFolder? Is there a better and simpler way to do it?
The code in the gui is like:
-main:
% --- Executes on button press in Start.
function Start_Callback(hObject, eventdata, handles)
% hObject handle to Start (see GCBO)
% eventdata reserved - to be defined in a future version of MATLAB
% handles structure with handles and user data (see GUIDATA)
axes(handles.axes1);
figure
main
-selectFolder:
function Next_Callback(hObject, eventdata, handles)
% hObject handle to Next (see GCBO)
% eventdata reserved - to be defined in a future version of MATLAB
% handles structure with handles and user data (see GUIDATA)
axes(handles.axes1);
figure %show the image in another window
selectFolder(mfA, mfR)
An easy way to share variables among the callback of a GUI is to use the
guidata function.
With respect to your specific variables mfA and mfR you can use guidata
to store them, this way: in the callback in which you generate the variables you want to share with other callback you can insert the following code:
% Get the GUI data
my_guidata=guidata(gcf);
%
% section of your code in which you create the mfA and mfR vars
%
% Store the variables to be shared among the callbacks in the guidata
my_guidata.mfA=mfA;
my_guidata.mfR=mfR;
% Save the updated GUI data
guidata(gcf,my_guidata);
In the callback in which you wnat to retreive the data, you can insert the
following code:
% Get the GUI data
my_guidata=guidata(gcf);
% Retrieve the data from the GUI data structure
mfA=my_guidata.mfA;
mfR=my_guidata.mfR;
In both the examples, the struct my_guidata holds the handles of the GUI and the additional varaibles you have defined.
With respect to the the architecture of the GUI, there are lots of possibilities.
Firt a comment: looking at the two callback you've posted at the bottom of your question, it seems that your GUI has, at least, one axes, nevertheless, you create, in both of them a new figure so it is not clear the role of that axes
Considering now your questions
What I want to do is call the 'main.m' script just one time with a button like 'Let's Start' and with that will show the disclaimer. Then repeat the process calling only the 'Next' button, which calls 'selectFolder.m' and display the images with the procedure described above
call the 'main.m' script just one time with a button like 'Let's Start' and with that will show the disclaimer
You have just to copy the relevant code of your main in the Start pushbutton callback.
Notice that the code which shows the disclaimer is actually in your selectFolder function, so you have to move it in the
Start callback.
Then repeat the process calling only the 'Next' button, which calls 'selectFolder.m' and display the images with the procedure described above
to do this, you have to remove the call to selectFolder from the main and move the body of your in the Next pushbotton callback.
Also you can copy the selectImage in the GUI .m file.
Hope this helps.
Qapla'

Reverse image search implementation

I am currently trying to make a site which will contain several images with patterns and shapes (Lets say few squares and circles of various colors and shape in each picture). And I am aiming to provide the user a way to upload their images of the pattern and do a reverse image search to check whether similar pattern image already exists in my site or not. So is there any way to implement the same, either by custom code or by using any third party api/widgets etc?
Hi Ashish below is a matlab code for a function which generates signature of a particular binary object's surface, which is nearly size dependent, you can use this concept for matching a shape on different scale.
function sig = signature(bw,prec)
boundry = bwboundaries(bw);
xy = boundry{1};
x = xy(:,1);
y = xy(:,2);
len = length(x);
res = (len/prec);
re = rem(res,2);
if re
res = ceil(res);
end
indexes = 1:res:len;
xnew = x(indexes);
ynew = y(indexes);
cx = round(mean(xnew));
cy = round(mean(ynew));
xn = abs(xnew-cx);
yn = abs(ynew-cy);
sig = (xn.^2+yn.^2);
sig = sig/max(sig);
Following is the example of how to use signature function:
clc
clear all
close all
path = 'E:\GoogleDrive\Mathworks\irisDEt\shapes';
im1 = imread([path,'\3.png']);
gray1 = ((im1));
scales = [1,2,3,4];
gray1 = im2bw(gray1);
for i = 1:length(scales)
im = imresize(gray1,scales(i));
sig = signature(im,25);
figure,plot(sig)
fra = getframe();
image = frame2im(fra);
imwrite(image,['E:\GoogleDrive\Mathworks\irisDEt\shapes\',num2str(i),'.png'])
end
following is the test image and its signature for changing in size od images which looks similar in shape.
All above signatures are generated by the code given above.

Using MATLAB to save a montage of many images as one large image file at full resolution

I am trying to save a montage of many (~500, 2MB each) images using MATLAB function imwrite, however I keep getting this error:
Error using imwrite>validateSizes (line 632)
Images must contain fewer than 2^32 - 1 bytes of data.
Error in imwrite (line 463)
validateSizes(data);
here is the code I am working with:
close all
clear all
clc
tic
file = 'ImageRegistrations.txt';
info = importdata(file);
ImageNames = info.textdata(:,1);
xoffset = info.data(:,1);
yoffset = info.data(:,2);
for i = 1:length(ImageNames);
ImageNames{i,1} = imread(ImageNames{i,1});
ImageNames{i,1} = flipud(ImageNames{i,1});
end
ImageNames = flipud(ImageNames);
for i=1:length(ImageNames)
diffx(i) = xoffset(length(ImageNames),1) - xoffset(i,1);
end
diffx = (diffx)';
diffx = flipud(diffx);
for j=1:length(ImageNames)
diffy(j) = yoffset(length(ImageNames),1) - yoffset(j,1);
end
diffy = (diffy)';
diffy = flipud(diffy);
matrix = zeros(max(diffy)+abs(min(diffy))+(2*1004),max(diffx)+abs(min(diffx))+(2*1002));
%matrix(1:size(ImageNames{1,1},1),1:size(ImageNames{1,1},2)) = ImageNames{1,1};
for q=1:length(ImageNames)
matrix((diffy(q)+abs(min(diffy))+1):(diffy(q)+abs(min(diffy))+size(ImageNames{q,1},1)),(diffx(q)+abs(min(diffx))+1):((diffx(q)+abs(min(diffx))+size(ImageNames{q,1},2)))) = ImageNames{q,1};
end
graymatrix = mat2gray(matrix);
graymatrix = flipud(graymatrix);
figure(2)
imshow(graymatrix)
imwrite(graymatrix, 'montage.tif')
toc
I use imwrite because it perserves the final montage in a full resolution file, whereas if I simply click save on the figure file it saves it as a low resolution file.
thanks!
Error does what it says on the tin, really. There is some sort of inbuilt limitation to input variable size in imwrite, and you're going over it.
Note that most images are stored as uint8 but I would guess that you end up with doubles as a result of your processing. That increases the memory usage.
It may be, therefore, that casting to another type would help. Try using im2uint8 (presuming your variable graymatrix is double, scaled between 0 and 1), before calling imwrite.

Resources