Tensorflow: Best way of creating training data with float as label - image

I want to use tensorflow version 2.4.0-dev20201009 in python 3.7.
My dataset are in the subfolder "data\Images". The label of an image is a float number between 1 and 5 and can read from the allTestData.csv from the subfolder "data".
What is the best way to read the data with validation split of 30 percent? So far I wanted to use
tf.keras.preprocessing.image_dataset_from_directory but this doesn't help me to incorperate the labels correctly, as all my images are in one folder and do not have one-hot encoded vectors as labels. How would you do this in tensorflow?
For the sake of completeness, I planed to use
def create_model():
model = keras.Sequential()
model.add(MobileNetV2(input_shape=(224, 224, 3), include_top=False))
model.trainable = True
model.add(layers.GlobalAveragePooling2D())
model.add(layers.Dense(1024, activation="relu"))
model.add(layers.Dense(1, activation="softmax"))
model.compile(optimizer='adam',
loss=tf.losses.mean_squared_error,
metrics=[tf.metrics.SparseCategoricalAccuracy()])
model.summary()
return model
for training the model. The question is only regarding how to read the training data?

I will answer my own question.
The best way was to write a manual function that reads the labels and images.
Assume that the images are in 'data\Images' and the labels are in a .txt file and the labels are in a .txt file at 'data\train_test_files\All_labels.txt'. Then the following two methods will do the job:
def loadImages(IMG_SIZE):
path = os.path.join(os.getcwd(), 'data\\Images')
training_data=[]
labelMap = getLabelMap()
for img in os.listdir(path):
out_array = np.zeros((350,350, 3), np.float32) #350x350 is the pixel size of the images
try:
img_array = cv2.imread(os.path.join(path, img))
img_array=img_array.astype('float32')
out_array = cv2.normalize(img_array, out_array, 0, 1, cv2.NORM_MINMAX)
out_array = cv2.resize(out_array, (IMG_SIZE, IMG_SIZE)
training_data.append([out_array, float(labelMap[img])])
except Exception as e:
pass
return training_data
def getLabelMap():
map = {}
path = os.getcwd()
path = os.path.join(path, "data\\train_test_files\\All_labels.txt")
f = open(path, "r")
for line in f:
line = line.split() #lines in txt file are of the form 'image_name.jpg 3.2'
map[line[0]] = line[1] #3.2 is the label
f.close()
return map
#call of method:
training_set=[]
training_set = loadImages(244) #I want to have my images resized to 244x244

Related

Raster merge using doParallel creates temp file for each merge step

I am working with the raster and glcm packages to compute Haralick texture features on satellite imagery. I have successfully run the glcm() function using a single core but am working on running it in parallel. Here is the code I'm using:
# tiles is a list of raster extents, r is a raster
registerDoParallel(7)
out_raster = foreach(i=1:length(tiles),.combine = merge,.packages=c("raster","glcm")) %dopar%
glcm(crop(r,tiles[[i]]), n_grey=16, window=c(17,17), shift = c(1,1),
min_x = rmin, max_x = rmax)
When I examine the temp files that are created, it appears each step of the merge creates a temp file, which takes a lot of hard drive space. Here is the overall image (2GB):
Full raster
and here are two of the temp files: Merge Step 1 Merge Step 2
Because the glcm function output for each tile is 3 GB, creating a temp file for each stepwise merge operation creates ~160GB of temp raster files. Is there a more space efficient way to run this in parallel?
I managed to save hard drive space by using gdal and building vrts. Below is the code I wrote running on the example data from the glcm package. The steps were 1: Create vrt files of the tiles; 2) Run the glcm function in parallel on each vrt tile (see glcm_parallel function); 3) Merge the tiles into a vrt and write the output raster using gdal warp. The vrt files are very small and the only temp files are just those created by the glcm function. This should help a lot with large rasters.
#Load Packages
library(raster)
library(sf)
library(rgdal)
library(glcm)
library(doParallel)
library(gdalUtils)
#Source helper functions
source("./tilebuild_buff.R")
source("./glcm_parallel.R")
#Read raster - example data from glcm package (saved to disk)
rasterfile = "./L5TSR_1986.tif"
r = raster("L5TSR_1986.tif")
#Create tiles directory if it doesn't exist and clear files if it exists
if(!file.exists("./tiles")){dir.create("./tiles")}
file.remove(list.files("./tiles/",full.names=T))
#Calculate tiles for parallel processing - returns x and y offsets, and widths
#to use with gdal_translate
jobs_buff = tilebuild_buff(r,nx=5,ny=2,buffer=c(5,5))
#Create vrt files for buffered tiles
for (i in 1:length(jobs_buff[,1])){
fin = rasterfile
fout = paste0("./tiles/t_",i,'.vrt')
ex = as.numeric(jobs_buff[i,])
gdal_utils('translate',fin,fout,options = c('-srcwin',ex,'-of','vrt'))
}
#Read in vrt files of raster tiles and set the nodata value
input.rasters = lapply(paste0("./tiles/", list.files("./tiles/",pattern="\\.vrt$")), raster)
for(i in 1:length(input.rasters)){ NAvalue(input.rasters[[i]])= -3.4E38 }
#Create a directory for temporary raster grids and clear files
tempdir = "./rastertemp/"
if(!file.exists(tempdir)){dir.create(tempdir)}
file.remove(list.files(tempdir,full.names=T))
registerDoParallel(6)
#Determine min and max values over original raster
rmin = cellStats(r,'min')
rmax = cellStats(r,'max')
#Run glcm function in parallel
glcm_split = foreach(i=1:length(jobs_buff[,1]),.packages=c("raster","glcm")) %dopar%
glcm_parallel(inlist = input.rasters,temp=tempdir,window=c(3,3),n_grey=16,
min_x=rmin,max_x=rmax)
#Get list of temp raster files created by glcm function
temps = paste0(tempdir,list.files(tempdir,pattern="\\.grd$"))
#trim off buffer (from tilebuild_buff function) and create mosaic raster
mosaic_rasters(temps,dst_dataset = "./mosaic.vrt", trim_margins = 5, srcnodata=-3.4E38,overwrite=T)
#write output tif
vrt_mosaic = "./mosaic.vrt"
outtif = "./final_merged.tif"
gdalwarp(vrt_mosaic,outtif,overwrite=T,verbose=T)
The two helper functions are here:
glcm_parallel <- function(inlist, temp, n_grey=16, window=c(11,11), shift=c(1,1), min_x=NULL, max_x=NULL){
require(glcm)
#todisk option required if output rasters are small enough to fit in memory
rasterOptions(tmpdir = temp, todisk=T, maxmemory = 1E8)
## run glcm over tile
r_glcm=glcm(inlist[[i]], n_grey = n_grey, window = window, shift=shift, min_x=min_x, max_x=max_x, na_opt = 'any')
}
and here:
tilebuild_buff <- function(r, nx=5, ny=2, buffer=c(0,0)){
round_xw = floor(ncol(r)/nx)
xsize = c(rep(round_xw,nx-1), round_xw + ncol(r)%%nx)
xoff = c(0,cumsum(rep(round_xw,nx-1)))
round_yh = floor(nrow(r)/ny)
ysize = c(rep(round_yh,ny-1), round_yh + nrow(r)%%ny)
yoff = c(0,cumsum(rep(round_yh,ny-1)))
pix_widths = expand.grid(xsize = xsize ,ysize = ysize)
offsets = expand.grid(xoff = xoff,yoff = yoff)
srcwins = cbind(offsets,pix_widths)
srcwins_buff = srcwins
#Add buffer
srcwins_buff$xoff = srcwins$xoff - buffer[1]
srcwins_buff$yoff = srcwins$yoff - buffer[2]
srcwins_buff$xsize = srcwins$xsize + 2*buffer[1]
srcwins_buff$ysize = srcwins$ysize + 2*buffer[2]
return(srcwins_buff)
}

Working on more than one image in Matlab

I started to learn Matlab newly. I am trying to learn about classification. I will make classification for my 23 images. In my function file I am using
I = imread('img.jpg');
a = rgb2gray(I);
bw = double(imread('mask_img.jpg'))/255;
b = rgb2gray(bw);
bwi = 1-b;
And working on the original image and ground truth of the image. I can handle one image and I have loop in the my main file.
for i=1:original_images_db.Count
original = original_images_db.ImageLocation(i);
groundtruth = original_file;
[x,y] = calculateFeatures(original, groundtruth, parameters);
dataset.HorizonFeats{i} = features;
end
And i related original_images_db with imageset to files. When i run my main file, naturally everytime it reads img from function file but actually in command file main can detect other images. My question is how can i make a loop in my function file so my data can be in all other images?
Thank you
fname={'1.jpg','2.jpg','3.jpg'};
create cell like that, it contains all file-path of images
for i=1: length(fname)
im= imread(fname{i});
end
and now you can iterate the all images
or
use dir(image_path) function
fnames = dir('image_directory_path');

Reverse image search implementation

I am currently trying to make a site which will contain several images with patterns and shapes (Lets say few squares and circles of various colors and shape in each picture). And I am aiming to provide the user a way to upload their images of the pattern and do a reverse image search to check whether similar pattern image already exists in my site or not. So is there any way to implement the same, either by custom code or by using any third party api/widgets etc?
Hi Ashish below is a matlab code for a function which generates signature of a particular binary object's surface, which is nearly size dependent, you can use this concept for matching a shape on different scale.
function sig = signature(bw,prec)
boundry = bwboundaries(bw);
xy = boundry{1};
x = xy(:,1);
y = xy(:,2);
len = length(x);
res = (len/prec);
re = rem(res,2);
if re
res = ceil(res);
end
indexes = 1:res:len;
xnew = x(indexes);
ynew = y(indexes);
cx = round(mean(xnew));
cy = round(mean(ynew));
xn = abs(xnew-cx);
yn = abs(ynew-cy);
sig = (xn.^2+yn.^2);
sig = sig/max(sig);
Following is the example of how to use signature function:
clc
clear all
close all
path = 'E:\GoogleDrive\Mathworks\irisDEt\shapes';
im1 = imread([path,'\3.png']);
gray1 = ((im1));
scales = [1,2,3,4];
gray1 = im2bw(gray1);
for i = 1:length(scales)
im = imresize(gray1,scales(i));
sig = signature(im,25);
figure,plot(sig)
fra = getframe();
image = frame2im(fra);
imwrite(image,['E:\GoogleDrive\Mathworks\irisDEt\shapes\',num2str(i),'.png'])
end
following is the test image and its signature for changing in size od images which looks similar in shape.
All above signatures are generated by the code given above.

Creating Image stacks and writing GDF file

I am attempting to write a function that stack up series of images into image stack and converting it into a gdf file. I don't really know much about GDF files, so please help me out.
X=[];
for i=1:10
if numel(num2str(i))==1
X{i}=imread(strcat('0000',num2str(i),'.tif'));
elseif numel(num2str(i))==2
X{i}=imread(strcat('000',num2str(i),'.tif'));
end
end
myImage=cat(3,X{1:10});
s=write_gdf('stack.gdf',myImage);
Above is to read my images labeled 00001 to 00010, all in grayscale. Everything is fine except in the last line
s=write_gdf('stack.gdf',myImage);
as when I run it, I receive an error:
Data type uint8 not supported
Any help on what this means? Should I convert it to some other colour format?
Thank you in advance!
I would write the code rather like this (I do not have write_gdf function so I can not properly test the code):
NumberOfFiles = 10;
X={}; % preallocate CELL array
for n=1:NumberOfFiles % do not use "i" as your varable because it is imaginary unit in MatLab
FileName = sprintf('%05d.tif',n);
img = imread(FileName); % load image
X{i} = double(img); % and convert to desired format
end
myImage = cat(3,X{1:NumberOfFiles});
s = write_gdf('stack.gdf',myImage);
Keep in mind that
double(img); % and convert to desired format
will not change data range. Your image even in double format will have data range from 0 to 255 if it was in uint8 format on disk. If you need to normalize your data to 0..1 range you should do
X{i} = double(img)/255;
or in more unversal form
X{i} = double(img) / intmax(class(img));

Figure window showing up matlab

I have written this code to help me compare different image histograms however when i run it i get a figure window popping up. I can't see anywhere in the code where i have written imshow and am really confused. Can anyone see why? thanks
%ensure we start with an empty workspace
clear
myPath= 'C:\coursework\'; %#'
number_of_desired_results = 5; %top n results to return
images_path = strcat(myPath, 'fruitnveg');
images_file_names = dir(fullfile(images_path, '*.png'));
images = cell(length(images_file_names), 3);
number_of_images = length(images);
%textures contruction
%loop through all textures and store them
disp('Starting construction of search domain...');
for i = 1:length(images)
image = strcat(images_path, '\', images_file_names(i).name); %#'
%store image object of image
images{i, 1} = imread(image);
%store histogram of image
images{i, 2} = imhist(rgb2ind(images{i, 1}, colormap(colorcube(256))));
%store name of image
images{i, 3} = images_file_names(i).name;
disp(strcat({'Loaded image '}, num2str(i)));
end
disp('Construction of search domain done');
%load the three example images
RGB1 = imread('C:\coursework\examples\salmon.jpg');
X1 = rgb2ind(RGB1,colormap(colorcube(256)));
example1 = imhist(X1);
RGB2 = imread('C:\coursework\examples\eggs.jpg');
X2 = rgb2ind(RGB2,colormap(colorcube(256)));
example2 = imhist(X2);
RGB3 = imread('C:\coursework\examples\steak.jpg');
X3 = rgb2ind(RGB3,colormap(colorcube(256)));
example3 = imhist(X3);
disp('three examples loaded');
disp('compare examples to loaded fruit images');
results = cell(length(images), 2);
results2 = cell(length(images), 2);
results3 = cell(length(images), 2);
for i = 1:length(images)
results{i,1} = images{i,3};
results{i,2} = hi(example1,images{i, 2});
end
results = flipdim(sortrows(results,2),1);
for i = 1:length(images)
results2{i,1} = images{i,3};
results2{i,2} = hi(example2,images{i, 2});
end
results2 = flipdim(sortrows(results2,2),1);
for i = 1:length(images)
results3{i,1} = images{i,3};
results3{i,2} = hi(example3,images{i, 2});
end
results3 = flipdim(sortrows(results3,2),1);
The colormap function sets the current figure's colormap, if there is no figure one is created.
The second parameter of imhist should be the number of bins used in the histogram, not the colormap.
Run your code in the Matlab debugger, step through it line by line, and see when the figure window pops up. That'll tell you what's creating it.
Etienne's answer is right for why you're getting a figure, but I'd just like to add that colormap is unnecessary in this code:
images{i, 2} = imhist(rgb2ind(images{i, 1}, colormap(colorcube(256))));
All you need is:
images{i, 2} = imhist(rgb2ind(images{i, 1}, colorcube(256)));
The second input of rgb2ind should be a colormap, yes. But the output of colorcube is a colormap already. Unless you've got an existing figure and you either want to set the colormap of it or retrieve the colormap it is currently using, the actual function colormap is not necessary.
Other than opening an unnecessary figure, the output of your existing code won't be wrong, as I think in this situation colormap will just pass as an output argument the colormap it was given as an input argument. For example, if you want to set the current figure colormap to one of the inbuilts and return the actual colormap:
cmap = colormap('bone');

Resources