Resizing and saving images on a new directory - image

I am writing a simple function that reads a sequence of images, re-sizes them and then saves each set of re-sized images to a new folder. Here is my code:
function [ image ] = FrameResize(Folder, ImgType)
Frames = dir([Folder '/' ImgType]);
NumFrames = size(Frames,1);
new_size = 2;
for i = 1 : NumFrames,
image = double(imread([Folder '/' Frames(i).name]));
for j = 2 : 10,
new_size = power(new_size, j);
% Creating a new folder called 'Low-Resolution' on the
% previous directory
mkdir ('.. Low-Resolution');
image = imresize(image, [new_size new_size]);
imwrite(image, 'Low-Resolution');
end
end
end
I have mainly two doubts:
How can I save those images with specific names, like im_1_64, im_2_64, etc. according to the iteration and to the resolution?
How can I make the name of the folder being created change with each iteration so that I save images with the same resolution on the same folder?

Since you know the resolution will be: new_size x new_size, you can use this in the imwrite function:
imwrite(image, ['im_' num2str(i) '_' num2str(new_size) '.' ImgType]);
Assuming that ImgType holds the extension.
To setup the folders you can do something like this:
mkdir(num2str(new_size))
cd(num2str(new_size))
imwrite(image, ['im_' num2str(i) '_' num2str(new_size) '.' ImgType]);
cd ..

You have an answer you are satisfied with, but I strongly suggest doing two things differently:
Use fullfile to create/concatenate file and path names.
For example, instead of:
imread([Folder '/' Frames(i).name])
do
imread(fullfile(Folder,Frames(i).name))
It's good for relative paths too:
fullfile('..','Low-Resolution')
ans =
..\Low-Resolution
Use sprintf to create strings containing numerical data from variables. Instead of:
['im_' num2str(i) '_' num2str(new_size) '.' ImgType]
do
sprintf('im_%d_%d.%s', i, new_size, ImgType)
You can even specify how many digits you want per integer. Compare:
K>> sprintf('im_%d_%d.%s', i, new_size, ImgType)
ans =
im_2_64.png
K>> sprintf('im_%02d_%d.%s', i, new_size, ImgType)
ans =
im_02_64.png

Related

Extract Exif data from Tif

i have this code here and i would like to make it simpler by not using tif and cr2. basicly i would like to get exposure time fnumber iso and the date from the tif as variables t f S date, so that i don't have to use the cr2 file. here is my code so far:
clear all % clear workspace
RGB = imread('IMG_0069.tif');
info = imfinfo('IMG_0069.CR2'); % get Metadata
C = 1; % Constant to adjust image
x = info.DigitalCamera; % get EXIF
t = getfield(x, 'ExposureTime');% save ExposureTime
f = getfield(x, 'FNumber'); % save FNumber
S = getfield(x, 'ISOSpeedRatings');% save ISOSpeedRatings
date = getfield(x,'DateTimeOriginal'); % save DateTimeOriginal
I = rgb2gray(RGB);
You can easily concatenate strings to from names.
fname='IMG_XXX';
imread([fname, '.tif']);
iminfo([fname,'.CR2'])
iminfo should give you any info encoded in the metadata, but from the comments I can see that your files have not the information you want.

JULIA : How can l write and store files in a loop ?

l have a massive dataset that l divided into k mini datasets where k=100. Know l want to store these mini datasets in different files.
to store my massive dataset l used the following instructions :
using JLD, HDF5
X=rand(100000)
file = jldopen("path to my file/mydata.jld", "w") # the extension of file is jld so you should add packages JLD and HDF5, Pkg.add("JLD"), Pkg.add("HDF5"),
write(file, "X", X) # alternatively, say "#write file A"
close(file)
Know l divided my dataset into k sub dataset where k=100
function get_mini_batch(X)
mini_batches = round(Int, ceil(X / 100))
for i=1:mini_batches
mini_batch = X[((i-1)*100 + 1):min(i*100, end)]
file= jldopen("/path to my file/mydata.jld", "w")
write(file, "mini_batch", mini_batch) # alternatively, say "#write file mini_batch"
lose(file)
end
end
but this function allows to store the different sub dataset in one file which is overwritten at each iteration.
file= jldopen("/path to my file/mydata1.jld", "w") # at each iteration l want to get files : mydata1, mydata2 ... mydata100
file= jldopen("/path to my file/mydata2.jld", "w")
file= jldopen("/path to my file/mydata3.jld", "w")
file= jldopen("/path to my file/mydata4.jld", "w")
.
.
.
file= jldopen("/path to my file/mydata100.jld", "w")
Alternatively l tried out this procedure
function get_mini_batch(X)
mini_batches = round(Int, ceil(X / 100))
for i=1:mini_batches
mini_batch[i] = X[((i-1)*100 + 1):min(i*100, end)]
file[i]= jldopen("/path to my file/mydata.jld", "w")
write(file, "mini_batch", mini_batch) # alternatively, say "#write file mini_batch"
lose(file)
end
end
but l don't have the idea of how to make a variable i=1....100 within this line code file[i]= jldopen("/path to my file/mydata(i).jld", "w")
You are looking for string formatting.
To create the filenames, you can use #sprintf(). Then you can use these strings to write your objects to disk.
julia> using Printf # Needed in Julia 1.0.0
julia> #sprintf("myfilename%02.d.jld", 5)
"myfilename05.jld"
Example in a loop:
julia> for i in 1:3
println(#sprintf("myfilename%03.d.jl", i))
end
myfilename001.jl
myfilename002.jl
myfilename003.jl
I used %03.d here to show how you can add leading zeros to your file names. This will help later on when it comes to sorting.
I agree with niczky12 that you are looking for string formatting. But I would personally write it this alternative way:
"/path to my file/mydata$i.jld"
instead of using sprintf.
Example:
julia> i = 4
4
julia> "/path/mydata$i.jld"
"/path/mydata4.jld"

How to batch rename images in several folders?

I have 600 images (e.g. images_0, images_1, images_2, ..., images_599) which are saved in 12 folders (e.g. dataset_1, dataset_2, dataset_3, ..., dataset_12).
I am currently using this code to rename images:
mainDirectory = 'C:\Users\Desktop\data';
subDirectory = dir([mainDirectory '/dataset_*']);
for m = 1 : length(subDirectory)
subFolder = dir(fullfile(mainDirectory, subDirectory(m).name, '*.png'));
fileNames = {subFolder.name};
for iFile = 1 : numel( subFolder )
newName = fullfile(mainDirectory, subDirectory(m).name, sprintf('%00d.png', iFile));
movefile(fullfile(mainDirectory, subDirectory(m).name, fileNames{iFile}), newName);
end
end
This code works well but I want to change the format of newName to the following: number-of-dataset_name-of-image (e.g. 1_images_0, 1_images_1, 2_images_0, 2_images_1, etc.). How can I make this change to newName?
You can first split your folder name to get the 1 to 12 number
str = strsplit('dataset_12', '_'); % split along '_'
The folder number will be in str{2}.
Then concatenate this piece of information with
new_name = [str{2} '_' original_image_name]
where original_image_name is the original image name (!) - or use alternatively sprintfas you already did.

why is Matlab slow in for loop with large number of iterations but fast with a small number of iterations?

I am running a function to extract some information from 100,000+ patient xray dicom files. the files are stored within a veracrypt encryption container for security purposes.
when i run the function on a small sample of files it performs really quickly, however when i run the function on the entire dataset it is very slow in comparison, going from several files per second to 1 file per second (approximately).
i was wandering why this is happening? i have tried storing the data on an ssd and on a normal hard drive and get the same sort of slow down when using a larger dataset compared to a small one.
i have added the code below for reference but haven't commented it fully yet.. this is for my thesis so i will do it once i get the extraction finished..
thanks for any help.
function [ DB, corrupted_files ] = extract_from_dcm( folder_name )
%EXTRACT_FROM_DCM Summary of this function goes here
% Detailed explanation goes here
if nargin == 0
folder_name = 'I:\Find and Treat\MXU Old Backup\2005';
end
Database_Check = strcat(folder_name, '\DataBase.mat');
if exist(Database_Check, 'file')
load(Database_Check);
entry_start = length(DB) + 1;
else
entry_start = 1;
[ found_dicoms ] = recursive_search( folder_name );
end
mat_file_location = strcat(folder_name, '\DataBase.mat');
excel_DB_file = strcat(folder_name, '\DataBase.xlsx');
excel_Corrupted_file = strcat(folder_name, '\Corrupted_Files.xlsx');
% the recursive search creates a struct with the path for each
% dcm file found. the list is then recursivly used to locate
% the image and extract the relevant information from it.
fprintf('---------------------------------------------\n');
fprintf('Start Patient Data Extraction\n');
tic
h = waitbar(0,'','Name','Patient Data Extraction');
entry_end = length(found_dicoms);
if entry_end == 0
% set(handles.info_box, 'String', 'No Dicom Files Found in this Folder or its Subfolders');
else
% set(handles.info_box, 'String', 'Congratulations Dicom Files have been found Look Through the Data Base using the Buttons Below....Press Save Button to save the Database. (Database Save format is EXCEL SpreadSheet and MAT file');
for kk = entry_start : entry_end
progress = kk/entry_end;
progress_percent = round(progress * 100);
waitbar(progress,h, sprintf('%d%% %d/%d of images processed', progress_percent, kk, entry_end));
img_full_path = found_dicoms(kk).name;
% search_path = folder_name;
% img_full_path = strrep(img_full_path, search_path, '');
try %# Attempt to perform some computation
dicom_info = dicominfo(img_full_path); %# The operation you are trying to perform goes here
try %# Attempt to perform some computation
dicom_read = dicomread(dicom_info); %# The operation you are trying to perform goes here
old = dicominfo(img_full_path);
DB(kk).StudyDate = old.StudyDate;
DB(kk).StudyTime = old.StudyTime;
if isfield(old.PatientName, 'FamilyName')
DB(kk).Forename = old.PatientName.FamilyName;
else
DB(kk).Forename = 'NA';
end
if isfield(old.PatientName, 'GivenName')
DB(kk).LastName = old.PatientName.GivenName;
else
DB(kk).LastName = 'NA';
end
if isfield(old, 'PatientSex')
DB(kk).PatientSex = old.PatientSex;
else
DB(kk).PatientSex = 'NA';
end
if isempty(old.PatientBirthDate)
DB(kk).PatientBirthDate = '00000000';
else
DB(kk).PatientBirthDate = old.PatientBirthDate;
end
if strcmp(old.Manufacturer, 'Philips Medical Systems')
DB(kk).Van = '1';
else
DB(kk).Van = '0';% section to represent organising by different vans
end
DB(kk).img_Path = img_full_path;
save(mat_file_location,'DB','found_dicoms');
catch exception %# Catch the exception
fprintf('read - file %d corrupt.\n',kk);
continue %# Pass control to the next loop iteration
end
catch exception %# Catch the exception
fprintf('info - file %d corrupt.\n',kk);
continue %# Pass control to the next loop iteration
end
end
end
[ corrupted_files, DB ] = corruption_check( DB, found_dicoms, folder_name );
toc
fprintf('End Patient Data Extraction\n');
fprintf('---------------------------------------------\n');
fprintf('---------------------------------------------\n');
fprintf('Start Saving Extracted Data \n');
tic
save(mat_file_location,'DB','corrupted_files','found_dicoms');
if isempty(DB)
msg = sprintf('No Dicom Files Found');
msgbox(strcat(msg));
else
DB_table = struct2table(DB);
writetable(DB_table, excel_DB_file);
end
close(h);
toc
fprintf('End Saving Extracted Data \n');
fprintf('---------------------------------------------\n');
end
OK thanks for all the help..
My problem was the saving at the end of each iteration but the biggest problem was the line where i run the dicomread function. i changed the saving to occur for every 20 images processed.
I also removed the preallocation suggested in the comments to see what difference it made without the dicromread and saving a swell. it was considerably slower than with the preallocation.
... i just need to find a solution for dicomread (which i was using as a way to check if the file was corrupt or not).

Detect duplicate videos from YouTube

In consideration to my M.tech Project
I want to know if there is any algorithm to detect duplicate videos from youtube.
For example (here are links of two videos):
random user upload
upload by official channel
Amongst these second is official video and T-series has it's copyright.
Is youtube officially doing something to remove duplicate videos from youtube?
Not only videos, there exists duplicate youtube channels also.
Sometimes the original video has less number of views than that of pirated version.
So, while searching found this
(see page number [49] of pdf)
What I learnt from the given link
Original vs copyright infringed video detection Classifier is used.
Given a query, firstly top k search results are being retrieved.Thereafter three parameters are used to classify the videos
Number of subscribers
user profile
username popularity
and on the basis of these parameters, original video is identified as described in the link.
EDIT 1:
There are basically two different objectives
To identify original video with the above method
To eliminate the duplicate videos
obviously identifying original video is easier than finding out all the duplicate videos.
So i preferred to first find out the original video.
Approach which i can think till now
to improve the accuracy:
We can first find out the original videos with above method
And then use the most popular publicized frames(may be multiple) of that video to search on google image. This method therefore retrieves the list of duplicate videos in google image search results.
After getting these duplicate videos, we can once again check frame by frame and reach a level of satisfaction(yes retrieved videos were "exact or "almost" duplicate copy of original video)
Will this approach work?
if not, is there any better algorithm, to improve upon the given method?
Please write in the comments section if i am unable to explain my approach clearly.
I will soon add some more details.
I've recently hacked together a small tool for that purpose. It's still work in progress but usually pretty accurate. The idea is to simply compare time between brightness maxima in the center of the video. Therefore it should work with different resolutions, frame rates and rotation of the video.
ffmpeg is used for decoding, imageio as bridge to python, numpy/scipy for maxima computation and some k-nearest-neighbor library (annoy, cyflann, hnsw) for comparison.
At the moment it's not polished at all so you should know a little python to run it or simply copy the idea.
Me too had the same problem.. So wrote a program myself..
Problem is I had videos of various formats and resolution.. So needed to take hash of each video frame and compare.
https://github.com/gklc811/duplicate_video_finder
you can just change the directories at top and you are good to go..
from os import path, walk, makedirs, rename
from time import clock
from imagehash import average_hash
from PIL import Image
from cv2 import VideoCapture, CAP_PROP_FRAME_COUNT, CAP_PROP_FRAME_WIDTH, CAP_PROP_FRAME_HEIGHT, CAP_PROP_FPS
from json import dump, load
from multiprocessing import Pool, cpu_count
input_vid_dir = r'C:\Users\gokul\Documents\data\\'
json_dir = r'C:\Users\gokul\Documents\db\\'
analyzed_dir = r'C:\Users\gokul\Documents\analyzed\\'
duplicate_dir = r'C:\Users\gokul\Documents\duplicate\\'
if not path.exists(json_dir):
makedirs(json_dir)
if not path.exists(analyzed_dir):
makedirs(analyzed_dir)
if not path.exists(duplicate_dir):
makedirs(duplicate_dir)
def write_to_json(filename, data):
file_full_path = json_dir + filename + ".json"
with open(file_full_path, 'w') as file_pointer:
dump(data, file_pointer)
return
def video_to_json(filename):
file_full_path = input_vid_dir + filename
start = clock()
size = round(path.getsize(file_full_path) / 1024 / 1024, 2)
video_pointer = VideoCapture(file_full_path)
frame_count = int(VideoCapture.get(video_pointer, int(CAP_PROP_FRAME_COUNT)))
width = int(VideoCapture.get(video_pointer, int(CAP_PROP_FRAME_WIDTH)))
height = int(VideoCapture.get(video_pointer, int(CAP_PROP_FRAME_HEIGHT)))
fps = int(VideoCapture.get(video_pointer, int(CAP_PROP_FPS)))
success, image = video_pointer.read()
video_hash = {}
while success:
frame_hash = average_hash(Image.fromarray(image))
video_hash[str(frame_hash)] = filename
success, image = video_pointer.read()
stop = clock()
time_taken = stop - start
print("Time taken for ", file_full_path, " is : ", time_taken)
data_dict = dict()
data_dict['size'] = size
data_dict['time_taken'] = time_taken
data_dict['fps'] = fps
data_dict['height'] = height
data_dict['width'] = width
data_dict['frame_count'] = frame_count
data_dict['filename'] = filename
data_dict['video_hash'] = video_hash
write_to_json(filename, data_dict)
return
def multiprocess_video_to_json():
files = next(walk(input_vid_dir))[2]
processes = cpu_count()
print(processes)
pool = Pool(processes)
start = clock()
pool.starmap_async(video_to_json, zip(files))
pool.close()
pool.join()
stop = clock()
print("Time Taken : ", stop - start)
def key_with_max_val(d):
max_value = 0
required_key = ""
for k in d:
if d[k] > max_value:
max_value = d[k]
required_key = k
return required_key
def duplicate_analyzer():
files = next(walk(json_dir))[2]
data_dict = {}
for file in files:
filename = json_dir + file
with open(filename) as f:
data = load(f)
video_hash = data['video_hash']
count = 0
duplicate_file_dict = dict()
for key in video_hash:
count += 1
if key in data_dict:
if data_dict[key] in duplicate_file_dict:
duplicate_file_dict[data_dict[key]] = duplicate_file_dict[data_dict[key]] + 1
else:
duplicate_file_dict[data_dict[key]] = 1
else:
data_dict[key] = video_hash[key]
if duplicate_file_dict:
duplicate_file = key_with_max_val(duplicate_file_dict)
duplicate_percentage = ((duplicate_file_dict[duplicate_file] / count) * 100)
if duplicate_percentage > 50:
file = file[:-5]
print(file, " is dup of ", duplicate_file)
src = analyzed_dir + file
tgt = duplicate_dir + file
if path.exists(src):
rename(src, tgt)
# else:
# print("File already moved")
def mv_analyzed_file():
files = next(walk(json_dir))[2]
for filename in files:
filename = filename[:-5]
src = input_vid_dir + filename
tgt = analyzed_dir + filename
if path.exists(src):
rename(src, tgt)
# else:
# print("File already moved")
if __name__ == '__main__':
mv_analyzed_file()
multiprocess_video_to_json()
mv_analyzed_file()
duplicate_analyzer()

Resources