How to get my X_rotated with Keras ImageDataGenerator - image

Hi I just want my databasis of MNIST randomly rotated:
I have my X which is a numpy array (5000, 1, 28, 28)
I want the X_rotated in the same order same dimension.
I have made this:
datagen = ImageDataGenerator(rotation_range=360)
datagen.fit(X)
Now how I get my X_rotated ?
They only explain how to make tricky stuff with epoch and batch I just want to get back my array where each image has randomly rotated nothing more, nothing tricky.
I don't understand why on tutorials they only explain how to make the tricky stuff but not the basics...
https://keras.io/preprocessing/image/

The NumpyArrayIterador and DirectoryIterator objects are pretty much like any python iterator:
g = ImageDataGenerator(...)
d = g.flow(..., batch_size=256, shuffle=False)
# flow all batches through the iterator,
# then zip all inputs and outputs, respectively.
batches = zip(*(next(d) for _ in range(len(d))))
# concatenate all inputs and outputs, respectively.
x, y = (np.concatenate(b) for b in batches)
print(x.shape, y.shape)
This should output something similar to this:
(5000, 1, 28, 28) (5000, ?)

Related

Input f into play3d() and movie3d() in the rgl package in R

I don't understand the input f expected by play3d and movie3d in the rgl package.
library(rgl)
nobs<-10
x<-runif(nobs)
y<-runif(nobs)
z<-runif(nobs)
n<-rep(1:nobs)
df<-as.data.frame(cbind(x,y,z,n))
listofobs<-split(df,n)
plot3d(df[,1],df[,2],df[,3], type = "n", radius = .2 )
myplotfunction<-function(x) {
rgl.spheres(x=x$x,y=x$y,z=x$z, type="s", r=0.025)
}
When executing the 2 lines below, the animation does play but both lines (play3d() and movie3d()) trigger the error displayed below:
play3d(f=lapply(listofobs,myplotfunction), fps=1 )
movie3d(f=lapply(listofobs,myplotfunction), fps=1 , duration=20)
I am hoping someone can correct my code and help me understand the f input to play3d and movie3d.
Question 1: Why is the play3d line above correct enough that the animation does display correctly?
Question 2: Why is the play3d line above incorrect enough that it triggers the error?
Question 3: What is wrong with the movie3d line that it does not produce a video output?
As the docs say, f is "A function returning a list that may be passed to par3d". It's not a list, which is what your usage passes.
To answer the questions:
R evaluates the lapply call which does the animation, then play3d looks at the result and dies because it's not a function.
f needs to be a function, as described in the help page.
It dies when it looks at f, because it's not a function.
This looks like it will do what you want:
library(rgl)
nobs<-10
x<-runif(nobs)
y<-runif(nobs)
z<-runif(nobs)
df<-data.frame(x,y,z)
plot3d(df, type = "n" )
id <- NA
myplotfunction<-function(time) {
index <- round(time)
# For a 3x faster display, use index <- round(3*time)
# To cycle through the points several times, use
# index <- round(3*time) %% nobs + 1
if (!is.na(id))
pop3d(id = id) # Delete previous item
id <<- spheres3d(df[index,], r=0.025)
list()
}
play3d(myplotfunction, startTime = 1, duration = nobs - 1)
movie3d(myplotfunction, startTime = 1, duration = nobs - 1, fps = 1)
This will leave a GIF in file.path(tempdir(), "movie.gif").
Some other notes:
don't call rgl.spheres. It will cause you immense pain later. Use spheres3d, or never call any *3d function, and never upgrade rgl: you're living in the past using the rgl.* functions. The *3d functions and the rgl.* functions don't play nicely together.
to construct a dataframe, just use the data.frame() function, don't convert
a matrix.
you don't need all those contortions to extract points from the dataframe.
Most rgl functions can handle a dataframe with x, y, and z columns.
You might notice the plot3d frame move a little: spheres are bigger than points, so it will adjust to accommodate them. You could use xlim, ylim and zlim to set the original frame a little bigger if you don't like this.

Even an image in data set used to train is giving opposite values when making prediction

I am new to ML and TensorFlow. I am trying to build a CNN to categorize a good image against corrupted images, similar to rock paper scissor tutorials in tensor flow, except for only two categories.
The Colab Notebook
Model Architecture
train_generator = training_datagen.flow_from_directory(
TRAINING_DIR,
target_size=(150,150),
class_mode='categorical'
)
validation_generator = validation_datagen.flow_from_directory(
VALIDATION_DIR,
target_size=(150,150),
class_mode='categorical'
)
model = tf.keras.models.Sequential([
# Note the input shape is the desired size of the image 150x150 with 3 bytes color
# This is the first convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(150, 150, 3)),
tf.keras.layers.MaxPooling2D(2, 2),
# The second convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# The third convolution
tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# The fourth convolution
tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# Flatten the results to feed into a DNN
tf.keras.layers.Flatten(),
tf.keras.layers.Dropout(0.5),
# 512 neuron hidden layer
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dense(2, activation='softmax')
])
model.summary()
model.compile(loss = 'categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
history = model.fit_generator(train_generator, epochs=25, validation_data = validation_generator, verbose = 1)
model.save("rps.h5")
Only Change I made was turning input shape to (150,150,1) to (150,150,3) and changed last layers output to 2 neurons from 3. The training gave me consistently accuracy of 90 above for data set of 600 images in each class. But when I am making a prediction using code in the tutorial, it gives me highly wrong values even for data in the data set.
PREDICTION
Original code in TensorFlow tutorial
for file in onlyfiles:
path = fn
img = image.load_img(path, target_size=(150, 150,3)) # changed target_size to (150, 150,3)) from (150,150 )
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
images = np.vstack([x])
classes = model.predict(images, batch_size=10)
print(fn)
print(classes)
I changed target_size to (150, 150,3)) from (150,150) in my belief that since my input is a 3 channel image,
Result
It gives very wrong values [0,1][0,1] for even images in which are in dataset
But when I changed the code to this
for file in onlyfiles:
path = fn
img = image.load_img(path, target_size=(150, 150,3))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x /= 255.
classes = model.predict(images, batch_size=10)
print(fn)
print(classes)
In this case values come like
[[9.9999774e-01 2.2242968e-06]]
[[9.9999785e-01 2.1864464e-06]]
[[9.9999785e-01 2.1641024e-06]]
one or two errors are there but it is very much correct
So my question even though the last activation is softmax, why it is now coming in decimal values, is there any logical mistake in the way I am making predictions.? I tried binary also, but couldn't find much difference.
Please note -
When you are changing output classes from 2 to 3, you are asking the model to categorise into 3 classes. This would contradict your problem statement which separates good and corrupted ones i.e 2 output classes (a binary problem). I think it can be reversed from 3 to 2 if I have understood the question correctly.
Second the output you are getting is perfectly correct, the neural network models outputs probabilities instead of absolute class values like 0 or 1. By probability, it tells how likely it belongs to say class 0 or class 1.
Also , as mentioned above by #BBloggsbott - you just have to use np.argmax on the output array which will tell you the probability of belonging to class 1 (Positive class) by default.
Hope this helps.
Thanks.
Softmax returns probability distributions for the vector it gets as input. So, the fact that you are getting decimal values is not a problem. If you want to find the exact class each image belongs to, try using the argmax function on the predictions.

How to save Julia for loop returns in an array or dataframe?

I am trying to apply a function over each row of a DataFrame as the code shows.
using RDatasets
iris = dataset("datasets", "iris")
function mean_n_var(x)
mean1=mean([x[1], x[2], x[3], x[4]])
var1=var([x[1], x[2], x[3], x[4]])
rst=[mean1, var1]
return rst
end
mean_n_var([2,4,5,6])
for row in eachrow(iris[1:4])
println(mean_n_var(convert(Array, row)))
end
However, instead of printing results, I'd like to save them in an array or another DataFrame.
Thanks in advance.
I thought it is worth to mention some more options available over what was already mentioned.
I assume you want a Matrix or a DataFrame. There are several possible approaches.
First is the most direct to get a Matrix:
mean_n_var(a) = [mean(a), var(a)]
hcat((mean_n_var(Array(x)) for x in eachrow(iris[1:4]))...) # rows
vcat((mean_n_var(Array(x)).' for x in eachrow(iris[1:4]))...) # cols
another possible approach is vectorized, e.g.:
mat_iris = Matrix(iris[1:4])
mat = hcat(mean(mat_iris, 2), var(mat_iris, 2))
df = DataFrame([vec(f(mat_iris, 2)) for f in [mean,var]], [:mean, :var])
DataFrame(mat) # this constructor also accepts variable names on master but is not released yet

For-loop/ simple data extraction and comparison in R

In this example dataset i have created a column called 'Var'. This is the result i would like from a the code. The pseudo-code to give Var is like this : For each ID_Survey, compare the Distance in sequence, if the difference between sequential Distances is 10, then Var=1, otherwise Var=0. Var should be 1 for both elements of the sequence where the difference is 10.
#Generate data
ID_Survey=rep(seq(1,3,1),each=4)
Distance= c(0,25,30,40,50,160,170,190,200,210,1000,1010)
Var= c(0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1);
TestData=data.frame(cbind(ID_Survey,Distance,Var))
TestData
I can use a simple for-loop like this, which nearly works, but it trips-up when moving between ID_Survey.
for(i in 1:(nrow(TestData)-1)){
TestData$Var2[i]=(TestData$Distance[i+1]==TestData$Distance[i]+10)}
I need to incorporate the above into a function which splits the data.frame into groups based on ID_Survey. I'm trying to build something like the following...
New6=do.call(rbind, by(TestData,list(TestData$ID_Survey),
FUN=function(x)
for (i in nrow(x)){ #loop must build an argument then return it.
#conditional statements in here.
return(x[i,])})); #loop appears to return 1st argument only.
... but i can't get the for-loop to operate inside the by-statement.
Any guidance much appreciated. Many thanks.
Using the data.table function (.SD) manages separating and collating chunks of the data.frame (as defined by ID_Survey) after it has been sent to a function. No doubt someone else will have a more elegant solution, but this seems to do the job:
library(data.table)
ComPair=function(a,b){V=ifelse(a==b-10,TRUE,FALSE);return(V)}
TestFunction=function(FData){
if(nrow(FData)>1){
for(i in 1:(nrow(FData)-1)){
V=ComPair(FData$Distance[i],FData$Distance[i+1])
if(V==1){ FData$Var2[i]=V;FData$Var2[i+1]=V}
}
};return(FData)}
TestData_dt=data.table(TestData)
TestData2=TestData_dt[,TestFunction(.SD),ID_Survey]
TestData2

Extrapolating variance components from Weir-Fst on Vcftools

vcftools --vcf ALL.chr1.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf --weir-fst-pop POP1.txt --weir-fst-pop POP2.txt --out fst.POP1.POP2
The above script computes Fst distances on 1000 Genomes population data using Weir and Cokerham's 1984 formula. This formula uses 3 variance components, namely a,b,c (between populations; between individuals within populations; between gametes within individuals within populations).
The output directly provides the result of the formula but not the components that the program calculated to arrive at the final result. How can I ask Vcftools to output the values for a,b,c?
If you can get the data into the format for hierfstat, you can get the variance components from varcomp.glob. What I normally do is:
use vcftools with --012 to get genotypes
convert 0/1/2/-1 to hierfstat format (eg., 11/12/22/NA)
load the data into hierfstat and compute (see below)
R example:
library(hierfstat)
data = read.table("hierfstat.txt", header=T, sep="\t")
levels = data.frame(data$popid)
loci = data[,2:ncol(data)]
res = varcomp.glob(levels=levels, loci=loci, diploid=T)
print(res$loc)
print(res$F)
Fst for each locus (row) therefore is (without hierarchical design), from res$loc: res$loc[1]/sum(res$loc). If you have more complicated sampling, you'll need to interpret the variance components differently.
--update per your comment--
I do this in Pandas, but any language would do. It's a text replacement exercise. Just get your .012 file into a dataframe and convert as below. I read in row by row into numpy b/c I have tons of snps, but read_csv would work, too.
import pandas as pd
import numpy as np
z12_data = []
for i, line in enumerate(open(z12_file)):
line = line.strip()
line = [int(x) for x in line.split("\t")]
z12_data.append(np.array(line))
if i % 10 == 0:
print i
z12_data = np.array(z12_data)
z12_df = pd.DataFrame(z12_data)
z12_df = z12_df.drop(0, axis=1)
z12_df.columns = pd.Series(z12_df.columns)-1
hierf_trans = {0:11, 1:12, 2:22, -1:'NA'}
def apply_hierf_trans(series):
return [hierf_trans[x] if x in hierf_trans else x for x in series]
hierf = df.apply(apply_hierf_trans)
hierf.to_csv("hierfstat.txt", header=True, index=False, sep="\t")
Then, you'd read that file hierfstat.txt into R, these are your loci. You'd need to specify your levels in your sampling design (e.g., your population). Then call varcomp.glob() to get the variance components. I have a parallel version of this here if you want to use it.
Note that you are specifying 0 as the reference allele, in this case. May be what you want, maybe not. I often calculate minor allele frequency and make 2 the minor allele, but it depends on your study goal.

Resources