Is there a way that I can solve or stop this error problem in cardinality_threshold? - correlation

I have about 17 soil variables that I'd like to run correlations with elevations, temperature and rainfall against species richness and abundance. I have 39 plots (rows) and the columns contain, environmental variables such as elevation, abundance, species richness, temperature, rainfall and then the list of soil variables (17 columns). Below is my script.
Is there a problem with my script or is it the laptop compatibility of the mac I am using? Please help. Thanks
After running the codes, I am getting this error:
Error in stop_if_high_cardinality(data, columns, cardinality_threshold) :
Column 'pH' has more levels (24) than the threshold (15) allowed.
Please remove the column or increase the 'cardinality_threshold' parameter. Increasing the cardinality_threshold may produce long processing times
GGally::ggpairs(
na.omit(nfi_nontree_soilclim_data[, c(11:18)]),
upper = list(
continuous = wrap(
custom_ggally_cor,
method = "spearman", exact = FALSE,
size = 2.5, col = "black", family = "serif", digits = 2
), combo = "box_no_facet", discrete = "count", na = "na"
),
lower = list(
continuous = wrap(
ggally_smooth,
method = "loess", formula = y ~ x,
se = F, lwd = 3, col = "red", shrink = T
), combo = "facethist", discrete = "facetbar", na = "na"
),
diag = list(
continuous = wrap(
ggally_densityDiag,
col = "darkgrey", lwd = .1,
stat = "density", fill = "darkgrey"
), continuous = "densityDiag", na = "naDiag"
), axisLabels = c("show")
) + theme_bw() + theme(
text = element_text(family = "serif", size = 4),
axis.text = element_text(family = "serif", size = 4),
panel.grid = element_blank()
)```

This error is a built-in stop because the default parameter is set to only allow 15 levels of a variable to be displayed in one graph. You have 24 levels for one of your variables, so you can either adjust the parameter, i.e., the cardinality_threshold, to that value of 24 or set it to NULL. Null may be more generalizable if the value of 24 isn't always the same. But in general, that number of levels depicted at once is going to be discouraged and have these stop-limits.
library(GGally)
data(iris)
Create data that has factor of more than 15 levels
iris$group = as.factor(sample(sample(letters,16), 150, replace = TRUE))
Just demonstrating that either entry can work
ggpairs(iris, cardinality_threshold = 16)
ggpairs(iris, cardinality_threshold = NULL)

Related

ValueError: Input 0 of layer sequential is incompatible with the layer

I am trying to run this model but I keep getting this error. There is some mistake with regard to the shape of input data, I played around with it but I still get these errors.
Error:
ValueError: Input 0 of layer sequential is incompatible with the layer: expected axis -1 of input shape to have value 1 but received input with shape (None, 32, 32, 3)
# Image size
img_width = 32
img_height = 32
# Define X as feature variable and Y as name of the class(label)
X = []
Y = []
for features,label in data_set:
X.append(features)
Y.append(label)
X = np.array(X).reshape(-1,img_width,img_height,3)
Y = np.array(Y)
print(X.shape) # Output :(4943, 32, 32, 3)
print(Y.shape) # Output :(4943,)
# Normalize the pixels
X = X/255.0
# Build the model
cnn = Sequential()
cnn.add(keras.Input(shape = (32,32,1)))
cnn.add(Conv2D(32, (3, 3), activation = "relu", input_shape = X.shape[1:]))
cnn.add(MaxPooling2D(pool_size = (2, 2)))
cnn.add(Conv2D(32, (3, 3), activation = "relu",input_shape = X.shape[1:]))
cnn.add(MaxPooling2D(pool_size = (2, 2)))
cnn.add(Conv2D(64, (3,3), activation = "relu",input_shape = X.shape[1:]))
cnn.add(MaxPooling2D(pool_size = (2,2)))
cnn.add(Flatten())
cnn.add(Dense(activation = "relu", units = 150))
cnn.add(Dense(activation = "relu", units = 50))
cnn.add(Dense(activation = "relu", units = 10))
cnn.add(Dense(activation = 'softmax', units = 1))
cnn.summary()
cnn.compile(loss = 'categorical_crossentropy',optimizer = 'adam',metrics = ['accuracy'])
# Model fit
cnn.fit(X, Y, epochs = 15)e
I tried reading about this issue, but still didn't understand it very well.
your input shape should be (32,32,3). y is your label matrix. I assume it contains N unique integer values where N is the number of classes. If N=2 you can treat this as a binary classification problem. In that case your code for the top layer should be
cnn.add(Dense(1, activation = 'sigmoid'))
your code for compile should be
cnn.compile(loss = 'binary_crossentropy',optimizer = 'adam',metrics = ['accuracy'])
If you have more than 2 classes then your code should be
cnn.add(Dense(N, activation = 'softmax'))
cnn.compile(loss = 'sparse_categorical_crossentropy',optimizer = 'adam',metrics = ['accuracy'])
Where N is the number of classes,
Change this line (the last dimension):
cnn.add(keras.Input(shape = (32,32,3)))

Tensorflow/Keras: volatile validation loss

I've been training a U-Net for single class small lesion segmentation, and have been getting consistently volatile validation loss. I have about 20k images split 70/30 between training and validation sets-so I don't think the issue is too little data. I've tried shuffling and resplitting the sets a few times with no change in volatility-so I don't think the validation set is unrepresentative. I have tried lowering the learning rate with no effect on volatility. And I have tried a few loss functions (dice coefficient, focal tversky, weighted binary cross-entropy). I'm using a decent amount of augmentation so as to avoid overfitting. I've also run through all my data (512x512 float64s with corresponding 512x512 int64 masks--both stored as numpy arrays) do double check that the value range, dtypes, etc. aren't screwy...and I even removed any ROIs in the masks under 35 pixels in area which I thought might be artifact and messing with loss.
I'm using keras ImageDataGen.flow_from_directory...I was initially using zca_whitening and brightness_range augmentation but I think this causes issues with flow_from_directory and the link between mask and image being lost.. so I skipped this.
I've tried validation generators with and without shuffle=True. Batch size is 8.
Here's some of my code, happy to include more if it would help:
# loss
from keras.losses import binary_crossentropy
import keras.backend as K
import tensorflow as tf
epsilon = 1e-5
smooth = 1
def dsc(y_true, y_pred):
smooth = 1.
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = K.sum(y_true_f * y_pred_f)
score = (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
return score
def dice_loss(y_true, y_pred):
loss = 1 - dsc(y_true, y_pred)
return loss
def bce_dice_loss(y_true, y_pred):
loss = binary_crossentropy(y_true, y_pred) + dice_loss(y_true, y_pred)
return loss
def confusion(y_true, y_pred):
smooth=1
y_pred_pos = K.clip(y_pred, 0, 1)
y_pred_neg = 1 - y_pred_pos
y_pos = K.clip(y_true, 0, 1)
y_neg = 1 - y_pos
tp = K.sum(y_pos * y_pred_pos)
fp = K.sum(y_neg * y_pred_pos)
fn = K.sum(y_pos * y_pred_neg)
prec = (tp + smooth)/(tp+fp+smooth)
recall = (tp+smooth)/(tp+fn+smooth)
return prec, recall
def tp(y_true, y_pred):
smooth = 1
y_pred_pos = K.round(K.clip(y_pred, 0, 1))
y_pos = K.round(K.clip(y_true, 0, 1))
tp = (K.sum(y_pos * y_pred_pos) + smooth)/ (K.sum(y_pos) + smooth)
return tp
def tn(y_true, y_pred):
smooth = 1
y_pred_pos = K.round(K.clip(y_pred, 0, 1))
y_pred_neg = 1 - y_pred_pos
y_pos = K.round(K.clip(y_true, 0, 1))
y_neg = 1 - y_pos
tn = (K.sum(y_neg * y_pred_neg) + smooth) / (K.sum(y_neg) + smooth )
return tn
def tversky(y_true, y_pred):
y_true_pos = K.flatten(y_true)
y_pred_pos = K.flatten(y_pred)
true_pos = K.sum(y_true_pos * y_pred_pos)
false_neg = K.sum(y_true_pos * (1-y_pred_pos))
false_pos = K.sum((1-y_true_pos)*y_pred_pos)
alpha = 0.7
return (true_pos + smooth)/(true_pos + alpha*false_neg + (1-alpha)*false_pos + smooth)
def tversky_loss(y_true, y_pred):
return 1 - tversky(y_true,y_pred)
def focal_tversky(y_true,y_pred):
pt_1 = tversky(y_true, y_pred)
gamma = 0.75
return K.pow((1-pt_1), gamma)
model = BlockModel((len(os.listdir(os.path.join(imageroot,'train_ct','train'))), 512, 512, 1),filt_num=16,numBlocks=4)
#model.compile(optimizer=Adam(learning_rate=0.001), loss=weighted_cross_entropy)
#model.compile(optimizer=Adam(learning_rate=0.001), loss=dice_coef_loss)
model.compile(optimizer=Adam(learning_rate=0.001), loss=focal_tversky)
train_mask = os.path.join(imageroot,'train_masks')
val_mask = os.path.join(imageroot,'val_masks')
model.load_weights(model_weights_path) #I'm initializing with some pre-trained weights from a similar model
data_gen_args_mask = dict(
rotation_range=10,
shear_range=20,
width_shift_range=0.1,
height_shift_range=0.1,
zoom_range=[0.8,1.2],
horizontal_flip=True,
#vertical_flip=True,
fill_mode='nearest',
data_format='channels_last'
)
data_gen_args = dict(
**data_gen_args_mask
)
image_datagen_train = ImageDataGenerator(**data_gen_args)
mask_datagen_train = ImageDataGenerator(**data_gen_args)#_mask)
image_datagen_val = ImageDataGenerator()
mask_datagen_val = ImageDataGenerator()
seed = 1
BS = 8
steps = int(np.floor((len(os.listdir(os.path.join(train_ct,'train'))))/BS))
print(steps)
val_steps = int(np.floor((len(os.listdir(os.path.join(val_ct,'val'))))/BS))
print(val_steps)
train_image_generator = image_datagen_train.flow_from_directory(
train_ct,
target_size = (512, 512),
color_mode = ("grayscale"),
classes=None,
class_mode=None,
seed = seed,
shuffle = True,
batch_size = BS)
train_mask_generator = mask_datagen_train.flow_from_directory(
train_mask,
target_size = (512, 512),
color_mode = ("grayscale"),
classes=None,
class_mode=None,
seed = seed,
shuffle = True,
batch_size = BS)
val_image_generator = image_datagen_val.flow_from_directory(
val_ct,
target_size = (512, 512),
color_mode = ("grayscale"),
classes=None,
class_mode=None,
seed = seed,
shuffle = True,
batch_size = BS)
val_mask_generator = mask_datagen_val.flow_from_directory(
val_mask,
target_size = (512, 512),
color_mode = ("grayscale"),
classes=None,
class_mode=None,
seed = seed,
shuffle = True,
batch_size = BS)
train_generator = zip(train_image_generator, train_mask_generator)
val_generator = zip(val_image_generator, val_mask_generator)
# make callback for checkpointing
plot_losses = PlotLossesCallback(skip_first=0,plot_extrema=False)
%matplotlib inline
filepath = os.path.join(versionPath, model_version + "_saved-model-{epoch:02d}-{val_loss:.2f}.hdf5")
if reduce:
cb_check = [ModelCheckpoint(filepath,monitor='val_loss',
verbose=1,save_best_only=False,
save_weights_only=True,mode='auto',period=1),
reduce_lr,
plot_losses]
else:
cb_check = [ModelCheckpoint(filepath,monitor='val_loss',
verbose=1,save_best_only=False,
save_weights_only=True,mode='auto',period=1),
plot_losses]
# train model
history = model.fit_generator(train_generator, epochs=numEp,
steps_per_epoch=steps,
validation_data=val_generator,
validation_steps=val_steps,
verbose=1,
callbacks=cb_check,
use_multiprocessing = False
)
And here's how my loss looks:
Another potentially relevant thing: I tweaked the flow_from_directory code a bit (added npy to the white list). But training loss looks fine so assuming the issue isnt here
Two suggestions:
Switch to the classic validation data format (i.e. numpy array) instead of using a generator -- this will ensure you always use the exactly same validation data every time. If you see a different validation curve, then there is something "random" in the validation generator giving you different data at different epochs.
Use a fixed set of samples (100 or 1000 should be enough w/o any data augmentation) for both training and validation. If everything goes well, you should see your network quickly overfit to this dataset and your training and validation curves should very much similar. If not, debug your network.

how to wrap text in ggplot for facet_grid labels

I have been searching how to wrap text. It seems there should be a way to use labeller = label_wrap_gen(3) but I keep getting an error:
--- Error in margins(vars, margins) : unused argument (margins)
Here is part of my code:
#simpson by protected status for domain FKNMS
ggplot(data = fk_strata_abun_diversity, aes(x = YEAR)) +
geom_point(aes(y = strata_simpson, color = "strata_simpson"),color = "blue") +
geom_line(aes(y = strata_simpson, color = "strata_simpson"), color = "blue") +
facet_grid(STRAT ~ protected_status,
labeller = labeller(.rows = strata_names, .cols = protected_status_names),
label_wrap_gen(width = 2)) + #error: in margins(vars, margins) : unused argument (margins) ??
labs(x = "Year", y = "Effective Number of Species") +
ggtitle("Simpson Diveristy of Reef Fish in the Florida Keys by Strata") +
theme(plot.title = element_text(hjust = 0.5, face = 'bold', size = 12)) +
scale_x_continuous(limits = c(1999, 2016), breaks = c(1999:2016)) +
scale_y_continuous(limits= c(0, 25), breaks = c(5,10,15,20,25))
Thank you in advance for the help
I found labeller = labeller(label_wrap_gen(width = 2... does not wrap.
Try
facet_grid(STRAT ~ protected_status,
labeller = label_wrap_gen(width = 2, multi_line = TRUE))
the following will work for facet_grid() and facet_wrap():
facet_grid(labeller = labeller(facet_category = label_wrap_gen(width = 16)))
where facet_category is the faceting variable to modify and width sets the maximum number of characters before wrapping.
multi_line is only needed if you have specified multiple factors in your faceting formula (eg. ~first + second)

Why does this error pop up, what are your thoughts on my neural network/genetic algorithm?

Preamble:
This is a combination of my first and second programs in python (besides hello world level tutorials). Any questions I've had have led me to this site so it seemed fitting that I post it here. I come from a TI-Basic background; so if you have no idea why I did it this why when you should do it this why that is likely why.
My first program was a genetic learning algorithm. Its testing setup was/is to guess your input string. There is currently a problem with it, but it only slightly affects the efficiency of the program.[1]
My second is a simple feed forward neural network (I am currently only working on the xor problem). Some of the code for customizing the variables (the number of inputs, the number of outputs, the number of hidden layers, the number of neurons in those hidden layers) is there but is currently not my focus.
What I am trying to do now is train my network with my genetic algorithm. All seems to be fine but I keep getting a un-explanable error.
Traceback (most recent call last):
File "python", line 174, in <module>
File "python", line 68, in fitness_function
File "python", line 146, in weight_dot_value_plus_bias
TypeError: 'int' object is not subscriptable
Now the weird thing is, the code this is referring to is a direct transfer of code from the original neural network.
I am using repl.it as my compiler, could that be the problem?
import random
from random import choice
from random import randint
#Global varables
length_of_phrase = 15
generation_number = 0
max_number_of_generations = 250
population = 150
perckill = 40
percparents = 35
percrandom = 1
percmutate = 1
individual_by_gene_matrix = [0]
one = 1
zero = 0
number_of_layers = 3
number_of_neurons = [2,3,1]
nnv = [0]*number_of_layers
nnw = [0]*number_of_layers
nnb = [0]*number_of_layers
val1 = randint(0,1)
val2 = randint(0,1)
living = int(((100 - perckill)*population)//100)
dead = population - living
random_strings = int((( percrandom)*population)//100)
reproduced_strings = int(living + random_strings)
parents = int(((100 - percparents)*population)//100)
"""
print(living)
print(dead)
print(population)
print(random_strings)
print(reproduced_strings)
"""
def random_matrix_generator(): #generates a matrix With = number of genes in the target and Hight = population
from random import randint
individual_by_gene_matrix = [[randint(-200, 200)/100 for x in range(length_of_phrase)] for x in range(population)]
#horozontal is traits, vertical is individual
#each gene represents a letter
#each individual represents a word
return(individual_by_gene_matrix)
"""
def convert_matrix_into_list_of_stings():
listofstrings = [ () for var in range( population)]
for var in range( population ):
list = individual_by_gene_matrix[var] #creates a list for each individual with their traits
lista = [ (chr(n )) for n in list] #the traits become letters
listofstrings[var] = ''.join(lista) #creates a list of all the individuals with letters joined
return(listofstrings)
"""
def fitness_function():
for individual in range (population):
number_of_layers,number_of_neurons ,nnv,nnw,nnb = NN_setup(val1,val2,individual_by_gene_matrix[individual][0],individual_by_gene_matrix[individual][1],individual_by_gene_matrix[individual][2],individual_by_gene_matrix[individual][3],individual_by_gene_matrix[individual][4],individual_by_gene_matrix[individual][5],individual_by_gene_matrix[individual][6],individual_by_gene_matrix[individual][7],individual_by_gene_matrix[individual][8],individual_by_gene_matrix[individual][9],individual_by_gene_matrix[individual][10],individual_by_gene_matrix[individual][11],individual_by_gene_matrix[individual][12],individual_by_gene_matrix[individual][13],individual_by_gene_matrix[individual][14])
for var in range (1,number_of_layers):
nnv = weight_dot_value_plus_bias(var)
nnv = sigmoid(var)
fitness[individual] = 1-abs((val1 ^ val2)- (nnv[2][0]))
#for n in range(population):
#print('{} : {} : {}'.format(n, listofstrings[n], fitness[n]))
return(fitness)
def matrix_reorder():
temp_individual_by_gene_matrix = [[0 for var in range(length_of_phrase)] for var in range(population)]
temp_fitness = [(0) for var in range(population)]
for var in range(population):
var_a = fitness.index(max(fitness))
temp_fitness[var] = fitness.pop(var_a)
temp_individual_by_gene_matrix[var] = individual_by_gene_matrix.pop(var_a)
return(temp_individual_by_gene_matrix, temp_fitness)
def kill():
for individal in range(living, population):
individual_by_gene_matrix[individal] = [0]*length_of_phrase
return(individual_by_gene_matrix)
def reproduce():
for individual in range(living,reproduced_strings):
for gene in range(length_of_phrase):
individual_by_gene_matrix[individual][gene] = randint(-200,200)/100
for individual in range(reproduced_strings, population):
mom = randint(0,parents)
dad = randint(0,parents)
for gene in range(length_of_phrase):
individual_by_gene_matrix[individual][gene] = random.choice([individual_by_gene_matrix[mom][gene],individual_by_gene_matrix[dad][gene]])
return(individual_by_gene_matrix)
def mutate():
for individual in range(population):
for gene in range(length_of_phrase):
if randint(0,100)<=percmutate:
individual_by_gene_matrix[individual][gene] = random.gauss(individual_by_gene_matrix[individual][gene],0.5)
return(individual_by_gene_matrix)
def NN_setup(val1,val2,w100,w101,w110,w111,w120,w121,w200,w201,w202,b00,b01,b10,b11,b12,b20):
number_of_layers = 3
number_of_neurons = [2,3,1]
nnv = [0]*number_of_layers
nnw = [0]*number_of_layers
nnb = [0]*number_of_layers
for layer in range ( number_of_layers ):
nnv[layer] = [0]*number_of_neurons[layer]
nnw[layer] = [0]*number_of_neurons[layer]
nnb[layer] = [0]*number_of_neurons[layer]
if layer != 0:
for neuron in range (number_of_neurons[layer]):
nnw[layer][neuron] = [0]*number_of_neurons[layer - 1]
nnv = [[val1,val2],[0.0,0.0,0.0],[0.0]]
nnw = [['inputs have no weight'],[[w100,w101],[w110,w111],[w120,w121]],[[w200,w201,w202]]]
nnb = [[b00,b01],[b10,b11,b12],[b20]]
return(number_of_layers,number_of_neurons,nnv,nnw,nnb)
|
|
|
v
def weight_dot_value_plus_bias(layer):
for nueron in range (number_of_neurons[layer]):
for weight in range (number_of_neurons[layer - 1]):
---> nnv[layer][nueron] += (nnv[layer-1][weight])*(nnw[layer][nueron][weight])
nnv[layer][nueron] += nnb[layer][nueron]
return(nnv)
def sigmoid(layer):
for neuron in range(number_of_neurons[layer]):
nnv[layer][neuron] = (1/(1+3**(-nnv[layer][neuron])))
return(nnv)
individual_by_gene_matrix = random_matrix_generator()
while (generation_number <= max_number_of_generations):
val1 = randint(0,1)
val2 = randint(0,1)
fitness = [(0) for var in range(population)]
#populations_phenotypes_by_individual = convert_matrix_into_list_of_stings()
fitness = fitness_function()
individual_by_gene_matrix , fitness = matrix_reorder()
individual_by_gene_matrix = kill()
individual_by_gene_matrix = reproduce()
individual_by_gene_matrix = mutate()
individual_by_gene_matrix , fitness = matrix_reorder()
#populations_phenotypes_by_individual = convert_matrix_into_list_of_stings()
print('{} {} {} {}'.format(generation_number,(10000(fitness[0]))//100),val1,val2)
generation_number += 1
print('')
print('')
print(individual_by_gene_matrix[0])
That was way to many indents!!!
How the hell do I just insert a block of code????!!!!!
I'll give you the source code to the individual programs once I learn how to insert a block of code
[1] Your going to have to wait till I give you the source code to just the genetic algorithm
Any tips, suggestions, maybe how would you write the code to what I'm trying to do?

How to model a mixture of 3 Normals in PyMC?

There is a question on CrossValidated on how to use PyMC to fit two Normal distributions to data. The answer of Cam.Davidson.Pilon was to use a Bernoulli distribution to assign data to one of the two Normals:
size = 10
p = Uniform( "p", 0 , 1) #this is the fraction that come from mean1 vs mean2
ber = Bernoulli( "ber", p = p, size = size) # produces 1 with proportion p.
precision = Gamma('precision', alpha=0.1, beta=0.1)
mean1 = Normal( "mean1", 0, 0.001 )
mean2 = Normal( "mean2", 0, 0.001 )
#deterministic
def mean( ber = ber, mean1 = mean1, mean2 = mean2):
return ber*mean1 + (1-ber)*mean2
Now my question is: how to do it with three Normals?
Basically, the issue is that you can't use a Bernoulli distribution and 1-Bernoulli anymore. But how to do it then?
edit: With the CDP's suggestion, I wrote the following code:
import numpy as np
import pymc as mc
n = 3
ndata = 500
dd = mc.Dirichlet('dd', theta=(1,)*n)
category = mc.Categorical('category', p=dd, size=ndata)
precs = mc.Gamma('precs', alpha=0.1, beta=0.1, size=n)
means = mc.Normal('means', 0, 0.001, size=n)
#mc.deterministic
def mean(category=category, means=means):
return means[category]
#mc.deterministic
def prec(category=category, precs=precs):
return precs[category]
v = np.random.randint( 0, n, ndata)
data = (v==0)*(50+ np.random.randn(ndata)) \
+ (v==1)*(-50 + np.random.randn(ndata)) \
+ (v==2)*np.random.randn(ndata)
obs = mc.Normal('obs', mean, prec, value=data, observed = True)
model = mc.Model({'dd': dd,
'category': category,
'precs': precs,
'means': means,
'obs': obs})
The traces with the following sampling procedure look good as well. Solved!
mcmc = mc.MCMC( model )
mcmc.sample( 50000,0 )
mcmc.trace('means').gettrace()[-1,:]
there is a mc.Categorical object that does just this.
p = [0.2, 0.3, .5]
t = mc.Categorical('test', p )
t.random()
#array(2, dtype=int32)
It returns an int between 0 and len(p)-1. To model the 3 Normals, you make p a mc.Dirichlet object (it accepts a k length array as the hyperparameters; setting the values in the array to be the same is setting the prior probabilities to be equal). The rest of the model is nearly identical.
This is a generalization of the model I suggested above.
Update:
Okay, so instead of having different means, we can collapse them all into 1:
means = Normal( "means", 0, 0.001, size=3 )
...
#mc.deterministic
def mean(categorical=categorical, means = means):
return means[categorical]

Resources