Torchmetrics Frechet Inception Distance Weird Behaviour - torchmetrics

I am trying to create an FID to measure the performance of my generative models on MNIST.
I provide my own feature extractor.
However, in order to find the output dimension of the feature extractor you provide, torchmetrics tries to pass it a dummy image to see what dimension it outputs.
The problems is that the dummy image they generate does not follow the shape or date type my feature extractor expects.
There is no way for me to manually specifiy the dummy image that should be passed in, so I can't control that.
Here is an example of what I'm trying to do:
N = <appropriate number>
class SimpleConvFeatureExtractor(nn.Module):
def __init__(self, embed_dim):
super().__init__()
self.conv = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=2)
self.out = nn.Sequential(nn.Linear(N, embed_dim))
def forward(self, x):
return th.randn(size=(1, 128))
print(x.shape)
print(x.dtype)
x = F.silu(self.conv1(x))
x = self.out(x.view(x.shape[0], -1))
return x
fid = FrechetInceptionDistance(feature=SimpleConvFeatureExtractor(128))
with output
torch.Size([1, 3, 299, 299])
torch.uint8
RuntimeError: Input type (unsigned char) and bias type (float) should be the same
As you can see the image being passed through is hardly an MNIST image.

I had a similar error with a project of mine. I wanted to see if anyone else would be able to answer your post, but given the silence, I will give my best attempt at an answer! For me the solution lay in the class definitions. When you create your class and define __init__ you should try to pass in a transform which will make its input a tensor.
If you want to see the similarity between our issues you can check out my question here.

Related

How to get total test accuracy for pytorch lightning?

How can the trainer.test method be used to get total accuracy over all batches?
I know I can implement model.test_step but that is for a single batch only. I need the accuracy over the whole data set. I can use torchmetrics.Accuracy to accumulate accuracy. But what is the proper way to combine that and get the total accuracy out? What is model.test_step supposed to return anyway since batchwise test scores are not very useful? I could hack it somehow, but I'm surprised that I couldn't find any example on the internet that demonstrates how to get accuracy with the pytorch-lightning native way.
You can see here (https://pytorch-lightning.readthedocs.io/en/stable/extensions/logging.html#automatic-logging) that the on_epoch argument in log automatically accumulates and logs at the end of the epoch. The right way of doing this would be:
from torchmetrics import Accuracy
def validation_step(self, batch, batch_idx):
x, y = batch
preds = self.forward(x)
loss = self.criterion(preds, y)
accuracy = Accuracy()
acc = accuracy(preds, y)
self.log('accuracy', acc, on_epoch=True)
return loss
If you want a custom reduction function you can set it using the reduce_fx argument, the default is torch.mean(). log() can be called from any method in you LightningModule
I am working on a notebook. I did some initial experimentation with the following code.
def test_step(self, batch, batch_idx):
x, y = batch
logits = self(x)
self.test_acc(logits, y)
self.log('test_acc', self.test_acc, on_step=False, on_epoch=True)
Prints out a nicely formatted text after calling
model = Cifar100Model()
trainer = pl.Trainer(max_epochs=1, accelerator='cpu')
trainer.test(model, test_dataloader)
This printed test_acc 0.008200000040233135
I tried verifying whether the printed value is actually an average over the test data batches. By modifying the test_step as follows:
def test_step(self, batch, batch_idx):
x, y = batch
logits = self(x)
self.test_acc(logits, y)
self.log('test_acc', self.test_acc, on_step=False, on_epoch=True)
preds = logits.argmax(dim=-1)
acc = (y == preds).float().mean()
print(acc)
Then ran trainer.test() again. This time the following values were printed out:
tensor(0.0049)
tensor(0.0078)
tensor(0.0088)
tensor(0.0078)
tensor(0.0122)
Averaging them gets me: 0.0083
which is very close to the value printed by the test_step().
The logic behind this solution is that I had specified in the
self.log('test_acc', self.test_acc, on_step=False, on_epoch=True)
on_epoch = True, and I used a TorchMetric class, the average is computed by PL, automatically using the metric.compute() function.
I'll try to post my full notebook shortly. You can check there too.

SARIMAX model in PyMC3

I would like to write down the following SARIMAX model (2,0,0) (2,0,0,12) in PyMC3 to perform bayesian estimation of its coefficients but I cannot figure out how to start with the seasonal part
Has anyone tries something like this?
with pm.Model() as ar2:
theta = pm.Normal("theta", 0.0, 1.0, shape=2)
sigma = pm.HalfNormal("sigma", 3)
likelihood = pm.AR("y", theta, sigma=sigma, observed=data)
trace = pm.sample(
1000,
tune=2000,
random_seed=13,
)
idata = az.from_pymc3(trace)
Although it would be best (e.g. best performance) if you can get an answer that uses PyMC3 exclusively, in case that does not exist yet, there is an alternative way to do this that uses the SARIMAX model in Statsmodels in combination with PyMC3.
There are too many details to repeat a full answer here, but basically you wrap the log-likelihood and gradient methods associated with a Statsmodels SARIMAX model. Here is a link to an example Jupyter notebook that shows how to do this:
https://www.statsmodels.org/stable/examples/notebooks/generated/statespace_sarimax_pymc3.html
I'm not sure if you'll still need it, however, expanding on cfulton's answer, here is how to fix the error in the statsmodels example (https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_sarimax_pymc3.html, cell 8):
with pm.Model():
# Priors
arL1 = pm.Uniform('ar.L1', -0.99, 0.99)
maL1 = pm.Uniform('ma.L1', -0.99, 0.99)
sigma2 = pm.InverseGamma('sigma2', 2, 4)
# convert variables to tensor vectors
# # this is wrong:
theta = tt.as_tensor_variable([arL1, maL1, sigma2])
# # this is correct:
theta = tt.as_tensor_variable([arL1, maL1, sigma2], 'v')
# use a DensityDist (use a lamdba function to "call" the Op)
# # this is wrong:
# pm.DensityDist('likelihood', lambda v: loglike(v), observed={'v': theta})
# # this is correct:
pm.DensityDist('likelihood', lambda v: loglike(v), observed=theta)
# Draw samples
trace = pm.sample(ndraws, tune=nburn, discard_tuned_samples=True, cores=4)
I'm no pymc3/theano expert, but I think the error means that Theano has failed to associate the tensor's name with the values. If you define the name along with the values right at the beginning, it works.
I know it's not a direct answer to your question. Nevertheless, I hope it helps.

Reduce the output layer size from XLTransformers

I'm running the following using the huggingface implementation:
t1 = "My example sentence is really great."
tokenizer = TransfoXLTokenizer.from_pretrained('transfo-xl-wt103')
model = TransfoXLLMHeadModel.from_pretrained("transfo-xl-wt103")
encoded_input = tokenizer(t1, return_tensors='pt', add_space_before_punct_symbol=True)
output = model(**encoded_input)
tmp = output[0].detach().numpy()
print(tmp.shape)
>>> (1, 7, 267735)
With the goal of getting output embeddings that I'll use downstream.
The last dimension is /substantially/ larger than I expected, and it looks like it is the size of the entire vocab_size rather than a reduction based on the ECL from the paper (which potentially I am misinterpreting).
What argument would I provide the model to reduce this layer size to a smaller dimensional space, something more like the basic BERT at 400 or 768 and still obtain good performance based on the pretrained embeddings?
That's because you used ...LMHeadModel, which predicts the next token. You can use TransfoXLModel.from_pretrained("transfo-xl-wt103") instead, then output[0] is the last hidden state which has the shape (batch_size, sequence_length, hidden_size).

Even an image in data set used to train is giving opposite values when making prediction

I am new to ML and TensorFlow. I am trying to build a CNN to categorize a good image against corrupted images, similar to rock paper scissor tutorials in tensor flow, except for only two categories.
The Colab Notebook
Model Architecture
train_generator = training_datagen.flow_from_directory(
TRAINING_DIR,
target_size=(150,150),
class_mode='categorical'
)
validation_generator = validation_datagen.flow_from_directory(
VALIDATION_DIR,
target_size=(150,150),
class_mode='categorical'
)
model = tf.keras.models.Sequential([
# Note the input shape is the desired size of the image 150x150 with 3 bytes color
# This is the first convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(150, 150, 3)),
tf.keras.layers.MaxPooling2D(2, 2),
# The second convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# The third convolution
tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# The fourth convolution
tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# Flatten the results to feed into a DNN
tf.keras.layers.Flatten(),
tf.keras.layers.Dropout(0.5),
# 512 neuron hidden layer
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dense(2, activation='softmax')
])
model.summary()
model.compile(loss = 'categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
history = model.fit_generator(train_generator, epochs=25, validation_data = validation_generator, verbose = 1)
model.save("rps.h5")
Only Change I made was turning input shape to (150,150,1) to (150,150,3) and changed last layers output to 2 neurons from 3. The training gave me consistently accuracy of 90 above for data set of 600 images in each class. But when I am making a prediction using code in the tutorial, it gives me highly wrong values even for data in the data set.
PREDICTION
Original code in TensorFlow tutorial
for file in onlyfiles:
path = fn
img = image.load_img(path, target_size=(150, 150,3)) # changed target_size to (150, 150,3)) from (150,150 )
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
images = np.vstack([x])
classes = model.predict(images, batch_size=10)
print(fn)
print(classes)
I changed target_size to (150, 150,3)) from (150,150) in my belief that since my input is a 3 channel image,
Result
It gives very wrong values [0,1][0,1] for even images in which are in dataset
But when I changed the code to this
for file in onlyfiles:
path = fn
img = image.load_img(path, target_size=(150, 150,3))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x /= 255.
classes = model.predict(images, batch_size=10)
print(fn)
print(classes)
In this case values come like
[[9.9999774e-01 2.2242968e-06]]
[[9.9999785e-01 2.1864464e-06]]
[[9.9999785e-01 2.1641024e-06]]
one or two errors are there but it is very much correct
So my question even though the last activation is softmax, why it is now coming in decimal values, is there any logical mistake in the way I am making predictions.? I tried binary also, but couldn't find much difference.
Please note -
When you are changing output classes from 2 to 3, you are asking the model to categorise into 3 classes. This would contradict your problem statement which separates good and corrupted ones i.e 2 output classes (a binary problem). I think it can be reversed from 3 to 2 if I have understood the question correctly.
Second the output you are getting is perfectly correct, the neural network models outputs probabilities instead of absolute class values like 0 or 1. By probability, it tells how likely it belongs to say class 0 or class 1.
Also , as mentioned above by #BBloggsbott - you just have to use np.argmax on the output array which will tell you the probability of belonging to class 1 (Positive class) by default.
Hope this helps.
Thanks.
Softmax returns probability distributions for the vector it gets as input. So, the fact that you are getting decimal values is not a problem. If you want to find the exact class each image belongs to, try using the argmax function on the predictions.

How to setup my function with blockproc to process the image in parts?

I have an image:
I want to divide this image into 3 equal parts and calculate the SIFT for each part individually and then concatenate the results.
I found out that Matlab's blockproc does just that, but I do not know how to get it to work with my function. Here is what I have:
[r c] = size(image);
c_new = floor(c/3); %round it
B = blockproc(image, [r c_new], #block_fun)
So according to Matlabs documentation the function, block_fun will be applied to the original image in blocks of size r and c_new.
this is what I wrote as block_fun
function feats = block_fun(img)
[keypoints, descriptors] = vl_sift(single(img));
feats = descriptors;
end
So, my matrix B should be a concatenation of the SIFT descriptors of all three parts of the same image? right?
But the error that I get when I run the command:
B = blockproc(image, [r c_new], #block_fun)
Function BLOCKPROC encountered an error while evaluating the user
supplied function handle, FUN.
The cause of the error was:
Error using single Conversion to single from struct is not possible.
For your custom function, blockproc sends in a structure where the image data is stored in a field called data. As such, you simply need to change your function so that it accesses the data field in the input. Like so:
function feats = block_fun(block_struct) %// Change
[keypoints, descriptors] = vl_sift(single(block_struct.data)); %// Change
feats = descriptors;
end
This error is caused by the fact that the function that is called via its handle by blockproc expects a block struct.
The real problem is that blockproc will attempt to concatenate all results and you will have a different set of 128xN feature vectors for each block, which blockproc doesn't allow.
I think that using im2col and reshape would be much more simple.

Resources