Keras predict gives different error than evaluate, loss different from metrics - validation

I have the following problem:
I have an autoencoder in Keras, and train it for a few epochs. The training overview shows a validation MAE of 0.0422 and an MSE of 0.0024.
However, if I then call network.predict and manually calculate the validation errors, I get 0.035 and 0.0024.
One would assume that my manual calculation of the MAE is simply incorrect, but the weird thing is that if I use an identity model (simply outputs what you input) and use that to evaluate the predicted values, the same error value is returned as for my manual calculation. The code looks as follows:
input = Input(shape=(X_train.shape[1], ))
encoded = Dense(50, activation='relu', activity_regularizer=regularizers.l1(10e-5))(input)
encoded = Dense(50, activation='relu', activity_regularizer=regularizers.l1(10e-5))(encoded)
encoded = Dense(50, activation='relu', activity_regularizer=regularizers.l1(10e-5))(encoded)
decoded = Dense(50, activation='relu', activity_regularizer=regularizers.l1(10e-5))(encoded)
decoded = Dense(50, activation='relu', activity_regularizer=regularizers.l1(10e-5))(decoded)
decoded = Dense(X_train.shape[1], activation='sigmoid')(decoded)
network = Model(input, decoded)
# sgd = SGD(lr=8, decay=1e-6)
# network.compile(loss='mean_squared_error', optimizer='adam')
network.compile(loss='mean_absolute_error', optimizer='adam', metrics=['mse'])
# Fitting the data
network.fit(X_train, X_train, epochs=2, batch_size=1, shuffle=True, validation_data=(X_valid, X_valid),
callbacks=[EarlyStopping(monitor='val_loss', min_delta=0.00001, patience=20, verbose=0, mode='auto')])
# Results
recon_valid = network.predict(X_valid, batch_size=1)
score2 = network.evaluate(X_valid, X_valid, batch_size=1, verbose=0)
print('Network evaluate result: mae={}, mse={}'.format(*score2))
x = Input((X_train.shape[1],))
m = Model(x, x)
m.compile(loss='mean_absolute_error', optimizer='adam', metrics=['mse'])
score1 = m.evaluate(recon_valid, X_valid, batch_size=1, verbose=0)
print('Identity evaluate result: mae={}, mse={}'.format(*score1))
errors_test = np.absolute(X_valid - recon_valid)
print("Manual MAE: {}".format(np.average(errors_test)))
errors_test = np.square(X_valid - recon_valid)
print("Manual MSE: {}".format(np.average(errors_test)))
Which outputs the following:
Train on 282 samples, validate on 94 samples
Epoch 1/2
2018-04-18 17:24:01.464947: I C:\tf_jenkins\workspace\rel-win\M\windows\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
282/282 [==============================] - 0s - loss: 0.0861 - mean_squared_error: 0.0187 - val_loss: 0.0451 - val_mean_squared_error: 0.0025
Epoch 2/2
282/282 [==============================] - 0s - loss: 0.0440 - mean_squared_error: 0.0025 - val_loss: 0.0422 - val_mean_squared_error: 0.0024
Network evaluate result: mae=0.04216482736011769, mse=0.0024067993242382767
Identity evaluate result: mae=0.03506102238563781, mse=0.0024067993242382767
Manual MAE: 0.03506102412939072
Manual MSE: 0.002406799467280507
I know that my manual calculation is correct, since the identity model (m) returns the same value. The only possible explanation for the difference in MAE values would then be if network.evaluate(X_valid, X_valid) somehow uses different values than those returned by network.predict(X_valid), but then the MSE would also be different.
This leaves me completely confused, thinking there might be a bug in the Keras MAE calculation. Has anyone had this issue before or have any ideas how it might be fixed? I am using the Tensorflow backend.
Any help would be much appreciated!
EDIT: I'm almost certain this is a bug. If I keep loss='mae' but also add metrics=['mse', 'mae'], the MAE returned by the metrics is the same as my manual computation and the identity model. The same is true for MSE: if I set loss='mse', the MSE returned by the metric is different from the loss.

It turns out that the loss is supposed to be different than the metric, because of the regularization. Using regularization, the loss is higher (in my case), because the regularization increases loss when the nodes are not as active as specified. The metrics don't take this into account, and therefore return a different value, which equals what one would get when manually computing the error.

The metrics during training and validation are different because of different reasons:
The dataset is different
During trainning the weights are changing in every step so the metrics are changing too
The metric during training is of the current batch of data or a running average of the metrics of the last batches. For the evaluation, the metric is for the whole dataset.

Related

Why is PyTorch inference faster when preloading all mini-batches to list?

While benchmarking different dataloaders I noticed some peculiar behavior with the PyTorch built-in dataloader. I am running the below code on a cpu-only machine with the MNIST dataset.
It seems that a simple forward pass in my model is much faster when mini-batches are preloaded to a list rather than fetched during iteration:
import torch, torchvision
import torch.nn as nn
import torchvision.transforms as T
from torch.profiler import profile, record_function, ProfilerActivity
mnist_dataset = torchvision.datasets.MNIST(root=".", train=True, transform=T.ToTensor(), download=True)
loader = torch.utils.data.DataLoader(dataset=mnist_dataset, batch_size=128,shuffle=False, pin_memory=False, num_workers=4)
model = nn.Sequential(nn.Flatten(), nn.Linear(28*28, 256), nn.BatchNorm1d(256), nn.ReLU(), nn.Linear(256, 10))
model.train()
with profile(activities=[ProfilerActivity.CPU], record_shapes=True) as prof:
with record_function("model_inference"):
for (images_iter, labels_iter) in loader:
outputs_iter = model(images_iter)
print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=10))
with profile(activities=[ProfilerActivity.CPU], record_shapes=True) as prof:
with record_function("model_inference"):
train_list = [sample for sample in loader]
for (images_iter, labels_iter) in train_list:
outputs_iter = model(images_iter)
print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=10))
The subset of most interesting output from the Torch profiler is:
Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls
aten::batch_norm 0.02% 644.000us 4.57% 134.217ms 286.177us 469
Self CPU time total: 2.937s
Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls
aten::batch_norm 70.48% 6.888s 70.62% 6.902s 14.717ms 469
Self CPU time total: 9.773s
Seems like aten::batch_norm (batch normalization) is taking significantly more time in the case where samples are not preloaded to a list, but I can't figure out why since it should be the same operation?
The above was tested on a 4-core cpu with python 3.8
If anything the version of pre-loading to a list should be slight slower overall due to the overhead of creating the list
with
torch=1.10.2+cu102
torchvision=0.11.3+cu102
had following results
Self CPU time total: 2.475s
Self CPU time total: 2.800s
Try to reproduce this code again using different lib versions

Dynamic number system in Qlik Sense

My data consists of large numbers, I have a column say - 'amount', while using it in charts(sum of amount in Y axis) it shows something like 1.4G, I want to show them as if is billion then e.g. - 2.8B, or in millions then 80M or if it's in thousands (14,000) then simply- 14k.
I have used - if(sum(amount)/1000000000 > 1, Num(sum(amount)/1000000000, '#,###B'), Num(sum(amount)/1000000, '#,###M')) but it does not show the M or B at the end of the figure and also How to include thousand in the same code.
EDIT: Updated to include the dual() function.
This worked for me:
=dual(
if(sum(amount) < 1, Num(sum(amount), '#,##0.00'),
if(sum(amount) < 1000, Num(sum(amount), '#,##0'),
if(sum(amount) < 1000000, Num(sum(amount)/1000, '#,##0k'),
if(sum(amount) < 1000000000, Num(sum(amount)/1000000, '#,##0M'),
Num(sum(amount)/1000000000, '#,##0B')
))))
, sum(amount)
)
Here are some example outputs using this script to format it:
=sum(amount)
Formatted
2,526,163,764
3B
79,342,364
79M
5,589,255
5M
947,470
947k
583
583
0.6434
0.64
To get more decimals for any of those, like 2.53B instead of 3B, you can format them like '#,##0.00B' by adding more zeroes at the end.
Also make sure that the Number Formatting property is set to Auto or Measure expression.

Is it possible to vectorize annotation for matplotlib?

As a part of a large QC benchmark I am creating a large number (approx 100K) of scatter plots in a single PDF using PdfPages backend. (See further down for the code)
The issue I am having is that the plotting takes too much time, see output from a custom profiling/debugging effort:
Checkpoint1: Predictions done in 1.110076904296875 millis
Checkpoint2: df created and correlations calculated in 3.108978271484375 millis
Checkpoint3: plotting and accumulating done in 231.31990432739258 millis
Cycle completed in 0.23553895950317383 secs
----------------------
Checkpoint1: Predictions done in 3.718852996826172 millis
Checkpoint2: df created and correlations calculated in 2.353191375732422 millis
Checkpoint3: plotting and accumulating done in 155.93385696411133 millis
Cycle completed in 0.16200590133666992 secs
----------------------
Checkpoint1: Predictions done in 2.920866012573242 millis
Checkpoint2: df created and correlations calculated in 1.995086669921875 millis
Checkpoint3: plotting and accumulating done in 161.8819236755371 millis
Cycle completed in 0.16679787635803223 secs
The figure for plotting gets an 2-3x increase if I annotate the points, which is necessary for the use case. As you can see below I have tried both itertuples() and apply(), switching to apply did not give a significant change in the times as far as I can see.
def annotate(row, ax):
ax.annotate(row.name, (row.exp, row.model),
xytext=(10, 20), textcoords='offset points',
arrowprops=dict(arrowstyle="-", connectionstyle="arc,angleA=180,armA=10"),
family='sans-serif', fontsize=8, color='darkslategrey')
def plot2File(df, file, seq, z, p, s):
""" Plot predictions vs experimental """
plttitle = f"Correlations for {seq}+{z} \n pearson={p} \n spearman={s}"
ax = df.plot(x='exp', y='model', kind='scatter', title=plttitle, s=40)
df.apply(annotate, ax=ax, axis=1)
# for row in df.itertuples():
# ax.annotate(row.Index, (row.exp, row.model),
# xytext=(10, 20), textcoords='offset points',
# arrowprops=dict(arrowstyle="-", connectionstyle="arc,angleA=180,armA=10"),
# family='sans-serif', fontsize=8, color='darkslategrey')
plt.savefig(file, bbox_inches='tight', format='pdf')
plt.close()
Given the nice explanation by Jeff on a question regarding iterrows() I was wondering if it would be possible to vectorize the annotation process? Or should I ditch using a data frame altogether?

Vowpal Wabbit readable model weight interpretation

Recently I'm using Vowpal Wabbit for classification, and I get a question about readable_model.
Here is my command: vw --quiet --save_resume --compressed aeroplane.txt.gzip --loss_function=hinge --readable_model aeroplane.txt
And the readable model file as below:
Version 7.7.0
Min label:-1.000000
Max label:1.000000
bits:18
0 pairs:
0 triples:
rank:0
lda:0
0 ngram:
0 skip:
options:
:1
initial_t 0.000000
norm normalizer 116869.664062
t 3984.000051
sum_loss 2400.032932
sum_loss_since_last_dump 2400.032932
dump_interval 1.000000
min_label -1.000000
max_label 1.000000
weighted_examples 3984.000051
weighted_labels 0.000051
weighted_unlabeled_examples 0.000000
example_number 2111
total_features 1917412
0:4.879771 0.004405 0.007933
1:5.268138 0.017729 0.020223
2:0.464031 0.001313 0.007443
3:3.158707 0.083495 0.029674
4:-22.006199 0.000721 0.004386
5:7.686290 0.018617 0.011562
......
1023:0.363004 0.022025 0.020973
116060:0.059659 2122.647461 1.000000
I have 1024 features for each example, and use i-1 as feature name for i feature.
My question is: Why I get 3 weights for each feature? Isn't it supposed to be only 1 weight? I'm a fresher to ML and get quite confused.
Simple SGD (stochastic gradient descent) needs just one parameter per feature, the weight. However, VW uses --adaptive --normalized --invariant update by default.
So in addition to the weight, it needs to store two other parameters per feature:
sum of gradients for --adaptive (AdaGrad-style update),
per-feature regularization for --normalized.
These two additional parameters are stored in the model (readable or binary) only if --save_resume is provided.
What is strange that if you run vw --sgd --save_resume, the readable model still contains three parameters per feature, although --sgd effectively means not-adaptive, not-normalized and not-invariant. I think this is a bug in the implementation of --readable_model.
EDIT: This last strange behavior was a bug indeed and it is fixed now.

Jmeter : Summary report : Throughput

is the total throughput shown in last row in Summary Report correct ? I m using Jmeter 2.11
I find it difficult to match the displayed figure by manipulation.
I followed the formula (x/sec) : Number of request / Total response time required (in sec)
Or 1/Avg total response time (sec).
for example : 50 request taking avg response time as 2000 ms each then throughput = 50/(50*2) = 0.5/sec
But Jmeter shows different value than 0.5/sec or 30/min
Can someone help me here?
I was also having similar assumption. But this is the formula for calculating throughput.
endTime = lastSampleStartTime + lastSampleLoadTime
startTime = firstSampleStartTime
converstion = unit time conversion value
Throughput = Numrequests / ((endTime - startTime)*conversion)
(I got this few months back from the below answer)
Calculating throughput from Jmeter jtl log file

Resources