Keras validation loss and metric inconsistent - validation

RNN model used for regression, cf. Chollet, Deep Learning with Python, 6.3.1 A temperature-forecasting problem
In this example I used random data, both regressors and regressand
I have used the mean absolute error, both as loss function and as a metric
I do not understand the values I get for val_loss and val_mean_absolute_error. Neither of them make sense to me.
code:
import tensorflow as tf
import numpy as np
from keras.models import Sequential
from keras import layers
from keras.optimizers import Adam
import keras
I use random input data:
data_np = np.random.rand(6400,10)
target_np = np.random.rand(6400,)
Normalizing the data:
mean1 = data_np[:].mean(axis=0)
std1 = data_np[:].std(axis=0)
data_np -= mean1
data_np /= std1
mean2 = target_np.mean(axis=0)
std2 = target_np.std(axis=0)
target_np -= mean2
target_np /= std2
Create RNN input with lookback:
lookback = 7
train_data = np.array([data_np[(i-lookback):i,:] for i in range(lookback,len(data_np))])
target_data = target_np[lookback:len(data_np)]
And then set up a simple RNN:
model = Sequential()
model.add(layers.SimpleRNN(64,
activation = 'relu',
return_sequences=False,
input_shape=(train_data.shape[1], train_data.shape[2])))
model.add(layers.Dense(1))
opt = Adam(learning_rate=0.1)
mae = tensorflow.keras.losses.MeanAbsoluteError()
model.compile(optimizer=opt, loss=mae, metrics=[mae])
history = model.fit(train_data, target_data,
steps_per_epoch=round(0.7*len(train_data))//64,
epochs=10,
shuffle=False,
validation_split=0.3,
validation_steps = round(0.3*len(train_data))//64,
verbose=1)
The output then looks like this:
Train on 3495 samples, validate on 1498 samples Epoch 1/10 54/54
[==============================] - 2s 38ms/step - loss: 0.7955 -
mean_absolute_error: 0.7955 - val_loss: 0.0428 -
val_mean_absolute_error: 22.6301 Epoch 2/10 54/54
[==============================] - 2s 30ms/step - loss: 0.7152 -
mean_absolute_error: 0.7152 - val_loss: 0.0421 -
val_mean_absolute_error: 22.2968
I would expect val_loss and val_mean_absolute_error to be the same. Moreover, the levels don't make much sense either. After 10 epochs, I get
Epoch 10/10 54/54 [==============================] - 2s 32ms/step -
loss: 0.7747 - mean_absolute_error: 0.7747 - val_loss: 0.0409 -
val_mean_absolute_error: 21.6337
If I calculate the mean absolute error manually:
N=len(data_np)
val_data = np.array([data_np[(i-lookback):i,:] for i in range(round(0.7*N),N)])
val_target = target_np[round(0.7*N):N]
model_output = model.predict(val_data)
model_output=[output[0] for output in model_output]
np.mean(abs(model_output-val_target))
0.940300949276649
This looks like a result that one could expect. However, it is not even close to either val_loss or val_mean_absolute_error. What is wrong here?

OK. I managed to solve the issue by consistently using tensorflow.keras. So, replacing
import tensorflow as tf
import numpy as np
from keras.models import Sequential
from keras import layers
from keras.optimizers import Adam
import keras
with
import tensorflow as tf
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
from tensorflow.keras.optimizers import Adam
import tensorflow.keras
(and corrected a couple of details in the original question)

Related

Forecasting validation loss flactuation

I have a question for those who have some experience with timeseries forecasting.
I have been experiment with this field for few weeks and i was trying to forecast some timeseries with both ARIMA and LSTM models to compare the results.
Basically i did plot this graph Figure 1 that has 4 plots :
Top left : ARIMA training data points and fitted model points.
Top right : ARIMA test and forecast points.
Bottom Left : LSTM training data and fitted data (i could not really find fitted point for LSTM so i just forecasted the training data but you can just ignore that part).
Bottom right : Test and forecast data for the LSTM model.
This graph was acceptable and also i did compute the RMSE and MSE and LSTM gave lower error which agrees with most literature online that states the superiority of LSTM over ARIMA models.
However after i did plot the loss and validation loss of the LSTM model to have more insights, i noticed that the validation_loss is following a wierd flectuating pattern Figure 2.
I can explain this as follow : the time series has a lot of outliers or abnormal behaviour, so splitting it to train/validation/test would mean validation cannot be really a good metric to show how good the model can learn.
But since all research papers never show this graph and explain this problem, i don't have a solid argument to defende this idea.
what do you guys think?
Thank you in advance
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf,plot_pacf
import statsmodels.api as sm
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import r2_score,mean_squared_error,mean_absolute_percentage_error,mean_absolute_percentage_error
from statsmodels.tsa.seasonal import STL
import numpy as np
from pandas import Series, DataFrame
from scipy import stats
from statsmodels.tsa.stattools import adfuller
import statsmodels
from statsmodels.tsa.seasonal import seasonal_decompose
from pandas.plotting import register_matplotlib_converters
import pmdarima as pm
register_matplotlib_converters()
import warnings
import time
from numpy import array
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from numpy import array
import keras_tuner as kt
import tensorflow as tf
print(tf.__version__)
from numpy import array
from tensorflow import keras
import keras_tuner as kt
from sklearn.preprocessing import MinMaxScaler
from keras.layers import Bidirectional
from keras.models import Sequential
from keras.preprocessing.sequence import TimeseriesGenerator
from keras.layers import Bidirectional
from tensorflow.keras import initializers
import random as rn
np.random.seed(123)
rn.seed(123)
tf.random.set_seed(123)
tf.keras.utils.set_random_seed(123)
keras.utils.set_random_seed(123)
warnings.filterwarnings('ignore')
df3 = pd.read_csv('favorita_train.csv')
## 1 - Get TS and do STL
print("TS lenbgth : "+str(len(df3)))
results = seasonal_decompose(df3['unit_sales'],period=30)
results.plot();
train_all = df3.iloc[:int(len(df3)*0.8)]
train = df3.iloc[:int(len(df3)*0.6)]
val = df3.iloc[int(len(df3)*0.6):int(len(df3)*0.8)]
test = df3.iloc[int(len(df3)*0.8):]
scaler = MinMaxScaler()
scaler.fit(train_all)
scaled_all = scaler.transform(df3)
scaled_train = scaler.transform(train)
scaled_train_all = scaler.transform(train_all)
scaled_val = scaler.transform(val)
scaled_test = scaler.transform(test)
# We do the same thing, but now instead for 12 months
n_features = 1
n_input =5
train_generator_all = TimeseriesGenerator(scaled_train_all, scaled_train_all, length=n_input, batch_size=1,shuffle=True)
train_generator = TimeseriesGenerator(scaled_train, scaled_train, length=n_input, batch_size=1,shuffle=True)
val_generator = TimeseriesGenerator(scaled_val, scaled_val, length=n_input, batch_size=1,shuffle=True)
adfPValue = adfuller(scaled_all)
adfPValue=adfPValue[1]
adi = len(scaled_all)/((scaled_all != 0).sum())
sd=scaled_all.std()
mean=scaled_all.mean()
cv2 = np.square(sd/mean)
print("CV2 (describe magnitude of demande variability <0.5 is good) :"+str(cv2))
print("SD (-2,2 is good | mean data variance is low) :"+str(sd))
print("ADI (1.3 or smaller means smooth ts) :"+str(adi))
print("Stationarity test (stationary if <0.05) :"+str(adfPValue))
def model_builder(hp):
model = keras.Sequential()
hp_units = hp.Int('units', min_value=1, max_value=50, step=1)
hp_layers = hp.Int('layers', min_value=1, max_value=3, step=1)
if hp_layers==1 :
model.add(Bidirectional(LSTM(hp_units,activation='relu'), input_shape=(n_input, n_features)))
elif hp_layers==2:
model.add(Bidirectional(LSTM(hp_units, activation='relu', return_sequences=True), input_shape=(n_input, n_features)))
model.add(Bidirectional(LSTM(hp_units, activation='relu')))
else:
model.add(Bidirectional(LSTM(hp_units, activation='relu', return_sequences=True), input_shape=(n_input, n_features)))
for i in range(hp_layers-2):
model.add(Bidirectional(LSTM(hp_units, activation='relu', return_sequences=True)))
model.add(Bidirectional(LSTM(hp_units, activation='relu')))
model.add(Dense(1))
hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])
model.compile(optimizer=keras.optimizers.Adam(learning_rate=hp_learning_rate), loss='mse',metrics=['accuracy'])
return model
tuner = kt.Hyperband(model_builder,
objective='val_loss',
max_epochs=300,
factor=3,
directory='499',
project_name='949',
seed=123)
stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=30)
tuner.search(train_generator, epochs=300, validation_data=val_generator, shuffle=True, callbacks=[stop_early], batch_size=len(train_generator))
best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]
print(best_hps.get('units'))
print(best_hps.get('layers'))
print(best_hps.get('window'))
print(best_hps.get('learning_rate'))
best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]
model = tuner.hypermodel.build(best_hps)
history = model.fit(img_train, label_train, epochs=50, validation_split=0.2)
val_acc_per_epoch = history.history['val_accuracy']
best_epoch = val_acc_per_epoch.index(max(val_acc_per_epoch)) + 1
print('Best epoch: %d' % (best_epoch,))

GEKKO: MHE load data of previous cycle

i am developing a model predictive controller (MPC) with a moving horizon estimation (MHE) Plugin for a dynamic simulation program.
My Problem is, that the simulation program executes the Python script in each timestep. So each timestep a new model in GEKKO is produced. Is there a possibility reload the model and the data files? So for example give the path of the data to GEKKO?
Best Regards,
Moritz
Try using a Pickle file to store the Gekko model. If the Gekko model archive exists then it is read back into Python.
from os.path import exists
import pickle
import numpy as np
from gekko import GEKKO
import matplotlib.pyplot as plt
if exists('m.pkl'):
# load model from subsequent call
m = pickle.load(open('m.pkl','rb'))
m.solve()
else:
# define model the first time
m = GEKKO()
m.time = np.linspace(0,20,41)
m.p = m.MV(value=0, lb=0, ub=1)
m.v = m.CV(value=0)
m.Equation(5*m.v.dt() == -m.v + 10*m.p)
m.options.IMODE = 6
m.p.STATUS = 1; m.p.DCOST = 1e-3
m.v.STATUS = 1; m.v.SP = 40; m.v.TAU = 5
m.options.CV_TYPE = 2
m.solve()
pickle.dump(m,open('m.pkl','wb'))
plt.figure()
plt.subplot(2,1,1)
plt.plot(m.time,m.p.value,'b-',lw=2)
plt.ylabel('gas')
plt.subplot(2,1,2)
plt.plot(m.time,m.v.value,'r--',lw=2)
plt.ylabel('velocity')
plt.xlabel('time')
plt.show()
Each cycle of the controller, the plot updates with the automatic time-shift of the initial condition.
This is similar to what happens in a loop with a combined MHE and MPC. As long as you include everything in the Pickle file, it should reload on the next cycle.
Here is the example code for MHE and MPC.

Two StatsModels modules have totally different 'end-runs'

I'm running StatsModels to estimate parameters of a multiple regression model, using county-level data for 3085 counties. When I use statsmodels.formula.api, and drop a few rows from the data, I get desired results. All seems well enough.
import pandas as pd
import numpy as np
import statsmodels.formula.api as sm
%matplotlib inline
from statsmodels.compat import lzip
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="whitegrid")
eg=pd.read_csv(r'C:/Users/user/anaconda3/une_edu_pipc_06.csv')
pd.options.display.precision = 3
plt.rc("figure", figsize=(16,8))
plt.rc("font", size=14)
sm_col = eg["lt_hsd_17"] + eg["hsd_17"]
eg["ut_hsd_17"] = sm_col
sm_col2 = eg["sm_col_17"] + eg["col_17"]
eg["bnd_hsd_17"] = sm_col2
eg["d_09"]= eg["Rate_09"]-eg["Rate_06"]
eg["d_10"]= eg["Rate_10"]-eg["Rate_06"]
inc_2=eg["p_c_inc_18"]*eg["p_c_inc_18"]
res = sm.ols(formula = "Rate_18 ~ p_c_inc_18 + ut_hsd_17 + d_10 + inc_2",
data=eg, missing='drop').fit()
print(res.summary()).
(BTW, eg["p_c_inc_18"]is per-capita income, and inc_2 is p_c_inc_18 squarred).
But when I wish to use import statsmodels.api as smas the module, everything else staying pretty much the same, and run the following code after all appropriate preliminaries,
inc_2=eg["p_c_inc_18"]*eg["p_c_inc_18"]
X = eg[["p_c_inc_18","ut_hsd_17","d_10","inc_2"]]
y = eg["Rate_18"]
X = sm.add_constant(X)
mod = sm.OLS(y, X)
res = mod.fit()
print(res.summary())
then things fall apart, and the Python interpreter throws an error, as follows:
[......]
KeyError: "['inc_2'] not in index"
BTW, the only difference between the two 'runs' is that 15 rows are dropped during the first, successful, model run, while I don't as yet know how to drop missing rows from the second model formulation. Could that difference be responsible for why the second run fails? (I chose to omit large parts of the error message, to reduce clutter.)
You need to assign inc_2 in your DataFrame.
inc_2=eg["p_c_inc_18"]*eg["p_c_inc_18"]
should be
eg["inc_2"] = eg["p_c_inc_18"]*eg["p_c_inc_18"]

Not able to fit a function with scipy.optimize.curve_fit()

I would like to fit the following function:
def invlaplace_stehfest2(time,EL,tau):
upsilon=0.25
pmax=6.9
E0=0.0154
M=8
results=[]
for t in time:
func=0
for k in range(1,2*M+1):
SUM=0
for j in range(int(math.floor((k+1)/2)),min(k,M)+1):
dummy=j**(M+1)*scipy.special.binom(M,j)*scipy.special.binom(2*j,j)*scipy.special.binom(j,k-j)/scipy.math.factorial(M)
SUM+=dummy
s=k*math.log(2)/t[enter image description here][1]
func+=(-1)**(M+k)*SUM*pmax*EL/(mp.exp(tau*s)*mp.expint(1,tau*s)*E0+EL)/s
func=func*math.log(2)/t
results.append(func)
return [float(i) for i in results]
To do so I use the following data:
data_time=np.array([69.0,99.0,139.0,179.0,219.0,259.0,295.5,299.03])
data_relax=np.array([6.2536,6.1652,6.0844,6.0253,5.9782,5.9404,5.9104,5.9066])
With the folowing guess:
guess=np.array([0.1,0.05])
And scipy.optimize.curve_fit() as folow:
Parameter,Covariance=scipy.optimize.curve_fit(invlaplace_stehfest2,data_time,data_relax,guess)
For A reason that I don't understand I am not able to fit the data correctly. I get the following graph.
Bad fitting
My function is undoubtedly correct since when I use the correct guess:
guess=np.array([0.33226685047281592707364253044085038793404361200072,8.6682623502960394383501102909774397295654841654769])
I am able to fit correctly my data.
Expected fitting
Any hint on why I am not able to fit correctly? Should I use another method?
Here is the hole program:
##############################################
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
import matplotlib.pylab as mplab
import math
from math import *
import numpy as np
import scipy
from scipy.optimize import curve_fit
import mpmath as mp
############################################################################################
def invlaplace_stehfest2(time,EL,tau):
upsilon=0.25
pmax=6.9
E0=0.0154
M=8
results=[]
for t in time:
func=0
for k in range(1,2*M+1):
SUM=0
for j in range(int(math.floor((k+1)/2)),min(k,M)+1):
dummy=j**(M+1)*scipy.special.binom(M,j)*scipy.special.binom(2*j,j)*scipy.special.binom(j,k-j)/scipy.math.factorial(M)
SUM+=dummy
s=k*math.log(2)/t
func+=(-1)**(M+k)*SUM*pmax*EL/(mp.exp(tau*s)*mp.expint(1,tau*s)*E0+EL)/s
func=func*math.log(2)/t
results.append(func)
return [float(i) for i in results]
############################################################################################
###Constant###
####Value####
data_time=np.array([69.0,99.0,139.0,179.0,219.0,259.0,295.5,299.03])
data_relax=np.array([6.2536,6.1652,6.0844,6.0253,5.9782,5.9404,5.9104,5.9066])
###Fitting###
guess=np.array([0.33226685047281592707364253044085038793404361200072,8.6682623502960394383501102909774397295654841654769])
#guess=np.array([0.1,0.05])
Parameter,Covariance=scipy.optimize.curve_fit(invlaplace_stehfest2,data_time,data_relax,guess)
print Parameter
residu=sum(data_relax-invlaplace_stehfest2(data_time,Parameter[0],Parameter[1]))
Graph_Curves=plt.figure()
ax = Graph_Curves.add_subplot(111)
ax.plot(data_time,invlaplace_stehfest2(data_time,Parameter[0],Parameter[1]),"-")
ax.plot(data_time,data_relax,"o")
plt.show()
Non-linear fitters such as the default Levenberg-Marquardt solver used in scipy.optimize.curve_fit(), like most iterative solvers, can stop in a local minimum in error space. If error space is "bumpy" then initial parameter estimates become very important, as in this case.
Scipy has added the Differential Evolution genetic algorithm to the optimize module, which can be used to determine initial parameter estimates for curve_fit(). Scipy's implementation uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, requiring parameter upper and lower bounds within which to search. As you can see below, I have used this scipy module to replace the hard-coded values for the value named "guess" in your code. This does not run quickly, but a somewhat slower correct result is much better than a fast wrong result. Try this code:
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
import matplotlib.pylab as mplab
import math
from math import *
import numpy as np
import scipy
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import mpmath as mp
############################################################################################
def invlaplace_stehfest2(time,EL,tau):
upsilon=0.25
pmax=6.9
E0=0.0154
M=8
results=[]
for t in time:
func=0
for k in range(1,2*M+1):
SUM=0
for j in range(int(math.floor((k+1)/2)),min(k,M)+1):
dummy=j**(M+1)*scipy.special.binom(M,j)*scipy.special.binom(2*j,j)*scipy.special.binom(j,k-j)/scipy.math.factorial(M)
SUM+=dummy
s=k*math.log(2)/t
func+=(-1)**(M+k)*SUM*pmax*EL/(mp.exp(tau*s)*mp.expint(1,tau*s)*E0+EL)/s
func=func*math.log(2)/t
results.append(func)
return [float(i) for i in results]
############################################################################################
###Constant###
####Value####
data_time=np.array([69.0,99.0,139.0,179.0,219.0,259.0,295.5,299.03])
data_relax=np.array([6.2536,6.1652,6.0844,6.0253,5.9782,5.9404,5.9104,5.9066])
# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
return np.sum((data_relax - invlaplace_stehfest2(data_time, *parameterTuple)) ** 2)
###Fitting###
#guess=np.array([0.33226685047281592707364253044085038793404361200072,8.6682623502960394383501102909774397295654841654769])
#guess=np.array([0.1,0.05])
parameterBounds = [[0.0, 1.0], [0.0, 10.0]]
# "seed" the numpy random number generator for repeatable results
# note the ".x" here to return only the parameter estimates in this example
guess = differential_evolution(sumOfSquaredError, parameterBounds, seed=3).x
Parameter,Covariance=scipy.optimize.curve_fit(invlaplace_stehfest2,data_time,data_relax,guess)
print Parameter
residu=sum(data_relax-invlaplace_stehfest2(data_time,Parameter[0],Parameter[1]))
Graph_Curves=plt.figure()
ax = Graph_Curves.add_subplot(111)
ax.plot(data_time,invlaplace_stehfest2(data_time,Parameter[0],Parameter[1]),"-")
ax.plot(data_time,data_relax,"o")
plt.show()

Scikit learn algorithms performing extremely poorly

I'm new to scikit learn and I'm banging my head against the wall. I've used both real world and test data and the scikit algorithms are not performing above chance level in predicting anything. I've tried knn, decision trees, svc and naive bayes.
Basically, I made a test data set consisting of a column of 0s and 1s, with all the 0s having a feature between 0 and .5 and all the 1s having a feature value between .5 and 1. This should be extremely easy and give near 100% accuracy. However, none of the algorithms are performing above chance level. Accurasies range from 45 to 55 %. I've already tried tweaking a whole bunch of parameters for every algorithm but noting helps. I think something is fundamentally wrong with my implementation.
Please help me out. Here's my code:
from sklearn.cross_validation import train_test_split
from sklearn import preprocessing
from sklearn.preprocessing import OneHotEncoder
from sklearn.metrics import accuracy_score
import sklearn
import pandas
import numpy as np
df=pandas.read_excel('Test.xlsx')
# Make data into np arrays
y = np.array(df[1])
y=y.astype(float)
y=y.reshape(399)
x = np.array(df[2])
x=x.astype(float)
x=x.reshape(399, 1)
# Creating training and test data
labels_train, labels_test = train_test_split(y)
features_train, features_test = train_test_split(x)
#####################################################################
# PERCEPTRON
#####################################################################
from sklearn import linear_model
perceptron=linear_model.Perceptron()
perceptron.fit(features_train, labels_train)
perc_pred=perceptron.predict(features_test)
print sklearn.metrics.accuracy_score(labels_test, perc_pred, normalize=True, sample_weight=None)
print 'perceptron'
#####################################################################
# KNN classifier
#####################################################################
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier()
knn.fit(features_train, labels_train)
knn_pred = knn.predict(features_test)
# Accuraatheid
print sklearn.metrics.accuracy_score(labels_test, knn_pred, normalize=True, sample_weight=None)
print 'knn'
#####################################################################
## SVC
#####################################################################
from sklearn.svm import SVC
from sklearn import svm
svm2 = SVC(kernel="linear")
svm2 = svm.SVC()
svm2.fit(features_train, labels_train)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3,
gamma=1.0, kernel='linear', max_iter=-1, probability=False,
random_state=None,
shrinking=True, tol=0.001, verbose=False)
svc_pred = svm2.predict(features_test)
print sklearn.metrics.accuracy_score(labels_test, svc_pred, normalize=True,
sample_weight=None)
#####################################################################
# Decision tree
#####################################################################
from sklearn import tree
clf = tree.DecisionTreeClassifier()
clf = clf.fit(features_train, labels_train)
tree_pred=clf.predict(features_test)
# Accuraatheid
print sklearn.metrics.accuracy_score(labels_test, tree_pred, normalize=True,
sample_weight=None)
print 'tree'
#####################################################################
# Naive bayes
#####################################################################
import sklearn
from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()
clf.fit(features_train, labels_train)
print "training time:", round(time()-t0, 3), "s"
GaussianNB()
bayes_pred = clf.predict(features_test)
print sklearn.metrics.accuracy_score(labels_test, bayes_pred,
normalize=True, sample_weight=None)
You seem to use train_test_split the wrong way.
labels_train, labels_test = train_test_split(y) #WRONG
features_train, features_test = train_test_split(x) #WRONG
the splitting of your labels and data isn't necessary the same. One easy way to split your data manually:
randomvec=np.random.rand(len(data))
randomvec=randomvec>0.5
train_data=data[randomvec]
train_label=labels[randomvec]
test_data=data[np.logical_not(randomvec)]
test_label=labels[np.logical_not(randomvec)]
or to use the scikit method properly:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.5, random_state=42)

Resources