Tensorflow - Keras, training on GPU is much slower then on CPU [duplicate]

Tensorflow - Keras, training on GPU is much slower then on CPU [duplicate] - performance

I have set up a simple linear regression problem in Tensorflow, and have created simple conda environments using Tensorflow CPU and GPU both in 1.13.1 (using CUDA 10.0 in the backend on an NVIDIA Quadro P600).
However, it looks like the GPU environment always takes longer time than the CPU environment. The code I'm running is below.
import time
import warnings
import numpy as np
import scipy
import tensorflow as tf
import tensorflow_probability as tfp
from tensorflow_probability import edward2 as ed
from tensorflow.python.ops import control_flow_ops
from tensorflow_probability import distributions as tfd
# Handy snippet to reset the global graph and global session.
def reset_g():
with warnings.catch_warnings():
warnings.simplefilter('ignore')
tf.reset_default_graph()
try:
sess.close()
except:
pass
N = 35000
inttest = np.ones(N).reshape(N, 1)
stddev_raw = 0.09
true_int = 1.
true_b1 = 0.15
true_b2 = 0.7
np.random.seed(69)
X1 = (np.atleast_2d(np.linspace(
0., 2., num=N)).T).astype(np.float64)
X2 = (np.atleast_2d(np.linspace(
2., 1., num=N)).T).astype(np.float64)
Ytest = true_int + (true_b1*X1) + (true_b2*X2) + \
np.random.normal(size=N, scale=stddev_raw).reshape(N, 1)
Ytest = Ytest.reshape(N, )
X1 = X1.reshape(N, )
X2 = X2.reshape(N, )
reset_g()
# Create data and param
model_X1 = tf.placeholder(dtype=tf.float64, shape=[N, ])
model_X2 = tf.placeholder(dtype=tf.float64, shape=[N, ])
model_Y = tf.placeholder(dtype=tf.float64, shape=[N, ])
alpha = tf.get_variable(shape=[1], name='alpha', dtype=tf.float64)
# these two params need shape of one if using trainable distro
beta1 = tf.get_variable(shape=[1], name='beta1', dtype=tf.float64)
beta2 = tf.get_variable(shape=[1], name='beta2', dtype=tf.float64)
# Yhat
tf_pred = (tf.multiply(model_X1, beta1) + tf.multiply(model_X2, beta2) + alpha)
# # Make difference of squares
# resid = tf.square(model_Y - tf_pred)
# loss = tf.reduce_sum(resid)
# # Make a Likelihood function based on simple stuff
stddev = tf.square(tf.get_variable(shape=[1],
name='stddev', dtype=tf.float64))
covar = tfd.Normal(loc=model_Y, scale=stddev)
loss = -1.0*tf.reduce_sum(covar.log_prob(tf_pred))
# Trainer
lr=0.005
N_ITER = 20000
opt = tf.train.AdamOptimizer(lr, beta1=0.95, beta2=0.95)
train = opt.minimize(loss)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
start = time.time()
for step in range(N_ITER):
out_l, out_b1, out_b2, out_a, laws = sess.run([train, beta1, beta2, alpha, loss],
feed_dict={model_X1: X1,
model_X2: X2,
model_Y: Ytest})
if step % 500 == 0:
print('Step: {s}, loss = {l}, alpha = {a:.3f}, beta1 = {b1:.3f}, beta2 = {b2:.3f}'.format(
s=step, l=laws, a=out_a[0], b1=out_b1[0], b2=out_b2[0]))
print(f"True: alpha = {true_int}, beta1 = {true_b1}, beta2 = {true_b2}")
end = time.time()
print(end-start)
Here are some outputs printed if they're any indicative of what's happening:
For the CPU run:
Colocations handled automatically by placer.
2019-04-18 09:00:56.329669: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-04-18 09:00:56.351151: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2904000000 Hz
2019-04-18 09:00:56.351672: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x558fefe604c0 executing computations on platform Host. Devices:
2019-04-18 09:00:56.351698: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined>
For the GPU run:
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0418 09:03:21.674947 139956864096064 deprecation.py:506] From /home/sadatnfs/.conda/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/training/slot_creator.py:187: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
2019-04-18 09:03:21.712913: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-04-18 09:03:21.717598: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-04-18 09:03:21.951277: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1009] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-04-18 09:03:21.952212: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55e583bc4480 executing computations on platform CUDA. Devices:
2019-04-18 09:03:21.952225: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Quadro P600, Compute Capability 6.1
2019-04-18 09:03:21.971218: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2904000000 Hz
2019-04-18 09:03:21.971816: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55e58577f290 executing computations on platform Host. Devices:
2019-04-18 09:03:21.971842: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
2019-04-18 09:03:21.972102: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1551] Found device 0 with properties:
name: Quadro P600 major: 6 minor: 1 memoryClockRate(GHz): 1.5565
pciBusID: 0000:01:00.0
totalMemory: 1.95GiB freeMemory: 1.91GiB
2019-04-18 09:03:21.972147: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1674] Adding visible gpu devices: 0
2019-04-18 09:03:21.972248: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-04-18 09:03:21.973094: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1082] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-18 09:03:21.973105: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1088] 0
2019-04-18 09:03:21.973110: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1101] 0: N
2019-04-18 09:03:21.973279: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1222] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1735 MB memory) -> physical GPU (device: 0, name: Quadro P600, pci bus id: 0000:01:00.0, compute capability: 6.1)
I am about to post another question about implementing CUBLAS in R as well because that was giving me slow speed times compared to Intel MKL, but I'm hoping that maybe there's a clear cut reason why even something as well built as TF (compared to hacky R and CUBLAS patching) is being slow with GPU.
EDIT: Following Vlad's suggestion, I wrote up the following script to try and throw some large sized objects and training it, but I think I might not be setting it up correctly because the CPU one in this case even as the size of the matrices are increasing. Any suggestions perhaps?
import time
import warnings
import numpy as np
import scipy
import tensorflow as tf
import tensorflow_probability as tfp
from tensorflow_probability import edward2 as ed
from tensorflow.python.ops import control_flow_ops
from tensorflow_probability import distributions as tfd
np.random.seed(69)
# Handy snippet to reset the global graph and global session.
def reset_g():
with warnings.catch_warnings():
warnings.simplefilter('ignore')
tf.reset_default_graph()
try:
sess.close()
except:
pass
# Loop over the different number of feature columns
for x_feat in [30, 50, 100, 1000, 10000]:
y_feat=10;
# Simulate data
N = 5000
inttest = np.ones(N).reshape(N, 1)
stddev_raw = np.random.uniform(0.01, 0.25, size=y_feat)
true_int = np.linspace(0.1 ,1., num=y_feat)
xcols = x_feat
true_bw = np.random.randn(xcols, y_feat)
true_X = np.random.randn(N, xcols)
true_errorcov = np.eye(y_feat)
np.fill_diagonal(true_errorcov, stddev_raw)
true_Y = true_int + np.matmul(true_X, true_bw) + \
np.random.multivariate_normal(mean=np.array([0 for i in range(y_feat)]),
cov=true_errorcov,
size=N)
## Our model is:
## Y = a + b*X + error where, for N=5000 observations:
## Y : 10 outputs;
## X : 30,50,100,1000,10000 features
## a, b = bias and weights
## error: just... error
# Number of iterations
N_ITER = 1001
# Training rate
lr=0.005
with tf.device('gpu'):
# Create data and weights
model_X = tf.placeholder(dtype=tf.float64, shape=[N, xcols])
model_Y = tf.placeholder(dtype=tf.float64, shape=[N, y_feat])
alpha = tf.get_variable(shape=[y_feat], name='alpha', dtype=tf.float64)
# these two params need shape of one if using trainable distro
betas = tf.get_variable(shape=[xcols, y_feat], name='beta1', dtype=tf.float64)
# Yhat
tf_pred = alpha + tf.matmul(model_X, betas)
# Make difference of squares (loss fn) [CONVERGES TO TRUTH]
resid = tf.square(model_Y - tf_pred)
loss = tf.reduce_sum(resid)
# Trainer
opt = tf.train.AdamOptimizer(lr, beta1=0.95, beta2=0.95)
train = opt.minimize(loss)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
start = time.time()
for step in range(N_ITER):
out_l, laws = sess.run([train, loss], feed_dict={model_X: true_X, model_Y: true_Y})
if step % 500 == 0:
print('Step: {s}, loss = {l}'.format(
s=step, l=laws))
end = time.time()
print("y_feat: {n}, x_feat: {x2}, Time elapsed: {te}".format(n = y_feat, x2 = x_feat, te = end-start))
reset_g()

As I said in a comment, the overhead of invoking GPU kernels, and copying data to and from GPU, is very high. For operations on models with very little parameters it is not worth of using GPU since frequency of CPU cores is much higher. If you compare matrix multiplication (this is what DL mostly does), you will see that for large matrices GPU outperforms CPU significantly.
Take a look at this plot. X-axis are the sizes of two square matrices and y-axis is time took to multiply those matrices on GPU and on CPU. As you can see at the beginning, for small matrices the blue line is higher, meaning that it was faster on CPU. But as we increase the size of the matrices the benefit from using GPU increases significantly.
The code to reproduce:
import tensorflow as tf
import time
cpu_times = []
sizes = [1, 10, 100, 500, 1000, 2000, 3000, 4000, 5000, 8000, 10000]
for size in sizes:
tf.reset_default_graph()
start = time.time()
with tf.device('cpu:0'):
v1 = tf.Variable(tf.random_normal((size, size)))
v2 = tf.Variable(tf.random_normal((size, size)))
op = tf.matmul(v1, v2)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(op)
cpu_times.append(time.time() - start)
print('cpu time took: {0:.4f}'.format(time.time() - start))
import tensorflow as tf
import time
gpu_times = []
for size in sizes:
tf.reset_default_graph()
start = time.time()
with tf.device('gpu:0'):
v1 = tf.Variable(tf.random_normal((size, size)))
v2 = tf.Variable(tf.random_normal((size, size)))
op = tf.matmul(v1, v2)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(op)
gpu_times.append(time.time() - start)
print('gpu time took: {0:.4f}'.format(time.time() - start))
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(8, 6))
ax.plot(sizes, gpu_times, label='GPU')
ax.plot(sizes, cpu_times, label='CPU')
plt.xlabel('MATRIX SIZE')
plt.ylabel('TIME (sec)')
plt.legend()
plt.show()

Select your Device using tf.device()
with tf.device('/cpu:0'):
#enter code here of tf data
On a typical system, there are multiple computing devices. In TensorFlow, the supported device types are CPU and GPU. They are represented as strings. For example:
"/cpu:0": The CPU of your machine.
"/device:GPU:0": The GPU of your machine, if you have one.
"/device:GPU:1": The second GPU of your machine, etc.
GPU:
with tf.device('/device:GPU:0'):
#code here: tf data and model
Reference: Link

to reproduce answer from first comment for tensorflow 2.0+
tf.compat.v1.disable_eager_execution()
cpu_times = []
sizes = [1, 10, 100, 500, 1000, 2000, 3000, 4000, 5000, 8000, 10000]
for size in sizes:
ops.reset_default_graph()
start = time.time()
with tf.device('cpu:0'):
v1 = tf.Variable(tf.random.normal((size, size)))
v2 = tf.Variable(tf.random.normal((size, size)))
op = tf.matmul(v1, v2)
with tf.compat.v1.Session() as sess:
sess.run(tf.compat.v1.global_variables_initializer())
sess.run(op)
cpu_times.append(time.time() - start)
print('cpu time took: {0:.4f}'.format(time.time() - start))
import tensorflow as tf
import time
gpu_times = []
for size in sizes:
ops.reset_default_graph()
start = time.time()
with tf.device('gpu:0'):
v1 = tf.Variable(tf.random.normal((size, size)))
v2 = tf.Variable(tf.random.normal((size, size)))
op = tf.matmul(v1, v2)
with tf.compat.v1.Session() as sess:
sess.run(tf.compat.v1.global_variables_initializer())
sess.run(op)
gpu_times.append(time.time() - start)
print('gpu time took: {0:.4f}'.format(time.time() - start))
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(8, 6))
ax.plot(sizes, gpu_times, label='GPU')
ax.plot(sizes, cpu_times, label='CPU')
plt.xlabel('MATRIX SIZE')
plt.ylabel('TIME (sec)')
plt.legend()
plt.show()

Your model is very small, so the overhead of transferring data to GPU and back to CPU outweights the speed up.

The code below will produce the exact same thing what you saw with the acccepted answer but this code is fully tested for version 2.10 of tensorflow (also the above code by #Art does not work in later version of tensorflow 2.x)
import tensorflow as tf
import time
import matplotlib.pyplot as plt
#something of an extra or the below code will produce an error as "Tensor.graph is undefined when eager execution is enabled."
#this code is needed to not let tensorflow produce error for the cpu part of code or the gpu part.
#the reason for this error is because Session does not work with either eager execution or tf.function, and you should not invoke it directly.
tf.compat.v1.disable_eager_execution()
cpu_times = []
sizes = [1, 10, 100, 500, 1000, 2000, 3000, 4000, 5000, 8000, 10000]
for size in sizes:
tf.compat.v1.reset_default_graph()
start = time.time()
with tf.device('cpu:0'):
v1 = tf.Variable(tf.random.normal((size, size)))
v2 = tf.Variable(tf.random.normal((size, size)))
op = tf.matmul(v1, v2)
with tf.compat.v1.Session() as sess:
sess.run(tf.compat.v1.global_variables_initializer())
sess.run(op)
cpu_times.append(time.time() - start)
print('cpu time took: {0:.4f}'.format(time.time() - start))
gpu_times = []
for size in sizes:
tf.compat.v1.reset_default_graph()
start = time.time()
with tf.device('gpu:0'):
v1 = tf.Variable(tf.random.normal((size, size)))
v2 = tf.Variable(tf.random.normal((size, size)))
op = tf.matmul(v1, v2)
with tf.compat.v1.Session() as sess:
sess.run(tf.compat.v1.global_variables_initializer())
sess.run(op)
gpu_times.append(time.time() - start)
print('gpu time took: {0:.4f}'.format(time.time() - start))
fig, ax = plt.subplots(figsize=(8, 6))
ax.plot(sizes, gpu_times, label='GPU')
ax.plot(sizes, cpu_times, label='CPU')
plt.xlabel('MATRIX SIZE')
plt.ylabel('TIME (sec)')
plt.legend()
plt.show()

Related

Using GEKKO for Moving Horizon Estimation online 2

This is the following question after appyling comments from: Using GEKKO for Moving Horizon Estimation online
I have studied example from estimation iterative example on the Dynamic Optimization course website and revised my code as follows:
from gekko import GEKKO
import numpy as np
import matplotlib.pyplot as plt
import matplotlib; matplotlib.use('TkAgg')
class Observer():
def __init__(self, window_size, r_init, alpha_init):
self.m = GEKKO(remote=False)
self.dt = 0.05
self.m.time = [i*self.dt for i in range(window_size)]
#Parameters
self.m.u = self.m.MV()
#Variables
self.m.r = self.m.CV(lb=0) # value=r_init) #ub=20 can be over 20
self.m.alpha = self.m.CV() # value=alpha_init) #ub lb for angle?
#Equations
self.m.Equation(self.m.r.dt()== -self.m.cos(self.m.alpha))
self.m.Equation(self.m.alpha.dt()== self.m.sin(self.m.alpha)/self.m.r - self.m.u) # differential equation
#Options
self.m.options.MV_STEP_HOR = 2
self.m.options.IMODE = 5 # dynamic estimation
self.m.options.EV_TYPE = 2 #Default 1: absolute error form 2: squared error form
self.m.options.DIAGLEVEL = 0 #diagnostic level
self.m.options.NODES = 5 #nodes # collocation nodes default:2
self.m.options.SOLVER = 3 #solver_num
# STATUS = 0, optimizer doesn't adjust value
# STATUS = 1, optimizer can adjust
self.m.u.STATUS = 0
self.m.r.STATUS = 1
self.m.alpha.STATUS = 1
# FSTATUS = 0, no measurement
# FSTATUS = 1, measurement used to update model
self.m.u.FSTATUS = 1 #default
self.m.r.FSTATUS = 1
self.m.alpha.FSTATUS = 1
self.m.r.TR_INIT = 0
self.m.alpha.TR_INIT = 0
self.count = 0
def MHE(self, observed_state, u_data):
self.count =+ 1
self.m.u.MEAS = u_data
self.m.r.MEAS = observed_state[0]
self.m.alpha.MEAS = observed_state[1]
self.m.solve(disp=False)
return self.m.r.MODEL, self.m.alpha.MODEL
if __name__=="__main__":
FILE_PATH00 = '/home/shane16/Project/model_guard/uav_paper/adversarial/SA_PPO/src/DATA/4end_estimation_results_r.csv'
FILE_PATH01 = '/home/shane16/Project/model_guard/uav_paper/adversarial/SA_PPO/src/DATA/4end_estimation_results_alpha.csv'
FILE_PATH02 = '/home/shane16/Project/model_guard/uav_paper/adversarial/SA_PPO/src/DATA/4end_action_buffer_eps0.0_sig0.0.csv'
cycles = 55
x = np.arange(cycles) # 1...300
matrix00 = np.loadtxt(FILE_PATH00, delimiter=',')
matrix01 = np.loadtxt(FILE_PATH01, delimiter=',')
matrix02 = np.loadtxt(FILE_PATH02, delimiter=',')
vanilla_action_sigma_0 = matrix02
vanilla_estimation_matrix_r = np.zeros(cycles)
vanilla_estimation_matrix_alpha = np.zeros(cycles)
# sigma = 0.0
# vanilla model true/observed states
r_vanilla_sigma_0_true = matrix00[0, 3:] # from step 1
r_vanilla_sigma_0_observed = matrix00[1, 3:] # from step1
alpha_vanilla_sigma_0_true = matrix01[0, 3:]
alpha_vanilla_sigma_0_observed = matrix01[1, 3:]
# initialize estimator
sigma = 0.0 #1.0
solver_num = 3
nodes = 5
# for window_size in [5, 10, 20, 30, 40, 50]:
window_size = 5
observer = Observer(window_size, r_vanilla_sigma_0_observed[0], alpha_vanilla_sigma_0_observed[0])
for i in range(cycles):
if i % 100 == 0:
print('cylcle: {}'.format(i))
vanilla_observed_states = np.hstack((r_vanilla_sigma_0_observed[i], alpha_vanilla_sigma_0_observed[i])) # from current observed state
r_hat, alpha_hat = observer.MHE(vanilla_observed_states, vanilla_action_sigma_0[i]) # and current action -> estimate current state
vanilla_estimation_matrix_r[i] = r_hat
vanilla_estimation_matrix_alpha[i] = alpha_hat
#plot vanilla
plt.figure()
plt.subplot(3,1,1)
plt.title('Vanilla model_sig{}'.format(sigma))
plt.plot(x, vanilla_action_sigma_0[:cycles],'b:',label='action (w)')
plt.legend()
plt.subplot(3,1,2)
plt.ylabel('r')
plt.plot(x, r_vanilla_sigma_0_true[:cycles], 'k-', label='true_r')
plt.plot(x, r_vanilla_sigma_0_observed[:cycles], 'gx', label='observed_r')
plt.plot(x, vanilla_estimation_matrix_r, 'r--', label='time window: 10')
# plt.legend()
plt.subplot(3,1,3)
plt.xlabel('time steps')
plt.ylabel('alpha')
plt.plot(x, alpha_vanilla_sigma_0_true[:cycles], 'k-', label='true_alpha')
plt.plot(x, alpha_vanilla_sigma_0_observed[:cycles], 'gx', label='observed_alpha')
plt.plot(x, vanilla_estimation_matrix_alpha, 'r--', label='time window: {}'.format(window_size))
plt.legend()
plt.savefig('plot/revision/4estimated_STATES_vanilla_sig{}_window{}_cycles{}_solver{}_nodes{}.png'.format(sigma, window_size,cycles, solver_num, nodes))
plt.show()
csv files: https://drive.google.com/drive/folders/1jW_6zBCdbJHB7yU3HmCIhamEyOT1LJqD?usp=sharing
The code works when initialized with values specified at line 15,16 (m.r, m.alpha).
However, if I try with no initial value,(as same condition in example), solution is not found.
terminal output:
cylcle: 0 Traceback (most recent call last): File
"4observer_mhe.py", line 86, in
r_hat, alpha_hat = observer.MHE(vanilla_observed_states, vanilla_action_sigma_0[i]) # and current action -> estimate current
state File "4observer_mhe.py", line 49, in MHE
self.m.solve(disp=False) File "/home/shane16/Project/model_guard/LipSDP/lipenv/lib/python3.7/site-packages/gekko/gekko.py",
line 2140, in solve
raise Exception(apm_error) Exception: #error: Solution Not Found
What could be the solution to this problem?
I have tried below strategies, but couldn't find the solution.
Reduce the number of decision variables by using m.FV() or m.MV() with m.options.MV_STEP_HOR=2+ to reduce the degrees of freedom for the solver for the unknown parameters.
Try other solvers with m.options.SOLVER=1 or m.options.SOLVER=2.
I expect to see estimation results that follow the true state well.
But I guess I'm doing something wrong.
Could anyone help me please?
Thank you.

Solvers sometimes need good initial guess values or constraints (lower and upper bounds) on the degrees of freedom (MV or FV) to find the optimal solution.
One of the equations may be the source of the problem:
self.m.alpha.dt() == self.m.sin(self.m.alpha)/self.m.r - self.m.u
The initial value of r is zero (default) because no initial value is provided when it is declared as self.m.r = self.m.CV(lb=0). A comment suggests that it was formerly initialized with value r_init. The zero value creates a divide-by-zero for that equation. Try rearranging the equation into an equivalent form that avoids the potential for divide-by-zero either with the initial guess or when the solver is iterating.
self.m.r*self.m.alpha.dt() == self.m.sin(self.m.alpha) - self.m.r*self.m.u
There may be other things that are also causing the model to not converge. When the solution does not converge then the infeasibilities.txt file can be a source to troubleshoot the specific equations that are having trouble. Here are instructions to retrieve the infeasibilities.txt file: How to retrieve the 'infeasibilities.txt' from the gekko

How to set up GEKKO for parameter estimation from multiple independent sets of data?

I am learning how to use GEKKO for kinetic parameter estimation based on laboratory batch reactor data, which essentially consists of the concentration profiles of three species A, C, and P. For the purposes of my question, I am using a model that I previously featured in a question related to parameter estimation from a single data set.
My ultimate goal is to be able to use multiple experimental runs for parameter estimation, leveraging data that may be collected at different temperatures, species concentrations, etc. Due to the independent nature of individual batch reactor experiments, each data set features samples collected at different time points. These different time points (and in the future, different temperatures for instance) are difficult for me to implement into a GEKKO model, as I previosly used the experimental data collection time points as the m.time parameter for the GEKKO model. (See end of post for code) I have solved problems like this in the past with gPROMS and Athena Visual Studio.
To illustrate my problem, I generated an artificial data set of 'experimental' data from my original model by introducing noise to the species concentration profiles, and shifting the experimental time points slightly. I then combined all data sets of the same experimental species into new arrays featuring multiple columns. My thought process here was that GEKKO would carry out the parameter estimation by using the experimental data of each corresponding column of the arrays, so that times_comb[:,0] would be related to A_comb[:,0] while times_comb[:,1] would be related to A_comb[:,1].
When I attempt to run the GEKKO model, the system does obtain a solution for the parameter estimation, but it is unclear to me if the problem solution is reasonable, as I notice that the GEKKO Variables A, B, C, and P are 34 element vectors, which is double the elements in each of the experimental data sets. I presume GEKKO is somehow combining both columns of the time and Parameter vectors during model setup that leads to those 34 element variables? I am also concerned that during this combination of the columns of each input parameter, that the relationship between a certain time point and the collected species information is lost.
How could I improve the use of multiple data sets that GEKKO can simultaneously use for parameter estimation, with the consideration that the time points of each data set may be different? I looked on the GEKKO documentation examples as well as the APMonitor website, but I could not find examples featuring multiple data sets that I could use for guidance, as I am fairly new to the GEKKO package.
Thank you for your time reading my question and for any help/ideas you may have.
Code below:
import numpy as np
import matplotlib.pyplot as plt
from gekko import GEKKO
#Experimental data
times = np.array([0.0, 0.071875, 0.143750, 0.215625, 0.287500, 0.359375, 0.431250,
0.503125, 0.575000, 0.646875, 0.718750, 0.790625, 0.862500,
0.934375, 1.006250, 1.078125, 1.150000])
A_obs = np.array([1.0, 0.552208, 0.300598, 0.196879, 0.101175, 0.065684, 0.045096,
0.028880, 0.018433, 0.011509, 0.006215, 0.004278, 0.002698,
0.001944, 0.001116, 0.000732, 0.000426])
C_obs = np.array([0.0, 0.187768, 0.262406, 0.350412, 0.325110, 0.367181, 0.348264,
0.325085, 0.355673, 0.361805, 0.363117, 0.327266, 0.330211,
0.385798, 0.358132, 0.380497, 0.383051])
P_obs = np.array([0.0, 0.117684, 0.175074, 0.236679, 0.234442, 0.270303, 0.272637,
0.274075, 0.278981, 0.297151, 0.297797, 0.298722, 0.326645,
0.303198, 0.277822, 0.284194, 0.301471])
#Generate second set of 'experimental data'
times_new = times + np.random.uniform(0.0,0.01)
P_obs_noisy = P_obs+np.random.normal(0,0.05,P_obs.shape)
A_obs_noisy = A_obs+np.random.normal(0,0.05,A_obs.shape)
C_obs_noisy = A_obs+np.random.normal(0,0.05,C_obs.shape)
#Combine two data sets into multi-column arrays
times_comb = np.array([times, times_new]).T
P_comb = np.array([P_obs, P_obs_noisy]).T
A_comb = np.array([A_obs, A_obs_noisy]).T
C_comb = np.array([C_obs, C_obs_noisy]).T
m = GEKKO(remote=False)
t = m.time = times_comb #using two column time array
Am = m.Param(value=A_comb) #Using the two column data as observed parameter
Cm = m.Param(value=C_comb)
Pm = m.Param(value=P_comb)
A = m.Var(1, lb = 0)
B = m.Var(0, lb = 0)
C = m.Var(0, lb = 0)
P = m.Var(0, lb = 0)
k = m.Array(m.FV,6,value=1,lb=0)
for ki in k:
ki.STATUS = 1
k1,k2,k3,k4,k5,k6 = k
r1 = m.Var(0, lb = 0)
r2 = m.Var(0, lb = 0)
r3 = m.Var(0, lb = 0)
r4 = m.Var(0, lb = 0)
r5 = m.Var(0, lb = 0)
r6 = m.Var(0, lb = 0)
m.Equation(r1 == k1 * A)
m.Equation(r2 == k2 * A * B)
m.Equation(r3 == k3 * C * B)
m.Equation(r4 == k4 * A)
m.Equation(r5 == k5 * A)
m.Equation(r6 == k6 * A * B)
#mass balance diff eqs, function calls rxn function
m.Equation(A.dt() == - r1 - r2 - r4 - r5 - r6)
m.Equation(B.dt() == r1 - r2 - r3 - r6)
m.Equation(C.dt() == r2 - r3 + r4)
m.Equation(P.dt() == r3 + r5 + r6)
m.Minimize((A-Am)**2)
m.Minimize((P-Pm)**2)
m.Minimize((C-Cm)**2)
m.options.IMODE = 5
m.options.SOLVER = 3 #IPOPT optimizer
m.options.NODES = 6
m.solve()
k_opt = []
for ki in k:
k_opt.append(ki.value[0])
print(k_opt)
plt.plot(t,A)
plt.plot(t,C)
plt.plot(t,P)
plt.plot(t,B)
plt.plot(times,A_obs,'bo')
plt.plot(times,C_obs,'gx')
plt.plot(times,P_obs,'rs')
plt.plot(times_new, A_obs_noisy,'b*')
plt.plot(times_new, C_obs_noisy,'g*')
plt.plot(times_new, P_obs_noisy,'r*')
plt.show()

To have multiple data sets with different times and data points, you can join the data sets as a pandas dataframe. Here is a simple example:
# data set 1
t_data1 = [0.0, 0.1, 0.2, 0.4, 0.8, 1.00]
x_data1 = [2.0, 1.6, 1.2, 0.7, 0.3, 0.15]
# data set 2
t_data2 = [0.0, 0.15, 0.25, 0.45, 0.85, 0.95]
x_data2 = [3.6, 2.25, 1.75, 1.00, 0.35, 0.20]
The merged data has NaN where the data is missing:
x1 x2
Time
0.00 2.0 3.60
0.10 1.6 NaN
0.15 NaN 2.25
0.20 1.2 NaN
0.25 NaN 1.75
Take note of where the data is missing with a 1=measured and 0=not measured.
# indicate which points are measured
z1 = (data['x1']==data['x1']).astype(int) # 0 if NaN
z2 = (data['x2']==data['x2']).astype(int) # 1 if number
The final step is to set up Gekko variables, equations, and objective to accommodate the data sets.
xm = m.Array(m.Param,2)
zm = m.Array(m.Param,2)
for i in range(2):
m.Equation(x[i].dt()== -k * x[i]) # differential equations
m.Minimize(zm[i]*(x[i]-xm[i])**2) # objectives
You can also calculate the initial condition with m.free_initial(x[i]). This gives an optimal solution for one parameter value (k) over the 2 data sets. This approach can be expanded to multiple variables or multiple data sets with different times.
from gekko import GEKKO
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# data set 1
t_data1 = [0.0, 0.1, 0.2, 0.4, 0.8, 1.00]
x_data1 = [2.0, 1.6, 1.2, 0.7, 0.3, 0.15]
# data set 2
t_data2 = [0.0, 0.15, 0.25, 0.45, 0.85, 0.95]
x_data2 = [3.6, 2.25, 1.75, 1.00, 0.35, 0.20]
# combine with dataframe join
data1 = pd.DataFrame({'Time':t_data1,'x1':x_data1})
data2 = pd.DataFrame({'Time':t_data2,'x2':x_data2})
data1.set_index('Time', inplace=True)
data2.set_index('Time', inplace=True)
data = data1.join(data2,how='outer')
print(data.head())
# indicate which points are measured
z1 = (data['x1']==data['x1']).astype(int) # 0 if NaN
z2 = (data['x2']==data['x2']).astype(int) # 1 if number
# replace NaN with any number (0)
data.fillna(0,inplace=True)
m = GEKKO(remote=False)
# measurements
xm = m.Array(m.Param,2)
xm[0].value = data['x1'].values
xm[1].value = data['x2'].values
# index for objective (0=not measured, 1=measured)
zm = m.Array(m.Param,2)
zm[0].value=z1
zm[1].value=z2
m.time = data.index
x = m.Array(m.Var,2) # fit to measurement
x[0].value=x_data1[0]; x[1].value=x_data2[0]
k = m.FV(); k.STATUS = 1 # adjustable parameter
for i in range(2):
m.free_initial(x[i]) # calculate initial condition
m.Equation(x[i].dt()== -k * x[i]) # differential equations
m.Minimize(zm[i]*(x[i]-xm[i])**2) # objectives
m.options.IMODE = 5 # dynamic estimation
m.options.NODES = 2 # collocation nodes
m.solve(disp=True) # solve
k = k.value[0]
print('k = '+str(k))
# plot solution
plt.plot(m.time,x[0].value,'b.--',label='Predicted 1')
plt.plot(m.time,x[1].value,'r.--',label='Predicted 2')
plt.plot(t_data1,x_data1,'bx',label='Measured 1')
plt.plot(t_data2,x_data2,'rx',label='Measured 2')
plt.legend(); plt.xlabel('Time'); plt.ylabel('Value')
plt.xlabel('Time');
plt.show()

Including my updated code (not fully cleaned up to minimize number of variables) incorporating the selected answer to my question for reference. The model does a regression of 3 measured species in two separate 'datasets.'
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from gekko import GEKKO
#Experimental data
times = np.array([0.0, 0.071875, 0.143750, 0.215625, 0.287500, 0.359375, 0.431250,
0.503125, 0.575000, 0.646875, 0.718750, 0.790625, 0.862500,
0.934375, 1.006250, 1.078125, 1.150000])
A_obs = np.array([1.0, 0.552208, 0.300598, 0.196879, 0.101175, 0.065684, 0.045096,
0.028880, 0.018433, 0.011509, 0.006215, 0.004278, 0.002698,
0.001944, 0.001116, 0.000732, 0.000426])
C_obs = np.array([0.0, 0.187768, 0.262406, 0.350412, 0.325110, 0.367181, 0.348264,
0.325085, 0.355673, 0.361805, 0.363117, 0.327266, 0.330211,
0.385798, 0.358132, 0.380497, 0.383051])
P_obs = np.array([0.0, 0.117684, 0.175074, 0.236679, 0.234442, 0.270303, 0.272637,
0.274075, 0.278981, 0.297151, 0.297797, 0.298722, 0.326645,
0.303198, 0.277822, 0.284194, 0.301471])
#Generate second set of 'experimental data'
times_new = times + np.random.uniform(0.0,0.01)
P_obs_noisy = (P_obs+ np.random.normal(0,0.05,P_obs.shape))
A_obs_noisy = (A_obs+np.random.normal(0,0.05,A_obs.shape))
C_obs_noisy = (C_obs+np.random.normal(0,0.05,C_obs.shape))
#Combine two data sets into multi-column arrays using pandas DataFrames
#Set dataframe index to be combined time discretization of both data sets
exp1 = pd.DataFrame({'Time':times,'A':A_obs,'C':C_obs,'P':P_obs})
exp2 = pd.DataFrame({'Time':times_new,'A':A_obs_noisy,'C':C_obs_noisy,'P':P_obs_noisy})
exp1.set_index('Time',inplace=True)
exp2.set_index('Time',inplace=True)
exps = exp1.join(exp2, how ='outer',lsuffix = '_1',rsuffix = '_2')
#print(exps.head())
#Combine both data sets into a single data frame
meas_data = pd.DataFrame().reindex_like(exps)
#define measurement locations for each data set, with NaN written for time points
#not common in both data sets
for cols in exps:
meas_data[cols] = (exps[cols]==exps[cols]).astype(int)
exps.fillna(0,inplace = True) #replace NaN with 0
m = GEKKO(remote=False)
t = m.time = exps.index #set GEKKO time domain to use experimental time points
#Generate two-column GEKKO arrays to store observed values of each species, A, C and P
Am = m.Array(m.Param,2)
Cm = m.Array(m.Param,2)
Pm = m.Array(m.Param,2)
Am[0].value = exps['A_1'].values
Am[1].value = exps['A_2'].values
Cm[0].value = exps['C_1'].values
Cm[1].value = exps['C_2'].values
Pm[0].value = exps['P_1'].values
Pm[1].value = exps['P_2'].values
#Define GEKKO variables that determine if time point contatins data to be used in regression
#If time point contains species data, meas_ variable = 1, else = 0
meas_A = m.Array(m.Param,2)
meas_C = m.Array(m.Param,2)
meas_P = m.Array(m.Param,2)
meas_A[0].value = meas_data['A_1'].values
meas_A[1].value = meas_data['A_2'].values
meas_C[0].value = meas_data['C_1'].values
meas_C[1].value = meas_data['C_2'].values
meas_P[0].value = meas_data['P_1'].values
meas_P[1].value = meas_data['P_2'].values
#Define Variables for differential equations A, B, C, P, with initial conditions set by experimental observation at first time point
A = m.Array(m.Var,2, lb = 0)
B = m.Array(m.Var,2, lb = 0)
C = m.Array(m.Var,2, lb = 0)
P = m.Array(m.Var,2, lb = 0)
A[0].value = exps['A_1'][0] ; A[1].value = exps['A_2'][0]
B[0].value = 0 ; B[1].value = 0
C[0].value = exps['C_1'][0] ; C[1].value = exps['C_2'][0]
P[0].value = exps['P_1'][0] ; P[1].value = exps['P_2'][0]
#Define kinetic coefficients, k1-k6 as regression FV's
k = m.Array(m.FV,6,value=1,lb=0,ub = 20)
for ki in k:
ki.STATUS = 1
k1,k2,k3,k4,k5,k6 = k
#If doing paramrter estimation, enable free_initial condition, else not include them in model to reduce DOFs (for simulation, for example)
if k1.STATUS == 1:
for i in range(2):
m.free_initial(A[i])
m.free_initial(B[i])
m.free_initial(C[i])
m.free_initial(P[i])
#Define reaction rate variables
r1 = m.Array(m.Var,2, value = 1, lb = 0)
r2 = m.Array(m.Var,2, value = 1, lb = 0)
r3 = m.Array(m.Var,2, value = 1, lb = 0)
r4 = m.Array(m.Var,2, value = 1, lb = 0)
r5 = m.Array(m.Var,2, value = 1, lb = 0)
r6 = m.Array(m.Var,2, value = 1, lb = 0)
#Model Equations
for i in range(2):
#Rate equations
m.Equation(r1[i] == k1 * A[i])
m.Equation(r2[i] == k2 * A[i] * B[i])
m.Equation(r3[i] == k3 * C[i] * B[i])
m.Equation(r4[i] == k4 * A[i])
m.Equation(r5[i] == k5 * A[i])
m.Equation(r6[i] == k6 * A[i] * B[i])
#Differential species balances
m.Equation(A[i].dt() == - r1[i] - r2[i] - r4[i] - r5[i] - r6[i])
m.Equation(B[i].dt() == r1[i] - r2[i] - r3[i] - r6[i])
m.Equation(C[i].dt() == r2[i] - r3[i] + r4[i])
m.Equation(P[i].dt() == r3[i] + r5[i] + r6[i])
#Minimization objective functions
m.Obj(meas_A[i]*(A[i]-Am[i])**2)
m.Obj(meas_P[i]*(P[i]-Pm[i])**2)
m.Obj(meas_C[i]*(C[i]-Cm[i])**2)
#Solver options
m.options.IMODE = 5
m.options.SOLVER = 3 #APOPT optimizer
m.options.NODES = 6
m.solve()
k_opt = []
for ki in k:
k_opt.append(ki.value[0])
print(k_opt)
plt.plot(t,A[0],'b-')
plt.plot(t,A[1],'b--')
plt.plot(t,C[0],'g-')
plt.plot(t,C[1],'g--')
plt.plot(t,P[0],'r-')
plt.plot(t,P[1],'r--')
plt.plot(times,A_obs,'bo')
plt.plot(times,C_obs,'gx')
plt.plot(times,P_obs,'rs')
plt.plot(times_new, A_obs_noisy,'b*')
plt.plot(times_new, C_obs_noisy,'g*')
plt.plot(times_new, P_obs_noisy,'r*')
plt.show()

Kernel Restarting The kernel appears to have died & dst tensor is not initialized

I'm using Tensorflow (GPU) to fit a CNN model (the total input datasize is only 9.8MB(np array form) and I'm on Windows 10 (Kaby Lake), Tensorflow GPU mode, Geforce GTX 1050, RAM 32GB.
Each time I try running this below piece of code, it either ends the kernel or throws up the error "dst tensor is not initialized". This code seems to be executable by others with relatively lower computing power than mine but I'm not sure how to get it to work.
I am able to run the below code on Tensorflow CPU mode without any problem (but it takes almost 12 hours to finish running it, especially with the epoch is set to more than just 3). That's why I need to run it using my GPU for faster execution.
import tensorflow as tf
import numpy as np
IMG_PX_SIZE = 50
HM_SLICES = 20
n_classes = 2
x = tf.placeholder('float')
y = tf.placeholder('float')
keep_rate = 0.8
keep_prob = tf.placeholder(tf.float32)
def conv3d(x, W):
return tf.nn.conv3d(x, W, strides=[1,1,1,1,1], padding='SAME')
def maxpool3d(x):
return tf.nn.max_pool3d(x, ksize=[1,2,2,2,1], strides=[1,2,2,2,1],
padding='SAME')
def convolutional_neural_network(x):
weights = {'W_conv1':tf.Variable(tf.random_normal([3,3,3,1,32])),
'W_conv2':tf.Variable(tf.random_normal([3,3,3,32,64])),
'W_fc':tf.Variable(tf.random_normal([62720 ,1024])),
'out':tf.Variable(tf.random_normal([1024, n_classes]))}
biases = {'b_conv1':tf.Variable(tf.random_normal([32])),
'b_conv2':tf.Variable(tf.random_normal([64])),
'b_fc':tf.Variable(tf.random_normal([1024])),
'out':tf.Variable(tf.random_normal([n_classes]))}
x = tf.reshape(x, shape=[-1, IMG_PX_SIZE, IMG_PX_SIZE, HM_SLICES, 1])
conv1 = tf.nn.relu(conv3d(x, weights['W_conv1']) + biases['b_conv1'])
conv1 = maxpool3d(conv1)
conv2 = tf.nn.relu(conv3d(conv1, weights['W_conv2']) + biases['b_conv2'])
conv2 = maxpool3d(conv2)
fc = tf.reshape(conv2,[-1, 62720 ])
fc = tf.nn.relu(tf.matmul(fc, weights['W_fc'])+biases['b_fc'])
fc = tf.nn.dropout(fc, keep_rate)
output = tf.matmul(fc, weights['out']) + biases['out']
return output
def train_neural_network(x):
much_data = np.load('muchdata_sampled-50-50-20.npy')
train_data = much_data[:100]
validation_data = much_data[-100:]
prediction = convolutional_neural_network(x)
cost = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y) )
optimizer = tf.train.AdamOptimizer().minimize(cost)
hm_epochs = 3
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(hm_epochs):
epoch_loss = 0
for data in train_data:
X = data[0]
Y = data[1]
_, c = sess.run([optimizer, cost], feed_dict={x: X, y: Y})
epoch_loss += c
print('Epoch', epoch, 'completed out of',hm_epochs,'loss:',epoch_loss)
correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct, 'float'))
print('Accuracy:',accuracy.eval({x:[i[0] for i in validation_data], y:[i[1]
for i in validation_data]}))
train_neural_network(x)
Please kindly provide some help as I'm stuck with this for sometime now. My only tip is to feed the data in batches instead of the whole thing into CNN, but I'm not successful with that technique yet. Could someone please point out a way ?

Numpy version of rolling MAD (mean absolute deviation)

How to make a rolling version of the following MAD function
from numpy import mean, absolute
def mad(data, axis=None):
return mean(absolute(data - mean(data, axis)), axis)
This code is an answer to this question
At the moment i convert numpy to pandas then apply this function, then convert the result back to numpy
pandasDataFrame.rolling(window=90).apply(mad)
but this is inefficient on larger data-frames. How to get a rolling window for the same function in numpy without looping and give the same result?

Here's a vectorized NumPy approach -
# From this post : http://stackoverflow.com/a/40085052/3293881
def strided_app(a, L, S ): # Window len = L, Stride len/stepsize = S
nrows = ((a.size-L)//S)+1
n = a.strides[0]
return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))
# From this post : http://stackoverflow.com/a/14314054/3293881 by #Jaime
def moving_average(a, n=3) :
ret = np.cumsum(a, dtype=float)
ret[n:] = ret[n:] - ret[:-n]
return ret[n - 1:] / n
def mad_numpy(a, W):
a2D = strided_app(a,W,1)
return np.absolute(a2D - moving_average(a,W)[:,None]).mean(1)
Runtime test -
In [617]: data = np.random.randint(0,9,(10000))
...: df = pd.DataFrame(data)
...:
In [618]: pandas_out = pd.rolling_apply(df,90,mad).values.ravel()
In [619]: numpy_out = mad_numpy(data,90)
In [620]: np.allclose(pandas_out[89:], numpy_out) # Nans part clipped
Out[620]: True
In [621]: %timeit pd.rolling_apply(df,90,mad)
10 loops, best of 3: 111 ms per loop
In [622]: %timeit mad_numpy(data,90)
100 loops, best of 3: 3.4 ms per loop
In [623]: 111/3.4
Out[623]: 32.64705882352941
Huge 32x+ speedup there over the loopy pandas solution!

how to get reproducible result in Tensorflow

I built 5-layer neural network by using tensorflow.
I have a problem to get reproducible results (or stable results).
I found similar questions regarding reproducibility of tensorflow and the corresponding answers, such as How to get stable results with TensorFlow, setting random seed
But the problem is not solved yet.
I also set random seed like the following
tf.set_random_seed(1)
Furthermore, I added seed options to every random function such as
b1 = tf.Variable(tf.random_normal([nHidden1], seed=1234))
I confirmed that the first epoch shows the identical results, but not identical from the second epoch little by little.
How can I get the reproducible results?
Am I missing something?
Here is a code block I use.
def xavier_init(n_inputs, n_outputs, uniform=True):
if uniform:
init_range = tf.sqrt(6.0 / (n_inputs + n_outputs))
return tf.random_uniform_initializer(-init_range, init_range, seed=1234)
else:
stddev = tf.sqrt(3.0 / (n_inputs + n_outputs))
return tf.truncated_normal_initializer(stddev=stddev, seed=1234)
import numpy as np
import tensorflow as tf
import dataSetup
from scipy.stats.stats import pearsonr
tf.set_random_seed(1)
x_train, y_train, x_test, y_test = dataSetup.input_data()
# Parameters
learningRate = 0.01
trainingEpochs = 1000000
batchSize = 64
displayStep = 100
thresholdReduce = 1e-6
thresholdNow = 0.6
#dropoutRate = tf.constant(0.7)
# Network Parameter
nHidden1 = 128 # number of 1st layer nodes
nHidden2 = 64 # number of 2nd layer nodes
nInput = 24 #
nOutput = 1 # Predicted score: 1 output for regression
# save parameter
modelPath = 'model/model_layer5_%d_%d_mini%d_lr%.3f_noDrop_rollBack.ckpt' %(nHidden1, nHidden2, batchSize, learningRate)
# tf Graph input
X = tf.placeholder("float", [None, nInput])
Y = tf.placeholder("float", [None, nOutput])
# Weight
W1 = tf.get_variable("W1", shape=[nInput, nHidden1], initializer=xavier_init(nInput, nHidden1))
W2 = tf.get_variable("W2", shape=[nHidden1, nHidden2], initializer=xavier_init(nHidden1, nHidden2))
W3 = tf.get_variable("W3", shape=[nHidden2, nHidden2], initializer=xavier_init(nHidden2, nHidden2))
W4 = tf.get_variable("W4", shape=[nHidden2, nHidden2], initializer=xavier_init(nHidden2, nHidden2))
WFinal = tf.get_variable("WFinal", shape=[nHidden2, nOutput], initializer=xavier_init(nHidden2, nOutput))
# biases
b1 = tf.Variable(tf.random_normal([nHidden1], seed=1234))
b2 = tf.Variable(tf.random_normal([nHidden2], seed=1234))
b3 = tf.Variable(tf.random_normal([nHidden2], seed=1234))
b4 = tf.Variable(tf.random_normal([nHidden2], seed=1234))
bFinal = tf.Variable(tf.random_normal([nOutput], seed=1234))
# Layers for dropout
L1 = tf.nn.relu(tf.add(tf.matmul(X, W1), b1))
L2 = tf.nn.relu(tf.add(tf.matmul(L1, W2), b2))
L3 = tf.nn.relu(tf.add(tf.matmul(L2, W3), b3))
L4 = tf.nn.relu(tf.add(tf.matmul(L3, W4), b4))
hypothesis = tf.add(tf.matmul(L4, WFinal), bFinal)
print "Layer setting DONE..."
# define loss and optimizer
cost = tf.reduce_mean(tf.square(hypothesis - Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learningRate).minimize(cost)
# Initialize the variable
init = tf.initialize_all_variables()
# save op to save and restore all the variables
saver = tf.train.Saver()
with tf.Session() as sess:
# initialize
sess.run(init)
print "Initialize DONE..."
# Training
costPrevious = 100000000000000.0
best = float("INF")
totalBatch = int(len(x_train)/batchSize)
print "Total Batch: %d" %totalBatch
for epoch in range(trainingEpochs):
#print "EPOCH: %04d" %epoch
avgCost = 0.
for i in range(totalBatch):
np.random.seed(i+epoch)
randidx = np.random.randint(len(x_train), size=batchSize)
batch_xs = x_train[randidx,:]
batch_ys = y_train[randidx,:]
# Fit traiing using batch data
sess.run(optimizer, feed_dict={X:batch_xs, Y:batch_ys})
# compute average loss
avgCost += sess.run(cost, feed_dict={X:batch_xs, Y:batch_ys})/totalBatch
# compare the current cost and the previous
# if current cost > the previous
# just continue and make the learning rate half
#print "Cost: %1.8f --> %1.8f at epoch %05d" %(costPrevious, avgCost, epoch+1)
if avgCost > costPrevious + .5:
#sess.run(init)
load_path = saver.restore(sess, modelPath)
print "Cost increases at the epoch %05d" %(epoch+1)
print "Cost: %1.8f --> %1.8f" %(costPrevious, avgCost)
continue
costNow = avgCost
reduceCost = abs(costPrevious - costNow)
costPrevious = costNow
#Display logs per epoch step
if costNow < best:
best = costNow
bestMatch = sess.run(hypothesis, feed_dict={X:x_test})
# model save
save_path = saver.save(sess, modelPath)
if epoch % displayStep == 0:
print "step {}".format(epoch)
pearson = np.corrcoef(bestMatch.flatten(), y_test.flatten())
print 'train loss = {}, current loss = {}, test corrcoef={}'.format(best, costNow, pearson[0][1])
if reduceCost < thresholdReduce or costNow < thresholdNow:
print "Epoch: %04d, Cost: %.9f, Prev: %.9f, Reduce: %.9f" %(epoch+1, costNow, costPrevious, reduceCost)
break
print "Optimization Finished"

It seems that your results are perhaps not reproducible because you are using Saver to write/restore from checkpoint each time? (i.e. the second time that you run the code, the variable values aren't initialized using your random seed -- they are restored from your previous checkpoint)
Please trim down your code example to just the code necessary to reproduce irreproducibility.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Tensorflow - Keras, training on GPU is much slower then on CPU [duplicate] - performance

Your model is very small, so the overhead of transferring data to GPU and back to CPU outweights the speed up.

Related

Using GEKKO for Moving Horizon Estimation online 2

How to set up GEKKO for parameter estimation from multiple independent sets of data?

Kernel Restarting The kernel appears to have died & dst tensor is not initialized

Numpy version of rolling MAD (mean absolute deviation)

how to get reproducible result in Tensorflow

Categories

Resources