Stanford NLP Tree LSTM running error - macos

I'm testing the codes offered by Stanford NLP, and followed the instruction in https://github.com/stanfordnlp/treelstm
However, when I test this th sentiment/main.lua -m constituency -b
I got this error:
--------------------------------------------------------------------------------
Constituency Tree LSTM for Sentiment Classification
--------------------------------------------------------------------------------
/Users/Solomon/torch/install/bin/luajit:
/Users/Solomon/Downloads/treelstm-master/util/Vocab.lua:19:
attempt to index local 'file' (a nil value)
>stack traceback:
/Users/Solomon/Downloads/treelstm-master/util/Vocab.lua:19: in function '__init'
/Users/Solomon/torch/install/share/lua/5.1/torch/init.lua:91: in function </Users/Solomon/torch/install/share/lua/5.1/torch/init.lua:87>
[C]: in function 'Vocab'
sentiment/main.lua:48: in main chunk
[C]: in function 'dofile'
...omon/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x0104141d50
it was run under OSX Yosemite, spent hours and hours but couldn't figure it out. I am new to these things, someone knows what's wrong?

1. cp glove.840B.300d.zip data/glove folder
2. unzip glove.840B.300d.zip inside data/glove folder
3. run the following command
4. th scripts/convert-wordvecs.lua data/glove/glove.840B.300d.txt data/glove/glove.840B.vocab data/glove/glove.840B.300d.th
You will see something like
Converting data/glove/glove.840B.300d.txt to Torch serialized format
count = 2196017
dim = 300
**Then run the following command **
th sentiment/main.lua --model constituency --layers 4 --dim 300 --epochs 2
--------------------------------------------------------------------------------
Constituency Tree LSTM for Sentiment Classification
--------------------------------------------------------------------------------
loading word embeddings
unk count = 976
loading datasets
num train = 8544
num dev = 1101
num test = 2210
--------------------------------------------------------------------------------
model configuration
--------------------------------------------------------------------------------
max epochs = 2
fine grained sentiment = true
num params = 1085105
num compositional params = 1083600
word vector dim = 300
Tree-LSTM memory dim = 300
regularization strength = 1.00e-04
minibatch size = 25
learning rate = 5.00e-02
word vector learning rate = 1.00e-01
dropout = true
--------------------------------------------------------------------------------
Training model
--------------------------------------------------------------------------------
-- epoch 1

Related

how to solve a simple mixing operation in gekko?

I am trying to solve a simple mixing operation in gekko. The mixer mx takes two inlet streams Feed1 and Feed2. The expected result is that mass flow of outlet stream mx.outlet should be the summation of mass flows of the inlet streams.
Here is what I have tried.
from gekko import GEKKO, chemical
m = GEKKO(remote=False)
f = chemical.Flowsheet(m)
P = chemical.Properties(m)
c1 = P.compound('Butane')
c2 = P.compound('Propane')
feed1 = f.stream()
m_feed1 = f.massflows(sn= feed1.name)
m_feed1.mdot = 200
m_feed1.mdoti = [50,150]
feed2= f.stream()
m_feed2 = f.massflows(sn= feed2.name)
m_feed2.mdot = 200
m_feed2.mdoti = [50,150]
mx = f.mixer(ni=2)
mx.inlet = [feed1.name,feed2.name]
m.options.SOLVER = 1
mf = f.massflows(sn = mx.outlet)
m.solve()
The code runs successfully. However, on mf.mdot seems to output incorrect value [-1.8220132454e-06]. The expected value is 400. Any help , what is wrong with my code?
Here is source code that works for this mixing application:
from gekko import GEKKO, chemical
import json
m = GEKKO(remote=False)
f = chemical.Flowsheet(m)
P = chemical.Properties(m)
# define compounds
c1 = P.compound('Butane')
c2 = P.compound('Propane')
# create feed streams
feed1 = f.stream(fixed=False)
feed2 = f.stream(fixed=False)
# create massflows objects
m_feed1 = f.massflows(sn=feed1.name)
m_feed2 = f.massflows(sn=feed2.name)
# create mixer
mx = f.mixer(ni=2)
# connect feed streams to massflows objects
f.connect(feed1,mx.inlet[0])
f.connect(feed2,mx.inlet[1])
m.options.SOLVER = 1
mf = f.massflows(sn = mx.outlet)
# specify mass inlet flows
mi = [50,150]
for i in range(2):
m.fix(m_feed1.mdoti[i],val=mi[i])
m.fix(m_feed2.mdoti[i],val=mi[i])
# fix pressure and temperature
m.fix(feed1.P,val=101325)
m.fix(feed2.P,val=101325)
m.fix(feed1.T,val=300)
m.fix(feed2.T,val=305)
m.solve(disp=True)
# print results
print(f'The total massflow out is {mf.mdot.value}')
print('')
# get additional solution information
with open(m.path+'//results.json') as f:
r = json.load(f)
for name, val in r.items():
print(f'{name}={val[0]}')
Below is the solver output. This will only work with APM 0.9.1 and Gekko v0.2.3 (release coming Aug 2019). The thermo and flowsheet object libraries were released with v0.2.2 and there are several features that are still under development. The next release should resolve many of them.
----------------------------------------------------------------
APMonitor, Version 0.9.1
APMonitor Optimization Suite
----------------------------------------------------------------
--------- APM Model Size ------------
Each time step contains
Objects : 6
Constants : 0
Variables : 19
Intermediates: 0
Connections : 44
Equations : 2
Residuals : 2
Number of state variables: 14
Number of total equations: - 14
Number of slack variables: - 0
---------------------------------------
Degrees of freedom : 0
----------------------------------------------
Steady State Optimization with APOPT Solver
----------------------------------------------
Iter Objective Convergence
0 3.86642E-16 1.99000E+02
1 4.39087E-18 1.11937E+01
2 8.33448E-19 6.05819E-01
3 1.84640E-19 1.62783E-01
4 2.91981E-20 7.21250E-02
5 1.55439E-21 2.28110E-02
6 5.51232E-24 1.21437E-03
7 7.03139E-29 4.30650E-06
8 7.03139E-29 4.30650E-06
Successful solution
---------------------------------------------------
Solver : APOPT (v1.0)
Solution time : 0.0469 sec
Objective : 0.
Successful solution
---------------------------------------------------
v1 not found in results file
The total massflow out is [400.0]
time=0.0
feed1.h=44154989.486
feed1.x[2]=0.79815448476
feed1.vdot=104.9180373
feed1.dens=0.040621756423
feed1.c[1]=0.0081993193551
feed1.c[2]=0.032422437068
feed1.mdot=200.0
feed1.y[1]=0.25
feed1.y[2]=0.75
feed1.sfrc=0.0
feed1.lfrc=0.0
feed1.vfrc=1.0
feed2.h=44552246.421
feed2.x[2]=0.79815448476
feed2.vdot=106.66667125
feed2.dens=0.03995582599
feed2.c[1]=0.0080649042837
feed2.c[2]=0.031890921707
feed2.mdot=200.0
feed2.y[1]=0.25
feed2.y[2]=0.75
feed2.sfrc=0.0
feed2.lfrc=0.0
feed2.vfrc=1.0
mixer5.outlet.t=381.10062836
mixer5.outlet.h=44353617.96
mixer5.outlet.ndot=8.5239099109
mixer5.outlet.x[1]=0.20184551524
mixer5.outlet.x[2]=0.79815448476
mixer5.outlet.vdot=1.5797241143
mixer5.outlet.dens=5.5635215396
mixer5.outlet.c[1]=1.0891224437
mixer5.outlet.c[2]=4.3066994177
mixer5.outlet.mdot=400.0
mixer5.outlet.y[1]=0.25
mixer5.outlet.y[2]=0.75
mixer5.outlet.sfrc=0.0
mixer5.outlet.lfrc=1.0
mixer5.outlet.vfrc=0.0
v2=300.0
v3=4.2619549555
v4=0.20184551524
v5=0.79815448476
v6=101325.0
v7=305.0
v8=4.2619549555
v9=0.20184551524
v10=0.79815448476
v11=200.0
v12=50.0
v13=150.0
v14=200.0
v15=50.0
v16=150.0
v17=400.0
v18=100.0
v19=300.0

Why pytorch training on CUDA works much slower than in CPU?

I guess i have made something in folowing simple neural network with PyTorch, because this runs much slower with CUDA then in CPU, can you find the mistake pls. The using function like
def backward(ctx, input):
return backward_sigm(ctx, input)
seems have no real impact on preformance
import torch
import torch.nn as nn
import torch.nn.functional as f
dname = 'cuda:0'
dname = 'cpu'
device = torch.device(dname)
print(torch.version.cuda)
def forward_sigm(ctx, input):
sigm = 1 / (1 + torch.exp(-input))
ctx.save_for_backward(sigm)
return sigm
def forward_step(ctx, input):
return torch.tensor(input > 0.5, dtype = torch.float32, device = device)
def backward_sigm(ctx, grad_output):
sigm, = ctx.saved_tensors
return grad_output * sigm * (1-sigm)
def backward_step(ctx, grad_output):
return grad_output
class StepAF(torch.autograd.Function):
#staticmethod
def forward(ctx, input):
return forward_sigm(ctx, input)
#staticmethod
def backward(ctx, input):
return backward_sigm(ctx, input)
#else return grad_output
class StepNN(torch.nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(StepNN, self).__init__()
self.linear1 = torch.nn.Linear(input_size, hidden_size)
#self.linear1.cuda()
self.linear2 = torch.nn.Linear(hidden_size, output_size)
#self.linear2.cuda()
#self.StepAF = StepAF.apply
def forward(self,x):
h_line_1 = self.linear1(x)
h_thrash_1 = StepAF.apply(h_line_1)
h_line_2 = self.linear2(h_thrash_1)
output = StepAF.apply(h_line_2)
return output
inputs = torch.tensor( [[1,0,1,0],[1,0,0,1],[0,1,0,1],[0,1,1,0],[1,0,0,0],[0,0,0,1],[1,1,0,1],[0,1,0,0],], dtype = torch.float32, device = device)
expected = torch.tensor( [[1,0,0],[1,0,0],[0,1,0],[0,1,0],[1,0,0],[0,0,1],[0,1,0],[0,0,1],], dtype = torch.float32, device = device)
nn = StepNN(4,8,3)
#print(*(x for x in nn.parameters()))
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(nn.parameters(), lr=1e-3)
steps = 50000
print_steps = steps // 20
good_loss = 1e-5
for t in range(steps):
output = nn(inputs)
loss = criterion(output, expected)
if t % print_steps == 0:
print('step ',t, ', loss :' , loss.item())
if loss < good_loss:
print('step ',t, ', loss :' , loss.item())
break
optimizer.zero_grad()
loss.backward()
optimizer.step()
test = torch.tensor( [[0,1,0,1],[0,1,1,0],[1,0,1,0],[1,1,0,1],], dtype = torch.float32, device=device)
print(nn(test))
Unless you have large enough data, you won't see any performance improvement while using GPU. The problem is that GPUs use parallel processing, so unless you have large amounts of data, the CPU can process the samples almost as fast as the GPU.
As far as I can see in your example, you are using 8 samples of size (4, 1). I would imagine maybe when having over hundreds or thousands of samples, then you would see the performance improvement on a GPU. In your case, the sample size is (4, 1), and the hidden layer size is 8, so the CPU can perform the calculations fairly quickly.
There are lots of example notebooks online of people using MNIST data (it has around 60000 images for training), so you could load one in maybe Google Colab and then try training on the CPU and then on GPU and observe the training times. You could try this link for example. It uses TensorFlow instead of PyTorch but it will give you an idea of the performance improvement of a GPU.
Note : If you haven't used Google Colab before, then you need to change the runtime type (None for CPU and GPU for GPU) in the runtime menu at the top.
Also, I will post the results from this notebook here itself (look at the time mentioned in the brackets, and if you run it, you can see firsthand how fast it runs) :
On CPU :
INFO:tensorflow:loss = 294.3736, step = 1
INFO:tensorflow:loss = 28.285727, step = 101 (23.769 sec)
INFO:tensorflow:loss = 23.518856, step = 201 (24.128 sec)
On GPU :
INFO:tensorflow:loss = 295.08328, step = 0
INFO:tensorflow:loss = 47.37291, step = 100 (4.709 sec)
INFO:tensorflow:loss = 23.31364, step = 200 (4.581 sec)
INFO:tensorflow:loss = 9.980572, step = 300 (4.572 sec)
INFO:tensorflow:loss = 17.769928, step = 400 (4.560 sec)
INFO:tensorflow:loss = 16.345463, step = 500 (4.531 sec)

Tensorflow dataset API slow to get data on large batch size

I find that getting one batch from tensorflow dataset API can be super slow when the batch size is large even when all data is in memory. Following is one example. Does anyone have any insight ?
FEATURE_NUM = 500
tf_X = tf.placeholder(dtype=tf.float32, shape=[None, FEATURE_NUM], name="X")
tf_Y = tf.placeholder(dtype=tf.float32, shape=[None, 1], name="Y")
batch_size = 1000000
dataset = tf.data.Dataset.from_tensor_slices((tf_X, tf_Y)).batch(batch_size)
iterator = dataset.make_initializable_iterator()
next_element = iterator.get_next()
se = tf.Session()
se.run(tf.global_variables_initializer())
se.run(iterator.initializer, feed_dict={tf_X : numpy_array_X, tf_Y : numpy_array_Y})
while True:
data = se.run(next_element) # This takes more than 5 seconds per call

H2O GLM model: saved MOJO's prediction is very different when running on the same validation data

I built a GLM model using H2O (ver 3.14) in R. Please note that the training data contains integers, and also many NA, which I use MeanImputation to handle them.
glm <- h2o.glm(
training_frame = train.truth,
x=getColNames(train.truth),
y="isFemale",
family = "binomial",
missing_values_handling = "MeanImputation",
seed = 1000000)
I then use a validation data set to look at the perf, and the Precision looks good to me:
h2o.performance(glm, newdata=valid.truth)%>% h2o.confusionMatrix()
Confusion Matrix (vertical: actual; across: predicted) for max f1 # threshold = 0.529384526696015:
0 1 Error Rate
0 41962 300 0.007099 =300/42262
1 863 13460 0.060253 =863/14323
Totals 42825 13760 0.020553 =1163/56585
I then saved the model as a MOJO:
h2o.download_mojo(glm, path="models/mojo", get_genmodel_jar=TRUE)
I exported the validation DF to a CSV file:
dt.valid <- data.table(as.data.frame(valid.truth))
write.table(dt.valid, row.names = F, na="", file="models/test.csv")
I tried to use the saved mojo to do the same prediction by running this on my Linux shell:
java -cp h2o-genmodel.jar hex.genmodel.tools.PredictCsv \
--mojo GLM_model_R_1511161743608_15 \
--decimal --mojo GLM_model_R_1511161743608_15.zip \
--input ../test.csv --output output.csv
However, the result is terrible. All the records were predicted as 0, which is very different from what I got when I ran the model in R.
I have been stuck in this for a day but I couldn't figure out what went wrong. Anyone can shed some light on this?

LSTM - LSTM - future value prediction error

After some research, I was able to predict the future value using the LSTM code below. I have also attached the Dmd1ahr.csv file in the github link that I am using.
https://github.com/ukeshchawal/hello-world/blob/master/Dmd1ahr.csv
As you all can see below, 90 data points are training sets and 91st to 100th are future value prediction.
However some of the questions that I still have are:
In order to predict these values I had to originally take more than hundred data sets (here, I have taken 500 data sets) which is not exactly what my primary goal is. Is there a way that given 500 data sets, it will predict the rest 10 or 20 out of sample data points? If yes, will you please write me a sample code where you can just take 500 data points from Dmd1ahr.csv file attached below and it will predict some future values (say 501 to 520) based on those 500 points?
The prediction are way off compared to the one who have in your blogs (definitely indicates for parameter tuning - I tried changing epochs, LSTM layers, Activation, optimizer). What other parameter tuning I can do to make it more robust?
Thank you'll in advance.
import numpy as np
import matplotlib.pyplot as plt
import pandas
# By twaking the architecture it could be made more robust
np.random.seed(7)
numOfSamples = 500
lengthTrain = 90
lengthValidation = 100
look_back = 1 # Can be set higher, in my experiments it made performance worse though
transientTime = 90 # Time to "burn in" time series
series = pandas.read_csv('Dmd1ahr.csv')
def generateTrainData(series, i, look_back):
return series[i:look_back+i+1]
trainX = np.stack([generateTrainData(series, i, look_back) for i in range(lengthTrain)])
testX = np.stack([generateTrainData(series, lengthTrain + i, look_back) for i in range(lengthValidation)])
trainX = trainX.reshape((lengthTrain,look_back+1,1))
testX = testX.reshape((lengthValidation, look_back + 1, 1))
trainY = trainX[:,1:,:]
trainX = trainX[:,:-1,:]
testY = testX[:,1:,:]
testX = testX[:,:-1,:]
############### Build Model ###############
import keras
from keras.models import Model
from keras import layers
from keras import regularizers
inputs = layers.Input(batch_shape=(1,look_back,1), name="main_input")
inputsAux = layers.Input(batch_shape=(1,look_back,1), name="aux_input")
# this layer makes the actual prediction, i.e. decides if and how much it goes up or down
x = layers.recurrent.LSTM(300,return_sequences=True, stateful=True)(inputs)
x = layers.recurrent.LSTM(200,return_sequences=True, stateful=True)(inputs)
x = layers.recurrent.LSTM(100,return_sequences=True, stateful=True)(inputs)
x = layers.recurrent.LSTM(50,return_sequences=True, stateful=True)(inputs)
x = layers.wrappers.TimeDistributed(layers.Dense(1, activation="linear",
kernel_regularizer=regularizers.l2(0.005),
activity_regularizer=regularizers.l1(0.005)))(x)
# auxillary input, the current input will be feed directly to the output
# this way the prediction from the step before will be used as a "base", and the Network just have to
# learn if it goes a little up or down
auxX = layers.wrappers.TimeDistributed(layers.Dense(1,
kernel_initializer=keras.initializers.Constant(value=1),
bias_initializer='zeros',
input_shape=(1,1), activation="linear", trainable=False
))(inputsAux)
outputs = layers.add([x, auxX], name="main_output")
model = Model(inputs=[inputs, inputsAux], outputs=outputs)
model.compile(optimizer='adam',
loss='mean_squared_error',
metrics=['mean_squared_error'])
#model.summary()
#model.fit({"main_input": trainX, "aux_input": trainX[look_back-1,look_back,:]},{"main_output": trainY}, epochs=4, batch_size=1, shuffle=False)
model.fit({"main_input": trainX, "aux_input": trainX[:,look_back-1,:].reshape(lengthTrain,1,1)},{"main_output": trainY}, epochs=100, batch_size=1, shuffle=False)
############### make predictions ###############
burnedInPredictions = np.zeros(transientTime)
testPredictions = np.zeros(len(testX))
# burn series in, here use first transitionTime number of samples from test data
for i in range(transientTime):
prediction = model.predict([np.array(testX[i, :, 0].reshape(1, look_back, 1)), np.array(testX[i, look_back - 1, 0].reshape(1, 1, 1))])
testPredictions[i] = prediction[0,0,0]
burnedInPredictions[:] = testPredictions[:transientTime]
# prediction, now dont use any previous data whatsoever anymore, network just has to run on its own output
for i in range(transientTime, len(testX)):
prediction = model.predict([prediction, prediction])
testPredictions[i] = prediction[0,0,0]
# for plotting reasons
testPredictions[:np.size(burnedInPredictions)-1] = np.nan
############### plot results ###############
#import matplotlib.pyplot as plt
plt.plot(testX[:, 0, 0])
plt.show()
plt.plot(burnedInPredictions, label = "training")
plt.plot(testPredictions, label = "prediction")
plt.legend()
plt.show()

Resources