multiple input does not work on Tensorflow 2 installation on MAC OS M1 - AttributeError: 'tuple' object has no attribute 'shape' - macos

I have a Tensorflow (sequence) model where the model takes 2 input streams. Here is the part of the code where the model is defined:
generic_in = Input(shape=(sequence_length, nr_features), name='generic_input')
input1_in = Input(shape=(sequence_length, nr_features), name='input1_input')
input2_in = Input(shape=(sequence_length, nr_features), name='input2_input')
generic_out, generic_state_h, generic_state_c = LSTM(50,
return_sequences = False,
return_state = True,
dropout = 0.15,
recurrent_dropout = 0.15,
name="generic_lstm")(generic_in)
concatenated_gen_out = Concatenate()([ generic_state_h, generic_state_c ])
gen_dense_out = Dense(100,
activation='relu',
name="generic_dense")(concatenated_gen_out)
gen_dense_out = BatchNormalization()(gen_dense_out)
gen_dense_out = Dropout(0.15)(gen_dense_out)
generic_model = Model( inputs = [ generic_in ], outputs = [ gen_dense_out ] )
input1_dense_out = generic_model(input1_in)
input2_dense_out = generic_model(input2_in)
concatenated_out = Concatenate()([ input1_dense_out, input2_dense_out ])
dense2_out = Dense(100,
activation='relu',
name="dense_2")(concatenated_out)
dense2_out = BatchNormalization()(dense2_out)
dense2_out = Dropout(0.2)(dense2_out)
softmax_out = Dense(nr_classes,
activation='softmax',
name="final_output_layer")(dense2_out)
model = Model(inputs = [ input1_in, input2_in ],
outputs = [ softmax_out ])
#opt = tf.keras.optimizers.Adam(lr=0.00008, decay=0.000001)
opt = tf.keras.optimizers.Adam(lr=0.0001)
model.compile(loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
history = model.fit(x=train_x,
y=train_y,
batch_size=BATCH_SIZE,
epochs=80,
verbose=2,
validation_data=(dev_x, dev_y),
shuffle=True)
Please note that train_x which is the input to the model.fit method is a list containing 2 inputs as defined in model = Model(inputs = [input1_in, input2_in], outputs = [softmax_out]).
This works perfectly fine in my Tensorflow v1.13.1 installation on Windows. I am trying to migrate my project to MAC OS Big Sur v11.3.1 with M1 chip. The Tensorflow version on the MAC OS is 2.4.0-rc0.
I obviously made some changes to make it work with Tensorflow 2 but these changes are mainly API call updates based on the new API.
The error I get on MAC Tensorflow installation:
Traceback (most recent call last):
File "/Users/me/Developer/AI_Projects/Football/M1_FULL_TIME_MR/src/train_fit_v2.py", line 288, in <module>
history = model.fit(x=train_x,
File "/Users/me/miniforge3/envs/tf_dev/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1100, in fit
tmp_logs = self.train_function(iterator)
File "/Users/me/miniforge3/envs/tf_dev/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
result = self._call(*args, **kwds)
File "/Users/me/miniforge3/envs/tf_dev/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 871, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File "/Users/me/miniforge3/envs/tf_dev/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 725, in _initialize
self._stateful_fn._get_concrete_function_internal_garbage_collected( # pylint: disable=protected-access
File "/Users/me/miniforge3/envs/tf_dev/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2969, in _get_concrete_function_internal_garbage_collected
graph_function, _ = self._maybe_define_function(args, kwargs)
File "/Users/me/miniforge3/envs/tf_dev/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3361, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/Users/me/miniforge3/envs/tf_dev/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3196, in _create_graph_function
func_graph_module.func_graph_from_py_func(
File "/Users/me/miniforge3/envs/tf_dev/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 990, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/Users/me/miniforge3/envs/tf_dev/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 634, in wrapped_fn
out = weak_wrapped_fn().__wrapped__(*args, **kwds)
File "/Users/me/miniforge3/envs/tf_dev/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 977, in wrapper
raise e.ag_error_metadata.to_exception(e)
AttributeError: in user code:
/Users/me/miniforge3/envs/tf_dev/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:805 train_function *
return step_function(self, iterator)
/Users/me/miniforge3/envs/tf_dev/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:795 step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
/Users/me/miniforge3/envs/tf_dev/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:1259 run
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
/Users/me/miniforge3/envs/tf_dev/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:2730 call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
/Users/me/miniforge3/envs/tf_dev/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:3417 _call_for_each_replica
return fn(*args, **kwargs)
/Users/me/miniforge3/envs/tf_dev/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:788 run_step **
outputs = model.train_step(data)
/Users/me/miniforge3/envs/tf_dev/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:758 train_step
self.compiled_metrics.update_state(y, y_pred, sample_weight)
/Users/me/miniforge3/envs/tf_dev/lib/python3.8/site-packages/tensorflow/python/keras/engine/compile_utils.py:387 update_state
self.build(y_pred, y_true)
/Users/me/miniforge3/envs/tf_dev/lib/python3.8/site-packages/tensorflow/python/keras/engine/compile_utils.py:317 build
self._metrics = nest.map_structure_up_to(y_pred, self._get_metric_objects,
/Users/me/miniforge3/envs/tf_dev/lib/python3.8/site-packages/tensorflow/python/util/nest.py:1159 map_structure_up_to
return map_structure_with_tuple_paths_up_to(
/Users/me/miniforge3/envs/tf_dev/lib/python3.8/site-packages/tensorflow/python/util/nest.py:1257 map_structure_with_tuple_paths_up_to
results = [
/Users/me/miniforge3/envs/tf_dev/lib/python3.8/site-packages/tensorflow/python/util/nest.py:1258 <listcomp>
func(*args, **kwargs) for args in zip(flat_path_gen, *flat_value_gen)
/Users/me/miniforge3/envs/tf_dev/lib/python3.8/site-packages/tensorflow/python/util/nest.py:1161 <lambda>
lambda _, *values: func(*values), # Discards the path arg.
/Users/me/miniforge3/envs/tf_dev/lib/python3.8/site-packages/tensorflow/python/keras/engine/compile_utils.py:418 _get_metric_objects
return [self._get_metric_object(m, y_t, y_p) for m in metrics]
/Users/me/miniforge3/envs/tf_dev/lib/python3.8/site-packages/tensorflow/python/keras/engine/compile_utils.py:418 <listcomp>
return [self._get_metric_object(m, y_t, y_p) for m in metrics]
/Users/me/miniforge3/envs/tf_dev/lib/python3.8/site-packages/tensorflow/python/keras/engine/compile_utils.py:439 _get_metric_object
y_t_rank = len(y_t.shape.as_list())
AttributeError: 'tuple' object has no attribute 'shape'
I am totally out of solutions. What am I supposed to do to make it work?

As per the error you posted y_t is of type tuple. Tuple has attribute shape.
You can try casting y_t.
numpy.array(y_t)

Related

logit syntax error even with correct code

MY CODE:
model_1 = SM.logit(formula = f_1, data=Company_train).fit()
I AM GETTING THIS ERROR EVEN IF MY SYNTAX IS CORRECT FOR STATSMODEL LOGIT FUNCTION WHEN I AM FITTING THE DATA. THE ERROR IS AS FOLLOWS:
Traceback (most recent call last):
File "C:\Users\Asus\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3444, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "C:\Users\Asus\AppData\Local\Temp/ipykernel_8360/3618389853.py", line 1, in <module>
model_1 = SM.logit(formula = f_1, data=Company_train).fit()
File "C:\Users\Asus\anaconda3\lib\site-packages\statsmodels\base\model.py", line 169, in from_formula
tmp = handle_formula_data(data, None, formula, depth=eval_env,
File "C:\Users\Asus\anaconda3\lib\site-packages\statsmodels\formula\formulatools.py", line 63, in handle_formula_data
result = dmatrices(formula, Y, depth, return_type='dataframe',
File "C:\Users\Asus\anaconda3\lib\site-packages\patsy\highlevel.py", line 309, in dmatrices
(lhs, rhs) = _do_highlevel_design(formula_like, data, eval_env,
File "C:\Users\Asus\anaconda3\lib\site-packages\patsy\highlevel.py", line 164, in _do_highlevel_design
design_infos = _try_incr_builders(formula_like, data_iter_maker, eval_env,
File "C:\Users\Asus\anaconda3\lib\site-packages\patsy\highlevel.py", line 66, in _try_incr_builders
return design_matrix_builders([formula_like.lhs_termlist,
File "C:\Users\Asus\anaconda3\lib\site-packages\patsy\build.py", line 689, in design_matrix_builders
factor_states = _factors_memorize(all_factors, data_iter_maker, eval_env)
File "C:\Users\Asus\anaconda3\lib\site-packages\patsy\build.py", line 354, in _factors_memorize
which_pass = factor.memorize_passes_needed(state, eval_env)
File "C:\Users\Asus\anaconda3\lib\site-packages\patsy\eval.py", line 474, in memorize_passes_needed
subset_names = [name for name in ast_names(self.code)
File "C:\Users\Asus\anaconda3\lib\site-packages\patsy\eval.py", line 474, in <listcomp>
subset_names = [name for name in ast_names(self.code)
File "C:\Users\Asus\anaconda3\lib\site-packages\patsy\eval.py", line 105, in ast_names
for node in ast.walk(ast.parse(code)):
File "C:\Users\Asus\anaconda3\lib\ast.py", line 50, in parse
return compile(source, filename, mode, flags,
File "<unknown>", line 1
Contingent liabilities
^
SyntaxError: invalid syntax

Am getting error trying to predict on a single image CNN pytorch

Error message
Traceback (most recent call last):
File "pred.py", line 134, in
output = model(data)
Runtime Error: Expected 4-dimensional input for 4-dimensional weight [16, 3, 3, 3], but got 3-dimensional input of size [1, 32, 32] instead.
Prediction code
normalize = transforms.Normalize(mean=[0.4914, 0.4824, 0.4467],
std=[0.2471, 0.2435, 0.2616])
train_set = transforms.Compose([
transforms.RandomCrop(32, padding=4),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
normalize,
])
model = models.condensenet(args)
model = nn.DataParallel(model)
PATH = "results/savedir/save_models/checkpoint_001.pth.tar"
model.load_state_dict(torch.load(PATH)['state_dict'])
device = torch.device("cpu")
model.eval()
image = Image.open("horse.jpg")
input = train_set(image)
train_loader = torch.utils.data.DataLoader(
input,
batch_size=1,shuffle=True, num_workers=1)
for i, data in enumerate(train_loader):
#input_var = torch.autograd.Variable(data, volatile=True)
#input_var = input_var.view(1, 3, 32,32)
**output = model(data)
topk=(1,5)
maxk = max(topk)
_, pred = output.topk(maxk, 1, True, True)
Am getting this error when am trying to predict on a single image
Image shape/size error message
Link to saved model
Training code repository
Plz uncomment this line #input_var = input_var.view(1, 3, 32,32) so that your input dimension is 4.
I assume that your no. of input channels are 3 if its one then use input_var = input_var.view(1, 1, 32,32) if gray scale
Instead of doing the for loop and train_loader, solved this by just passing the input directly into the model. like this
input = train_set(image)
input = input.unsqueeze(0)
model.eval()
output = model(input)
More details can be found here link

How to run a transformers bert without pipeline?

I have found myself dealing with an enviroment that does not support multiprocessing. How do I run my DistillBert without transformers pipeline?
Here is code right now:
import json
import os
import sys
sys.path.append("/mnt/access")
import torch
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
from transformers.pipelines import pipeline
def lambda_handler(event, context):
print("After:",os.listdir("/mnt/access"))
tokenizer = AutoTokenizer.from_pretrained('/mnt/access/Dis_Save/')
model = AutoModelForQuestionAnswering.from_pretrained('/mnt/access/Dis_Save/')
nlp_qa = pipeline('question-answering', tokenizer=tokenizer,model=model)
context = "tra"
question = "tra"
X = nlp_qa(context=context, question=question)
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}
Error message I get right now:
{
"errorMessage": "[Errno 38] Function not implemented",
"errorType": "OSError",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 18, in lambda_handler\n X = nlp_qa(context=context, question=question)\n",
" File \"/mnt/access/transformers/pipelines.py\", line 1776, in __call__\n features_list = [\n",
" File \"/mnt/access/transformers/pipelines.py\", line 1777, in <listcomp>\n squad_convert_examples_to_features(\n",
" File \"/mnt/access/transformers/data/processors/squad.py\", line 354, in squad_convert_examples_to_features\n with Pool(threads, initializer=squad_convert_example_to_features_init, initargs=(tokenizer,)) as p:\n",
" File \"/var/lang/lib/python3.8/multiprocessing/context.py\", line 119, in Pool\n return Pool(processes, initializer, initargs, maxtasksperchild,\n",
" File \"/var/lang/lib/python3.8/multiprocessing/pool.py\", line 191, in __init__\n self._setup_queues()\n",
" File \"/var/lang/lib/python3.8/multiprocessing/pool.py\", line 343, in _setup_queues\n self._inqueue = self._ctx.SimpleQueue()\n",
" File \"/var/lang/lib/python3.8/multiprocessing/context.py\", line 113, in SimpleQueue\n return SimpleQueue(ctx=self.get_context())\n",
" File \"/var/lang/lib/python3.8/multiprocessing/queues.py\", line 336, in __init__\n self._rlock = ctx.Lock()\n",
" File \"/var/lang/lib/python3.8/multiprocessing/context.py\", line 68, in Lock\n return Lock(ctx=self.get_context())\n",
" File \"/var/lang/lib/python3.8/multiprocessing/synchronize.py\", line 162, in __init__\n SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)\n",
" File \"/var/lang/lib/python3.8/multiprocessing/synchronize.py\", line 57, in __init__\n sl = self._semlock = _multiprocessing.SemLock(\n"
]
}
Other code:
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch
import json
import sys
sys.path.append("/mnt/access")
tokenizer = AutoTokenizer.from_pretrained("/mnt/access/Dis_Save/")
model = AutoModelForQuestionAnswering.from_pretrained("/mnt/access/Dis_Save/", return_dict=True)
def lambda_handler(event, context):
text = r"""
🤗 Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
TensorFlow 2.0 and PyTorch.
"""
questions = ["How many pretrained models are available in 🤗 Transformers?",]
for question in questions:
inputs = tokenizer(question, text, add_special_tokens=True, return_tensors="pt")
input_ids = inputs["input_ids"].tolist()[0]
text_tokens = tokenizer.convert_ids_to_tokens(input_ids)
answer_start_scores, answer_end_scores = model(**inputs).values()
answer_start = torch.argmax(
answer_start_scores
) # Get the most likely beginning of answer with the argmax of the score
answer_end = torch.argmax(answer_end_scores) + 1 # Get the most likely end of answer with the argmax of the score
answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))
print(f"Question: {question}")
print(f"Answer: {answer}")
return {
'statusCode': 200,
'body': json.dumps(answer)
}
Edit:
I run the code. It runs well on it's own, however I get an error whne running on API itself:
{
"errorMessage": "'tuple' object has no attribute 'values'",
"errorType": "AttributeError",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 39, in lambda_handler\n answer_start_scores, answer_end_scores = model(**inputs).values()\n"
]
}

How to turn off Auto Mixed Precision in validation time?

I try to run the MAC network (https://github.com/stanfordnlp/mac-network/tree/gqa) with Auto Mixed Precision.
def addOptimizerOp(self):
with tf.variable_scope("trainAddOptimizer"):
self.globalStep = tf.Variable(0, dtype = tf.int32, trainable = False, name = "globalStep") # init to 0 every run?
optimizer = tf.train.AdamOptimizer(learning_rate = self.lr)
optimizer = tf.train.experimental.enable_mixed_precision_graph_rewrite(optimizer)
if config.subsetOpt:
self.subsetOptimizer = tf.train.AdamOptimizer(learning_rate = self.lr * config.subsetOptMult)
return optimizer
In the first epoch, training is ok. However, when the model run evaluation on the validation set, I got this error.
Training epoch 1...
2019-08-05 14:51:13.625899: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1767] Running auto_mixed_precision graph optimizer
2019-08-05 14:51:13.709959: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1723] Converted 1504/6920 nodes to float16 precision using 150 cast(s) to float16 (excluding Const and Variable casts)
2019-08-05 14:51:16.930248: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-08-05 14:51:17.331687: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-08-05 14:51:29.378905: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1767] Running auto_mixed_precision graph optimizer
2019-08-05 14:51:29.380633: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1241] No whitelist ops found, nothing to do
eb 1, 10000,(160010 / 943000), t = 0.12 (0.00+0.11), lr 0.0003, l = 2.8493, a = 0.4250, avL = 2.5323, avA = 0.4188, g = 3.7617, emL = 2.3097, emA = 0.4119; gqaExperiment
Restoring EMA weights
2019-08-05 14:51:31.132804: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1767] Running auto_mixed_precision graph optimizer
2019-08-05 14:51:31.136122: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1241] No whitelist ops found, nothing to do
2019-08-05 14:51:32.322369: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1767] Running auto_mixed_precision graph optimizer
2019-08-05 14:51:32.341609: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1723] Converted 661/1848 nodes to float16 precision using 38 cast(s) to float16 (excluding Const and Variable casts)
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: TensorArray dtype is float but Op is trying to write dtype half.
[[{{node macModel/tower0/encoder/birnnLayer/bidirectional_rnn/fw/fw/while/TensorArrayWrite/TensorArrayWriteV3}}]]
[[macModel/tower0/MACnetwork/MACCell_3/write/inter2attselfAttention/Softmax/_1661]]
(1) Invalid argument: TensorArray dtype is float but Op is trying to write dtype half.
[[{{node macModel/tower0/encoder/birnnLayer/bidirectional_rnn/fw/fw/while/TensorArrayWrite/TensorArrayWriteV3}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 866, in <module>
main()
File "main.py", line 728, in main
evalRes = runEvaluation(sess, model, data["main"], dataOps, epoch, getPreds = getPreds, prevRes = evalRes)
File "main.py", line 248, in runEvaluation
minLoss = prevRes["train"]["minLoss"] if prevRes else float("inf"))
File "main.py", line 594, in runEpoch
res = model.runBatch(sess, batch, imagesBatch, train, getPreds, getAtt)
File "/content/model.py", line 948, in runBatch
feed_dict = feed)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
I think the errors may come from not turning off the auto mixed precision on evaluation time or using the same session and model for train and evaluation. I tried "tf.train.experimental.disable_mixed_precision_graph_rewrite()" but I do not know how to use it in the right way.
How can I fix it? Thanks all.
def main():
with open(config.configFile(), "a+") as outFile:
json.dump(vars(config), outFile)
# set gpus
if config.gpus != "":
config.gpusNum = len(config.gpus.split(","))
os.environ["CUDA_VISIBLE_DEVICES"] = config.gpus
tf.logging.set_verbosity(tf.logging.ERROR)
# process data
print(bold("Preprocess data..."))
start = time.time()
preprocessor = Preprocesser()
data, embeddings, answerDict, questionDict = preprocessor.preprocessData()
print("took {} seconds".format(bcolored("{:.2f}".format(time.time() - start), "blue")))
nextElement = None
dataOps = None
# build model
print(bold("Building model..."))
start = time.time()
model = MACnet(embeddings, answerDict, questionDict, nextElement)
print("took {} seconds".format(bcolored("{:.2f}".format(time.time() - start), "blue")))
# initializer
init = tf.global_variables_initializer()
# savers
savers = setSavers(model)
saver, emaSaver = savers["saver"], savers["emaSaver"]
# sessionConfig
sessionConfig = setSession()
with tf.Session(config = sessionConfig) as sess:
# ensure no more ops are added after model is built
sess.graph.finalize()
# restore / initialize weights, initialize epoch variable
epoch = loadWeights(sess, saver, init)
trainRes, evalRes = None, None
if config.train:
start0 = time.time()
bestEpoch = epoch
bestRes = None
prevRes = None
# epoch in [restored + 1, epochs]
for epoch in range(config.restoreEpoch + 1, config.epochs + 1):
print(bcolored("Training epoch {}...".format(epoch), "green"))
start = time.time()
# train
# calle = lambda: model.runEpoch(), collectRuntimeStats, writer
trainingData, alterData = chooseTrainingData(data)
trainRes = runEpoch(sess, model, trainingData, dataOps, train = True, epoch = epoch,
saver = saver, alterData = alterData,
maxAcc = trainRes["maxAcc"] if trainRes else 0.0,
minLoss = trainRes["minLoss"] if trainRes else float("inf"),)
# save weights
saver.save(sess, config.weightsFile(epoch))
if config.saveSubset:
subsetSaver.save(sess, config.subsetWeightsFile(epoch))
# load EMA weights
if config.useEMA:
print(bold("Restoring EMA weights"))
emaSaver.restore(sess, config.weightsFile(epoch))
# evaluation
getPreds = config.getPreds or (config.analysisType != "")
evalRes = runEvaluation(sess, model, data["main"], dataOps, epoch, getPreds = getPreds, prevRes = evalRes)
extraEvalRes = runEvaluation(sess, model, data["extra"], dataOps, epoch,
evalTrain = not config.extraVal, getPreds = getPreds)
# restore standard weights
if config.useEMA:
print(bold("Restoring standard weights"))
saver.restore(sess, config.weightsFile(epoch))
print("")
epochTime = time.time() - start
print("took {:.2f} seconds".format(epochTime))
# print results
printDatasetResults(trainRes, evalRes, extraEvalRes)

mpi4py Gatherv facing KeyError: '0'

I am new in mpi4py. I wrote the code in order to process a large numpy array data by multiple processor. As I am unable to provide the input file I am mentioning the shape of data. Shape of data is [3000000,15] and it contains string type of data.
from mpi4py import MPI
import numpy as np
import datetime as dt
import math as math
comm = MPI.COMM_WORLD
numprocs = comm.size
rank = comm.Get_rank()
fname = "6.binetflow"
data = np.loadtxt(open(fname,"rb"), dtype=object, delimiter=",", skiprows=1)
X = data[:,[0,1,3,14,6,6,6,6,6,6,6,6]]
num_rows = math.ceil(len(X)/float(numprocs))
X = X.flatten()
sendCounts = list()
displacements = list()
for p in range(numprocs):
if p == (numprocs-1): #for last processor
sendCounts.append(int(len(X) - (p*num_rows*12)))
displacements.append(int(p*num_rows*12))
break
sendCounts.append(int(num_rows*12))
displacements.append(int(p*sendCounts[p]))
sendbuf = np.array(X[displacements[rank]: (displacements[rank]+sendCounts[rank])])
## Each processor will do some task on sendbuf
if rank == 0:
recvbuf = np.empty(sum(sendCounts), dtype=object)
else:
recvbuf = None
print("sendbuf: ",sendbuf)
comm.Gatherv(sendbuf=sendbuf, recvbuf=(recvbuf, sendCounts), root=0)
if rank == 0:
print("Gathered array: {}".format(recvbuf))
But I am facing below error:
Traceback (most recent call last):
File "hello.py", line 36, in <module>
comm.Gatherv(sendbuf=sendbuf, recvbuf=(recvbuf, sendCounts), root=0)
File "MPI/Comm.pyx", line 602, in mpi4py.MPI.Comm.Gatherv (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:97993)
File "MPI/msgbuffer.pxi", line 525, in mpi4py.MPI._p_msg_cco.for_gather (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:34678)
File "MPI/msgbuffer.pxi", line 446, in mpi4py.MPI._p_msg_cco.for_cco_send (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:33938)
File "MPI/msgbuffer.pxi", line 148, in mpi4py.MPI.message_simple (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:30349)
File "MPI/msgbuffer.pxi", line 93, in mpi4py.MPI.message_basic (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:29448)
KeyError: 'O'
Traceback (most recent call last):
File "hello.py", line 36, in <module>
comm.Gatherv(sendbuf=sendbuf, recvbuf=(recvbuf, sendCounts), root=0)
File "MPI/Comm.pyx", line 602, in mpi4py.MPI.Comm.Gatherv (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:97993)
File "MPI/msgbuffer.pxi", line 525, in mpi4py.MPI._p_msg_cco.for_gather (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:34678)
File "MPI/msgbuffer.pxi", line 446, in mpi4py.MPI._p_msg_cco.for_cco_send (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:33938)
File "MPI/msgbuffer.pxi", line 148, in mpi4py.MPI.message_simple (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:30349)
File "MPI/msgbuffer.pxi", line 93, in mpi4py.MPI.message_basic (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:29448)
KeyError: 'O'
Traceback (most recent call last):
File "hello.py", line 36, in <module>
comm.Gatherv(sendbuf=sendbuf, recvbuf=(recvbuf, sendCounts), root=0)
File "MPI/Comm.pyx", line 602, in mpi4py.MPI.Comm.Gatherv (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:97993)
File "MPI/msgbuffer.pxi", line 525, in mpi4py.MPI._p_msg_cco.for_gather (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:34678)
File "MPI/msgbuffer.pxi", line 446, in mpi4py.MPI._p_msg_cco.for_cco_send (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:33938)
File "MPI/msgbuffer.pxi", line 148, in mpi4py.MPI.message_simple (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:30349)
File "MPI/msgbuffer.pxi", line 93, in mpi4py.MPI.message_basic (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:29448)
KeyError: 'O'
Traceback (most recent call last):
File "hello.py", line 36, in <module>
comm.Gatherv(sendbuf=sendbuf, recvbuf=(recvbuf, sendCounts), root=0)
File "MPI/Comm.pyx", line 602, in mpi4py.MPI.Comm.Gatherv (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:97993)
File "MPI/msgbuffer.pxi", line 516, in mpi4py.MPI._p_msg_cco.for_gather (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:34587)
File "MPI/msgbuffer.pxi", line 466, in mpi4py.MPI._p_msg_cco.for_cco_recv (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:34097)
File "MPI/msgbuffer.pxi", line 261, in mpi4py.MPI.message_vector (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:31977)
File "MPI/msgbuffer.pxi", line 93, in mpi4py.MPI.message_basic (d:\build\mpi4py\mpi4py-2.0.0\src\mpi4py.MPI.c:29448)
KeyError: 'O'
Any help will be much appreciated. I am stuck in this problem for a long time.
Thanks
The problem is dtype=object.
Mpi4py provides two kinds of communication functions, those whose names begin with an upper-case letter, e.g. Scatter, and those whose names begin with a lower-case letter, e.g. scatter. From the Mpi4py documentation:
In MPI for Python, the Bcast(), Scatter(), Gather(), Allgather() and Alltoall() methods of Comm instances provide support for collective communications of memory buffers. The variants bcast(), scatter(), gather(), allgather() and alltoall() can communicate generic Python objects.
What is not clear from this is that even though numpy arrays supposedly expose memory buffers, the buffers apparently need to be to one of a small set of primitive data types, and certainly don't work with generic objects. Compare the following two pieces of code:
from mpi4py import MPI
import numpy
Comm = MPI.COMM_WORLD
Size = Comm.Get_size()
Rank = Comm.Get_rank()
if Rank == 0:
Data = numpy.empty(Size, dtype=object)
else:
Data = None
Data = Comm.scatter(Data, 0) # I work fine!
print("Data on rank %d: " % Rank, Data)
and
from mpi4py import MPI
import numpy
Comm = MPI.COMM_WORLD
Size = Comm.Get_size()
Rank = Comm.Get_rank()
if Rank == 0:
Data = numpy.empty(Size, dtype=object)
else:
Data = None
Datb = numpy.empty(1, dtype=object)
Comm.Scatter(Data, Datb, 0) # I throw KeyError!
print("Datb on rank %d: " % Rank, Datb)
Unfortunately, Mpi4py provides no scatterv. From the same place in the docs:
The vector variants (which can communicate different amounts of data to each process) Scatterv(), Gatherv(), Allgatherv() and Alltoallv() are also supported, they can only communicate objects exposing memory buffers.
These are not exceptions to the upper- vs lower-case rule for dtypes, either:
from mpi4py import MPI
import numpy
Comm = MPI.COMM_WORLD
Size = Comm.Get_size()
Rank = Comm.Get_rank()
if Rank == 0:
Data = numpy.empty(2*Size+1, dtype=numpy.dtype('float64'))
else:
Data = None
if Rank == 0:
Datb = numpy.empty(3, dtype=numpy.dtype('float64'))
else:
Datb = numpy.empty(2, dtype=numpy.dtype('float64'))
Comm.Scatterv(Data, Datb, 0) # I work fine!
print("Datb on rank %d: " % Rank, Datb)
versus
from mpi4py import MPI
import numpy
Comm = MPI.COMM_WORLD
Size = Comm.Get_size()
Rank = Comm.Get_rank()
if Rank == 0:
Data = numpy.empty(2*Size+1, dtype=object)
else:
Data = None
if Rank == 0:
Datb = numpy.empty(3, dtype=object)
else:
Datb = numpy.empty(2, dtype=object)
Comm.Scatterv(Data, Datb, 0) # I throw KeyError!
print("Datb on rank %d: " % Rank, Datb)
You'll unfortunately need to write your code so that it can use scatter, necessitating the same SendCount for each process, or more primitive, point-to-point communication functions, or use some parallel facility other than Mpi4py.
Using Mpi4py 2.0.0, the current stable version at the time of this writing.

Resources