I'm using the following code out of the box from this url: https://lightning-transformers.readthedocs.io/en/latest/tasks/nlp/question_answering.html
import pytorch_lightning as pl
from transformers import AutoTokenizer
from lightning_transformers.task.nlp.question_answering import (
QuestionAnsweringTransformer,
SquadDataModule,
)
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path="bert-base-uncased")
model = QuestionAnsweringTransformer(pretrained_model_name_or_path="bert-base-uncased")
dm = SquadDataModule(
batch_size=1,
dataset_config_name="plain_text",
max_length=384,
version_2_with_negative=False,
null_score_diff_threshold=0.0,
doc_stride=128,
n_best_size=20,
max_answer_length=30,
tokenizer=tokenizer,
)
trainer = pl.Trainer(accelerator="auto", devices="auto", max_epochs=1)
trainer.fit(model, dm)
which throws this error
AssertionError Traceback (most recent call last)
<ipython-input-2-0b608c02a52e> in <module>
14 trainer = pl.Trainer(accelerator="auto", devices="auto", max_epochs=1)
15
---> 16 trainer.fit(model, dm)
16 frames
/usr/local/lib/python3.8/dist-packages/lightning_transformers/task/nlp/question_answering/datasets/squad/processing.py in postprocess_qa_predictions(examples, features, predictions, version_2_with_negative, n_best_size, max_answer_length, null_score_diff_threshold, output_dir, prefix)
245 all_start_logits, all_end_logits, example_ids = predictions
246
--> 247 assert len(predictions[0]) == len(features), f"Got {len(predictions[0])} predictions and {len(features)} features."
248
249 # Build a map example to its corresponding features.
AssertionError: Got 2 predictions and 10784 features.
I was simply trying to get a single example from the documentation to run within google colab before investigating further if this would meet my use case, but I see an error when I try to use the example as is, which is disheartening to consider investigating it. Nothing comes up when I google "AssertionError: Got 2 predictions and 10784 features."
In this Notebook, we use Explainable AI SDK from Google to load a model, right after saving it. This fails with a message that the model is missing.
But note
the info message saying that the model was saved
checking working/model shows that the model is there.
However, working/model/assets is empty.
Why do we get this error message? How can we avoid it?
model_path = "working/model"
model.save(model_path)
builder = SavedModelMetadataBuilder(model_path)
builder.set_numeric_metadata(
"numpy_inputs",
input_baselines=[X_train.median().tolist()], # attributions relative to the median of the target
index_feature_mapping=X_train.columns.tolist(), # the names of each feature
)
builder.save_metadata(model_path)
explainer = explainable_ai_sdk.load_model_from_local_path(
model_path=model_path,
config=configs.SampledShapleyConfig(path_count=20),
)
INFO:tensorflow:Assets written to: working/model/assets
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
/tmp/ipykernel_26061/1928503840.py in <module>
18 explainer = explainable_ai_sdk.load_model_from_local_path(
19 model_path=model_path,
---> 20 config=configs.SampledShapleyConfig(path_count=20),
21 )
22
/opt/conda/lib/python3.7/site-packages/explainable_ai_sdk/model/model_factory.py in load_model_from_local_path(model_path, config)
128 """
129 if _LOCAL_MODEL_KEY not in _MODEL_REGISTRY:
--> 130 raise NotImplementedError('There are no implementations of local model.')
131 return _MODEL_REGISTRY[_LOCAL_MODEL_KEY](model_path, config)
132
NotImplementedError: There are no implementations of local model.
When saving a version in Kaggle, I get StdinNotImplementedError: getpass was called, but this frontend does not support input requests whenever I use the Transformers.Trainer class. The general code I use:
from transformers import Trainer, TrainingArguments
training_args = TrainingArguments(params)
trainer = Trainer(params)
trainer.train()
And the specific cell I am running now:
from transformers import Trainer, TrainingArguments,EarlyStoppingCallback
early_stopping = EarlyStoppingCallback()
training_args = TrainingArguments(
output_dir=OUT_FINETUNED_MODEL_PATH,
num_train_epochs=20,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
warmup_steps=0,
weight_decay=0.01,
logging_dir='./logs',
logging_steps=100,
evaluation_strategy="steps",
eval_steps=100,
load_best_model_at_end=True,
metric_for_best_model="eval_loss",
greater_is_better=False
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=val_dataset,
callbacks=[early_stopping]
)
trainer.train()
When trainer.train() is called, I get the error below, which I do not get if I train with native PyTorch. I understood that the error arises since I am asked to input a password, but no password is asked when using native PyTorch code, nor when using the same code with trainer.train() on Google Colab.
Any solution would be ok, like:
Avoid being asked the password.
Enable input requests when saving a notebook on Kaggle. After that, if I understood correctly, I would need to go to https://wandb.ai/authorize (after having created an account) and copy the generated key to console. However, I do not understand why wandb should be necessary since I never explicitly used it so far.
wandb: You can find your API key in your browser here: https://wandb.ai/authorize
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/wandb_init.py", line 741, in init
wi.setup(kwargs)
File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/wandb_init.py", line 155, in setup
wandb_login._login(anonymous=anonymous, force=force, _disable_warning=True)
File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/wandb_login.py", line 210, in _login
wlogin.prompt_api_key()
File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/wandb_login.py", line 144, in prompt_api_key
no_create=self._settings.force,
File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/lib/apikey.py", line 135, in prompt_api_key
key = input_callback(api_ask).strip()
File "/opt/conda/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 825, in getpass
"getpass was called, but this frontend does not support input requests."
IPython.core.error.StdinNotImplementedError: getpass was called, but this frontend does not support input requests.
wandb: ERROR Abnormal program exit
---------------------------------------------------------------------------
StdinNotImplementedError Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/wandb/sdk/wandb_init.py in init(job_type, dir, config, project, entity, reinit, tags, group, name, notes, magic, config_exclude_keys, config_include_keys, anonymous, mode, allow_val_change, resume, force, tensorboard, sync_tensorboard, monitor_gym, save_code, id, settings)
740 wi = _WandbInit()
--> 741 wi.setup(kwargs)
742 except_exit = wi.settings._except_exit
/opt/conda/lib/python3.7/site-packages/wandb/sdk/wandb_init.py in setup(self, kwargs)
154 if not settings._offline and not settings._noop:
--> 155 wandb_login._login(anonymous=anonymous, force=force, _disable_warning=True)
156
/opt/conda/lib/python3.7/site-packages/wandb/sdk/wandb_login.py in _login(anonymous, key, relogin, host, force, _backend, _silent, _disable_warning)
209 if not key:
--> 210 wlogin.prompt_api_key()
211
/opt/conda/lib/python3.7/site-packages/wandb/sdk/wandb_login.py in prompt_api_key(self)
143 no_offline=self._settings.force,
--> 144 no_create=self._settings.force,
145 )
/opt/conda/lib/python3.7/site-packages/wandb/sdk/lib/apikey.py in prompt_api_key(settings, api, input_callback, browser_callback, no_offline, no_create, local)
134 )
--> 135 key = input_callback(api_ask).strip()
136 write_key(settings, key, api=api)
/opt/conda/lib/python3.7/site-packages/ipykernel/kernelbase.py in getpass(self, prompt, stream)
824 raise StdinNotImplementedError(
--> 825 "getpass was called, but this frontend does not support input requests."
826 )
StdinNotImplementedError: getpass was called, but this frontend does not support input requests.
The above exception was the direct cause of the following exception:
Exception Traceback (most recent call last)
<ipython-input-82-4d1046ab80b8> in <module>
42 )
43
---> 44 trainer.train()
/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, **kwargs)
1067 model.zero_grad()
1068
-> 1069 self.control = self.callback_handler.on_train_begin(self.args, self.state, self.control)
1070
1071 # Skip the first epochs_trained epochs to get the random state of the dataloader at the right point.
/opt/conda/lib/python3.7/site-packages/transformers/trainer_callback.py in on_train_begin(self, args, state, control)
338 def on_train_begin(self, args: TrainingArguments, state: TrainerState, control: TrainerControl):
339 control.should_training_stop = False
--> 340 return self.call_event("on_train_begin", args, state, control)
341
342 def on_train_end(self, args: TrainingArguments, state: TrainerState, control: TrainerControl):
/opt/conda/lib/python3.7/site-packages/transformers/trainer_callback.py in call_event(self, event, args, state, control, **kwargs)
386 train_dataloader=self.train_dataloader,
387 eval_dataloader=self.eval_dataloader,
--> 388 **kwargs,
389 )
390 # A Callback can skip the return of `control` if it doesn't change it.
/opt/conda/lib/python3.7/site-packages/transformers/integrations.py in on_train_begin(self, args, state, control, model, **kwargs)
627 self._wandb.finish()
628 if not self._initialized:
--> 629 self.setup(args, state, model, **kwargs)
630
631 def on_train_end(self, args, state, control, model=None, tokenizer=None, **kwargs):
/opt/conda/lib/python3.7/site-packages/transformers/integrations.py in setup(self, args, state, model, **kwargs)
604 project=os.getenv("WANDB_PROJECT", "huggingface"),
605 name=run_name,
--> 606 **init_args,
607 )
608 # add config parameters (run may have been created manually)
/opt/conda/lib/python3.7/site-packages/wandb/sdk/wandb_init.py in init(job_type, dir, config, project, entity, reinit, tags, group, name, notes, magic, config_exclude_keys, config_include_keys, anonymous, mode, allow_val_change, resume, force, tensorboard, sync_tensorboard, monitor_gym, save_code, id, settings)
779 if except_exit:
780 os._exit(-1)
--> 781 six.raise_from(Exception("problem"), error_seen)
782 return run
/opt/conda/lib/python3.7/site-packages/six.py in raise_from(value, from_value)
Exception: problem
You may want to try adding report_to="tensorboard" or any other reasonable string array in your TrainingArguments
https://huggingface.co/transformers/main_classes/trainer.html#transformers.TrainingArguments
If you have multiple logger that you want to use report_to="all" (the default value)
try os.environ["WANDB_DISABLED"] = "true" such that wandb is always disabled.
see: https://huggingface.co/transformers/main_classes/trainer.html#transformers.TFTrainer.setup_wandb
I'm trying to build a convolutional neural network for image classification in Python.
I run my code on CoLab and have loaded my data on Google Drive.
I can see all the files and folders in my google drive from python, but when I try to actually load an image it gives me the error in the title.
I'm using the skimage.io package, I'm actually just running a notebook I found on kaggle so the code should run fine, only difference I noticed is that the kaggle user was probably not working on CoLab with his data in GoogleDrive so I think maybe that's the problem, anyway here's my code:
from skimage.io import imread
img=imread('/content/drive/My Drive/CoLab/Data/chest_xray/train/PNEUMONIA/person53_bacteria_255.jpeg')
Which gives me the following error:
AttributeError: 'NoneType' object has no attribute 'ReadAsArray'
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-12-4a64aebb8504> in <module>()
----> 1 img=imread('/content/drive/My Drive/CoLab/Data/chest_xray/train/PNEUMONIA/person53_bacteria_255.jpeg')
4 frames
/usr/local/lib/python3.6/dist-packages/skimage/io/_io.py in imread(fname, as_gray, plugin, flatten, **plugin_args)
59
60 with file_or_url_context(fname) as fname:
---> 61 img = call_plugin('imread', fname, plugin=plugin, **plugin_args)
62
63 if not hasattr(img, 'ndim'):
/usr/local/lib/python3.6/dist-packages/skimage/io/manage_plugins.py in call_plugin(kind, *args, **kwargs)
208 (plugin, kind))
209
--> 210 return func(*args, **kwargs)
211
212
/usr/local/lib/python3.6/dist-packages/imageio/core/functions.py in imread(uri, format, **kwargs)
221 reader = read(uri, format, "i", **kwargs)
222 with reader:
--> 223 return reader.get_data(0)
224
225
/usr/local/lib/python3.6/dist-packages/imageio/core/format.py in get_data(self, index, **kwargs)
345 self._checkClosed()
346 self._BaseReaderWriter_last_index = index
--> 347 im, meta = self._get_data(index, **kwargs)
348 return Array(im, meta) # Array tests im and meta
349
/usr/local/lib/python3.6/dist-packages/imageio/plugins/gdal.py in _get_data(self, index)
64 if index != 0:
65 raise IndexError("Gdal file contains only one dataset")
---> 66 return self._ds.ReadAsArray(), self._get_meta_data(index)
67
68 def _get_meta_data(self, index):
AttributeError: 'NoneType' object has no attribute 'ReadAsArray'
Frist instead of My Drive it should be MyDrive (no space).
If it still doesn't work, you can try the following:
%cd /content/drive/MyDrive/CoLab/Data/chest_xray/train/PNEUMONIA
img=imread('person53_bacteria_255.jpeg')```
Imagine that I am manipulating a shapefile in geopandas. I then want to load it using another library (like networkx) but since my file is large I dont want to have to save and reload it. Is there a way I can save it in memory? I imagine it would look something like this:
import geopandas
from io import BytesIO
writeBytes = BytesIO()
### load the demo
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
### do something trivial to the demo
world['geometry'] = world['geometry'].buffer(0.05)
### save to bytes IO so that I can do something else with it without having to save and read a file
world.to_file(writeBytes)
Running the above yields a TypeError: expected str, bytes or os.PathLike object, not _io.BytesIO
This is the full traceback:
TypeError Traceback (most recent call last)
<ipython-input-1-1ba22f23181a> in <module>
8 world['geometry'] = world['geometry'].buffer(0.05)
9 ### save to bytes IO so that I can do something else with it without having to save and read a file
---> 10 world.to_file(writeBytes)
~/.conda/envs/geopandas/lib/python3.7/site-packages/geopandas/geodataframe.py in to_file(self, filename, driver, schema, **kwargs)
427 """
428 from geopandas.io.file import to_file
--> 429 to_file(self, filename, driver, schema, **kwargs)
430
431 def to_crs(self, crs=None, epsg=None, inplace=False):
~/.conda/envs/geopandas/lib/python3.7/site-packages/geopandas/io/file.py in to_file(df, filename, driver, schema, **kwargs)
125 if schema is None:
126 schema = infer_schema(df)
--> 127 filename = os.path.abspath(os.path.expanduser(filename))
128 with fiona_env():
129 with fiona.open(filename, 'w', driver=driver, crs=df.crs,
~/.conda/envs/geopandas/lib/python3.7/posixpath.py in expanduser(path)
233 """Expand ~ and ~user constructions. If user or $HOME is unknown,
234 do nothing."""
--> 235 path = os.fspath(path)
236 if isinstance(path, bytes):
237 tilde = b'~'
Any assistance is appreciated, Thank You
geopandas.to_file() requires a file path, not a BytesIO object.
Use a temporary file (or folder for shape files)
For example: https://stackoverflow.com/a/70254174/2023941