MXNet - Augmentations - expected uint8, got float32 - image

I am attempting to use mxnet 1.10/mxnet-cu91 for image classification. I am currently attempting to use mxnet.image.ImageIter to iterate through
and preprocess images. I have been able to successfully use the Augmenters to preprocess the images, but have received the following error when using Augmenters (with the only exception being ForceResizeAug):
Traceback (most recent call last):
File "image.py", line 22, in <module>
for batch in iterator:
File "/usr/local/lib/python2.7/dist-packages/mxnet/image/image.py", line 1181, in next
data = self.augmentation_transform(data)
File "/usr/local/lib/python2.7/dist-packages/mxnet/image/image.py", line 1239, in augmentation_transform
data = aug(data)
File "/usr/local/lib/python2.7/dist-packages/mxnet/image/image.py", line 659, in __call__
src = t(src)
File "/usr/local/lib/python2.7/dist-packages/mxnet/image/image.py", line 721, in __call__
gray = src * self.coef
File "/usr/local/lib/python2.7/dist-packages/mxnet/ndarray/ndarray.py", line 235, in __mul__
return multiply(self, other)
File "/usr/local/lib/python2.7/dist-packages/mxnet/ndarray/ndarray.py", line 2566, in multiply
None)
File "/usr/local/lib/python2.7/dist-packages/mxnet/ndarray/ndarray.py", line 2379, in _ufunc_helper
return fn_array(lhs, rhs)
File "<string>", line 46, in broadcast_mul
File "/usr/local/lib/python2.7/dist-packages/mxnet/_ctypes/ndarray.py", line 92, in _imperative_invoke
ctypes.byref(out_stypes)))
File "/usr/local/lib/python2.7/dist-packages/mxnet/base.py", line 146, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [20:02:07] src/operator/contrib/../elemwise_op_common.h:123: Check failed: assign(&dattr, (*vec)[i]) Incompatible attr in node at 1-th input: expected uint8, got float32
Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x2ab9a8) [0x7f5c873f09a8]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x2abdb8) [0x7f5c873f0db8]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x2d2078) [0x7f5c87417078]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x2d2b83) [0x7f5c87417b83]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x24c4c1e) [0x7f5c89609c1e]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x24c6e59) [0x7f5c8960be59]
[bt] (6) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x240539b) [0x7f5c8954a39b]
[bt] (7) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(MXImperativeInvokeEx+0x63) [0x7f5c8954a903]
[bt] (8) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f5cc334ae40]
[bt] (9) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7f5cc334a8ab]
The code needed to replicate the issue is below (shortened for brevity, closely resembles the code provided in the documentation):
import mxnet as mx
import glob
type1_paths = glob.glob('type1/*.jpg')
type1_list = [[1.0, path] for path in type1_paths]
type2_paths = glob.glob('type2/*.JPG')
type2_list = [[2.0, path] for path in type2_paths]
all_paths = type1_list + type2_list
iterator = mx.image.ImageIter(1, (3, 1000, 1000),
imglist=all_paths,
aug_list=[
mx.image.ColorJitterAug(0.1, 0.1, 0.1),
])
for batch in iterator:
print batch.data
I am not sure why the error is occurring, as I am not using any custom augmenters that could effect the discrepancy in dtype. I've also replicated this issue when using the following:
RandomGrayAug
HueJitterAug
ContrastJitterAug
SaturationJitterAug
NOTE: In case this matters, the only differences I know between the loaded jpg/JPG is that some photos were taken using a phone, and others using a DSLR camera.
Please let me know if I am missing any information that would be helpful in learning.

You're getting this issue because the images are loaded with a data type of int8 but the augmentations are expecting a data types of float32. Unfortunately the error message reads a little backwards to what you need to do in this case, because of a multiplication of the input image (int8) with a contrast jitter (float32). It's complaining about the data type of the contrast jitter instead of the input data. Same issue with hue and saturation augmenters.
So to fix this you need to convert your input image data type to float32. You can do this by adding mx.image.CastAug(typ='float32') at the start of your augmenter list.
iterator = mx.image.ImageIter(1, (3, 100, 100),
path_root=".",
imglist=all_paths,
aug_list=[
mx.image.CastAug(typ='float32'),
mx.image.ColorJitterAug(0.1, 0.1, 0.1),
mx.image.CenterCropAug((100,100))
])
And it's always a good idea to visualize your data after augmentation to confirm the steps are being applied as you expected.

Related

Pytorch is not working with DistributedDataParallel for multi gpu training

I am trying to train my model on multiple GPUS. I used the libraries and a added a code for it
from torch.nn.parallel import DistributedDataParallel as DDP
from torch.distributed import init_process_group, destroy_process_group
Initialization
def ddp_setup(rank: int, world_size: int):
os.environ["MASTER_ADDR"] = "localhost"
os.environ["MASTER_PORT"] = "12355"
os.environ["TORCH_DISTRIBUTED_DEBUG"]="DETAIL"
init_process_group(backend="gloo", rank=0, world_size=1)
my model
model = CMGCNnet(config,
que_vocabulary=glovevocabulary,
glove=glove,
device=device)
model = model.to(0)
if -1 not in args.gpu_ids and len(args.gpu_ids) > 1:
model = DDP(model, device_ids=[0,1])
it throws following error:
config_yml : model/config_fvqa_gruc.yml
cpu_workers : 0
save_dirpath : exp_test_gruc
overfit : False
validate : True
gpu_ids : [0, 1]
dataset : fvqa
Loading FVQATrainDataset…
True
done splitting
Loading FVQATestDataset…
Loading glove…
Building Model…
Traceback (most recent call last):
File “trainfvqa_gruc.py”, line 512, in
train()
File “trainfvqa_gruc.py”, line 145, in train
ddp_setup(0,1)
File “trainfvqa_gruc.py”, line 42, in ddp_setup
init_process_group(backend=“gloo”, rank=0, world_size=1)
File “/home/seecs/miniconda/envs/mucko-edit/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py”, line 360, in init_process_group
timeout=timeout)
RuntimeError: [enforce fail at /opt/conda/conda-bld/pytorch_1544202130060/work/third_party/gloo/gloo/transport/tcp/device.cc:128] rp != nullptr. Unable to find address for: 127.0.0.1localhost.
localdomainlocalhost
I tried printing the issue with os.environ["TORCH_DISTRIBUTED_DEBUG"]="DETAIL"
it outputs:
Loading FVQATrainDataset...
True
done splitting
Loading FVQATestDataset...
Loading glove...
Building Model...
Segmentation fault
with NCCL background it starts the training but get stuck and doesn’t go further than this :slight_smile:
Training for epoch 0:
0%| | 0/2039 [00:00<?, ?it/s]
I found this solution but where to add these lines?
GLOO_SOCKET_IFNAME* , for example export GLOO_SOCKET_IFNAME=eth0`
mentioned in
https://discuss.pytorch.org/t/runtime-error-using-distributed-with-gloo/16579/3
Can someone help me with this issue?
to seek help. I am hoping to get and answer

Multiclass confusion matrix problem ValueError: multilabel-indicator is not supported

I have successfully created a pytorch NN with 10 inputs and two output classes (B & S). So B is either 0 or 1, and S is either 0 or 1. So Y_test is B(0),B(1),S(0),S(1). Y_Pred will output B(0...1) and S(0...1). The net trains without error...
Now I want to create a confusion matrix and I am confused.
This is my code:
cm = confusion_matrix(y_test, y_pred)
print (cm)
It generates this error message:
"Traceback (most recent call last):
File "C:/Users/user/PycharmProjects/3_indexes_NN/3_indexes_NN.py", line 386, in "
[line 386 is cm = confusion_matrix(y_test, y_pred)]
cm = confusion_matrix(y_test, y_pred)
File "C:\Users\user\anaconda3\envs\PIMA\lib\site-packages\sklearn\metrics_classification.py", line 309, in confusion_matrix
raise ValueError("%s is not supported" % y_type)
ValueError: multilabel-indicator is not supported
I am completely lost. Can anyone help me understand where I've gone wrong?
Thank you in advance for throwing me a life-line!!
Can both S and B be 1 simultaneously? In this case, this is indeed a multilabel task, making confusion matrix an unsuitable metric.
Otherwise, you may work it around using something like y_pred.argmax(axis=1) for both.

Pyautogui and pyscreeze crash with windll.user32.ReleaseDC failed

I'm trying to compare certain pixel values in my pyautogui script, but it crashes with following error message after either multiple successful runs, or sometimes just straight on the first call:
Traceback (most recent call last):
File "F:\Koodit\Python\HeroWars NNet\Assets\autodataGet.py", line 219, in <module>
battle = observeBattle()
File "F:\Koodit\Python\HeroWars NNet\Assets\autodataGet.py", line 180, in observeBattle
statii = getHeroBattlePixels()
File "F:\Koodit\Python\HeroWars NNet\Assets\autodataGet.py", line 32, in getHeroBattlePixels
colormatch = pyautogui.pixelMatchesColor(location[0], location[1], alive, tolerance=5)
File "E:\Program Files\Python\lib\site-packages\pyscreeze\__init__.py", line 557, in pixelMatchesColor
pix = pixel(x, y)
File "E:\Program Files\Python\lib\site-packages\pyscreeze\__init__.py", line 582, in pixel
return (r, g, b)
File "E:\Program Files\Python\lib\contextlib.py", line 120, in __exit__
next(self.gen)
File "E:\Program Files\Python\lib\site-packages\pyscreeze\__init__.py", line 111, in __win32_openDC
raise WindowsError("windll.user32.ReleaseDC failed : return 0")
OSError: windll.user32.ReleaseDC failed : return 0
My code (this is called multiple times, sometimes it crashes on first run, sometimes it runs nicely for around 100 calls before failing, also, my screen is 4K, so the resolutions get big):
def getSomePixelStatuses():
someLocations= [
[1200, 990],
[1300, 990],
[1400, 990],
[1500, 990],
[1602, 990],
[1768, 990],
[1868, 990],
[1968, 990],
[2068, 990],
[2169, 990]
]
status = []
someValue= (92, 13, 12)
for location in someLocations:
colormatch = pyautogui.pixelMatchesColor(location[0], location[1], someValue, tolerance=5)
status.append(colormatch)
return status
I have no idea how to mitigate this problem. It would seem that pyautogui uses pyscreeze to read pixel values on screen, and most probable candidate for the place where error occurs is the pyscreeze pixel function:
def pixel(x, y):
"""
TODO
"""
if sys.platform == 'win32':
# On Windows, calling GetDC() and GetPixel() is twice as fast as using our screenshot() function.
with __win32_openDC(0) as hdc: # handle will be released automatically
color = windll.gdi32.GetPixel(hdc, x, y)
if color < 0:
raise WindowsError("windll.gdi32.GetPixel failed : return {}".format(color))
# color is in the format 0xbbggrr https://msdn.microsoft.com/en-us/library/windows/desktop/dd183449(v=vs.85).aspx
bbggrr = "{:0>6x}".format(color) # bbggrr => 'bbggrr' (hex)
b, g, r = (int(bbggrr[i:i+2], 16) for i in range(0, 6, 2))
return (r, g, b)
else:
# Need to select only the first three values of the color in
# case the returned pixel has an alpha channel
return RGB(*(screenshot().getpixel((x, y))[:3]))
I installed these libraries just yesterday, and I'm running python 3.8 on windows 10, and pyscreeze is version 0.1.25 so in theory everything should be up to date, but somehow something ends up crashing. Is there a way to mitigate this, either modifying my code, or even the library itself, or is my environment not suitable for this operation?
Well I know it's not particularly helpful; but for me, this error was fixed simply by running my code on 3.7 instead of 3.8. There shouldn't be any changes you have to make to your code, however (unless you were using walrus!)
On Windows, this can be done with the -3.7 command line flag, as long as 3.7 is installed
PyScreeze and PyAutoGUI maintainer here. This is an issue that has been fixed in PyScreeze 0.1.28, so you just need to update it by running pip install -U pyscreeze.
For more context, here's the GitHub issue where it was reported: https://github.com/asweigart/pyscreeze/pull/73
It's a bug. You were on the right track, as the problem is indeed in this line of the pixel() function:
with __win32_openDC(0) as hdc
That function uses cyptes.windll which doesn't seem to do well with the negative values sometimes returned from windll.user32.GetDC(), which subsequently creates an exception when windll.user32.ReleaseDC() is called.
The folks at pillow helped track this down and propose a fix.
issue filed at pyautogui
issue filed at pillow which led to the solution
pending PR at pyscreeze to address
I can use pixel function on Python 3.8 like this:
try:
a = pixel(100,100)
> except:
> a = pixel(100,100)
I don't have any clue why this works, but it works.
I had this error too and i fixed it. Just use try and except.
While true:
try:
x,y = pyautogui.position()
print(pyautogui.pixel(x,y))
except:
print("Cannot get pixel for the moment")
Given that you might be taking pixels multiple times, or you can do so, try and except works wonders to solve any pyscreeze for pyautogui issue. Honestly i dont know whats up with pyscreeze, but this works for me. Cheers

SQL Alchemy with oracle: converting column overflows integer datatype

I extract data from an Oracle database with python 2.7 64 Bit . There is a field of the numeric type with 35 digits: 1200000000000000000000000000005151
If I want to read this field with SQL Alchemy, I get the following error:
File "D:\Produkte\CoCo\Sourcen\ConsultingConnector\src\ais.py", line 253, in tabledata2table
for row in tabledata.yield_per(buffersize).enable_eagerloads(False):
File "C:\Python27\lib\site-packages\sqlalchemy\orm\loading.py", line 98, in instances
util.raise_from_cause(err)
File "C:\Python27\lib\site-packages\sqlalchemy\util\compat.py", line 203, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "C:\Python27\lib\site-packages\sqlalchemy\orm\loading.py", line 71, in instances
fetch = cursor.fetchmany(query._yield_per)
File "C:\Python27\lib\site-packages\sqlalchemy\engine\result.py", line 1166, in fetchmany
self.cursor, self.context)
File "C:\Python27\lib\site-packages\sqlalchemy\engine\base.py", line 1413, in _handle_dbapi_exception
exc_info
File "C:\Python27\lib\site-packages\sqlalchemy\util\compat.py", line 203, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "C:\Python27\lib\site-packages\sqlalchemy\engine\result.py", line 1159, in fetchmany
l = self.process_rows(self._fetchmany_impl(size))
File "C:\Python27\lib\site-packages\sqlalchemy\engine\result.py", line 1076, in _fetchmany_impl
return self.cursor.fetchmany(size)
sqlalchemy.exc.DatabaseError: (cx_Oracle.DatabaseError) ORA-01455: converting column overflows integer datatype (Background on this error at: http://sqlalche.me/e/4xp6)
It seems that SQL Alchemy is trying to cast in int, even though it's a long.
Do you have any idea how I can solve the problem or something like a workaround?
Thanks in advance,
Jassin
I just got the same problem and came up with a solution by converting the data type to character. e.g., "select to_char(ID)".

FileNotFoundError: No such file: 'someones_epi.nii.gzip'

I am trying to load an MRI, I keep getting the following error:
Traceback (most recent call last):
File "F:/Study/Projects/BTSaG/Programs/t3.py", line 2, in <module> epi_img = nib.load('someones_epi.nii.gzip')
File "C:\Users\AnkitaShinde\AppData\Local\Programs\Python\Python35-32\lib\site-packages\nibabel\loadsave.py", line 38, in load raise FileNotFoundError("No such file: '%s'" % filename)
FileNotFoundError: No such file: 'someones_epi.nii.gzip'
The code is used is as follows:
import nibabel as nib
epi_img = nib.load('someones_epi.nii.gzip')
epi_img_data = epi_img.get_data()
epi_img_data.shape(53, 61, 33)
import matplotlib.pyplot as plt
def show_slices(slices):
""" Function to display row of image slices """
fig, axes = plt.subplots(1, len(slices))
for i, slice in enumerate(slices):
axes[i].imshow(slice.T, cmap="gray", origin="lower")
slice_0 = epi_img_data[26, :, :]
slice_1 = epi_img_data[:, 30, :]
slice_2 = epi_img_data[:, :, 16]
show_slices([slice_0, slice_1, slice_2])
plt.suptitle("Center slices for EPI image")
I have also updated the loadsave.py file in nibabel but it didn't work. Please help.
Edit:
The earlier error was resolved. Now another error has been encountered.
Traceback (most recent call last):File "F:\Study\Projects\BTSaG\Programs\t3.py", line 2, in <module> epi_img = nib.load('someones_epi.nii.gzip')
File "C:\Users\AnkitaShinde\AppData\Local\Programs\Python\Python35-32\lib\site-packages\nibabel\loadsave.py", line 47, in load filename)
nibabel.filebasedimages.ImageFileError: Cannot work out file type of "someones_epi.nii.gzip"
This is an old question, however I may have the solution for it.
I just figured out that nibabel.save() does not allow me to have dot . or dash - in the folder names. These can exist in filenames however. In your case, the current path is:
C:\Users\AnkitaShinde\AppData\Local\Programs\Python\Python35-32\Lib\site-packages\nibabel\someones_epi.nii.gzip
I would change it to:
C:\Users\AnkitaShinde\AppData\Local\Programs\Python\Python35_32\Lib\site_packages\nibabel\someones_epi.nii.gzip
This is just to give an example. Of course, I don't mean that you actually change the names of these package folders as it might cause other errors.
The actual solution would be to move the file someones_epi.nii.gzip to the user structure, something like:
C:\Users\AnkitaShinde\Desktop\nibabel\someones_epi.nii.gzip

Resources