Pytorch embedding too big for GPU but fits in CPU - pytorch-lightning

I am using PyTorch lightning, so lightning control GPU/CPU assignments and in
return I get easy multi GPU support for training.
I would like to create an embedding that does not fit in the GPU memory.
fit_in_cpu = torch.nn.Embedding(too_big_for_GPU, embedding_dim)
Then when I select the subset for a batch, send it to the GPU
GPU_tensor = embedding(idx)
How do I do this in Pytorch Lightning?

Lightning will send anything that is registered as a model parameter to GPU, i.e: weights of layers (anything in torch.nn.*) and variables registered using torch.nn.parameter.Parameter.
However if you want to declare something in CPU and then on runtime move it to GPU you can go 2 ways:
Create the too_big_for_GPU inside the __init__ without registering it as a model parameter (using torch.zeros or torch.randn or any other init function). Then move it to the GPU on the forward pass
class MyModule(pl.LightningModule):
def __init__():
self.too_big_for_GPU = torch.zeros(4, 1000, 1000, 1000)
def forward(self, x):
# Move tensor to same GPU as x and operate with it
y = self.too_big_for_GPU.to(x.device) * x**2
return y
Create the too_big_for_GPU which will be created by default in CPU and then you would need to move it to GPU
class MyModule(pl.LightningModule):
def forward(self, x):
# Create the tensor on the fly and move it to x GPU
too_big_for_GPU = torch.zeros(4, 1000, 1000, 1000).to(x.device)
# Operate with it
y = too_big_for_GPU * x**2
return y

Related

Why Python stops to work on GPU when using SimpleITK library in MONAI transforms?

I'm using Python 3.9 with Spyder 5.2.2 (Anaconda) for a U-Net segmentation task with MONAI. After importing all the images in a dictionary, I create these lines to define pre-process steps:
import SimpleITK as sitk
from monai.inferers import SimpleInferer
from monai.transforms import (
AsDiscrete,
DataStatsd,
AddChanneld,
Compose,
Activations,
LoadImaged,
Resized,
RandFlipd,
ScaleIntensityRanged,
DataStats,
AsChannelFirstd,
AsDiscreted,
ToTensord,
EnsureTyped,
RepeatChanneld,
EnsureType
)
from monai.transforms import Transform
monai_load = [
LoadImaged(keys=["image","segmentation"],image_only=False,reader=PILReader()),
EnsureTyped(keys=["image", "segmentation"], data_type="numpy"),
AddChanneld(keys=["segmentation","image"]),
RepeatChanneld(keys=["image"],repeats=3),
AsChannelFirstd(keys=["image"], channel_dim = 0),
]
monai_transforms =[
AsDiscreted(keys=["segmentation"],threshold=0.5),
ToTensord(keys=["image","segmentation"]),
]
class N4ITKTransform(Transform):
def __call__(self,image):
filtered = []
for channel in image["image"]:
inputImage = sitk.GetImageFromArray(channel)
inputImage = sitk.Cast(inputImage, sitk.sitkFloat32)
corrector = sitk.N4BiasFieldCorrectionImageFilter()
outputImage = corrector.Execute(inputImage)
filtered.append(sitk.GetArrayFromImage(outputImage))
image["image"] = np.stack(filtered)
return image
train_transforms = Compose(monai_load + [N4ITKTransform()] + monai_transforms)
When i recall these transforms with Compose and apply them to the train images, python does not work on GPU despite
torch.cuda.is_available()
return True.
These are the lines where I apply the transforms:
train_ds = IterableDataset(data = train_data, transform = train_transforms)
train_loader = DataLoader(dataset = train_ds, batch_size = batch_size, num_workers = 0, pin_memory = True)
When I define the U-Net model, I send it to 'cuda'.
The problem is in the SimpleITK transform. If I don't use them, Python works on GPU as usual.
Thank you in advance for getting back to me.
Federico
The answer is simple: SimpleITK uses CPU for processing.
I am not sure whether it is possible to get it to use some of the GPU-accelerated filters from ITK (its base library). If you use ITK Python, you have the possibility to use GPU-filters. But only a few filters have GPU implementations. N4BiasFieldCorrection does NOT have a GPU implementation. So if you want to use this filter, it needs to be done on the CPU.

Optimizing gpu allocation/transfer of matrix tiles

I am working with very large matrices (>1GB) but imagine that I have the following matrix:
A = [1 1 2 2;
1 1 2 2;
3 3 4 4;
3 3 4 4]
I need to pin each tile of the previous matrix to transfer them to the GPU in an async way (using the CUDA.jl package).
The following code allocates the space of each tile in the GPU and it is working:
function allocGPU!(gpu_buf, m,n)
dev_buf = CUDA.Mem.alloc(CUDA.Mem.DeviceBuffer, m*n*8)
dev_ptr = convert(CuPtr{Float64}, dev_buf);
push!(gpu_buf, dev_buf)
tile_gpu = unsafe_wrap(CuArray{Float64}, dev_ptr, (m,n));
gpu_buf
return tile_gpu
end
A_coor = [(1:2,1:2) (1:2, 3:4);
(3:4,1:2) (3:4,3:4)]
A_tiles = [A[A_coor[i][1], A_coor[i,j][2]] for i=1:size(A_coor)[1], j=1:size(A_coor)[2]]
gpu_buf = []
A_tiles_gpu = [allocGPU!(gpu_buf, m,n) for i=1:size(A_tiles)[1], j=1:size(A_tiles)[2]]
But it's copying each tile into a new object, taking more time than I would like. Is there any way to wrap a 2x2 Array to each tile in order to reduce the number of allocations?
I also tried with this line:
A_tiles = [unsafe_wrap(Array{Float64}, pointer(A[A_coor[i][1], A_coor[i,j][2]]), (m,n)) for i=1:size(A_coor)[1], j=1:size(A_coor)[2]]
I also though of pinning matrix A and then transfer to the GPU as:
copyto!(tile_gpu, A[1:2,1:2])
but I'm guessing julia will copy the A[1:2,1:2] into a new object and then transfer the tile, yielding the same results as 1st method.
Edit:
As I suspected the:
copyto!(tile_gpu, A[1:2,1:2])
Creates a new object, in a different memory location, I also tried to use the #view macro, although it works for the CPU it doesn't seem to work with copyto! to the GPU memory.

How can I do indexwise Convolution type operation in Tensorflow using Conv2D?

So I have 64 Images/Feature maps of dimension 512x512 making it a cube of (512x512x64), I want to convolve each image with 64 kernels INDEXWISE.
Example -
1st Image ------> 1st Kernel
2nd Image ------> 2nd Kernel
3rd Image ------> 3rd Kernel
.
.
.
64th Image -------> 64th Kernel
I want to do this with Conv2D in tensorflow, as far as i know Conv2D will take single image and convolve with each kernel,
1st Image --> all 64 kernels
2nd Image --> all 64 kernels
I dont want to do this
One (inefficient but relatively simple) way to do this would be to use a custom layer:
class IndexConv(Layer):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# Define kernel size, strides, padding etc.
def build(self, input_shape):
# store the input kernels as a static value. This will save having to get the number of kernels at runtime.
self.num_kernels = input_shape[-1]
# If the input has n channels, you need n separate kernels
# Since each kernel convolves a single channel, input and output channels for each kernel will be 1
self.kernels = [self.add_weight(f'k{i}', (kernel_h, kernel_w, 1, 1), other_params) for i in range(input_shape[-1])]
def call(self, inputs, **kwargs):
# Split input vector into separate vectors, one vector per channel
inputs = tf.unstack(inputs, axis=-1)
# Convolve each input channel with corresponding kernel
# This is the "inefficient" part I mentioned
# Complex but more efficient versions can make use of tf.map_fn
outputs = [
tf.nn.conv2d(channel[i][:, :, :, None], self.kernels[i], other_params)
for i in range(self.num_kernels)
]
# return concatenated output
return tf.concat(outputs, axis=-1)

Loading Tensorflow Graph in other file not giving the same accuracy

I trained a CNN in Tensorflow and it tested with 92% accuracy. I saved it as a typical ckpt file.
session = tf.Session(config=tf.ConfigProto(log_device_placement=True))
session.run(tf.global_variables_initializer())
<TRAINING ETC>
saver.save(session, save_path_name)
In a different file, I want to run inference, so I called the meta-graph as explained in the documentation:
face_recognition_session = tf.Session()
saver = tf.train.import_meta_graph(<PATH TO META FILE>, clear_devices=True)
saver.restore(face_recognition_session, <PATH TO CKPT FILE>)
graph = tf.get_default_graph()
x = graph.get_tensor_by_name('input_variable_00:0')
y = graph.get_tensor_by_name('output_variable_00:0')
When performing inference or testing it anew, the accuracy drops to 3%.
Am I overlooking anything?
You are assigning the wrong method to saver. From the TF Guide you can see that you want to init session and then upload through tensorflow.train.Saver().
tf.reset_default_graph()
# Create some variables.
x = tf.get_variable("input_variable_00:0", [x_shape])
y = tf.get_variable("output_variable_00:0", [y_shape])
saver = tf.train.Saver()
# Use the saver object normally after that.
with tf.Session() as sess:
# Initialize v1 since the saver will not.
saver.restore(sess, <PATH TO CKPT FILE>)
print("x : %s" % x.eval())
print("y : %s" % y.eval())
I would also recommend looking into freezing and exporting your graphs as a GraphDef if you want to have consistent inference results.

How to move an image in Lua?

I am new to Lua programming and I am having problems while trying to move an image from a set of coordinates to another.
What I am trying to create is to be used with the X-Plane flight simulator. There is a library called SASL (www.1-sim.com), that was created to make plugins (for X-Plane) creation easir, since the default language is C++ and many people find it difficult.
In general, SASL works as a bridge between Lua and X-Plane, in general lines, the scripts you write reads some data straight from X-Plane (DataRefs) while it is running and depending on the code you wrote its execute commands or many other things that it is possible.
So, when using SASL to create cockpit/panel gauges it uses a base file, named 'avionics.lua' that works as a class and loads all gauges you create for the specific aircraft you are working on. For example my avionics.lua file looks like this:
size = { 2048, 2048 }
components = {
flaps {};
};
where, 'size' is the size that will be used for things to be drawn and components is an array of gauges, in this case the flaps gauge.
The rest of the work is to create the gauge functionality and this is done in a separate file, in my case, called 'flaps.lua'.
Within flaps.lua, is where I need to code the flaps indicator functionality which is to load 2 images: one for the back ground and the second one for the flaps indicator.
The first image is a fixed image. The second one will move throught the 'y' axis based on the flaps indicator DataRef (flapsDegree property below).
The code below when X-Plane is running displays the background image and the flaps indicator on its first stage which is 0 as you can see on the image.
size = {78,100}
local flapsDegree = globalPropertyf("sim/cockpit2/controls/flap_ratio")
local background = loadImage("gfx/Flaps.png")
local indicator = loadImage("gfx/Flaps_Indicator.png")
local flaps = get(flapsPosition)
components = {
texture { position = {945, 1011, 60, 100}, image = background},
texture { position = {959, 1097, 30, 9}, image = indicator},
}
Image
Now, the problem comes when I need to implement the logic for moving the 'indicator' image through the 'y' axis.
I have tried this code without success:
if flaps == 0.333 then
indicator.position = {959, 1075, 30, 9}
end
So how could I accomplish that?
I am not familiar with the SALS library. I just had a quick look into it.
Everything you need to know is in the manuals on the website you linked.
In particular http://www.1-sim.com/files/SASL300.pdf
Everytime your screen is updated each components draw() function will be called.
So if you want to change something dynamically you have to put that into the component's draw function.
If you open the SALS sources you'll find basic components which show you how to use that stuff. One of them is needle.lua:
-- default angle
defineProperty("angle", 0)
-- no image
defineProperty("image")
function draw(self)
local w, h = getTextureSize(get(image))
local max = w
if h > max then
max = h
end
local rw = (w / max) * 100
local rh = (h / max) * 100
drawRotatedTexture(get(image), get(angle),
(100 - rw) / 2, (100 - rh) / 2, rw, rh)
end
If you check the manual you'll find that there is not only a drawRotatedTexture function but many other functions for drawing stuff. Just play around. Try drawTexture for your purpose.
If you don't know what you have to program, open every single lua file in the library and read it together with the Lua reference manual and the SALS documentation until you understand what is going on. Once you understand the existing code you can extend it with ease.

Resources