Does anybody know why the last arg in input_shape must be specied 3 in keras' application? - image

I want to use pre-train Net, such as VGG, ResNet. While in Keras, there must be specified the formate in (w,h,3) in input_shape. If I want to specify the channel to 1, is there have more tricks?
conv_vgg = keras.application.VGG16(input_shape=(224,224,3))
I want to specify 3 to 1:
conv_vgg = keras.application.VGG16(input_shape=(224,224,1))
Thanks in advance!

Pre-trained networks as trained in imagenet or other image data sets. This means that is trained with RGB images that's why using a pretrained network requires three channels.
If you want to use pre-trained networks for a single channel image you could repeat your channel three times and proceed. (Repeat-copy two more times your 1-channel image, from (224,224,1) shape to (224,224,3) shape (3-channels image).

Related

faster r-cnn test all images in folder

I have trained my model using faster-rcnn on my computer. As you know , using script you can only test 1 image. But I want to test all images in my test folder. I have to write a code (using loops) which enables me to do that. Is there anyone can help me about this problem. Thanks in advance. Sincerely.
Hello,
Here you can find a function that I implemented for testing
Faster R-CNN on a set of images:
def get_folder_results(detector, image_dir, device):
' The detector represents your Faster R-CNN model,
image_dir represents the folderpath containing the images,
device : the device used to train (CPU, GPU ...) '
for image in os.listdir(image_dir):
image_path = os.path.join(image_dir, image)
input_images = [T.ToTensor()(Image.open(image_path)).to(device)]
prediction = detector(images)
print(prediction)
After executing this function you will get a prediction per image where each
prediction represents the coordinates of the bounding boxes and the
corresponding class).

Transformer-XL: Input and labels for Language Modeling

I'm trying to finetune the pretrained Transformer-XL model transfo-xl-wt103 for a language modeling task. Therfore, I use the model class TransfoXLLMHeadModel.
To iterate over my dataset I use the LMOrderedIterator from the file tokenization_transfo_xl.py which yields a tensor with the data and its target for each batch (and the sequence length).
Let's assume the following data with batch_size = 1 and bptt = 8:
data = tensor([[1,2,3,4,5,6,7,8]])
target = tensor([[2,3,4,5,6,7,8,9]])
mems # from the previous output
My question is: I currently pass this data into the model like this:
output = model(input_ids=data, labels=target, mems=mems)
Is this correct?
I am wondering because the documentation says for the labels parameter:
labels (:obj:torch.LongTensor of shape :obj:(batch_size, sequence_length), optional, defaults to :obj:None):
Labels for language modeling.
Note that the labels are shifted inside the model, i.e. you can set lm_labels = input_ids
So what is it about the parameter lm_labels? I only see labels defined in the forward method.
And when the labels "are shifted" inside the model, does this mean I have to pass data twice (additionally instead of targets) because its shifted inside? But how does the model then know the next token to predict?
I also read through this bug and the fix in this pull request but I don't quite understand how to treat the model now (before vs. after fix)
Thanks in advance for some help!
Edit: Link to issue on Github
That does sound like a typo from another model's convention. You do have to pass data twice, once to input_ids and once to labels (in your case, [1, ... , 8] for both). The model will then attempt to predict [2, ... , 8] from [1, ... , 7]). I am not sure adding something at the beginning of the target tensor would work as that would probably cause size mismatches later down the line.
Passing twice is the default way to do this in transformers; before the aforementioned PR, TransfoXL did not shift labels internally and you had to shift the labels yourself. The PR changed it to be consistent with the library and the documentation, where you have to pass the same data twice.

Paraview rotate fields

I am using Paraview 5.0.1. If any solution requires updating, I can try.
I want to programmatically obtain field plots (and corresponding PlotOverLine) of displacements and stresses in rotated coordinate systems.
What are appropriate/convenient/possible ways of doing this?
So far, I have created one Calculator filter for each component of displacements and stresses.
For instance, I used Calculators in 2D with results
(displacement.iHat)*cos(0.7853981625)+(displacement.jHat)*sin(0.7853981625)
(stress_3-stress_0)*sin(45.0*3.14159265/180)*cos(45.0*3.14159265/180)+stress_1*((cos(45.0*3.14159265/180))^2-(sin(45.0*3.14159265/180))^2)
It works fine, but it is quite cumbersome, in several aspects:
Creating them (one filter per component).
Plotting several of them in a single XY plot
Exporting them (one export per component).
Is there a simple way to do this?
PS: The Transform filter does not accomplish this. It rotates the view, not the fields.
Two solutions:
Ugly, inneficient solution
Use Transform and check "Transform All Input vectors"
Add a calculator and add a dummy array
Use transform the other way around, without checking "Transform All Input vectors"
Correct solution :
Compute the transformation yourself in a programmable filter
input = self.GetUnstructuredGridInput();
output = self.GetUnstructuredGridOutput();
output.ShallowCopy(input)
data = input.GetPointData().GetArray("YourArray")
vec = vtk.vtkDoubleArray();
vec.SetNumberOfComponents(3);
vec.SetName("TransformedVectors");
numPoints = input.GetNumberOfPoints()
for i in xrange(0, numPoints):
tuple = data.GetTuple(i)
transform(tuple) # implement the transform in python
vec.InsertNextTuple(tuple)
output.GetPointData().AddArray(vec)

Caffe Multiple Input Images

I'm looking at implementing a Caffe CNN which accepts two input images and a label (later perhaps other data) and was wondering if anyone was aware of the correct syntax in the prototxt file for doing this? Is it simply an IMAGE_DATA layer with additional tops? Or should I use separate IMAGE_DATA layers for each?
Thanks,
James
Edit: I have been using the HDF5_DATA layer lately for this and it is definitely the way to go.
HDF5 is a key value store, where each key is a string, and each value is a multi-dimensional array. Thus, to use the HDF5_DATA layer, just add a new key for each top you want to use, and set the value for that key to store the image you want to use. Writing these HDF5 files from python is easy:
import h5py
import numpy as np
filelist = []
for i in range(100):
image1 = get_some_image(i)
image2 = get_another_image(i)
filename = '/tmp/my_hdf5%d.h5' % i
with hypy.File(filename, 'w') as f:
f['data1'] = np.transpose(image1, (2, 0, 1))
f['data2'] = np.transpose(image2, (2, 0, 1))
filelist.append(filename)
with open('/tmp/filelist.txt', 'w') as f:
for filename in filelist:
f.write(filename + '\n')
Then simply set the source of the HDF5_DATA param to be '/tmp/filelist.txt', and set the tops to be "data1" and "data2".
I'm leaving the original response below:
====================================================
There are two good ways of doing this. The easiest is probably to use two separate IMAGE_DATA layers, one with the first image and label, and a second with the second image. Caffe retrieves images from LMDB or LEVELDB, which are key value stores, and assuming you create your two databases with corresponding images having the same integer id key, Caffe will in fact load the images correctly, and you can proceed to construct your net with the data/labels of both layers.
The problem with this approach is that having two data layers is not really very satisfying, and it doesn't scale very well if you want to do more advanced things like having non-integer labels for things like bounding boxes, etc. If you're prepared to make a time investment in this, you can do a better job by modifying the tools/convert_imageset.cpp file to stack images or other data across channels. For example you could create a datum with 6 channels - the first 3 for your first image's RGB, and the second 3 for your second image's RGB. After reading this in using the IMAGE_DATA layer, you can split the stream into two images using a SLICE layer with a slice_point at index 3 along the slice_dim = 1 dimension. If further down the road, you decide that you want to load even more complex assortments of data, you'll understand the encoding scheme and can write your own decoding layer based off of src/caffe/layers/data_layer.cpp to gain full control of the pipeline.
You may also consider using HDF5_DATA layer with multiple "top"s

Opencv cvSaveImage

I am trying to save an image using opencv cvSaveImage function. The problem is that I am performing a DCT on the image and then changing the coefficients that are obtained after performing the DCT, after that I am performing an inverse DCT to get back the pixel values. But this time I get the pixel values in Decimals(e.g. 254.34576). So when I save this using cvSaveImage function it discards all the values after decimals(e.g. saving 254.34576 as 254) and saves the image. Due to this my result gets affected. Please Help
"The function cvSaveImage saves the image to the specified file. The image format is chosen depending on the filename extension, see cvLoadImage. Only 8-bit single-channel or 3-channel (with 'BGR' channel order) images can be saved using this function. If the format, depth or channel order is different, use cvCvtScale and cvCvtColor to convert it before saving, or use universal cvSave to save the image to XML or YAML format."
I'd suggest investigating the cvSave function.
HOWEVER, a much easier way is to just write your own save/load functions, this would be very easy:
f = fopen("image.dat","wb");
fprintf(f,"%d%d",width,height);
for (y=0 to height)
for (x=0 to width)
fprintf(f,"%f",pixelAt(x,y));
And a corresponding mirror function for reading.
P.S. Early morning and I can't remember for the life of me if fprintf works with binary files. But you get the idea. You could use fwrite() instead.

Resources