Using Tensorflow to train a image classifier using my own data using inception and TFrecords - image

I follow the tutorial on how to train your own data from tensorflow at Github: https://github.com/tensorflow/models/tree/master/inception#how-to-construct-a-new-dataset-for-retraining.
I split my data (Training and validation), created labels suggested and managed to created the TFrecords using bazel-bin. Everything works and now I have my own data as TFrecords.
Now I want to train my image classifier using inception-v3 model from scratch and it seems I should use the script inception_train.py, but I am not sure. Is that right ? https://github.com/tensorflow/models/blob/master/inception/inception/inception_train.py.
If so, I have two questions:
1-) How can I train it using my TFrecords. If you can show me an example would be great.
2-) Can I run on CPU or is only possible on GPUs ?
Thank you very much.

Try the following sample code to read images and labels from your tfrecords,
import os
import glob
import tensorflow as tf
from matplotlib import pyplot as plt
def read_and_decode_file(filename_queue):
# Create an instance of tf record reader
reader = tf.TFRecordReader()
# Read the generated filename queue
_, serialized_reader = reader.read(filename_queue)
# extract the features you require from the tfrecord using their corresponding key
# In my example, all images were written with 'image' key
features = tf.parse_single_example(
serialized_reader, features={
'image': tf.FixedLenFeature([], tf.string),
'labels': tf.FixedLenFeature([], tf.int16)
})
# Extract the set of images as shown below
img = features['image']
img_out = tf.image.resize_image_with_crop_or_pad(img, target_height=128, target_width=128)
# Similarly extract the labels, be careful with the type
label = features['labels']
return img_out, label
if __name__ == "__main__":
tf.reset_default_graph()
# Path to your tfrecords
path_to_tf_records = os.getcwd() + '/*.tfrecords'
# Collect all tfrecords present in the records folder using glob
list_of_tfrecords = sorted(glob.glob(path_to_tf_records))
# Generate a tensorflow readable filename queue by supplying it with
# a list of tfrecords, optionally it is recommended to shuffle your data
# before feeding into the network
filename_queue = tf.train.string_input_producer(list_of_tfrecords, shuffle=False)
# Supply the tensorflow generated filename queue to the custom function above
image, label = read_and_decode_file(filename_queue)
# Create a new tf session to read the data
sess = tf.Session()
tf.train.start_queue_runners(sess=sess)
# Arbitrary number of iterations
for i in range(50):
img =sess.run(image)
# Show image
plt.imshow(img)
Now, you also have a function called tf.train.shuffle_batch to help you spawn multiple CPU threads that perform this function and return images and labels based on user specified batch size. You would need to create simultaneous data and training pipelines so that they work simultaneously.
To answer your second question, yes you can train your model using CPU alone but it would be slow and might take several hours or even days to achieve decent results. Remove the with tf.device('/gpu:{0}'): decorator before the creation of your inception model and tensorflow would create the model on your CPU.
Hope this explanation helps.

Related

Issue with xarray.open_raterio() img file and multiprocessing Pool

I am trying to use mutliprocessing Pool.map() to speed up my code. In the function where I have computation occurring for each process I reference an xarray.DataArray that was opened using xarray.open_rasterio(). However, I receive errors similar to this:
rasterio.errors.RasterioIOError: Read or write failed. /net/home_stu/cfite/data/CDL/2019/2019_30m_cdls.img, band 1: IReadBlock failed at X offset 190, Y offset 115: Unable to open external data file: /net/home_stu/cfite/data/CDL/2019/
I assume this is some issue related to the same file being referenced simultaneously while another worker is opening it too? I use DataArray.sel() to select small portions of the raster grid that I work with since the entire .img file is way to big to load all at once. I have tried opening the .img file in the main code and then just referencing to it in my function, and I've tried opening/closing it in the function that is being passed to Pool.map() - and receive errors like this regardless. Is my file corrupted, or will I just not be able to work with this file using multiprocessing Pool? I am very new to working with multiprocessing, so any advice is appreciated. Here is an example of my code:
import pandas as pd
import xarray as xr
import numpy as np
from multiprocessing import Pool
def select_grid(x,y):
ds = xr.open_rasterio('myrasterfile.img') #opening large file with xarray
grid = ds.sel(x=slice(x,x+50), y=slice(y,y+50))
ds.close()
return grid
def myfunction(row):
x = row.x
y = row.y
mygrid = select_grid(x,y)
my_calculation = mygrid.sum() #example calculation, but really I am doing multiple calculations
my_calculation.to_csv('filename.csv')
with Pool(30) as p:
p.map(myfunction, list_of_df_rows)

Gensim: How to load corpus from saved lda model?

When I saved my LdaModel lda_model.save('model'), it saved 4 files:
model
model.expElogbeta.npy
model.id2word
model.state
I want to use pyLDAvis.gensim to visualize the topics, which seems to need the model, corpus and dictionary. I was able to load the model and dictionary with:
lda_model = LdaModel.load('model')
dict = corpora.Dictionary.load('model.id2word')
Is it possible to load the corpus? How?
Sharing this here because it took me awhile to find out the answer to this as well. Note that dict is not a valid name for a dictionary and we use lda_dict instead.
# text array is a list of lists containing text you are analysing
# eg. text_array = [['volume', 'eventually', 'metric', 'rally'], ...]
# lda_dict is a gensim.corpora.Dictionary object
bow_corpus = [lda_dict.doc2bow(doc) for doc in text_array]
Jireh answered correctly but it may be confusing how to load all the previous LDA files. I'm not sure why gensim saves the *.state and *.npy files (I'd appreciate insights in the comments). To reuse a previous LDA model you load the *.model and *.id2word files along with your original corpus.
For instance, if I have a dataframe of my documents in column 'docs' then you load that dataframe again as you will need it to recreate your corpus.
import pandas as pd
from gensim import corpora, models
from gensim.corpora.dictionary import Dictionary
from pyLDAvis import gensim_models
df = pd.read_csv('your_file.csv')
texts = df['docs'].values
You load your previously created dictionary as follows:
dictionary = corpora.Dictionary.load('your_file.id2word')
... and then create the corpus from the dictionary and your original texts (created from the dataframe['docs'] above):
corpus = [dictionary.doc2bow(text) for text in texts]
The previously created LDA model is loaded via gensim:
lda_model = gensim.models.ldamodel.LdaModel.load('your_file.model')
These objects are then fed into your pyLDAvis instance:
lda_viz = pyLDAvis.gensim_models.prepare(lda_model, corpus, dictionary)
If you don't use the .id2word file you can run into issues with not having the correct shape (IndexError). I've had this happen when I ran LDA multicore so I use the .id2word rather than recreating the dictionary from the corpus.
in the gensim python code, they said ignore expElogbeta and state file. It is possible to load the corpus, corpus is a set of list contain 2 numbers. It will be complex to load it out, I suggest load corpus from the origin text data and using id2word

Raspberry timelapse movie on the fly

I have a Raspberry on which I want to create a timelapse movie.
All examples I see in the internet FIRST save a bunch of images and THEN converts them into a movie all at once.
I want to create a movie over a long period of time so I can't save thousands of images. What I need is a tool that adds an image to a movie right after the image is captured.
Is there a chance to do that?
There's a flaw in your logic, I think - by adding each image to the movie, you would necessarily be adding a full-frame, rather than only a diff frame. This will result in higher quality, sure - but it will also not save you anything in terms of space as compared to saving the entire image. The space savings you see in adding things to movies is all about that diff, rather than storing a full frame.
Doing a partial diff with check-frames at increments might work, but I'm not sure what format you're targeting, nor what codexes would be needed in order to arbitrarily tack on either a diff frame or a full frame, depending on some external condition - encoding usually takes place as a series of operations rather than singly.
An answer but it isn't finished!
I need your help making this perfect!
Running in python2
import os, cv2
from picamera import PiCamera
from picamera.array import PiRGBArray
from datetime import datetime
from time import sleep
now = datetime.now()
x = now.strftime("%Y")+"-"+now.strftime("%m")+"-"+now.strftime("%d")+"-"+now.strftime("%H")+"-"+now.strftime("%M") #string of dateandtimestart
def main():
imagenum = 100 #how many images
period = 1 #seconds between images
os.chdir ("/home/pi/t_lapse")
os.mkdir(x)
os.chdir(x)
filename = x + ".avi"
camera = PiCamera()
camera.resolution=(1920,1088)
camera.vflip = True
camera.hflip = True
camera.color_effects = (128,128) #makes a black and white image for IR camera
sleep(0.1)
out = cv2.VideoWriter(filename, cv2.cv.CV_FOURCC(*'XVID'), 30, (1920,1088))
for c in range(imagenum):
with PiRGBArray(camera, size=(1920,1088)) as output:
camera.capture(output, 'bgr')
imagec = output.array
out.write(imagec)
output.truncate(0) #trying to get more than 300mb files..
pass
sleep(period-0.5)
camera.close()
out.release()
if __name__ == '__main__':
main()
I've got this configured with with a few buttons and an OLED to select time spacing and frame numbers displayed on a OLED (code not shown above for simplicity but it is also here: https://github.com/gchennell/RPi-PiLapse )
This doesn't make videos larger than 366Mb which is some sort of limit I've reached and I don't know why - if anyone has a good suggestion I would appreciate it

In Python 3, best way to open an image stored in a list as a file object?

Using python 3.4 in linux and windows, I'm trying to create qr code images from a list of string objects. I don't want to just store the image as a file because the list of strings may change frequently. I want to then tile all the objects and display the resulting image on screen for the user to scan with a barcode scanner. For the user to know which code to scan I need to add some text to the qr code image.
I can create the list of image objects correctly and they are in a list and calling .show on these objects displays them properly but I don't know how to treat these objects as a file object to open them. The object that is given to the open function, (img_list[0] in my case), in my add_text_to_img needs to support read, seek and tell methods. When I try this as is I get an attribute error. I've tried BytesIO and StringIO but I get an error message that Image.open does not support buffer interface. Maybe I am not doing that part correctly.
I'm sure there are several ways to do this, but what is the best way to open in memory objects as a file object?
from io import BytesIO
import qrcode
from PIL import ImageFont, ImageDraw, Image
def make_qr_image_list(code_list):
"""
:param code_list: a list of string objects to encode into QR code image
:return: a list of image or some type of other data objects
"""
img_list = []
for item in code_list:
qr = qrcode.QRCode(
version=None,
error_correction=qrcode.ERROR_CORRECT_L,
box_size=4,
border=10
)
qr.add_data(item)
qr_image = qr.make_image(fit=True)
img_list.append(qr_image)
return img_list
def add_text_to_img(text_list, img_list):
"""
While I was working on this, I am only saving the first image. Once
it's working, I'll save the rest of the images to a list.
:param text_list: a list of strings to add to the corresponding image.
:param img_list: the list containing the images already created from
the text_list
:return:
"""
base = Image.open(img_list[0])
# img = Image.frombytes(mode='P', size=(164,164), data=img_list[0])
text_img = Image.new('RGBA', base.size, (255,255,255,0))
font = ImageFont.truetype('sans-serif.ttf', 10)
draw = ImageDraw.Draw(text_img)
draw.text((0,-20),text_list[0], (0,0,255,128), font=font)
# include some method to save the images after the text
# has been added here. Shouldn't actually save to a file.
# Should be saved to memory/img_list
output = Image.alpha_composite(base,text_img)
output.show()
if __name__ == '__main__':
test_list = ['AlGaN','n-AlGaN','p-AlGaN','MQW','LED AlN-AlGaN']
image_list = make_qr_image_list(test_list)
add_text_to_img(test_list, image_list)
im = image_list[0]
im.save('/my_save_path/test_image.png')
im.show()
Edit: I've been using python for about a year and I feel like this is a pretty common thing to do but I'm not even sure that I'm looking up/searching for the right terms. What topics would you search for to answer this? If anyone can post a link or two to what I need to read up on regarding this, that would be very appreciated.
You already have PIL image objects; qr.make_image() returns the (a wrapper around) the right type of object and you do not need to open them again.
As such, all you need to do is:
base = img_list[0]
and go from there.
You do need to match image modes when compositing; QR codes are black-and-white images (mode 1), so either convert that or use the same mode in your text_img image object. The Image.alpha_composite() operation does require that both images have an alpha channel. Converting the base is easy:
base = img_list[0].convert('RGBA')

Caffe Multiple Input Images

I'm looking at implementing a Caffe CNN which accepts two input images and a label (later perhaps other data) and was wondering if anyone was aware of the correct syntax in the prototxt file for doing this? Is it simply an IMAGE_DATA layer with additional tops? Or should I use separate IMAGE_DATA layers for each?
Thanks,
James
Edit: I have been using the HDF5_DATA layer lately for this and it is definitely the way to go.
HDF5 is a key value store, where each key is a string, and each value is a multi-dimensional array. Thus, to use the HDF5_DATA layer, just add a new key for each top you want to use, and set the value for that key to store the image you want to use. Writing these HDF5 files from python is easy:
import h5py
import numpy as np
filelist = []
for i in range(100):
image1 = get_some_image(i)
image2 = get_another_image(i)
filename = '/tmp/my_hdf5%d.h5' % i
with hypy.File(filename, 'w') as f:
f['data1'] = np.transpose(image1, (2, 0, 1))
f['data2'] = np.transpose(image2, (2, 0, 1))
filelist.append(filename)
with open('/tmp/filelist.txt', 'w') as f:
for filename in filelist:
f.write(filename + '\n')
Then simply set the source of the HDF5_DATA param to be '/tmp/filelist.txt', and set the tops to be "data1" and "data2".
I'm leaving the original response below:
====================================================
There are two good ways of doing this. The easiest is probably to use two separate IMAGE_DATA layers, one with the first image and label, and a second with the second image. Caffe retrieves images from LMDB or LEVELDB, which are key value stores, and assuming you create your two databases with corresponding images having the same integer id key, Caffe will in fact load the images correctly, and you can proceed to construct your net with the data/labels of both layers.
The problem with this approach is that having two data layers is not really very satisfying, and it doesn't scale very well if you want to do more advanced things like having non-integer labels for things like bounding boxes, etc. If you're prepared to make a time investment in this, you can do a better job by modifying the tools/convert_imageset.cpp file to stack images or other data across channels. For example you could create a datum with 6 channels - the first 3 for your first image's RGB, and the second 3 for your second image's RGB. After reading this in using the IMAGE_DATA layer, you can split the stream into two images using a SLICE layer with a slice_point at index 3 along the slice_dim = 1 dimension. If further down the road, you decide that you want to load even more complex assortments of data, you'll understand the encoding scheme and can write your own decoding layer based off of src/caffe/layers/data_layer.cpp to gain full control of the pipeline.
You may also consider using HDF5_DATA layer with multiple "top"s

Resources