Issue with xarray.open_raterio() img file and multiprocessing Pool - multiprocessing

I am trying to use mutliprocessing Pool.map() to speed up my code. In the function where I have computation occurring for each process I reference an xarray.DataArray that was opened using xarray.open_rasterio(). However, I receive errors similar to this:
rasterio.errors.RasterioIOError: Read or write failed. /net/home_stu/cfite/data/CDL/2019/2019_30m_cdls.img, band 1: IReadBlock failed at X offset 190, Y offset 115: Unable to open external data file: /net/home_stu/cfite/data/CDL/2019/
I assume this is some issue related to the same file being referenced simultaneously while another worker is opening it too? I use DataArray.sel() to select small portions of the raster grid that I work with since the entire .img file is way to big to load all at once. I have tried opening the .img file in the main code and then just referencing to it in my function, and I've tried opening/closing it in the function that is being passed to Pool.map() - and receive errors like this regardless. Is my file corrupted, or will I just not be able to work with this file using multiprocessing Pool? I am very new to working with multiprocessing, so any advice is appreciated. Here is an example of my code:
import pandas as pd
import xarray as xr
import numpy as np
from multiprocessing import Pool
def select_grid(x,y):
ds = xr.open_rasterio('myrasterfile.img') #opening large file with xarray
grid = ds.sel(x=slice(x,x+50), y=slice(y,y+50))
ds.close()
return grid
def myfunction(row):
x = row.x
y = row.y
mygrid = select_grid(x,y)
my_calculation = mygrid.sum() #example calculation, but really I am doing multiple calculations
my_calculation.to_csv('filename.csv')
with Pool(30) as p:
p.map(myfunction, list_of_df_rows)

Related

Importing a Mathematica Matrix

I am trying to import a very large matrix file to a variable using the code below. The matrix is of the form {{A,B,C},{D,E,D},{G,F,A}}
inputseq2s = Import[ "C:\Users\jdd0758\Desktop\WaveformsOnly.txt","Data"]
However when I import it like this, it is not read as if it was entered as:
inputseq2s = {{A,B,C},{D,E,D},{G,F,A}}
Extra commas are added to the input and it will not work in my algorithm. How can I make this work properly?

Using Tensorflow to train a image classifier using my own data using inception and TFrecords

I follow the tutorial on how to train your own data from tensorflow at Github: https://github.com/tensorflow/models/tree/master/inception#how-to-construct-a-new-dataset-for-retraining.
I split my data (Training and validation), created labels suggested and managed to created the TFrecords using bazel-bin. Everything works and now I have my own data as TFrecords.
Now I want to train my image classifier using inception-v3 model from scratch and it seems I should use the script inception_train.py, but I am not sure. Is that right ? https://github.com/tensorflow/models/blob/master/inception/inception/inception_train.py.
If so, I have two questions:
1-) How can I train it using my TFrecords. If you can show me an example would be great.
2-) Can I run on CPU or is only possible on GPUs ?
Thank you very much.
Try the following sample code to read images and labels from your tfrecords,
import os
import glob
import tensorflow as tf
from matplotlib import pyplot as plt
def read_and_decode_file(filename_queue):
# Create an instance of tf record reader
reader = tf.TFRecordReader()
# Read the generated filename queue
_, serialized_reader = reader.read(filename_queue)
# extract the features you require from the tfrecord using their corresponding key
# In my example, all images were written with 'image' key
features = tf.parse_single_example(
serialized_reader, features={
'image': tf.FixedLenFeature([], tf.string),
'labels': tf.FixedLenFeature([], tf.int16)
})
# Extract the set of images as shown below
img = features['image']
img_out = tf.image.resize_image_with_crop_or_pad(img, target_height=128, target_width=128)
# Similarly extract the labels, be careful with the type
label = features['labels']
return img_out, label
if __name__ == "__main__":
tf.reset_default_graph()
# Path to your tfrecords
path_to_tf_records = os.getcwd() + '/*.tfrecords'
# Collect all tfrecords present in the records folder using glob
list_of_tfrecords = sorted(glob.glob(path_to_tf_records))
# Generate a tensorflow readable filename queue by supplying it with
# a list of tfrecords, optionally it is recommended to shuffle your data
# before feeding into the network
filename_queue = tf.train.string_input_producer(list_of_tfrecords, shuffle=False)
# Supply the tensorflow generated filename queue to the custom function above
image, label = read_and_decode_file(filename_queue)
# Create a new tf session to read the data
sess = tf.Session()
tf.train.start_queue_runners(sess=sess)
# Arbitrary number of iterations
for i in range(50):
img =sess.run(image)
# Show image
plt.imshow(img)
Now, you also have a function called tf.train.shuffle_batch to help you spawn multiple CPU threads that perform this function and return images and labels based on user specified batch size. You would need to create simultaneous data and training pipelines so that they work simultaneously.
To answer your second question, yes you can train your model using CPU alone but it would be slow and might take several hours or even days to achieve decent results. Remove the with tf.device('/gpu:{0}'): decorator before the creation of your inception model and tensorflow would create the model on your CPU.
Hope this explanation helps.

Importing images to prep for keras

I am trying to import a bunch of images and get them ready for keras. The goal here is to have an array of the following dimensions. (length, 160,329,3). As you can see my reshape function is commented out. The "print(images.shape) line returns (8037,). Not sure how to proceed to get the right array dimensions. For reference the 1st column in the csv file is a list of paths to the image in question. I have a function below that combines the path of the image inside the folder and the path to the folder.
When I run the commented out reshape function I get the following error. "ValueError: cannot reshape array of size 8037 into shape (8037,160,320,3)"
import csv
import cv2
f = open('/Users/username/Desktop/data/driving_log.csv')
csv_f = csv.reader(f)
m=[]
for row in csv_f:
n=(row)
m.append(n)
images=[]
for i in range(len(m)):
img=(m[i][1])
img=img.lstrip()
path='/Users/username/Desktop/data/'
img=path+img
image=cv2.imread(img)
images.append(image)
item_num = len(images)
images=np.array(images)
#images=np.array(images).reshape(item_num, 160, 320, 3)
print(images.shape) #returns (8037,)
Can you print the shape of an image before it is appended to images to verify it is what you expect? Even better would be adding an imshow in the loop to make sure you're loading the images you expect (only need to do for one or two). cv2.imread does not throw an error if there isn't an image at the file path you give it, so your array might be all None which would yield the exact behavior you've described.
If that is the problem, check the img variable and make sure it's pointing exactly where you want it to.
Turns out it was including the first line of the CSV file which was heading. After I sorted that out it ran great. It gave me the requested shape.
images=[]
for i in range(1,len(labels)):
img=(m[i][1])
img=img.lstrip()
path='/Users/user/Desktop/data/'
img=path+img
image=cv2.imread(img)
images.append(image)

Mysql not all arguments converted during string formatting using python

Under a database, I am currently creating several tables and inserting several cvs files into these tables.
While I succeeded in inserting some CVS files, some files I failed.
The code I failed is as below
import sys
sys.path.append('/Users/leigh/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages')
import mysql.connector
import config
import csv
import os
dbcon = mysql.connector.connect(database=config.db, user=config.user, password=config.passwd, host=config.host)
dbcur = dbcon.cursor()
dbcur.execute('CREATE TABLE parking (Place_ID int, Parking varchar(50))')
with open('/Users/leigh/documents/RCdata/chefmozparking.csv') as csvfile:
reader = csv.reader(csvfile)
next(reader, None)
for row in reader:
dbcur.execute('INSERT INTO parking (Place_ID, Parking) VALUES(%s, "%s")' % tuple(row))
dbcon.commit()
I used the same format (same code function on another datasets with same data structure) on the succeeded ones, but this one just failed with the error written:
dbcur.execute('INSERT INTO parking (Place_ID, Parking) VALUES(%s, "%s")' % tuple(row))
TypeError: not all arguments converted during string formatting
The columns are perfectly matched and the data type cannot be wrong.
I am not very experienced in MySQL, and I have looked through the internet and still cannot find what is wrong here.
Could some one please kindly tell me what is going on?
Thanks!
// If further information is needed, please just let me know.

Matplotlib Python Stealing Screen Focus

my code is taking serial data from an arduino, processing it, and then plotting it. I am using matplotlib as the graphics interface. Every time it 'draws' though it forces attention to it, and a user won't be able to look at anything besides that. What is the best way to get this to stop? (The code works fine except for the stealing focus). I tried to use the matplotlib.use('Agg') method after reading that on another post, but it did not work. (Using a MAC OS X).
The Code shown below is a super simple graph of updating data, with which I have the same problem. I'm not showing my code because it is not copy-pastable without the right inputs
Here is my code:
import matplotlib
from matplotlib import *
from pylab import *
# import math
x=[]
y=[]
def function(iteration):
xValue=iteration#Assigns current x value
yValue=(1./iteration)*34#Assigns current y value
x.extend([xValue]) #adds the current x value to the x list
y.extend([yValue]) #adds the current y value to the y list
clf() #clears the plot
plot(x,y,color='green') #tells the plot what to do
draw() #forces a draw
def main():
for i in range(1,25): #run my function 25 times (24 I think actually)
function(i)
pause(.1)
main()
Have you tried using the interactive mode of matplotlib?
You can switch it on using ion() (see Documentation)
If you use interactive mode you do not need to call draw() but you might need to clear your figures using clf() depending on your desired output
I find that using the Tkagg backend works
import matplotlib
matplotlib.use('Tkagg')
credit to 457290092

Resources