H2O Frame constructed from multiple parts - h2o

My training frame is rather large, so I'd like to import them in a way similar to S3's multipart upload. Is the correct way to do this to manually import_file for all the parts, then call rbind on all of these parts? Or is there a more correct way or built-in of doing this?

the function h2o.import_file can handle import from multiple files on it's own. This works both in Python and R.
Python:
data = h2o.import_file(["/home/some/path/to/airliens/airline1.csv",
"/home/some/path/to/airliens/airline2.csv"])
R:
data = h2o.importFile(c("/home/some/path/to/airliens/airline1.csv",
"/home/some/path/to/airliens/airline2.csv"))

Related

How to save a list of matrices in a file on Sage?

I'm currently working on lattices. To solve some problems, I have to generate a big number of matrices of the same basis. This takes a lot of time. For example, to generate 10'000 bases, I have to launch the code when I go to bed and retrieve the list of basis in the morning. The problem is that I can't do it every day.
So I'd like to save my list of 1000 matrices once for all in a text file. The problem is that when I do it, I get strings.
The matrix list is named BB.
with open('yourfile.csv', 'w') as f1:
writefile = csv.writer(f1)
writefile.writerows(BB)
import csv
with open('yourfile.csv','rU') as f1:
data=list( csv.reader(f1) )
Do you know how I could find a way to save the matrix list and then, directly recover a list? I'm working on the Sage notebook.
The correct and easiest ways to save and load Sagemath objects via a file are
save(your_list_of_matrix, 'filename.sobj')
your_list_of_matrix = load('filename.sobj')
Saving Sagemath objects to CSV will need to convert the values into strings and will lose precision.
Refer to the official document for more detail.

StratifiedKfold in ImageDataGenerator flow_from_directory

Is there any solution to implement CrossValidation directly from ImageDataGenerator?
I have a folder containing all my training images (and masks since I'm performing segmentation task).
I want to perform StratifiedKfold directly from ImageDatagenerator flow_from_directory.
Any ideas?

Difference between Tensorfloat and ImageFeatureValue

When using the Windows-Machine-Learning library, the input and output to the onnx models is often either TensorFloat or ImageFeatureValue format.
My question: What is the difference between these? It seems like I am able to change the form of the input in the automatically created model.cs file after onnx import (for body pose detection) from TensorFloat to ImageFeatureValue and the code still runs. This makes it e.g. easier to work with videoframes, since I can then create my input via ImageFeatureValue.CreateFromVideoFrame(frame).
Is there a reason why this might lead to problems and what are the differences between these when using videoframes as input, I don't see it from the documentation? Or why does the model.cs script create a TensorFloat instead of an ImageFeatureValue in the first place anyway if the input is a videoframe?
Found the answer here.
If Windows ML does not support your model's color format or pixel range, then you can implement conversions and tensorization. You'll create an NCHW four-dimensional tensor for 32-bit floats for your input value. See the Custom Tensorization Sample for an example of how to do this.

Nvidia Digits accuracy and loss plots data

I trained my model in Nvidia Digits 5 and I would now like to extract the accuracy and loss plots that were generated during training for a report. Is this data saved somewhere so that it would possible to extract the data for these plots so that I could plot it in Python and perhaps ultimately modify the plots to compare different models etc?
The best solution I have found is to either look at the HTML file or to scan the text file caffe_output.log that is produced by Caffe. The text file is usually stored in /var/digits/jobs/insert_your_job_id/ but you can also just run on linux systems:
locate caffe_output.log
Go to your DIGITS job folder and locate your job's subfolder. Inside you'll find a file status.pickle, which is a pickled object containing all your job's information.
You can load it in python like so:
import digits
import pickle
data = pickle.load(open('status.pickle','rb'))
This object is somewhat generic and may contain multiple tasks. For a typical classification task it will likely be just one, but you will still need to access it via data.tasks[0]. From there you can grab the plots:
data.tasks[0].combined_graph_data()
which returns a somewhat convoluted dict (unfortunately - since your network can produce many accuracy/loss outputs, as well as even custom ones). It contains everything you need though - I managed to plot accuracy with:
plt.plot( data.tasks[0].combined_graph_data()['columns'][2][1:] )
but it's likely that you'll have to write a bit of custom code. As always, dir() is your friend.

import multiple images ans store them in list using mathematica

I am using Mathematica to enhance and thin images. I used it for single image, now i want to use it for multiple images. so I have to import 6 images, do the thining and store them in a list for example. Can any one show me how to do that??
The images will be used for biometrics identification system.
Since you want a list as a result you might think of using either Table or Map. Either of those can do n things, one after another, and put the result into your final list.
Since you didn't show the steps you used for processing a single list it is a little difficult to tell you exactly how to wrap Table or Map around this.
If you have a list of image file names then you could use Map to process those names one at a time. The processing could either be a compound function to Import the image and then enhance and thin and the output of that function would be a single processed image. Map would then do the repetition over all the names.
Table might work in a similar way, but you use each iteration to get the file name, do the processing and store the result in your desired list.

Resources