generating image features dataset in scikit-learn - csv file - image

I extract 2 edge features (Hog feature and sobel operator) from a single image.
How can i create an image feature dataset in Scikit-learn python, like iris_dataset ?
In the library there are csv files which represent datasets. A csv file containing only numbers. How were generate these numbers? feature extraction?
unfortunately i saw only a java tutorial here http://www.coccidia.icb.usp.br/coccimorph/tutorials/Tutorial-2-Creating-..., at 5 point talk about generating the training matrices (average and co-variance matrices)?
There is any function in Scikit who generate these training arrays?

You don't need to wrap your data as a CSV file to load it as a dataset. scikit-learn models have a fit method that expects:
as first argument that is a regular numpy array (or scipy.sparse matrices) with shape (n_samples, n_features) (most often with dtype=numpy.float64) to encode the features vector for each sample in the training set,
and for supervised classification models, a second argument with shape (n_samples,) and dtype=numpy.int32 to encode the class label assignments encoded as integer values for each sample of the training set.
If you don't know the basic numpy datastructure and what shape and dtype mean, I stongly advise you to have a look at a tutorial such as SciPy Lecture Notes.
Edit: If you really need to read / write numerical CSV to / from numpy arrays, you can use numpy.loadtxt / numpy.savetxt

Related

How to map features from two different data using regressor for classification?

I am trying to build a Keras model to implement to approach explained in this paper.
Context of my implementation:
I have two different kinds of data representing the same set of classes(labels) that needs to be classified. The 1st kind is Image data, and the second kind is EEG data (a time series sequence).
I know that to classify image data we can use CNN models like this:
model.add(Conv2D(filters=256, kernel_size=(11,11), strides=(1,1), padding='valid'))
model.add(Activation('relu'))
model.add(Dense(1000))
model.add(Activation('relu'))
model.add(Dropout(0.4))
# Batch Normalisation
model.add(BatchNormalization())
# Output Layer
model.add(Dense(40))
model.add(Activation('softmax'))
And to classify sequence data we can use LSTM models like this:
model.add(LSTM(units = 50, return_sequences = True))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(40, activation='softmax'))
But the approach of the paper above shows that EEG feature vectors can be mapped with image vectors through regression like this:
The first approach is to train a CNN to map images to corresponding
EEG feature vectors. Typically, the first layers of CNN attempt to
learn the general (global) features of the images, which are common
between many tasks, thus we initialize the weights of these layers
using pre-trained models, and then learn the weights of the last
layers from scratch in an end-to-end setting. In particular, we used
the pre-trained AlexNet CNN, and modified it by replacing the
softmax classification layer with a regression layer (containing as
many neurons as the dimensionality of the EEG feature vectors),
using Euclidean loss as the objective function.
The second approach consists of extracting image features using
pre-trained CNN models and then employ regression methods to map
image features to EEG feature vectors. We used our fine-tuned
AlexNet as feature extractors by
reading the output of the last fully connected layer, and then
applied several regression methods (namely, k-NN regression, ridge
regression, random forest regression) to obtain the predicted
feature vectors
I am not able to comprehend how to code the above two approaches. I have never used a regressor for feature mapping and then do classification. Any leads on this are much appreciated.
In my understanding the training data consists of (eeg_signal,image,class_label) triplets.
Train the LSTM model with input=eeg_signal, output=class_label. Loss is crossentropy.
Peel off the last layer of the LSTM model. Let's say the pre-last layer's output is a vector of size 20. Let's call it eeg_representation.
Run this truncated model on all your eeg_signal inputs, save the output of eeg_representation. You will get a tensor of [batch, 20]
Take that AlexNet mentioned in the paper (or any other image classifier), peel off the last layer. Let's say the pre-last layer's output is a vector of size 30. Let's call it image_representation.
Stich a linear layer to the end of the previous layer. This layer will convert image_representation to eeg_representation. It has 20 x 30 weight.
Train the stiched model on (image, eeg_representation) pairs. Loss is the Euclidean distance.
And now the fun part: Stich together model trained in step 7. and the peeled off part of model trained in step 1. If you input an image, you will get class predictions.
This sound like not a big deal (because we do image classification all the time), but if this is really working, it means that this is a "prediction that is running through our brains" :)
Thank you bringing up this question and linking the paper.
I feel I just repeated what's in your question and in the the paper.
I would be beneficial to have some toy dataset to be able to provide code examples.
Here's a Tensorflow tutorial on how to "peel off" the last layer of a pretrained image classification model.

Exploration graphics in h2o

To whom may it concern,
Is it possible to plot an exploratory variable versus the target in h2o? I want to know whether it is possible to carry out basic data exploration in h2o, or whether it is not designed for that.
Many thanks in advance,
Kere
the main plotting functionality for an H2O frame is for histograms (hist() in python and h2o.hist() in R).
Within Flow you can do basic data exploration if you import your dataframe, then click on inspect and then, next to the hyperlinked columns, you'll see a plot button which will let you get bar charts of counts for example and other plot types.
You can also easily convert single columns you want to plot into a pandas or R dataframe with
H2OFrame.as_data_frame() in python
as.data.frame.H2OFrame in R and then use the native python and R plotting methods

How to model a time dependant matrix with RNN?

I am working on video classification. So let's say, I sub-sample the video temporally. So for each sub-sample, I engineer some features out of it. Let's say I can represent these features with a two dimensional matrix.
So, these values of the matrix are time-Dependant. So for each video, I have a set of matrices, which are time dependent.
So I need to use a RNN to model these time dependent matrix values and represent the video. This representation should classify the video in to classes. In other words, the RNN should be able to classify the video in to classes, depending on these time dependent matrix values.
I need to know, Is this possible with RNNs? would it be a good practice? If so, what are the guide lines anyone could provide me. What is a good library to use? What would be good tutorials? Thanks.
You flatten the images of the video and use them as an element of the sequence.
You will have to put a convnet under the RNN, most likely, to get reasonable results.
As such, you feed the image to a convnet, then flatten the activation map and feed it to the RNN cell.
https://stackoverflow.com/a/36992625/447599

how do i convert photographs to tensors

I am a neophyte neural network user trying to get to grips with TensorFlow. I have used the MNIST dataset as a test, and would now like to use real world data.
Can anyone point me to a "Howto" or paper or source which tells me how to go about converting digital photographs in files, (jpeg, png, gif, wmf), into a tensors ready for import into TensorFlow please?
Cheers!
You can use the TensorFlow image functions to load images and convert them into tensors. After loading the images, you will likely want to look at tf.image.resize_bilinear to resize the images to standard sizes.
The standard way to load data into Tensorflow is to use a TFRecords file.
Another approach is to convert whatever data you have into a supported format. This approach makes it easier to mix and match data sets and network architectures. The recommended format for TensorFlow is a TFRecords file containing tf.train.Example protocol buffers.
-Tensorflow Documentation
Basically TFRecord is a binary representation of your data or images along with its labels, file names, and other information. Its main advantages are to allow you to stream data into the model efficiently by using Tensorflow's threading and to increase flexibility between different models.
You can use this script to generate your own TFRecord files.
Additionally, you can read on how to use the script here.

Matrix calculations in rails app

In my rails app I need to do the following:
Create a form for inputting the data for a 4x4 matrix, store this data in a model
Use a ruby matrix for calculating the cholesky decomposition of this matrix, and then displaying the resulting decomposed matrix in a view
From what I understand the form data is stored in a 1-dimensional array, I need the data stored in a 4x4 array but I haven't seen any examples of this. What is the best way to do this?
You may want to look at the following tools:
standard Matrix class
ruby gsl gem
extendmatrix gem

Resources