To whom may it concern,
Is it possible to plot an exploratory variable versus the target in h2o? I want to know whether it is possible to carry out basic data exploration in h2o, or whether it is not designed for that.
Many thanks in advance,
Kere
the main plotting functionality for an H2O frame is for histograms (hist() in python and h2o.hist() in R).
Within Flow you can do basic data exploration if you import your dataframe, then click on inspect and then, next to the hyperlinked columns, you'll see a plot button which will let you get bar charts of counts for example and other plot types.
You can also easily convert single columns you want to plot into a pandas or R dataframe with
H2OFrame.as_data_frame() in python
as.data.frame.H2OFrame in R and then use the native python and R plotting methods
Related
Okey, I am so lost in here that I cannot make even a concrete question, so I shall be very general and hope that someone can point me in the right direction.
I am producing some scientific plots in Julia with PyPlot, and I am very satisfied with the results ( adequate and clear estetics and I can handle the syntax to create very complex images ). But I need to produce a so called "heatmap" ( a 2D bitmap image ) in which the user should be able to select a set of points of the image with the mouse. The selection, which will be confined to a discrete grid, shall be stored in some Iterable, an Array or similar. I have no idea where to start, if at the same PyPlot library or using something like Gtk or GtkReact (this last one I couldn't get the examples running). Can I be pointed to the right direction?
PyPlot is an interface to Python's matplotlib which makes static plots.
Use Plotly inside a Jupyter notebook instead: https://plot.ly/julia/heatmaps/,
I am using H2o's Auto ML package and would like to know if it is possible to get a single AUC, Confusion Matrix and ROC curve for all the methods combined. For instance I have AUC values for the individual models GLM, Stacked Ensemble, deep learning etc. Can you get these three values for all the methods combined? The goal is to be able to compare the Auto ML package to other similar packages.
I have created a simple line graph with data from a mySQL database using PHP to return the data in JSON format.
https://gist.github.com/5fc4cd5f41a6ddf2df23
I would like to simulate "live" updating something similar to this example, but less complicated:
http://bl.ocks.org/2657838
I've been searching for examples on how to achieve this simply as new to D3 - to no avail.
I've looked at Mike Bostock's http://bost.ocks.org/mike/path/ path transitions, but not sure how to implement this using json data.
Can anyone help with either an example or some direction on how I could accomplish this?
Doing that kind of line transformations is tricky in SVG because moving large number of points just a little and rerendering the complete line can hurt performance.
For the case when interactivity with each data point is not paramount and the time series can grow to contain arbitrary number of points, consider using Cubism. It is a library based on d3 but meant specially for visualizing time-series data efficiently. To prevent rerendings of SVG, it draws the points on a canvas, allowing for cheap pixel by pixel transitions as new data arrives.
I extract 2 edge features (Hog feature and sobel operator) from a single image.
How can i create an image feature dataset in Scikit-learn python, like iris_dataset ?
In the library there are csv files which represent datasets. A csv file containing only numbers. How were generate these numbers? feature extraction?
unfortunately i saw only a java tutorial here http://www.coccidia.icb.usp.br/coccimorph/tutorials/Tutorial-2-Creating-..., at 5 point talk about generating the training matrices (average and co-variance matrices)?
There is any function in Scikit who generate these training arrays?
You don't need to wrap your data as a CSV file to load it as a dataset. scikit-learn models have a fit method that expects:
as first argument that is a regular numpy array (or scipy.sparse matrices) with shape (n_samples, n_features) (most often with dtype=numpy.float64) to encode the features vector for each sample in the training set,
and for supervised classification models, a second argument with shape (n_samples,) and dtype=numpy.int32 to encode the class label assignments encoded as integer values for each sample of the training set.
If you don't know the basic numpy datastructure and what shape and dtype mean, I stongly advise you to have a look at a tutorial such as SciPy Lecture Notes.
Edit: If you really need to read / write numerical CSV to / from numpy arrays, you can use numpy.loadtxt / numpy.savetxt
How can I work with my own dataset in scikit-learn?
Scikit Tutorial always take as example to load his dataset (digit dataset, flower dataset...)
http://scikit-learn.org/stable/datasets/index.html
ie: from sklearn.datasets import load_iris
I have my images and I have no idea how create new one.
Particularly, for starting, i use this example i found (i use library opencv):
img =cv2.imread('telamone.jpg')
# Convert them to grayscale
imgg =cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# SURF extraction
surf = cv2.SURF()
kp, descritors = surf.detect(imgg,None,useProvidedKeypoints = False)
# Setting up samples and responses for kNN
samples = np.array(descritors)
responses = np.arange(len(kp),dtype = np.float32)
I would like to extract features of a set of images, in a way useful to implement a machine learning algorithm!
You would first need to clearly define what you are trying to achieve: "extract feature to a set of images, in a way useful to implement a machine learning algorithm!" is much too vague to give you any guidance.
Are you trying to do:
image classification of the picture as a whole (e.g. indoor scene vs outdoor scene)?
object recognition (e.g. recognizing several instances of the same object in different pictures) inside sub-parts of a set of pictures, maybe using a scan procedures with windows of various sizes?
object detection and class-based categorization (e.g. finding all occurrences of cars or pedestrians in pictures and a bounding box around each occurrence of instances of those classes)?
full picture semantic parsing a.k.a. segmentation of the pixels + class categorization of each segment (build, road, people, trees)...
Each of those tasks will require different pipelines (feature extraction + machine learning models combo).
You should probably start by reading a book on the subject, for instance: http://szeliski.org/Book/
Also as a side note, stackoverflow is probably not the best place to ask such open ended questions.