Where could I find an implementation of SVM on Hadoop? - hadoop

I found an implementation in http://code.google.com/p/cascadesvm/.
However, there are no specifications about that. Has anyone tried that? Or where could I find an alternative implementation of SVM on Hadoop?
Thanks a lot~

Looks like someone did this within the Mahout project, not sure if it's been merged into trunk, but this looks like a good place to start:
https://issues.apache.org/jira/browse/MAHOUT-232

You can check it out https://code.google.com/p/cascadesvm/
The training part, and a demo in Matlab version are released.
https://code.google.com/p/cascadesvm/wiki/CascadeSVMMatlabVersion

Related

2D distributions in the HistFactory?

How can I specify in the construction of the HistFactory the signal and background to be 2-dimensional distributions?
I have understood than in RooStats you need to change the TH1 to a TH2.
At the moment to write my model in the json file can I use a ndarray to do something similar?.
Which is the correct way to do this?
I hope someone can help me and thank you in advance.
Currently the best way is to unroll the distributions e.g.
{'data': 2darray.ravel().tolist()}
Since mathematically it doesn't make any difference.
If you want to convert from XML+ROOT this is not yet supported (but could be). If so, please open an issue on GitHub.
Thanks for using pyhf!

pyhf: POI application using formula

I am trying to write a likelihood model in which the POI affects two samples, but while one I have the regular POI*yield, the other I have f(POI)*yield where f is an arbitrary function.
Is there a simple way to implement that in pyhf?
Thanks in advance.
pyhf currently does not support it, but it's something that is on our mind. Can you open an issue on our github with this as a feature request and we can work out how to do it.

What is a good link to examples of enaml being used with traits and matplotlib?

I have done GUI construction but not in Python. From other stack exchange questions and my own investigation. It looks like I want to use enaml and traits for the bulk of this work. Are there any links or references to help me get started.
This is a scientific application integrating matplotlib plots and text boxes and buttons (Very simple I think). I have gone through this example but don't understand it too well http://code.enthought.com/projects/traits/docs/html/tutorials/traits_ui_scientific_app.html
I have also gone through the Enthough Chaco examples and don't get very far. Has somebody built a program that I could run and look at their code? Or is their a repository of examples I am not aware of? I found the enaml examples but the example with matplotlib is basic and does not show me how to connect my algorithms to the plots. Thanks in advance!
Not a full answer, but for additional context:
1) Use https://github.com/nucleic/enaml, along with https://github.com/enthought/traits-enaml
2) Example:
https://github.com/nucleic/enaml/blob/master/examples/widgets/mpl_canvas.enaml

How can i implement LDA using apache mahout?

have a data set like as bellow in CSV format.
FileName,Topic,Tag,Frequency
File-1,Topic -1,Tag-1,10
File-2,Topic -2,Tag-2,10
File-3,Topic -3,Tag-2,10
File-4,Topic -4,Tag-4,10
File-5,Topic -1,Tag-5,10
File-6,Topic -3,Tag-1,10
File-7,Topic -1,Tag-1,10
I need to find a correlation between the tags using mahout LDA(Latent Dirichlet allocation) algorithm. Can anybody please help me to find how to do that using Apache Mahout.
I am also confused that in exactly what input format mahout wants ?
It will be helpful if somebody please share some good stuff for mahout beginner
I might be late in answering. But, Mahout no longer supports LDA for versions above 0.6 . One has to use Cvb instead of lda to accomplish the task of running topic models.
The following links can help You:
https://mahout.apache.org/users/clustering/lda-commandline.html
https://mahout.apache.org/users/clustering/latent-dirichlet-allocation.html

How does MapReduce framework implement the sort phase?

I am interested in the implementation of the MapReduce sort phase; it seems to be very efficient. Could someone provide some references about it please? Thanks!
This points to ReduceTask.java as the place where sort phase is coded. See lines 393-408 in ReduceTask.java. If you need more info, download the entire source and dig into it.
EDITED
"Sort" phase falls under ReduceTask as shown in this figure below from hadoop book. (Page no: 163)

Resources