Stanbol and hadoop integration - hadoop

I'm new to Stanbol. Can it be run on Hadoop? I cannot find an example of this.
I assume it can, but before diving into it I want to be sure.
Thanks!

Related

running word-counter example with hadoop and hbase

word-counter example with hbase and hadoop
I am new to hadoop and hbase, i am going to implement a real example on a data set and understand the logic behind them.
I have already install hadoop and hbase on my system (ubuntu 17.04).
hadoop-2.8.0
hbase-1.3.1
is there any step-by-step tutorial for implementing word-counter example?
(word-counter example or any basic example exist)
There is comprehensive tutorial provided in HBase reference guide:
http://hbase.apache.org/book.html#mapreduce.example
Note, HBase provides alternative mechanism called Cascading which is similar to Map-Reduce, but allow to write code in simplified way (it's described in ref. guide too).

Hadoop installation on Amazon cloud

i am new to Hadoop ,i likes to go in hadoop administration line so studied basics of hadoop and tried to install hadoop in pseudo distribution mode and installed successfully and run some basic examples also, now i need to improve me further,so i need to try a way to learn hadoop installation and configuration in real time so decided to go for Amazon micro instance ,can any one please tell how to install and configure hadoop in Amazon cloud.
Thanks in Advance.
I have tried this personally and you will not really be able to use hadoop on a single micro instance due to memory restrictions. IMHO you should atleast try a medium instance to run hadoop or better yet use their elastic-mapreduce api which is a modified version of hadoop. You can run a 3 node cluster for around 00.25 cents an hour. If you really want to learn big data this is the way I went.
You should check out their documentation here
http://aws.amazon.com/documentation/elasticmapreduce/

running a non mapreduce program in hadoop

I have a question.. I have a program write in Netbeans. the program read data from cassandra and write the result into it. My program is not MapReduce at all.I execute the program and make a .jar file from it. now, I want to know if I can execute it in Hadoop?
actually, I want to know can I run a non-MapReduce Program in Hadoop?
You could architect this program to run on Hadoop v2 as a Yarn application. This would require re-architecting your application to fit the Yarn paradigm. An example of how to do this is given here: Writing App Framework on Yarn
This is not a simple exercise. Also, if you are interested in using Hadoop, I would consider simply re-writing your application to use HBase (another No-SQL Columnar database competitor to Cassandra) which is written specifically for Hadoop. It translates your query requests to MapReduce calls automatically.
This question is ages long but has never been answered. Anyhow, two projects are looking into this issue:
Apache Slider (incubating): http://slider.incubator.apache.org/
and
Apache Myriad (incubating): http://myriad.incubator.apache.org/
Slider is mainly sponsored by Hortonworks while Myriad is a MapR / Mesosphere project with large assistance from PayPal.

Best practices for using Oozie for Hadoop

I have been using Hadoop quite a while now. After some time I realized I need to chain Hadoop jobs, and have some type of workflow. I decided to use Oozie , but couldn't find much of information about best practices. I would like to hear it from more experienced folks.
Best Regards
The best way to learn oozie is to download the examples tar file that comes with the distribution and run each of them. It has an example for mapreduce, pig , streaming workflow as well as sample coordinator xmls.
First run the normal workflows and once you debug that , move to running the workflows with coordinator so that you can take it step by step. Lastly one best practice would be to make most of your variables in workflow and coordinator be to configurable and supplied through a component.properties file so that you don't have touch the xml often.
http://yahoo.github.com/oozie/releases/3.1.0/DG_Examples.html
There are documents about Oozie on github and apache.
https://github.com/yahoo/oozie/wiki
http://yahoo.github.com/oozie/releases/3.1.0/DG_Examples.html
http://incubator.apache.org/oozie/index.html
Apache document is being updated and should be live soon.

Any tested Frameworks/Solutions similar to Apache Hadoop?

I am interested in the Apache Hadoop project, but i would like to know if any other tested (please mind the 'tested') projects/frameworks are out there.
Appreciate any information/links to projects similar to Apache Hadoop and any comments on the Apache Hadoop project from anyone that has used it.
Regards,
As mentioned in an answer to this question:
https://stackoverflow.com/questions/2168558/is-there-anything-like-hadoop-in-c
MongoDB might be something you could look at. Its a scalable database which allows MapReduce algorithms to be run against it.
There are indeed open-source projects utilizing and funding on Hadoop.
See Apache Mahout for data mining: http://lucene.apache.org/mahout/
And are you aware of the other MR implementations available?
http://en.wikipedia.org/wiki/MapReduce#Implementations
Maybe. But none of them will have anywhere near the testing a real world experience that hadoop does. Companies like facebook and yahoo are paying to scale hadoop and I know of no similar open source projects that are really worth looking at.
A possible way is to use org.apache.hadoop.hbase.MiniDFSCluster and org.apache.hadoop.mapred.MiniMRCluster, which are used in testing hadoop itself.
What they do is to launch a small cluster locally. To test your program, make hdfs-site.xml stuffs pointing to local cluster, and add them to your classpath. And this local cluster is just like another cluster but smaller. You can reference hadoop/src/test/*-site.xml as templates.
For more example, take a look at hadoop/src/test/.
There is a Hadoop-like framework, built over Hadoop, giving importance to prioritized execution of iterative algorithms.
It is tested. I have run The WordCount example on it. It is very very similar to Hadoop (especially the installation)
You can find the paper here :
http://rio.ecs.umass.edu/mnilpub/papers/socc11-zhang.pdf
and the code here
https://code.google.com/p/priter/
Hope this helps
A

Resources