I would like to know: is it is possible to integrate Apache Hadoop or MapReduceFramework on a grid computing environment?
Certainly it is possible, I have seen it in play.
IBM does it with its Spectrum Symphony grid middleware platform.
For details on the solution read here: https://www.ibm.com/support/knowledgecenter/en/SSZUMP_7.1.2/sym_kc/sym_kc_managing_mapreduce_framework.html
Related
I am having 4 years of experience in .net I would like to learn new technology, what could be best for me learning Hadoop or SalesForce?
There is no answer to this question. Hadoop and SalesForce are completely different technologies. Hadoop is distributed storage and processing that is great for big data. SalesForce is a cloud based CRM tool.
The question to ask yourself, is what do you want next? Are you looking for a steady job? Are you looking for a career in a specific field where one of these technologies would be more helpful? What do you want?
From reading about Akka and my own beginning uses of it, it seems to me that Akka could be used, and more simply, than a Hadoop setup for some applications. You wouldn't have HDFS for use, but you could write an application that would send out pieces of work to different "mappers" and have results sent to a "reducer", and it would be easier to set up than Hadoop in VMs or on hardware, fewer services to set up.
Is this reasonable or are the two technologies used for totally different things?
Yes, totally reasonable. We have built a large scale (1000+ workers) map-reduce system using Akka 2.0. Akka 2.2+ is even better because you can use the clustering and remote deathwatch features instead of having to write that functionality yourself.
See this post to get a feel for how it might work.
Akka cluster is currently marked experimental but the Akka team say it's more or less ready for prime time and people are using it in production. I would be very cautious about going this direction and you may instead want to consider hadoop or using zookeeper with akka and zmq or a message queue for horizontally scaling as well.
im trying to implement an algorithm to find connected components in a large graph(size equivalent to social networks) using Mapreduce. Im not familiar with Hadoop though ive heard it can be used. I need some direction with using it.
Look at Apache Giraph. It is Hadoop-based framework to work with graph algorithms.
I have an academic course "Middleware" which covers different aspects of Distributed Software Systems including introduction to topics like [tag:Distributed File system]. This also involves introduction to hbase,hadoop,mapreduce,hiveql,piglatin.
I want to know, can I have a small project which tries to integrate above technologies. For starters, I am aware of vm provided by cloudera for having a feel of hadoop and playing around using Eclipse.
I was thinking on lines of implementing an application which accepts stream of events as an input, Analyses this and gives an output.
I have both windows/linux on my machine with i7 procoessor and 4Gb Ram.
Please let me know how to get started with everything and any suggestions for simple example application are welcome.
Here is a blog post on analyzing Tweets using Hive/HDFS. And here is a blog post on performing Clickstream analytics using Pig and Hive.
Check some of the Big Data use cases here and try to solve an interesting problem.
I am working on a social networking web based application, which is uses Apache web server and MYSQL server for database with codeigniter MVC frameworks. I don't know how to integrate Hadoop in this application and how to write map- reduce program.
Hadoop and map-reduce have no direct relationship to web applications. You should not integrate Hadoop into a web application as long as you understand web application as something that responds (quickly) to user input (web requests).
Hadoop and map-reduce are very useful for algorithms that run on large datasets in order to transform/extract data/knowledge from those datasets.
While it is true that Hadoop is nowadays mostly used for "offline analytics", it can be useful to web projects as well. For example, to pre-compute recommendations or suggestions that are then provided to the users of a website.
Another case of use is to be able to ETL from multiple sources of data to produce an inverted index for a website (for example, jobs/cars/rentals-like websites with huge amounts of input data).
Always think of Hadoop when you have a "Big Data" problem, not if your website is managing small amounts of data.
Using Hadoop to tackle this sort of problems has some advantages and disadvantages. The obvious advantage is that it makes any sort of batch process (like the examples I mentioned) scale transparently. The disadvantage is that it isn't real-time: you can't use Hadoop to update your website every 5 seconds.
I think Hadoop can have two "classic" usages for the social network style of applications.
First is usage of HBASE to store messaging and other dynamic information. Storage of user profiles in the HBASE also can be considered in order to completely replace MySQL with this kind of NoSQL solution.
Second is usage of Hadoop MapReduce for analysis of Your network. Good example of such analysis is looking for friends suggestions.
Yes it is possible to make web application using apache hadoop as a back-end
You can create web application using apache hive and pig you can write custom mapper and reducers and use as udf , but personal experience it is slow , In case you have very less data , It is better to use other database and do analytics. , I prefer spark is the solution for better reponse time..
By using hadoop analyse your data and take the results into your mysql database. Then use that with your web application.
In your web application you can get required data from Hadoop (like job results) using REST services: https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html