Depth First Search using Map Reduce - algorithm

I have successfully implemented the Shortest Path algorithm in Hadoop Map Reduce(Breath First Search). However I have a question that:
Is it possible to do graph traversal "Depth First Search" using Hadoop map reduce ?
Any Links..?

The nature of the Depth First Search makes it inappropriate for map reduce jobs. Because you only follow one strict path to the end before forking into another one. That lead to the fact that you can't use the scalability provided by hadoop properly. I'm not aware of a fine working implementation and I'm pretty sure you won't find one which uses the MapReduce paradigm in a good way.
If you try to implement graph algorithms in hadoop on your own you might want to have a look at some useful frameworks like Apache Giraph, xrime or Pegasus. xrime also contains a shortest path implementation which might be interesting for you.

Related

Elbow method for determining number of clusters on Mahout

I'm using Mahout for clustering and I have implemented elbow method for determining number of clusters, so that I wouldn't have to specify it.
I have tried this on one machine, but now I'm having doubts when it comes to cluster of computers.
I have planned to use Oozie to simulate looping (running clustering algorithm each time incrementing the number of clusters by one). I read that Oozie is used for DAGs, but I see no other way of doing this.
The question is, does this look like a sound approach? If not, any alternatives?

Hadoop for password cracking

Hi I came across this article and it made me wonder, how easy it would be for a hacker to crack passwords. What do you think guys???
If you want to try out several permutations in a brute force manner, I don't think that using hadoop would give you any benefit. Hadoop is not something that fits into all uses cases and would not every time perform well.
Computing permutations can be done in batch.. just set different start and end params for each machine. The overhead involved in setting a job, movement of data across nodes, job cleanup can surely be saved. I have seen that running different processes over 5 nodes be pre-dividing the load equally performed pretty well as compared to map-reduce. Offcourse, I dont mean that map-reduce is bad.. its just that the scenario wasnt right fit for getting the job done.
I found this Recursive Algorithm on Distributed Systems an interesting way to run recursive algorithms on distributed system. Now a permutation and combination algorithms can be used to do some interesting stuff

Is there any hadoop implement of Louvain method?

This is louvain method to find community in social graph.
https://sites.google.com/site/findcommunities/
I want to run it on a big graph.
If you are not stuck on Hadoop, I saw this implementation for Apach Spark.
https://github.com/Sotera/spark-distributed-louvain-modularity
I don't know of an implementation of this clustering method, which looks to be based on modularity. The main source of clustering algorithms in the Hadoop ecosystem is in Mahout.
Take a look here: https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms
Perhaps one of the clustering algorithms listed would work or provide the basis for your own implementation.

A good example in hadoop that needs iteration

I am currently implement a parallel-for on hadoop to iterate the mapper a number of times as specify by the user. Can someone help me with a useful example that I can use my implementation for testing. Some application in Hadoop that needs iteration of the Mapper function.
Thank you
The simplest one is implementing Apriori algorithm which is used to find the frequent itemset.
What do you exactly mean by "iteration of the mapper"? I have an example of starting a job recursively (on the input of the last job).
Have a look here, it explains a simple graph mindist-search / graph exploration algorithm: http://codingwiththomas.blogspot.com/2011/04/graph-exploration-with-hadoop-mapreduce.html
A bit more generic version is this here:
http://codingwiththomas.blogspot.com/2011/04/controlling-hadoop-job-recursion.html
There are plenty of examples in data mining. You could try one of the clustering algorithms, for example.

Sort and shuffle optimization in Hadoop MapReduce

I'm looking for a research/implementation based project on Hadoop and I came across the list posted on the wiki page - http://wiki.apache.org/hadoop/ProjectSuggestions. But, this page was last updated in September, 2009. So, I'm not sure if some of these ideas have already been implemented or not. I was particularly interested in "Sort and Shuffle optimization in the MR framework" which talks about "combining the results of several maps on rack or node before the shuffle. This can reduce seek work and intermediate storage".
Has anyone tried this before? Is this implemented in the current version of Hadoop?
There is the combiner functionality (as described under the "Combine" section of http://wiki.apache.org/hadoop/HadoopMapReduce), which is more-or-less an in-memory shuffle. But I believe that the combiner only aggregates key-value pairs for a single map job, not all the pairs for a given node or rack.
The project description is aimed "optimization".
This feature is already present in the current Hadoop-MapReduce and it can probably run in a lot less time.
Sounds like a valuable enhancement to me.
I think it is very challenging task. In my understanding the idea is to make a computation tree instead of "flat" map-reduce.The good example of it is Google's Dremel engine (called BigQuey now). I would suggest to read this paper: http://sergey.melnix.com/pub/melnik_VLDB10.pdf
If you interesting in this kind of architecture - you can also take a look on the open source clone of this technology - Open Dremel.
http://code.google.com/p/dremel/

Resources