How to implement matrix factorization recommender using ALS in Hadoop? [closed] - hadoop

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am reading the ALS algorithm paper for collaborative filtering but not sure how to implement the algorithm in Hadoop.
Does anyone can shed some light? Thanks a lot.

I think the best description how to implement the ALS by yourself in distributed environment You will find in this web article - https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html. The implementation there is for Apache Flink, but it shows everything: from basic understanding, naive approach, then using broadcasted matrices and blocked implementation.
For already implemented ALS solution, one I would recommend is in a Spark MLlib - https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html. This implementation can be directly run on Your YARN cluster and collect the data from HDFS/Hive.
If You need to keeping Your matrix factorization latent-model up-to-date nearly online or providing online recommendation for anonymous users, then You should take a look at new Oryx Project - https://github.com/OryxProject/oryx. Which is actually called Oryx 2, this is a reincarnation of the previous Oryx but in a lambda-architecture. Good pice of nice recommender engine where You should find interesting parts for Your research.
Last but not least, I would advise doing simple PoC-implementation of ALS for single machine. Then go for a distributed implementation.

Related

optimal sequence to be followed for studying topics like dp [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am a novice to algorithms and data structures. I recently started participating on codeforces and spoj etc.for solving questions i need to study dp,greedy algorithms,graph algorithms,data structures.what should be my strategy for studying or rather sequence and what data structures i need to know for competitive programming?
All in all there shouldnt be an "optimal" sequence. Its all about understanding the topic. Since not two people can learn with the same speed, there shouldnt be something like an "optimal" sequence. But its good to learn basic approaches of each topic.
There are alot Tutorials out there, which explains the most fundamental thing in any topic. E.g. Youtube covers most graph problems. Even DP and so can be found there. Especially on Topcoder Tutorials there is alot you can learn.
On the other hand you will learn nearly nothing, if u dont have to think for yourself. So solving such puzzles is a must. I would recommend this site (especially for dp). Just check the "problem set" link on the site and look for dynamic programming.
I recommend you this book: Competitive Progamming, by Halim. Is very complete, and newbie-compatible.

What are the good or most efficient algorithm used in collaborative filtering? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm currently working on a recommendation system that uses collaborative filtering. And now I'm researching for a good/efficient algorithm that is geared towards movie recommendation. I'm confused because there are many algorithm like the Pearson Correlation Coefficient. And so I don't know what to use/implement.
Can you give me a good/efficient algorithm? or a site that gives a good example or simulation to the algorithm?
Thanks for the help!
Give this paper about the netflix prize a read Netflix Prize. Usually the 'state of the art' is some variant of matrix factorisation such as OrdRec. Check out the Funk Blog FunkSVD which will give you a nice simple explanation about implementing the starts of a matrix factorization technique for CF.
Matrix factorisation (An example in Python) is a good starting point. Furthermore, I'd recommend Ed Chen's Blog and The Mining of Massive Datasets as good introductions to the variety of methods used to solve this type of problem. The interesting thing for me having worked with this type of data is the amount of sparseness, there are of course practical limits, papers by Emmanual Candes seem to shed light on this area, excellent advanced work.

Is there any online judge for data mining [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
There are many Online Judges (OJ) for ACM/ICPC questions. And another Online Judge for Interview questions, named Leetcode (http://leetcode.com).
I think these OJs are very useful for us to learn algorithms. Recently, I am going to learn data mining algorithms. Is there any OJ for data mining questions?
Thank you very much.
There is MLcomp, where you can submit an algorithm and it will run it on a number of data sets to judge how well it is doing.
Plus, there is Kaggle, which hosts various classification competitions.
And of course you can do classes at Cousera. These are pretty much low level, but in order to get submission points you need to reproduce the known performance.
In particular the first also allows you to run several standard algorithms such as naive bayes and SVM and see how well they did. Obviously, your own implementation should perform similar then.
Unfortunately, both are pretty much focused on machine learning (i.e. classification and regression). There is very little in the unsupervised domain, clustering and outlier detection. On unlabeled data, things get too hard even to evaluate locally, so doing any kind of online judging is pretty much unsolved. What you can do is largely a one-class classification, or you just strip labels before running the algorithm.

Algorithm vs Code [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I came across the declaration in a software best practices guide that algorithm and code shouldn't get mixed up. I'm not sure what is meant by this? As far as I understand, code is the implementation of the algorithm, isn't is? So, what exactly is meant by this statement? and why it is considered as a good practice?
Thank You!
The context in which the author mentioned would be clearer if you had pasted the surrounding lines.
Though what it would mean to me is, an algorithm is just a clear step-by-step logic that you would use to implement. You would leave out the finer implementation details like selection of the right data structure and other implementation details while you write/design the algorithm.
A good explanation can be found here
An algorithm is a series of steps for solving a problem, completing a task or performing a calculation. Algorithms are usually executed by computer programs but the term can also apply to steps in domains such as mathematics for human problem solving.
Code is a series of steps that machines can execute. In many cases, code is composed in a high level language that is then automatically translated into instructions that machines understand.

Whats the importance of data structures and algorithms for programming? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Possible Duplicate:
Why should I learn algorithms?
Hello, I'm a curious beginner and I don't understand how algorithms and data structures are useful in programming. Are they crucial for being a good programmer? Why should I learn them and how they actually help me when writing code?
Thanks so much!
They help you write efficient code and solve problems in optimal or near-optimal ways. Without them, you will be reinventing the wheel - not always successfully.
Also, they help you structure your code, so that it can be maintained more easily by encouraging a better design / implementation.
Algorithms and data structures are the basic tools of a programmer. They are as essential as a hammer (or nail gun) to a house framer. They are the tools that solve problems so you don't have to reinvent the solution.
You should understand what they are, why and how they work, and what their shortcomings are. Knowing this will save you a huge amount of time that could be wasted trying to solve a problem that has a solution.
You can get by programming without being proficient in a particular language, but you cannot program without the knowledge of data structures. Data structures are more of a Computer Science obsession. Each problem will have its own ideal data structure that fits naturally to it and to manipualte the data in the structure you will need algorithms.

Resources