Outliers in Data mining - cluster-computing

I am learning the algorithm about outliers, could you help me answer these questions? I feel confused.
Thank you!

Related

What are the good or most efficient algorithm used in collaborative filtering? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm currently working on a recommendation system that uses collaborative filtering. And now I'm researching for a good/efficient algorithm that is geared towards movie recommendation. I'm confused because there are many algorithm like the Pearson Correlation Coefficient. And so I don't know what to use/implement.
Can you give me a good/efficient algorithm? or a site that gives a good example or simulation to the algorithm?
Thanks for the help!
Give this paper about the netflix prize a read Netflix Prize. Usually the 'state of the art' is some variant of matrix factorisation such as OrdRec. Check out the Funk Blog FunkSVD which will give you a nice simple explanation about implementing the starts of a matrix factorization technique for CF.
Matrix factorisation (An example in Python) is a good starting point. Furthermore, I'd recommend Ed Chen's Blog and The Mining of Massive Datasets as good introductions to the variety of methods used to solve this type of problem. The interesting thing for me having worked with this type of data is the amount of sparseness, there are of course practical limits, papers by Emmanual Candes seem to shed light on this area, excellent advanced work.

Introduction to algorithms [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
How does reading the book Introduction to algorithms(CLRS) help me? How's learning this course connected with the other areas of theoretical computer science?(I mean intutions and insights if any that I could get).
I'm new to this concepts.I am getting bored of the sorting algorithms that I was learning in the course right now.I wanted to have a broader view while learning the course.It would be very helpful to me, if you could provide me with a structure on how things go.Thanks in advance! :)
Algorithms are the practical application of theoretical knowledge in computer science; they're the most theoretical part of the engineering side of computer science, so to speak. Without the study of algorithms, anyone in software would either be an amateur - because computation is useless without efficiency - or wouldn't produce much of anything since he would have to focus on solving problems all the time instead of actually writing implementations that are known to solve problems.
From a didactic point of view, algorithms are a distillation of theoretical knowledge into a precise expression. You may understand what graph traversal is and how strongly connected components should be contracted; if you try to give a succinct form to those thoughts, the best way to do it is writing down an algorithm that does what you want.
On a formal level, they help us understand the concepts we grapple with; when we claim some problem can be solved in this or that complexity, we need an algorithm to prove it. For example, if you read that sorting is in O(n log n) in the general case, you can just go ahead and believe your professor; maybe you even have an intuition why that might be true. But to actually prove it, you need an algorithm that solves sorting for which you then prove that it runs in O(n log n) in the general case. So on the theoretical level, algorithms help us classify problems according to their complexity (read: "difficulty").
I'm not really sure that this question has a specific answer and that this is the right place to ask it, but it is still a useful one. Aside from trusting the people that have spent much of their lives guiding people to learn a skill set they will use for the rest of their lives (your professors), I have always looked at algorithm design as a way to learn how to think more clearly. This is something I believe everyone can learn from.
Also, when I was a student there were many times I was frustrated with what I was being asked to learn (believing that it is a waste). Virtually all of which I have found to be very useful and use frequently. Thinking back, I wish I had given some of my professors much more credit then I did when I was in school.

Resource for learning Algorithms for non-CS/Math degrees [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I've been asked to recommend a resource (on-line, book or tutorial) to learn Algorithms (in the sense of of the MIT Intro to Algorithms) for non-CS or Math majors. Obviously the MIT book is way too involved and some of the lighter treatments (like OReilly's Algorithms in a Nutshell) still seem as if you would need to have some background in algorithmic analysis. Is there resource that presents the material in a way that developers who do not have a background in theoretical computer science will find useful?
I think the best way to learn algorithms are through the various competition sites.
USACO - my personal favorite, as it gives a clear path through the material
TopCoder - already mentioned
Sphere Online Judge - great if you want to work in another language other than C/C++/Java
As far as books, the best single intro I've seen for the non-math specialist is Data Structures and Algorithms. It takes you through an algorithm line by line and shows you how it decomposes mathematically, something CLRS's otherwise excellent analysis section is a little less clear on.
Skiena's Algorithm Design Manual is also excellent, as is his Programming Challenges, which is essentially a tutorial through the Valladolid Online Judge.
Honestly, though, I think the single most helpful thing a beginner can do is to implement the various algorithms -- merge sort, say, followed by Quicksort -- and time them against variously sized inputs. Create a spreadsheet with a graph that shows their growth over time. Very few non-specialists will have the patience or the know-how to set up a recurrence relation and solve their way through it. But you must understand the effect of, say O n^2 growth over time, and there's no better way to learn this than to watch your own program blow through its memory stack. :)
I say this as a non-CS, non-math programmer who has spent a good couple of months wrapping my mind around algorithmic analysis.
I'd go for the Algorithm Design Manual, by Steven Skiena. It's very readable and starts with the basics in an easy-to-understand way. For example, it explains big-O notation very well. The emphasis is on practical application, which is a big bonus for beginners coming from a non-theoretical field.
The second half of the book is a reference of common algorithm problems and practical approaches to their solutions. I found it invaluable as a learning aid, and now as a reference.
I'm not sure which MIT book you're referring to, but the canonical text is CLRS. I don't think it really assumes any background besides high school math.
Personally, I found doing TopCoder algorithm competitions over the course of the past few years to be the best way for me to learn common algorithms and put them into practice. Perhaps you should try the same. Whatever you do, I suggest that you spend a lot more hands-on-keyboard time implementing things you learn than head-in-book time, because that's the way to really internalize different techniques.

Pointers to some good SVM Tutorial [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I have been trying to grasp the basics of Support Vector Machines, and downloaded and read many online articles. But still am not able to grasp it.
I would like to know, if there are some
nice tutorial
sample code which can be used for understanding
or something, that you can think of, and that will enable me to learn SVM Basics easily.
PS: I somehow managed to learn PCA (Principal Component Analysis).
BTW, you guys would have guessed that I am working on Machine Learning.
The standard recommendation for a tutorial in SVMs is A Tutorial on Support Vector Machines for Pattern Recognition by Christopher Burges. Another good place to learn about SVMs is the Machine Learning Course at Stanford (SVMs are covered in lectures 6-8). Both these are quite theoretical and heavy on the maths.
As for source code; SVMLight, libsvm and TinySVM are all open-source, but the code is not very easy to follow. I haven't looked at each of them very closely, but the source for TinySVM is probably the is easiest to understand. There is also a pseudo-code implementation of the SMO algorithm in this paper.
This is a very good beginner's tutorial on SVM:
SVM explained
I always thought StompChicken's recommended tutorial was a bit confusing in the way that they jump right into talking about bounds and VC statistics and trying to find the optimal machine and such. It's good if you already understand the basics, though.
Lots of video lectures on SVM:
http://videolectures.net/Top/Computer_Science/Machine_Learning/Kernel_Methods/Support_Vector_Machines/
I found the one by Colin Campbell to be very useful.
A practical guide to SVM classification for libsvm
PyML Tutorial for PyML
I think 1 is practical for use, 3 is clear for understanding.
Assuming you know the basics (eg max margin classifiers, constructing a kernel), solve Problem Set 2 (handout #5) of that stanford machine learning course. There's answer keys & he holds your hand through the whole process. Use Lecture notes 3 & video #7-8 as references.
If you don't know the basics, watch earlier videos.
I would grab a copy of R, install the e1071 package which nicely wraps libsvm, and try to get good results on your favorite data sets.
If you just figured out PCA, it might be informative to look at data with many more predictors than cases (e.g., microarray gene expression profiles, time series, spectra from analytical chemistry, etc.) and compare linear regression on the PCA'd predictors with SVM on the raw predictors.
There are a lot of great references in the other answers, but I think there's value in playing around with the black box before you read what's inside.

Blogs to freshen up my math (in practice) [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
My question, his question, but blogs as resources to be specific.
I find blogs great to keep up to date... refresh material...
So do you know any blogs who tackle math-related programming problems...
Not exactly what you're asking for, but Project Euler freshens up my math skills.
Following MIT Open Courseware is another good computer-based way to learn and practice these skills.
Good Math, Bad Math is my favorite regular math blog.
Steve Yegge's post Math for Programmers gives a pretty decent rundown of what math is important for programmers to understand.
I also like to keep an eye on the math subreddit.
Better Explained has several good articles.
This blog has some interesting math-related things. Some of them are pretty high-level. You've been warned.
Not exactly a blog, but: Notices of the American Mathematical Society
I blog about programming and math, especially probability and statistics, at The Endeavour.
If you want highly lucid explanations and discussions of mathematics related to computer science, then this blog, Developing for Developers, is superb. [No posts for a while, but the previous posts are great.]
It sometimes goes off on tangents but this guy blogs about math software a lot
www.walkingrandomly.com
More along the lines of Project Euler than a blog, William Wu has quite a number of math and CS challenges.
MathPuzzle is one of my favorite sites on math. It may seem not directly related to problem-solving, but games are an excellent way of learning.
This Weeks Finds in mathematical physics is well worth a look, though whethers it's maths depends on where you draw the borderline.
good question, I'm surprised to see that nobody mentioned wikipedia so far.. However, many articles, especially about higher math are written by experts and are overloaded with details, which is not ideal if you just want to learn a bit. But still wikipedia is something that I use regularly to look up math questions.
When it comes to a specific math subject you may also want to read a book.. ;)
Found this one with the help of stackoverflow... :)
link
not before you have your first coffee I think...

Resources